From amenkov at openjdk.org Sat Jun 1 01:07:11 2024 From: amenkov at openjdk.org (Alex Menkov) Date: Sat, 1 Jun 2024 01:07:11 GMT Subject: RFR: 8311177: Switching to interpreter only mode in carrier thread can lead to crashes [v3] In-Reply-To: References: Message-ID: On Fri, 31 May 2024 23:55:20 GMT, Serguei Spitsyn wrote: >> Please, review the following `interp-only` issue related to carrier threads. >> There are 3 problems fixed here: >> - The `EnterInterpOnlyModeClosure::do_threads` is taking the `JvmtiThreadState` with the `jt->jvmti_thread_state()` which is incorrect when we have a deal with a carrier thread. The target state is known at the point when the `HandshakeClosure` is set, so the fix is to pass it as a constructor parameter. >> - The `state->is_pending_interp_only_mode())` was processed at mounts only but it has to be processed for unmounts as well. >> - The test `test/hotspot/jtreg/serviceability/jvmti/vthread/MethodExitTest/libMethodExitTest.cpp` has a wrong assumption that there can't be `MethodExit` event on the carrier thread when the function `breakpoint_hit1` is being executed. However, it can happen if the virtual thread gets unmounted. >> >> The fix also includes new test case `vthread/CarrierThreadEventNotification` developed by Patricio. >> >> Testing: >> - Ran new test case locally >> - Ran mach5 tiers 1-6 > > Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: > > review: refactored def and use of process_pending_interp_only() test/hotspot/jtreg/serviceability/jvmti/vthread/CarrierThreadEventNotification/libCarrierThreadEventNotification.cpp line 40: > 38: > 39: static const char* CTHREAD_NAME_START = "ForkJoinPool"; > 40: static const size_t CTHREAD_NAME_START_LEN = (int)strlen("ForkJoinPool"); `(int)` cast is not needed test/hotspot/jtreg/serviceability/jvmti/vthread/CarrierThreadEventNotification/libCarrierThreadEventNotification.cpp line 58: > 56: cthreads[ct_cnt++] = jni->NewGlobalRef(thread); > 57: } > 58: deallocate(jvmti, jni, (void*)tname); cast to `void*` is not needed test/hotspot/jtreg/serviceability/jvmti/vthread/CarrierThreadEventNotification/libCarrierThreadEventNotification.cpp line 96: > 94: } > 95: jvmtiError err = jvmti->Deallocate((unsigned char*)carrier_threads); > 96: check_jvmti_status(jni, err, "deallocate: error in JVMTI Deallocate call"); replace with `deallocate(jvmti, jni, carrier_threads);` ? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19438#discussion_r1623060427 PR Review Comment: https://git.openjdk.org/jdk/pull/19438#discussion_r1623061692 PR Review Comment: https://git.openjdk.org/jdk/pull/19438#discussion_r1623061890 From amenkov at openjdk.org Sat Jun 1 01:07:11 2024 From: amenkov at openjdk.org (Alex Menkov) Date: Sat, 1 Jun 2024 01:07:11 GMT Subject: RFR: 8311177: Switching to interpreter only mode in carrier thread can lead to crashes [v3] In-Reply-To: <7D1Cchdl8jpFGHWJq0YLCELHQGXz6OLpkxHdLahhgmA=.4b815259-ba39-4ecb-9819-585c0123fca5@github.com> References: <7D1Cchdl8jpFGHWJq0YLCELHQGXz6OLpkxHdLahhgmA=.4b815259-ba39-4ecb-9819-585c0123fca5@github.com> Message-ID: On Thu, 30 May 2024 02:41:39 GMT, Serguei Spitsyn wrote: >> test/hotspot/jtreg/serviceability/jvmti/vthread/MethodExitTest/libMethodExitTest.cpp line 201: >> >>> 199: >>> 200: // need to reset this value after the breakpoint_hit1 >>> 201: received_method_exit_event = JNI_FALSE; >> >> There was a loom-dev email thread regarding this last year. Seems related. I had concluded that the way the test was written that no MethodExit event should have been received. I'm not sure if I missed something in my analysis or if this failure is a result of your changes: >> >> https://mail.openjdk.org/pipermail/loom-dev/2023-August/006059.html >> https://mail.openjdk.org/pipermail/loom-dev/2023-September/006170.html > > Thank you for the comment and links to the discussion. In fact, I've observed the MethodExit events really posted between the breakpoint hits: `hit1` and `hit2`. The first one is at the return from the `unmount()` method. I was not able to prove why they should not be expected. I'm not sure I follow the test logic. Its summary says "Verifies that MethodExit events are delivered on both carrier and virtual threads", but now it just ignores MethodExit requested for carrier thread in breakpoint_hit1. Then there is no sense to request the event on carrier thread. Per the test summary I'd expect the test should test MethodExit for carrier thread, but then java part needs to force unmount ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19438#discussion_r1623073345 From stuefe at openjdk.org Sat Jun 1 05:33:01 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Sat, 1 Jun 2024 05:33:01 GMT Subject: RFR: 8332935: Crash: assert(*lastPtr != 0) failed: Mismatched JNINativeInterface tables, check for new entries [v2] In-Reply-To: References: Message-ID: <8QqhD-2zDfWLAe0s8XWLqDe2QxNO7Uy5OwlkvvUEei4=.6dd2e8a4-46a9-4648-92fd-ef61f84e00a2@github.com> On Fri, 31 May 2024 23:12:27 GMT, David Holmes wrote: >> By using the `int*` type the assert could fail if the lower 32-bits of the function address were all zero. Trivial fix is to change to a type that is guaranteed the right size: `intptr_t*` >> >> Testing was done manually - see the JBS issue. >> >> Also run tier4 testing a sanity as it include `-Xcheck:jni`. >> >> Thanks. > > David Holmes has updated the pull request incrementally with one additional commit since the last revision: > > Copyright year Okay. ------------- Marked as reviewed by stuefe (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19491#pullrequestreview-2092020656 From dholmes at openjdk.org Sat Jun 1 05:33:02 2024 From: dholmes at openjdk.org (David Holmes) Date: Sat, 1 Jun 2024 05:33:02 GMT Subject: RFR: 8332935: Crash: assert(*lastPtr != 0) failed: Mismatched JNINativeInterface tables, check for new entries [v2] In-Reply-To: <8QqhD-2zDfWLAe0s8XWLqDe2QxNO7Uy5OwlkvvUEei4=.6dd2e8a4-46a9-4648-92fd-ef61f84e00a2@github.com> References: <8QqhD-2zDfWLAe0s8XWLqDe2QxNO7Uy5OwlkvvUEei4=.6dd2e8a4-46a9-4648-92fd-ef61f84e00a2@github.com> Message-ID: On Sat, 1 Jun 2024 05:28:21 GMT, Thomas Stuefe wrote: >> David Holmes has updated the pull request incrementally with one additional commit since the last revision: >> >> Copyright year > > Okay. Thanks for the review @tstuefe ! ------------- PR Comment: https://git.openjdk.org/jdk/pull/19491#issuecomment-2143304048 From vlivanov at openjdk.org Sat Jun 1 05:54:05 2024 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Sat, 1 Jun 2024 05:54:05 GMT Subject: RFR: 8330198: Add some class loading related perf counters to measure VM startup [v3] In-Reply-To: References: <7yfsvM0ff6gBYLefpro2qTcEMBmCOHd3YICcygItlZs=.d900a439-4932-46e6-b287-d1bf2789f195@github.com> Message-ID: <24n-bMIYvAF10yob4_Z5t1DPG_nrWypoQYE25zQ499U=.1a77aa5e-1005-4d32-9ae7-ec745838e449@github.com> On Thu, 30 May 2024 00:25:52 GMT, Calvin Cheung wrote: >> This still seems convoluted to me. A -Xlog option shouldn't control anything but logging. If you want a set of counters enabled then use the flag to enable them, and separately use -Xlog:init to print them (though whether "init" is appropriate here is another matter). You could use -Xlog:init+foo to be more selective about which counters. > > I've modified the fix so that the user needs to specify both `-Xlog:init` and `-XX:+ProfileClassLinkage` for the counters to be printed. IMO a dedicated flag (`ProfileClassLinkage`) is well-justified here. `-Xlog:init` prints some data which is not collected by default. So, if a user explicitly specifies `-Xlog:init`, the expectation is JVM automatically enables relevant profiling logic. There's no need to require a user to explicitly specify another flag. (The main reason the data is not collected by default, unlike most of PerfData counters, is because Calvin spotted some negative effects on startup when profiling is turned on.) Speaking of the extra flag itself, it can be achieved solely by consulting whether `-Xlog:init` is enabled or not (replace `ProfileClassLinkage` with `log_is_enabled(Info, init)` checks). But I find it clearer and more convenient to control profiling logic with a dedicated flag. As a bonus, it fits nicely with the rest of PerfData framework: when `-XX:+ProfileClassLinkage` is specified w/o `-Xlog:init`, it is still possible to dump the data using external tools (jcmd and jstat). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18790#discussion_r1623155266 From stuefe at openjdk.org Sat Jun 1 09:01:29 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Sat, 1 Jun 2024 09:01:29 GMT Subject: RFR: 8330174: Establish no-access zone at the start of Klass encoding range Message-ID: <9RShpjQGr5MI3aqK6VqpYgDiUJS3q_Q6Bdo4jWmtJ5g=.764b3747-69be-4a70-a599-d6cb9a02bddd@github.com> After having reserved an address range for the Klass encoding range, we either: a) Place CDS, then class space, into that address range b) Place only class space in that range (if CDS is off). For an nKlass of 0, the decoded Klasspointer points to the beginning of the encoding range. Since nKlass=0 is a special value, both CDS (a) and Metaspace (b) ensure that no Klass is placed right at the start of the Klass range. However, it would also be good to establish a no-access zone at the range's start. Dereferencing an nKlass=0 would then result in an immediate, obvious crash instead of in reading invalid data. This would closely mimic what we do in the compressed-oops-enabled java heap (albeit there we do it for fault-based null checks, too) and what Operating Systems do with low-address ranges. --- The patch: We can neither move the encoding base down one page (the encoding base is carefully chosen to fit the platform's decoding). Nor can we move CDS archive space up one page (since CDS relies on the archive being placed exactly at the encoding base address). Nor do we want to move class space up (since class space start has a high alignment requirement of 16MB, protection zone would need to be 16MB large, which is a waste of address space). Instead, as before, we just let Metaspace and CDS handle the protection zone internally. For Metaspace, this is very simple. We just protect the first page of class space. For CDS, it is a tiny bit more complex since we need to leave a "protection-zone-shaped hole" in the first region of the archive when we dump it. We do just that and then give that region a new property, "has protection zone". At runtime, we protect the underlying memory if a mapped region has a protection zone. With CDS, because the page size can differ between dump- and runtime, the protection zone is the size of CDS core region alignment, not page-sized (e.g. dumping on Linux aarch64 with 4KB pages shall generate an archive that can be used in Docker on MacOS with 16KB pages). ---- Tests: - ran CDS and AppCDS jtreg tests manually on Mac m1 - manually tested that decoding, then dereferencing an nKlass=0 gives us the new "Fault address is narrow Klass base - dereferencing a zero nKlass?" output in the hs-err file - GHAs (which include the new regression test) ------------- Commit messages: - Merge branch 'openjdk:master' into cds-metaspace-prot-prefix - Update metaspace.cpp - cds-metaspace-prot-prefix Changes: https://git.openjdk.org/jdk/pull/19290/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19290&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8330174 Stats: 253 lines in 12 files changed: 214 ins; 15 del; 24 mod Patch: https://git.openjdk.org/jdk/pull/19290.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19290/head:pull/19290 PR: https://git.openjdk.org/jdk/pull/19290 From aph-open at littlepinkcloud.com Sat Jun 1 10:13:41 2024 From: aph-open at littlepinkcloud.com (Andrew Haley) Date: Sat, 1 Jun 2024 11:13:41 +0100 Subject: Structure of the HotSpot Interpreter In-Reply-To: References: Message-ID: On 5/31/24 13:24, Julian Waters wrote: > Thanks for the overview. I unfortunately can't do +PrintInterpreter at > the moment since my JDK is experiencing compilation failures That doesn't matter: the Java you're using as a bootstrap compiler will do it. > everywhere (It's in a bit of a mess right now), but I will try doing > that once I've gotten everything fixed. However, I've been digging > through the code a little, and I think I see a bit of a pattern. The > methods in the Template Table files are all geared towards emitting > the executable code into memory, and each of their methods are passed > as a pointer to the corresponding bytecode definition to actually emit > code into memory. The dispatch mechanism is still a bit of a mystery > to me, Why? It's the four instructions that I quoted last time: 0x0000ffff785428cc: ldrb w8, [x22, #1]! ;; 403: __ dispatch_epilog(tos_out, step); Offset to the dispatch table: 0x0000ffff785428d0: add w9, w8, #0x500 Load the address of the next action, and jump to it: 0x0000ffff785428d4: ldr x9, [x21, w9, uxtw #3] 0x0000ffff785428d8: br x9 > but from what I can see the code that dispatches to the next > bytecode is emitted by dispatch_next. Did I get all of that right, or > is there anything I am missing? By the dispatch_XXX methods, yes. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From dholmes at openjdk.org Sun Jun 2 20:13:04 2024 From: dholmes at openjdk.org (David Holmes) Date: Sun, 2 Jun 2024 20:13:04 GMT Subject: Integrated: 8332935: Crash: assert(*lastPtr != 0) failed: Mismatched JNINativeInterface tables, check for new entries In-Reply-To: References: Message-ID: On Thu, 30 May 2024 23:43:00 GMT, David Holmes wrote: > By using the `int*` type the assert could fail if the lower 32-bits of the function address were all zero. Trivial fix is to change to a type that is guaranteed the right size: `intptr_t*` > > Testing was done manually - see the JBS issue. > > Also run tier4 testing a sanity as it include `-Xcheck:jni`. > > Thanks. This pull request has now been integrated. Changeset: 8338946a Author: David Holmes URL: https://git.openjdk.org/jdk/commit/8338946a6d765eab9cd7a6cbc24c865a9cd355e7 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod 8332935: Crash: assert(*lastPtr != 0) failed: Mismatched JNINativeInterface tables, check for new entries Reviewed-by: dcubed, stuefe ------------- PR: https://git.openjdk.org/jdk/pull/19491 From dholmes at openjdk.org Sun Jun 2 22:09:29 2024 From: dholmes at openjdk.org (David Holmes) Date: Sun, 2 Jun 2024 22:09:29 GMT Subject: RFR: 8256828: ostream::print_cr() truncates buffer in copy-through case Message-ID: <7sjAHNVe08bQX2n2Kp38Ppxqf9zem-oYahE3dO_la84=.20f4fbd3-2c20-415a-8e75-275d69cf4f7b@github.com> Clarifies the behaviour of this function in regards to truncation when adding a CR. Ensures a truncation warning is always issued. Adds unit testing for the specialized cases. See JBS for discussion of other suggestions. Testing: - tiers 1-4 Thanks ------------- Commit messages: - Correct comment - Merge branch 'master' into 8256828-ostream - 8256828: ostream::print_cr() truncates buffer in copy-through case Changes: https://git.openjdk.org/jdk/pull/19512/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19512&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8256828 Stats: 315 lines in 4 files changed: 296 ins; 2 del; 17 mod Patch: https://git.openjdk.org/jdk/pull/19512.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19512/head:pull/19512 PR: https://git.openjdk.org/jdk/pull/19512 From lmesnik at openjdk.org Mon Jun 3 01:01:14 2024 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Mon, 3 Jun 2024 01:01:14 GMT Subject: RFR: 8332259: JvmtiTrace::safe_get_thread_name fails if current thread is in native state [v5] In-Reply-To: References: <2Aorg4EW1Sl5s0tplzUb89ZNUeZg2xsPj3VkJQflzN4=.9072eee0-c481-4da9-ade9-5595ab78030f@github.com> Message-ID: On Wed, 29 May 2024 01:18:57 GMT, Serguei Spitsyn wrote: >> Leonid Mesnik has updated the pull request incrementally with two additional commits since the last revision: >> >> - fixed space. >> - The result is updated. > > src/hotspot/share/prims/jvmtiTrace.cpp line 284: > >> 282: JavaThreadState current_state = JavaThread::cast(Thread::current())->thread_state(); >> 283: if (current_state == _thread_in_native || current_state == _thread_blocked) { >> 284: return "not readable"; > > Nit: I'd suggest to make it more detailed, something like like this: > "" or "" @sspitsyn, @dholmes-ora Thanks for the naming suggestion, looks to long in the report. Let me try to use logging and see if it makes sense to make more improvements. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19275#discussion_r1623691234 From ccheung at openjdk.org Mon Jun 3 06:11:16 2024 From: ccheung at openjdk.org (Calvin Cheung) Date: Mon, 3 Jun 2024 06:11:16 GMT Subject: RFR: 8330198: Add some class loading related perf counters to measure VM startup [v6] In-Reply-To: References: Message-ID: > Adding a few perf counters related to class loading to measure VM startup. The counters are only active if the user specifies `-Xlog:init` in the command line. A diagnostic flag `ProfileClassLinkage` is added to control the new counters. The flag is set to false by default and will be enabled if `-Xlog:init` is specified. > > This change is already in the leyden/premain branch. There are more counters in the branch to measure other stuff. For now, just upstreaming class loader related counters. > > Refer to the [comment](https://bugs.openjdk.org/browse/JDK-8330198?focusedId=14665311&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14665311) in the bug report for an example output. > > Passed tiers 1 - 4 testing. Calvin Cheung has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 10 commits: - @iwanowww comments - Merge branch 'master' into xloginit-classloading - more comments from @dholmes-ora - @dholmes-ora comments - comments from Ioi - Merge branch 'master' into xloginit-classloading - fix build issues on macos-x64 and -aarch64 - Merge branch 'master' into xloginit-classloading - fix linux-x86 and minimal build issues - 8330198: Add some class loading related perf counters to measure VM startup ------------- Changes: https://git.openjdk.org/jdk/pull/18790/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=18790&range=05 Stats: 172 lines in 15 files changed: 157 ins; 6 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/18790.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18790/head:pull/18790 PR: https://git.openjdk.org/jdk/pull/18790 From ccheung at openjdk.org Mon Jun 3 06:11:16 2024 From: ccheung at openjdk.org (Calvin Cheung) Date: Mon, 3 Jun 2024 06:11:16 GMT Subject: RFR: 8330198: Add some class loading related perf counters to measure VM startup [v3] In-Reply-To: <24n-bMIYvAF10yob4_Z5t1DPG_nrWypoQYE25zQ499U=.1a77aa5e-1005-4d32-9ae7-ec745838e449@github.com> References: <7yfsvM0ff6gBYLefpro2qTcEMBmCOHd3YICcygItlZs=.d900a439-4932-46e6-b287-d1bf2789f195@github.com> <24n-bMIYvAF10yob4_Z5t1DPG_nrWypoQYE25zQ499U=.1a77aa5e-1005-4d32-9ae7-ec745838e449@github.com> Message-ID: On Sat, 1 Jun 2024 05:51:03 GMT, Vladimir Ivanov wrote: >> I've modified the fix so that the user needs to specify both `-Xlog:init` and `-XX:+ProfileClassLinkage` for the counters to be printed. > > IMO a dedicated flag (`ProfileClassLinkage`) is well-justified here. `-Xlog:init` prints some data which is not collected by default. So, if a user explicitly specifies `-Xlog:init`, the expectation is JVM automatically enables relevant profiling logic. There's no need to require a user to explicitly specify another flag. > > (The main reason the data is not collected by default, unlike most of PerfData counters, is because Calvin spotted some negative effects on startup when profiling is turned on.) > > Speaking of the extra flag itself, it can be achieved solely by consulting whether `-Xlog:init` is enabled or not (replace `ProfileClassLinkage` with `log_is_enabled(Info, init)` checks). But I find it clearer and more convenient to control profiling logic with a dedicated flag. As a bonus, it fits nicely with the rest of PerfData framework: when `-XX:+ProfileClassLinkage` is specified w/o `-Xlog:init`, it is still possible to dump the data using external tools (jcmd and jstat). Thanks @iwanowww for charming in. I've changed `arguments.cpp` back to the first version. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18790#discussion_r1623827552 From rehn at openjdk.org Mon Jun 3 06:25:08 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Mon, 3 Jun 2024 06:25:08 GMT Subject: RFR: 8332900: RISC-V: refactor nativeInst_riscv.cpp and macroAssembler_riscv.cpp [v4] In-Reply-To: References: Message-ID: On Fri, 31 May 2024 14:40:15 GMT, Hamlin Li wrote: >> Hi, >> Can you help to review the patch? >> Currently, code in nativeInst_riscv.cpp and macroAssembler_riscv.cpp call each other, which is not right for readability and maintainance. >> After refactoring, basically only code in nativeInst_riscv.cpp calls code in macroAssembler_riscv.cpp, but not in reverse direction. >> >> Thanks! >> >> * Tests are still running, so far so good. > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > adjust accessibility Hi Halim, this will cause merge hell for me in https://github.com/openjdk/jdk/pull/19453. You have a bunch of stuff in hpp files which can be moved to cpp file, to keep headers clean and not export so. E.g. check_movptr1_data_dependency is only used in macroAssembler_riscv.cpp, now everyone including MASM.hpp get copy of this method (which is compiled away). I.e. if you want it to be a member method move the definition to cpp and just keep the declaration in hpp. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19459#issuecomment-2144368132 From stuefe at openjdk.org Mon Jun 3 06:41:02 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Mon, 3 Jun 2024 06:41:02 GMT Subject: RFR: 8256828: ostream::print_cr() truncates buffer in copy-through case In-Reply-To: <7sjAHNVe08bQX2n2Kp38Ppxqf9zem-oYahE3dO_la84=.20f4fbd3-2c20-415a-8e75-275d69cf4f7b@github.com> References: <7sjAHNVe08bQX2n2Kp38Ppxqf9zem-oYahE3dO_la84=.20f4fbd3-2c20-415a-8e75-275d69cf4f7b@github.com> Message-ID: On Sun, 2 Jun 2024 22:05:40 GMT, David Holmes wrote: > Clarifies the behaviour of this function in regards to truncation when adding a CR. Ensures a truncation warning is always issued. > > Adds unit testing for the specialized cases. > > See JBS for discussion of other suggestions. > > Testing: - tiers 1-4 > > Thanks So, just to clarify, this is a behavioral change, right? Where before we would not truncate raw strings, now we do, since result_len is used to determine how many bytes we will write to the output sink. src/hotspot/share/utilities/ostream.cpp line 109: > 107: result_len = buflen - 1; > 108: } > 109: } else if (format[0] == '%' && format[1] == 's' && format[2] == '\0') { nit, preexisting: why not just `strncmp(format, "%s", 3) == 0`? src/hotspot/share/utilities/ostream.hpp line 74: > 72: // of the returned string. > 73: // > 74: // In a debug build, if truncation occurs a VM warning is issued. I had to think a bit (I am not a native English speaker) about what the "Nominally" means, but I think it is supposed to contrast the second paragraph? As in "Normally we do that, but in the case of ... we do... ?". Same for "idiomatically" - what does that signify? ------------- PR Review: https://git.openjdk.org/jdk/pull/19512#pullrequestreview-2092883626 PR Review Comment: https://git.openjdk.org/jdk/pull/19512#discussion_r1623841035 PR Review Comment: https://git.openjdk.org/jdk/pull/19512#discussion_r1623850806 From duke at openjdk.org Mon Jun 3 06:45:22 2024 From: duke at openjdk.org (kuaiwei) Date: Mon, 3 Jun 2024 06:45:22 GMT Subject: RFR: 8325821: [REDO] use "dmb.ishst+dmb.ishld" for release barrier [v7] In-Reply-To: References: Message-ID: <6ibfHj015tewMFWCGCYCSBr1-DIlkbebYAGXuoC_h58=.2a4336e2-fcdf-4c68-a61b-e6f65cdb5eb8@github.com> > he origin patch for https://bugs.openjdk.org/browse/JDK-8324186 has 2 issues: > 1 It show regression in some platform, like Apple silicon in mac os > 2 Can not handle instruction sequence like "dmb.ishld; dmb.ishst; dmb.ishld; dmb.ishld" > > It can be fixed by: > 1 Enable AlwaysMergeDMB by default, only disable it in architecture we can see performance improvement (N1 or N2) > 2 Check the special pattern and merge the subsequent dmb. > > It also fix a bug when code buffer is expanding, st/ld/dmb can not be merged. I added unit tests for these. > > This patch still has a unhandled case. Insts like "dmb.ishld; dmb.ishst; dmb.ish", it will merge the last 2 instructions and can not merge all three. Because when emitting dmb.ish, if merge all previous dmbs, the code buffer will shrink the size. I think it may break some resumption and think it's not a common pattern. > > In previous PR https://github.com/openjdk/jdk/pull/18467 , I tried an implementation to use state machine for merging. But it looks risky to pending instruction during emitting. kuaiwei has updated the pull request incrementally with one additional commit since the last revision: Use constexpr for test encoding ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19278/files - new: https://git.openjdk.org/jdk/pull/19278/files/7df2103f..68c3018c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19278&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19278&range=05-06 Stats: 25 lines in 1 file changed: 5 ins; 4 del; 16 mod Patch: https://git.openjdk.org/jdk/pull/19278.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19278/head:pull/19278 PR: https://git.openjdk.org/jdk/pull/19278 From duke at openjdk.org Mon Jun 3 06:45:23 2024 From: duke at openjdk.org (kuaiwei) Date: Mon, 3 Jun 2024 06:45:23 GMT Subject: RFR: 8325821: [REDO] use "dmb.ishst+dmb.ishld" for release barrier [v5] In-Reply-To: References: Message-ID: On Fri, 31 May 2024 10:06:06 GMT, kuaiwei wrote: > > > > I can run the jcstress test. I will run fastdebug build with `java -jar jcstress-latest.jar -tb 24h` Is it the correct command > > > > > Yes, I think so. > > > > > > FTR, I ran Linux AArch64 server release on Graviton 3 instance (ergonomics selects `-AlwaysMergeDMB` there) for 12 hours. Apart from failures from [JDK-8332670](https://bugs.openjdk.org/browse/JDK-8332670), I see no other trouble. Scheduled a quick run with `+AlwaysMergeDMB` as well. > > Thanks for testing. I'm running jcstress on a neoverse-n2 instance. I got some "soft errs" in console output. Are they real error? My jcstress test is done and no error is reported. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19278#issuecomment-2144394730 From duke at openjdk.org Mon Jun 3 06:45:23 2024 From: duke at openjdk.org (kuaiwei) Date: Mon, 3 Jun 2024 06:45:23 GMT Subject: RFR: 8325821: [REDO] use "dmb.ishst+dmb.ishld" for release barrier [v6] In-Reply-To: References: <-EyonABDkbgLwga8XTUDIaJeKhd8qwOmMmYtjFg90RQ=.eb6aebe4-b3e0-4225-9413-57a37730643c@github.com> Message-ID: <17Do100zToDoBbGh90wEsl1mckJcIFj0AdJhV-wMceo=.975190eb-1d47-4ab0-8152-881cc35c8b31@github.com> On Fri, 31 May 2024 11:56:26 GMT, kuaiwei wrote: >> src/hotspot/cpu/aarch64/macroAssembler_aarch64.hpp line 153: >> >>> 151: Assembler::bind(L); >>> 152: code()->clear_last_insn(); >>> 153: code()->set_last_label(pc()); >> >> OK, so we have added `_last_label` to shared code in `codeBuffer`, but only update it in aarch64. This would be surprising for other platforms. On the other hand, this is what we already do with `_last_insn` -- only implementing it for specific platforms. Probably fine, but it would be nice to strengthen this with asserts, maybe in separate PR. > > It reminds me it could be applied to riscv. It also need merge membar. I will move this part to a new PR. I need _last_label in this patch. I need it to check previous 2 instructions and they are not cross block boundary. I will create a new PR for RISCV only. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19278#discussion_r1623854892 From duke at openjdk.org Mon Jun 3 06:45:23 2024 From: duke at openjdk.org (kuaiwei) Date: Mon, 3 Jun 2024 06:45:23 GMT Subject: RFR: 8325821: [REDO] use "dmb.ishst+dmb.ishld" for release barrier [v6] In-Reply-To: <-EyonABDkbgLwga8XTUDIaJeKhd8qwOmMmYtjFg90RQ=.eb6aebe4-b3e0-4225-9413-57a37730643c@github.com> References: <-EyonABDkbgLwga8XTUDIaJeKhd8qwOmMmYtjFg90RQ=.eb6aebe4-b3e0-4225-9413-57a37730643c@github.com> Message-ID: On Fri, 31 May 2024 10:08:47 GMT, Aleksey Shipilev wrote: >> kuaiwei has updated the pull request incrementally with one additional commit since the last revision: >> >> Add comment in aarch64.ad > > test/hotspot/gtest/aarch64/test_assembler_aarch64.cpp line 211: > >> 209: constexpr uint32_t test_encode_dmb_ld = 0xd50339bf; >> 210: constexpr uint32_t test_encode_dmb_st = 0xd5033abf; >> 211: constexpr uint32_t test_encode_dmb = 0xd5033bbf; > > Can you maybe move these to the top, and use these constants across the test? You would not need the comments like `0xd5033abf, // dmb.ishst` then. Fixed ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19278#discussion_r1623852788 From gcao at openjdk.org Mon Jun 3 06:47:10 2024 From: gcao at openjdk.org (Gui Cao) Date: Mon, 3 Jun 2024 06:47:10 GMT Subject: RFR: 8332587: RISC-V: secondary_super_cache does not scale well Message-ID: Implementation of subtype checking [JDK-8180450](https://bugs.openjdk.org/browse/JDK-8180450) for linux-riscv64. By the way, This optimization depends on availability of the Zbb extension which has the cpop instruction. ### Correctness testing: - [x] Run tier1-3, hotspot:tier4 tests on SOPHON SG2042 (release) - [x] Run tier1-3 tests with -XX:+UseRVV on qemu 8.1.0 (fastdebug) - [x] Run all of tier1 with `-XX:+VerifySecondarySupers` ### JMH tested on Banana Pi BPI-F3 board (has Zbb) and Enable UseZbb: Original: Benchmark Mode Cnt Score Error Units SecondarySuperCacheHits.test avgt 15 11.375 ? 0.071 ns/op SecondarySuperCacheInterContention.test avgt 15 646.087 ? 32.587 ns/op SecondarySuperCacheInterContention.test:t1 avgt 15 600.090 ? 83.779 ns/op SecondarySuperCacheInterContention.test:t2 avgt 15 692.084 ? 73.218 ns/op SecondarySupersLookup.testNegative00 avgt 15 16.420 ? 0.239 ns/op SecondarySupersLookup.testNegative01 avgt 15 18.307 ? 0.260 ns/op SecondarySupersLookup.testNegative02 avgt 15 21.695 ? 0.458 ns/op SecondarySupersLookup.testNegative03 avgt 15 24.855 ? 0.664 ns/op SecondarySupersLookup.testNegative04 avgt 15 27.305 ? 0.522 ns/op SecondarySupersLookup.testNegative05 avgt 15 29.719 ? 0.385 ns/op SecondarySupersLookup.testNegative06 avgt 15 32.231 ? 0.498 ns/op SecondarySupersLookup.testNegative07 avgt 15 33.747 ? 0.603 ns/op SecondarySupersLookup.testNegative08 avgt 15 35.856 ? 0.629 ns/op SecondarySupersLookup.testNegative09 avgt 15 37.077 ? 0.546 ns/op SecondarySupersLookup.testNegative10 avgt 15 39.408 ? 0.465 ns/op SecondarySupersLookup.testNegative16 avgt 15 51.041 ? 0.547 ns/op SecondarySupersLookup.testNegative20 avgt 15 58.722 ? 0.922 ns/op SecondarySupersLookup.testNegative30 avgt 15 77.310 ? 0.654 ns/op SecondarySupersLookup.testNegative32 avgt 15 81.116 ? 0.854 ns/op SecondarySupersLookup.testNegative40 avgt 15 96.311 ? 0.840 ns/op SecondarySupersLookup.testNegative50 avgt 15 115.427 ? 0.838 ns/op SecondarySupersLookup.testNegative55 avgt 15 124.371 ? 1.076 ns/op SecondarySupersLookup.testNegative56 avgt 15 126.796 ? 0.916 ns/op SecondarySupersLookup.testNegative57 avgt 15 127.952 ? 1.202 ns/op SecondarySupersLookup.testNegative58 avgt 15 131.956 ? 4.515 ns/op SecondarySupersLookup.testNegative59 avgt 15 131.858 ? 1.066 ns/op SecondarySupersLookup.testNegative60 avgt 15 134.294 ? 0.888 ns/op SecondarySupersLookup.testNegative61 avgt 15 135.462 ? 1.037 ns/op SecondarySupersLookup.testNegative62 avgt 15 137.805 ? 0.999 ns/op SecondarySupersLookup.testNegative63 avgt 15 139.335 ? 1.164 ns/op SecondarySupersLookup.testNegative64 avgt 15 141.401 ? 0.947 ns/op SecondarySupersLookup.testPositive01 avgt 15 10.731 ? 0.152 ns/op SecondarySupersLookup.testPositive02 avgt 15 10.726 ? 0.142 ns/op SecondarySupersLookup.testPositive03 avgt 15 10.728 ? 0.145 ns/op SecondarySupersLookup.testPositive04 avgt 15 10.730 ? 0.149 ns/op SecondarySupersLookup.testPositive05 avgt 15 10.730 ? 0.151 ns/op SecondarySupersLookup.testPositive06 avgt 15 10.730 ? 0.150 ns/op SecondarySupersLookup.testPositive07 avgt 15 10.731 ? 0.148 ns/op SecondarySupersLookup.testPositive08 avgt 15 10.730 ? 0.150 ns/op SecondarySupersLookup.testPositive09 avgt 15 10.730 ? 0.151 ns/op SecondarySupersLookup.testPositive10 avgt 15 10.734 ? 0.156 ns/op SecondarySupersLookup.testPositive16 avgt 15 10.742 ? 0.160 ns/op SecondarySupersLookup.testPositive20 avgt 15 10.730 ? 0.150 ns/op SecondarySupersLookup.testPositive30 avgt 15 10.735 ? 0.156 ns/op SecondarySupersLookup.testPositive32 avgt 15 10.731 ? 0.147 ns/op SecondarySupersLookup.testPositive40 avgt 15 10.744 ? 0.179 ns/op SecondarySupersLookup.testPositive50 avgt 15 10.733 ? 0.152 ns/op SecondarySupersLookup.testPositive60 avgt 15 10.748 ? 0.189 ns/op SecondarySupersLookup.testPositive63 avgt 15 10.726 ? 0.142 ns/op SecondarySupersLookup.testPositive64 avgt 15 10.733 ? 0.155 ns/op TypePollution.instanceOfInterfaceSwitchLinearNoSCC avgt 12 51592.312 ? 6662.713 ns/op TypePollution.instanceOfInterfaceSwitchLinearSCC avgt 12 50294.723 ? 371.807 ns/op TypePollution.instanceOfInterfaceSwitchTableNoSCC avgt 12 53752.017 ? 346.287 ns/op TypePollution.instanceOfInterfaceSwitchTableSCC avgt 12 50053.321 ? 642.562 ns/op TypePollution.parallelInstanceOfInterfaceSwitchLinearNoSCC avgt 12 1830.935 ? 262.496 ms/op TypePollution.parallelInstanceOfInterfaceSwitchLinearSCC avgt 12 1745.503 ? 201.047 ms/op TypePollution.parallelInstanceOfInterfaceSwitchTableNoSCC avgt 12 1794.322 ? 283.656 ms/op TypePollution.parallelInstanceOfInterfaceSwitchTableSCC avgt 12 1808.875 ? 235.126 ms/op With patch: Benchmark Mode Cnt Score Error Units SecondarySuperCacheHits.test avgt 15 11.382 ? 0.027 ns/op SecondarySuperCacheInterContention.test avgt 15 621.291 ? 40.619 ns/op SecondarySuperCacheInterContention.test:t1 avgt 15 635.382 ? 58.347 ns/op SecondarySuperCacheInterContention.test:t2 avgt 15 607.200 ? 65.490 ns/op SecondarySupersLookup.testNegative00 avgt 15 13.275 ? 0.223 ns/op SecondarySupersLookup.testNegative01 avgt 15 13.264 ? 0.201 ns/op SecondarySupersLookup.testNegative02 avgt 15 13.261 ? 0.194 ns/op SecondarySupersLookup.testNegative03 avgt 15 13.271 ? 0.210 ns/op SecondarySupersLookup.testNegative04 avgt 15 13.265 ? 0.201 ns/op SecondarySupersLookup.testNegative05 avgt 15 13.258 ? 0.191 ns/op SecondarySupersLookup.testNegative06 avgt 15 13.280 ? 0.225 ns/op SecondarySupersLookup.testNegative07 avgt 15 13.268 ? 0.201 ns/op SecondarySupersLookup.testNegative08 avgt 15 13.266 ? 0.202 ns/op SecondarySupersLookup.testNegative09 avgt 15 13.261 ? 0.196 ns/op SecondarySupersLookup.testNegative10 avgt 15 13.268 ? 0.198 ns/op SecondarySupersLookup.testNegative16 avgt 15 13.268 ? 0.205 ns/op SecondarySupersLookup.testNegative20 avgt 15 13.284 ? 0.231 ns/op SecondarySupersLookup.testNegative30 avgt 15 13.281 ? 0.226 ns/op SecondarySupersLookup.testNegative32 avgt 15 13.273 ? 0.215 ns/op SecondarySupersLookup.testNegative40 avgt 15 13.287 ? 0.233 ns/op SecondarySupersLookup.testNegative50 avgt 15 13.292 ? 0.242 ns/op SecondarySupersLookup.testNegative55 avgt 15 53.064 ? 0.757 ns/op SecondarySupersLookup.testNegative56 avgt 15 53.052 ? 0.767 ns/op SecondarySupersLookup.testNegative57 avgt 15 53.068 ? 0.803 ns/op SecondarySupersLookup.testNegative58 avgt 15 53.076 ? 0.776 ns/op SecondarySupersLookup.testNegative59 avgt 15 53.095 ? 0.846 ns/op SecondarySupersLookup.testNegative60 avgt 15 75.106 ? 1.033 ns/op SecondarySupersLookup.testNegative61 avgt 15 76.832 ? 4.047 ns/op SecondarySupersLookup.testNegative62 avgt 15 75.085 ? 1.010 ns/op SecondarySupersLookup.testNegative63 avgt 15 153.709 ? 0.893 ns/op SecondarySupersLookup.testNegative64 avgt 15 155.623 ? 0.922 ns/op SecondarySupersLookup.testPositive01 avgt 15 10.727 ? 0.145 ns/op SecondarySupersLookup.testPositive02 avgt 15 10.734 ? 0.157 ns/op SecondarySupersLookup.testPositive03 avgt 15 10.731 ? 0.151 ns/op SecondarySupersLookup.testPositive04 avgt 15 10.733 ? 0.156 ns/op SecondarySupersLookup.testPositive05 avgt 15 10.742 ? 0.168 ns/op SecondarySupersLookup.testPositive06 avgt 15 10.729 ? 0.148 ns/op SecondarySupersLookup.testPositive07 avgt 15 10.738 ? 0.163 ns/op SecondarySupersLookup.testPositive08 avgt 15 10.736 ? 0.159 ns/op SecondarySupersLookup.testPositive09 avgt 15 10.735 ? 0.158 ns/op SecondarySupersLookup.testPositive10 avgt 15 10.730 ? 0.150 ns/op SecondarySupersLookup.testPositive16 avgt 15 10.734 ? 0.157 ns/op SecondarySupersLookup.testPositive20 avgt 15 10.731 ? 0.149 ns/op SecondarySupersLookup.testPositive30 avgt 15 10.733 ? 0.156 ns/op SecondarySupersLookup.testPositive32 avgt 15 10.729 ? 0.149 ns/op SecondarySupersLookup.testPositive40 avgt 15 10.732 ? 0.154 ns/op SecondarySupersLookup.testPositive50 avgt 15 10.735 ? 0.154 ns/op SecondarySupersLookup.testPositive60 avgt 15 10.736 ? 0.157 ns/op SecondarySupersLookup.testPositive63 avgt 15 10.733 ? 0.155 ns/op SecondarySupersLookup.testPositive64 avgt 15 10.729 ? 0.149 ns/op TypePollution.instanceOfInterfaceSwitchLinearNoSCC avgt 12 52942.830 ? 851.030 ns/op TypePollution.instanceOfInterfaceSwitchLinearSCC avgt 12 49901.169 ? 663.837 ns/op TypePollution.instanceOfInterfaceSwitchTableNoSCC avgt 12 41443.364 ? 1045.602 ns/op TypePollution.instanceOfInterfaceSwitchTableSCC avgt 12 38050.773 ? 1301.636 ns/op TypePollution.parallelInstanceOfInterfaceSwitchLinearNoSCC avgt 12 1634.976 ? 5.730 ms/op TypePollution.parallelInstanceOfInterfaceSwitchLinearSCC avgt 12 1613.755 ? 199.612 ms/op TypePollution.parallelInstanceOfInterfaceSwitchTableNoSCC avgt 12 1979.354 ? 207.235 ms/op TypePollution.parallelInstanceOfInterfaceSwitchTableSCC avgt 12 1525.198 ? 167.922 ms/op ------------- Commit messages: - Merge remote-tracking branch 'upstream/master' into JDK-8332587 - Fix client VM build - Merge remote-tracking branch 'upstream/master' into JDK-8332587 - 8332587: RISC-V: secondary_super_cache does not scale well Changes: https://git.openjdk.org/jdk/pull/19320/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19320&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8332587 Stats: 370 lines in 6 files changed: 370 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/19320.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19320/head:pull/19320 PR: https://git.openjdk.org/jdk/pull/19320 From dholmes at openjdk.org Mon Jun 3 07:17:02 2024 From: dholmes at openjdk.org (David Holmes) Date: Mon, 3 Jun 2024 07:17:02 GMT Subject: RFR: 8256828: ostream::print_cr() truncates buffer in copy-through case In-Reply-To: References: <7sjAHNVe08bQX2n2Kp38Ppxqf9zem-oYahE3dO_la84=.20f4fbd3-2c20-415a-8e75-275d69cf4f7b@github.com> Message-ID: On Mon, 3 Jun 2024 06:36:50 GMT, Thomas Stuefe wrote: >> Clarifies the behaviour of this function in regards to truncation when adding a CR. Ensures a truncation warning is always issued. >> >> Adds unit testing for the specialized cases. >> >> See JBS for discussion of other suggestions. >> >> Testing: - tiers 1-4 >> >> Thanks > > src/hotspot/share/utilities/ostream.hpp line 74: > >> 72: // of the returned string. >> 73: // >> 74: // In a debug build, if truncation occurs a VM warning is issued. > > I had to think a bit (I am not a native English speaker) about what the "Nominally" means, but I think it is supposed to contrast the second paragraph? As in "Normally we do that, but in the case of ... we do... ?". Same for "idiomatically" - what does that signify? Right "nominally" is indicating that it basically operates one way but there are exceptions as outlined in the second paragraph. "idiomatically" means we are applying a specific coding idiom aka pattern - in this case secure programming says you never pass a non-constant string to a printf-like function, but instead pass "%s" and supply the actual string as the argument. So when we encounter that idiom will handle it specially. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19512#discussion_r1623889256 From dholmes at openjdk.org Mon Jun 3 07:25:02 2024 From: dholmes at openjdk.org (David Holmes) Date: Mon, 3 Jun 2024 07:25:02 GMT Subject: RFR: 8256828: ostream::print_cr() truncates buffer in copy-through case In-Reply-To: <7sjAHNVe08bQX2n2Kp38Ppxqf9zem-oYahE3dO_la84=.20f4fbd3-2c20-415a-8e75-275d69cf4f7b@github.com> References: <7sjAHNVe08bQX2n2Kp38Ppxqf9zem-oYahE3dO_la84=.20f4fbd3-2c20-415a-8e75-275d69cf4f7b@github.com> Message-ID: On Sun, 2 Jun 2024 22:05:40 GMT, David Holmes wrote: > Clarifies the behaviour of this function in regards to truncation when adding a CR. Ensures a truncation warning is always issued. > > Adds unit testing for the specialized cases. > > See JBS for discussion of other suggestions. > > Testing: - tiers 1-4 > > Thanks Thanks for looking at this Thomas! > So, just to clarify, this is a behavioral change, right? > > Where before we would not truncate raw strings, now we do, since result_len is used to determine how many bytes we will write to the output sink. No, the only change in behaviour is issuing the warning in all cases of truncation. The `result_len` handling is the same as before. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19512#issuecomment-2144459529 From dholmes at openjdk.org Mon Jun 3 07:30:03 2024 From: dholmes at openjdk.org (David Holmes) Date: Mon, 3 Jun 2024 07:30:03 GMT Subject: RFR: 8256828: ostream::print_cr() truncates buffer in copy-through case In-Reply-To: References: <7sjAHNVe08bQX2n2Kp38Ppxqf9zem-oYahE3dO_la84=.20f4fbd3-2c20-415a-8e75-275d69cf4f7b@github.com> Message-ID: On Mon, 3 Jun 2024 06:25:54 GMT, Thomas Stuefe wrote: >> Clarifies the behaviour of this function in regards to truncation when adding a CR. Ensures a truncation warning is always issued. >> >> Adds unit testing for the specialized cases. >> >> See JBS for discussion of other suggestions. >> >> Testing: - tiers 1-4 >> >> Thanks > > src/hotspot/share/utilities/ostream.cpp line 109: > >> 107: result_len = buflen - 1; >> 108: } >> 109: } else if (format[0] == '%' && format[1] == 's' && format[2] == '\0') { > > nit, preexisting: why not just `strncmp(format, "%s", 3) == 0`? Yep that works too. :) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19512#discussion_r1623904624 From dholmes at openjdk.org Mon Jun 3 08:04:18 2024 From: dholmes at openjdk.org (David Holmes) Date: Mon, 3 Jun 2024 08:04:18 GMT Subject: RFR: 8326716: JVMTI spec: clarify what nullptr means for C/C++ developers [v4] In-Reply-To: <778pi5AHHgXZdUEBV45R0Npj1wZPeuAHwWdrygWR830=.d67d6249-0dcd-4fef-976f-6911432d53f8@github.com> References: <778pi5AHHgXZdUEBV45R0Npj1wZPeuAHwWdrygWR830=.d67d6249-0dcd-4fef-976f-6911432d53f8@github.com> Message-ID: On Fri, 31 May 2024 08:07:36 GMT, Serguei Spitsyn wrote: >> The following RFE was fixed recently: >> [8324680](https://bugs.openjdk.org/browse/JDK-8324680): Replace NULL with nullptr in JVMTI generated code >> >> It replaced all the `NULL`'s in the generated spec with`nullptr`. JVMTI agents can be developed in C or C++. >> This update is to make it clear that `nullptr` is C programming language `null` pointer. >> >> I think we do not need a CSR for this fix. >> >> Testing: N/A (not needed) > > Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: > > review: more null pointer corrections The general rules are to either say "a null pointer" (possibly with capital A depending on context), or just "null". And in most cases you could choose either. I made various suggestions but really it is up to you. It is hard to get a sense of consistency from these small fragments. The word "null" should never be in code font as it is not a programming language entity. Thanks for your patience and perseverance on this. src/hotspot/share/prims/jvmti.xml line 1101: > 1099: > 1100: On return, a pointer to the beginning of the allocated memory. > 1101: If size is zero, null pointer is returned. If saying "null pointer" then it should be "a null pointer". src/hotspot/share/prims/jvmti.xml line 1533: > 1531: > 1532: > 1533: On return, points to the current thread, or null. remove code font src/hotspot/share/prims/jvmti.xml line 1996: > 1994: > 1995: The thread group to which this thread belongs. > 1996: null pointer if the thread has terminated. Just "Null if ..." src/hotspot/share/prims/jvmti.xml line 2142: > 2140: > 2141: On return, filled with the current contended monitor, or > 2142: null pointer if there is none. Just "null" src/hotspot/share/prims/jvmti.xml line 2262: > 2260: > 2261: > 2262: null pointer is passed to the start function A null ... src/hotspot/share/prims/jvmti.xml line 2353: > 2351: If thread-local storage has not been set with > 2352: the returned > 2353: pointer is null. Remove code font src/hotspot/share/prims/jvmti.xml line 4277: > 4275: , > 4276: or . > 4277: Otherwise null pointer. a null ... src/hotspot/share/prims/jvmti.xml line 4322: > 4320: points to the zero if the referrer > 4321: object is not tagged. > 4322: null pointer if the referrer in not an object (that is, Null if ... src/hotspot/share/prims/jvmti.xml line 4769: > 4767: > 4768: > 4769: null pointer is passed as the user supplied data a null pointer src/hotspot/share/prims/jvmti.xml line 4945: > 4943: > 4944: > 4945: null pointer is passed as the user supplied data a null pointer src/hotspot/share/prims/jvmti.xml line 5593: > 5591: > 5592: > 5593: null pointer is passed as the user supplied data a null pointer src/hotspot/share/prims/jvmti.xml line 5637: > 5635: callback will not be called until the appropriate callback has been called > 5636: for all roots. If the callback is > 5637: specified as null pointer then this function returns after Just "as null then" src/hotspot/share/prims/jvmti.xml line 5692: > 5690: > 5691: >null pointer is passed as the user supplied data > 5692: a null pointer src/hotspot/share/prims/jvmti.xml line 5750: > 5748: > 5749: > 5750: null pointer is passed as the user supplied data a null pointer src/hotspot/share/prims/jvmti.xml line 6770: > 6768: If a named module is defined to the class loader and it > 6769: contains the package then that named module is returned, > 6770: otherwise null pointer is returned. Just "null" src/hotspot/share/prims/jvmti.xml line 6803: > 6801: > 6802: On return, points to a java.lang.Module object > 6803: or points to null. Remove code font src/hotspot/share/prims/jvmti.xml line 7264: > 7262: modified UTF-8 string. > 7263: If there is no generic signature attribute for the class, then, > 7264: on return, points to null. Remove code font src/hotspot/share/prims/jvmti.xml line 7794: > 7792: If the class was not created by a class loader > 7793: or if the class loader is the bootstrap class loader, > 7794: points to null. Remove code font src/hotspot/share/prims/jvmti.xml line 8414: > 8412: modified UTF-8 string. > 8413: If there is no generic signature attribute for the field, then, > 8414: on return, points to null. Remove code font src/hotspot/share/prims/jvmti.xml line 8607: > 8605: modified UTF-8 string. > 8606: If there is no generic signature attribute for the method, then, > 8607: on return, points to null. Remove code font src/hotspot/share/prims/jvmti.xml line 9243: > 9241: of 1. > 9242: Calling SetNativeMethodPrefix with > 9243: null pointer is the same as calling this function with a null pointer src/hotspot/share/prims/jvmti.xml line 11657: > 11655: > 11656: > 11657: value is set to null Remove code font src/hotspot/share/prims/jvmti.xml line 11951: > 11949: > 11950: > 11951: Pointer is unexpectedly null. Remove code font. src/hotspot/share/prims/jvmti.xml line 12458: > 12456: > 12457: Object with the field being accessed if the field is an > 12458: instance field; null pointer otherwise a null pointer src/hotspot/share/prims/jvmti.xml line 12528: > 12526: > 12527: Object with the field being modified if the field is an > 12528: instance field; null pointer otherwise a null pointer src/hotspot/share/prims/jvmti.xml line 12879: > 12877: > 12878: > 12879: Class that will catch the exception, or null pointer if no known catch Just "null" src/hotspot/share/prims/jvmti.xml line 12885: > 12883: > 12884: > 12885: Method that will catch the exception, or null pointer if no known catch Just "null" src/hotspot/share/prims/jvmti.xml line 13397: > 13395: redefined or > 13396: retransformed. > 13397: null pointer if sent by class load. A null pointer src/hotspot/share/prims/jvmti.xml line 13404: > 13402: > 13403: The class loader loading the class. > 13404: null pointer if the bootstrap class loader. A null pointer src/hotspot/share/prims/jvmti.xml line 13414: > 13412: modified UTF-8 string. > 13413: Note: if the class is defined with a null pointer name or > 13414: without a name specified, name will be null. How do you not specify a name other than by passing "null" for the name?? src/hotspot/share/prims/jvmti.xml line 13632: > 13630: > 13631: to start_address-1 of the next entry. > 13632: null pointer if mapping information cannot be supplied. A null pointer src/hotspot/share/prims/jvmti.xml line 14695: > 14693: > 14694: > 14695: Allow null pointer as RunAgentThread arg. Just "null" src/hotspot/share/prims/jvmti.xml line 14705: > 14703: > 14704: > 14705: Change GetFieldName to allow null pointer like GetMethodName. Just null src/hotspot/share/prims/jvmti.xml line 14814: > 14812: Clarify semantics of raw monitors. > 14813: Change flags on GetThreadStatus. > 14814: GetClassLoader return null pointer for the bootstrap class loader. a null pointer src/hotspot/share/prims/jvmti.xml line 14824: > 14822: > 14823: Define the data type jvmtiEventCallbacks. > 14824: Zero length allocations return null pointer. a null pointer src/hotspot/share/prims/jvmti.xml line 14846: > 14844: remove GetHeapRoots, add reachable iterators, > 14845: and rename "annotation" to "tag". > 14846: null pointer thread parameter on most functions is current A null pointer src/hotspot/share/prims/jvmti.xsl line 1589: > 1587: > 1588: is > 1589: null pointer Remove code font src/hotspot/share/prims/jvmtiEnv.xsl line 139: > 137: > 138: > 139: // method - pre-checked for validity, but may be null pointer meaning obsolete method Just "null" src/hotspot/share/prims/jvmtiEnv.xsl line 169: > 167: // > 168: > 169: - pre-checked for null pointer Just null src/hotspot/share/prims/jvmtiEnv.xsl line 175: > 173: // > 174: > 175: - null pointer is a valid value, must be checked Just null src/hotspot/share/prims/jvmtiLib.xsl line 180: > 178: > 179: is > 180: null pointer, . Just null (no code font) src/hotspot/share/prims/jvmtiLib.xsl line 381: > 379: > 380: is > 381: null pointer, the current thread is used. Just null (no code font) ------------- Changes requested by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19257#pullrequestreview-2093011060 PR Review Comment: https://git.openjdk.org/jdk/pull/19257#discussion_r1623916928 PR Review Comment: https://git.openjdk.org/jdk/pull/19257#discussion_r1623917438 PR Review Comment: https://git.openjdk.org/jdk/pull/19257#discussion_r1623918169 PR Review Comment: https://git.openjdk.org/jdk/pull/19257#discussion_r1623920264 PR Review Comment: https://git.openjdk.org/jdk/pull/19257#discussion_r1623920788 PR Review Comment: https://git.openjdk.org/jdk/pull/19257#discussion_r1623921174 PR Review Comment: https://git.openjdk.org/jdk/pull/19257#discussion_r1623922170 PR Review Comment: https://git.openjdk.org/jdk/pull/19257#discussion_r1623923033 PR Review Comment: https://git.openjdk.org/jdk/pull/19257#discussion_r1623923386 PR Review Comment: https://git.openjdk.org/jdk/pull/19257#discussion_r1623923954 PR Review Comment: https://git.openjdk.org/jdk/pull/19257#discussion_r1623924658 PR Review Comment: https://git.openjdk.org/jdk/pull/19257#discussion_r1623925443 PR Review Comment: https://git.openjdk.org/jdk/pull/19257#discussion_r1623925864 PR Review Comment: https://git.openjdk.org/jdk/pull/19257#discussion_r1623926064 PR Review Comment: https://git.openjdk.org/jdk/pull/19257#discussion_r1623926453 PR Review Comment: https://git.openjdk.org/jdk/pull/19257#discussion_r1623926975 PR Review Comment: https://git.openjdk.org/jdk/pull/19257#discussion_r1623927420 PR Review Comment: https://git.openjdk.org/jdk/pull/19257#discussion_r1623927827 PR Review Comment: https://git.openjdk.org/jdk/pull/19257#discussion_r1623928337 PR Review Comment: https://git.openjdk.org/jdk/pull/19257#discussion_r1623928549 PR Review Comment: https://git.openjdk.org/jdk/pull/19257#discussion_r1623928967 PR Review Comment: https://git.openjdk.org/jdk/pull/19257#discussion_r1623930079 PR Review Comment: https://git.openjdk.org/jdk/pull/19257#discussion_r1623930308 PR Review Comment: https://git.openjdk.org/jdk/pull/19257#discussion_r1623931169 PR Review Comment: https://git.openjdk.org/jdk/pull/19257#discussion_r1623931569 PR Review Comment: https://git.openjdk.org/jdk/pull/19257#discussion_r1623932080 PR Review Comment: https://git.openjdk.org/jdk/pull/19257#discussion_r1623932303 PR Review Comment: https://git.openjdk.org/jdk/pull/19257#discussion_r1623932917 PR Review Comment: https://git.openjdk.org/jdk/pull/19257#discussion_r1623933143 PR Review Comment: https://git.openjdk.org/jdk/pull/19257#discussion_r1623934935 PR Review Comment: https://git.openjdk.org/jdk/pull/19257#discussion_r1623935320 PR Review Comment: https://git.openjdk.org/jdk/pull/19257#discussion_r1623935750 PR Review Comment: https://git.openjdk.org/jdk/pull/19257#discussion_r1623936049 PR Review Comment: https://git.openjdk.org/jdk/pull/19257#discussion_r1623936478 PR Review Comment: https://git.openjdk.org/jdk/pull/19257#discussion_r1623936755 PR Review Comment: https://git.openjdk.org/jdk/pull/19257#discussion_r1623937128 PR Review Comment: https://git.openjdk.org/jdk/pull/19257#discussion_r1623938553 PR Review Comment: https://git.openjdk.org/jdk/pull/19257#discussion_r1623938932 PR Review Comment: https://git.openjdk.org/jdk/pull/19257#discussion_r1623939238 PR Review Comment: https://git.openjdk.org/jdk/pull/19257#discussion_r1623939410 PR Review Comment: https://git.openjdk.org/jdk/pull/19257#discussion_r1623940127 PR Review Comment: https://git.openjdk.org/jdk/pull/19257#discussion_r1623941054 From mli at openjdk.org Mon Jun 3 08:19:08 2024 From: mli at openjdk.org (Hamlin Li) Date: Mon, 3 Jun 2024 08:19:08 GMT Subject: RFR: 8332900: RISC-V: refactor nativeInst_riscv.cpp and macroAssembler_riscv.cpp [v4] In-Reply-To: References: Message-ID: <2rhfcNiCyomqQ5Ibh5xWJv6uEYkdme7gS8ViGzOTjww=.8bd6bffe-b72b-47ad-9939-a07e1644a84e@github.com> On Fri, 31 May 2024 14:40:15 GMT, Hamlin Li wrote: >> Hi, >> Can you help to review the patch? >> Currently, code in nativeInst_riscv.cpp and macroAssembler_riscv.cpp call each other, which is not right for readability and maintainance. >> After refactoring, basically only code in nativeInst_riscv.cpp calls code in macroAssembler_riscv.cpp, but not in reverse direction. >> >> Thanks! >> >> * Tests are still running, so far so good. > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > adjust accessibility > Hi Halim, this will cause merge hell for me in #19453. > > You have a bunch of stuff in hpp files which can be moved to cpp file, to keep headers clean and not export so. E.g. check_movptr1_data_dependency is only used in macroAssembler_riscv.cpp, now everyone including MASM.hpp get copy of this method (which is compiled away). I.e. if you want it to be a member method move the definition to cpp and just keep the declaration in hpp. I'm not sure. If this is a problem in this patch, then in original implementation, MASM.hpp includes nativeInst.hpp, it will cause the similar issue? In another hand, will unused methods in a header file still exist in the compiled binary? The reason I do it this way is because I just want to refactor necessary things (the pricipal is to only nativeInst depends on macroAssembler, not in reverse direction), for other things, I try to keep it as original ones. So if there is other optimization opportunities, it's better to do it in separate pr's. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19459#issuecomment-2144563334 From aph at openjdk.org Mon Jun 3 08:20:10 2024 From: aph at openjdk.org (Andrew Haley) Date: Mon, 3 Jun 2024 08:20:10 GMT Subject: RFR: 8325821: [REDO] use "dmb.ishst+dmb.ishld" for release barrier [v7] In-Reply-To: <6ibfHj015tewMFWCGCYCSBr1-DIlkbebYAGXuoC_h58=.2a4336e2-fcdf-4c68-a61b-e6f65cdb5eb8@github.com> References: <6ibfHj015tewMFWCGCYCSBr1-DIlkbebYAGXuoC_h58=.2a4336e2-fcdf-4c68-a61b-e6f65cdb5eb8@github.com> Message-ID: On Mon, 3 Jun 2024 06:45:22 GMT, kuaiwei wrote: >> he origin patch for https://bugs.openjdk.org/browse/JDK-8324186 has 2 issues: >> 1 It show regression in some platform, like Apple silicon in mac os >> 2 Can not handle instruction sequence like "dmb.ishld; dmb.ishst; dmb.ishld; dmb.ishld" >> >> It can be fixed by: >> 1 Enable AlwaysMergeDMB by default, only disable it in architecture we can see performance improvement (N1 or N2) >> 2 Check the special pattern and merge the subsequent dmb. >> >> It also fix a bug when code buffer is expanding, st/ld/dmb can not be merged. I added unit tests for these. >> >> This patch still has a unhandled case. Insts like "dmb.ishld; dmb.ishst; dmb.ish", it will merge the last 2 instructions and can not merge all three. Because when emitting dmb.ish, if merge all previous dmbs, the code buffer will shrink the size. I think it may break some resumption and think it's not a common pattern. >> >> In previous PR https://github.com/openjdk/jdk/pull/18467 , I tried an implementation to use state machine for merging. But it looks risky to pending instruction during emitting. > > kuaiwei has updated the pull request incrementally with one additional commit since the last revision: > > Use constexpr for test encoding Marked as reviewed by aph (Reviewer). I think we're done here. Thank you for your patience and hard work. @shipilev, are you happy too? ------------- PR Review: https://git.openjdk.org/jdk/pull/19278#pullrequestreview-2093108863 PR Review: https://git.openjdk.org/jdk/pull/19278#pullrequestreview-2093115776 From shade at openjdk.org Mon Jun 3 08:20:10 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 3 Jun 2024 08:20:10 GMT Subject: RFR: 8325821: [REDO] use "dmb.ishst+dmb.ishld" for release barrier [v7] In-Reply-To: <6ibfHj015tewMFWCGCYCSBr1-DIlkbebYAGXuoC_h58=.2a4336e2-fcdf-4c68-a61b-e6f65cdb5eb8@github.com> References: <6ibfHj015tewMFWCGCYCSBr1-DIlkbebYAGXuoC_h58=.2a4336e2-fcdf-4c68-a61b-e6f65cdb5eb8@github.com> Message-ID: On Mon, 3 Jun 2024 06:45:22 GMT, kuaiwei wrote: >> he origin patch for https://bugs.openjdk.org/browse/JDK-8324186 has 2 issues: >> 1 It show regression in some platform, like Apple silicon in mac os >> 2 Can not handle instruction sequence like "dmb.ishld; dmb.ishst; dmb.ishld; dmb.ishld" >> >> It can be fixed by: >> 1 Enable AlwaysMergeDMB by default, only disable it in architecture we can see performance improvement (N1 or N2) >> 2 Check the special pattern and merge the subsequent dmb. >> >> It also fix a bug when code buffer is expanding, st/ld/dmb can not be merged. I added unit tests for these. >> >> This patch still has a unhandled case. Insts like "dmb.ishld; dmb.ishst; dmb.ish", it will merge the last 2 instructions and can not merge all three. Because when emitting dmb.ish, if merge all previous dmbs, the code buffer will shrink the size. I think it may break some resumption and think it's not a common pattern. >> >> In previous PR https://github.com/openjdk/jdk/pull/18467 , I tried an implementation to use state machine for merging. But it looks risky to pending instruction during emitting. > > kuaiwei has updated the pull request incrementally with one additional commit since the last revision: > > Use constexpr for test encoding Yes, I believe we are done. Good job! ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19278#pullrequestreview-2093121726 From mli at openjdk.org Mon Jun 3 08:30:02 2024 From: mli at openjdk.org (Hamlin Li) Date: Mon, 3 Jun 2024 08:30:02 GMT Subject: RFR: 8332587: RISC-V: secondary_super_cache does not scale well In-Reply-To: References: Message-ID: On Tue, 21 May 2024 08:31:53 GMT, Gui Cao wrote: > Implementation of subtype checking [JDK-8180450](https://bugs.openjdk.org/browse/JDK-8180450) for linux-riscv64. > This optimization depends on availability of the Zbb extension which has the cpop instruction. > > ### Correctness testing: > > - [x] Run tier1-3, hotspot:tier4 tests on SOPHON SG2042 (release) > - [x] Run tier1-3 tests with -XX:+UseRVV on qemu 8.1.0 (fastdebug) > - [x] Run all of tier1 with `-XX:+VerifySecondarySupers` > > ### JMH tested on Banana Pi BPI-F3 board (has Zbb) and Enable UseZbb: > Original: > > Benchmark Mode Cnt Score Error Units > SecondarySuperCacheHits.test avgt 15 11.375 ? 0.071 ns/op > SecondarySuperCacheInterContention.test avgt 15 646.087 ? 32.587 ns/op > SecondarySuperCacheInterContention.test:t1 avgt 15 600.090 ? 83.779 ns/op > SecondarySuperCacheInterContention.test:t2 avgt 15 692.084 ? 73.218 ns/op > SecondarySupersLookup.testNegative00 avgt 15 16.420 ? 0.239 ns/op > SecondarySupersLookup.testNegative01 avgt 15 18.307 ? 0.260 ns/op > SecondarySupersLookup.testNegative02 avgt 15 21.695 ? 0.458 ns/op > SecondarySupersLookup.testNegative03 avgt 15 24.855 ? 0.664 ns/op > SecondarySupersLookup.testNegative04 avgt 15 27.305 ? 0.522 ns/op > SecondarySupersLookup.testNegative05 avgt 15 29.719 ? 0.385 ns/op > SecondarySupersLookup.testNegative06 avgt 15 32.231 ? 0.498 ns/op > SecondarySupersLookup.testNegative07 avgt 15 33.747 ? 0.603 ns/op > SecondarySupersLookup.testNegative08 avgt 15 35.856 ? 0.629 ns/op > SecondarySupersLookup.testNegative09 avgt 15 37.077 ? 0.546 ns/op > SecondarySupersLookup.testNegative10 avgt 15 39.408 ? 0.465 ns/op > SecondarySupersLookup.testNegative16 avgt 15 51.041 ? 0.547 ns/op > SecondarySupersLookup.testNegative20 avgt 15 58.722 ? 0.922 ns/op > SecondarySupersLookup.testNegative30 avgt 15 77.310 ? 0.654 ns/op > SecondarySupersLookup.testNegative32 avgt 15 81.116 ? 0.854 ns/op > SecondarySupersLookup.testNegative40 avgt 15 96.311 ? 0.840 ns/op > SecondarySupersLookup.testNegative50 avgt 15 115.427 ? 0.838 ns/op > SecondarySupersLookup.testNegative55 avgt 15 124.371 ? 1.076 ns/op > SecondarySupersLookup.testNegative56 avgt 15 126.796 ? 0.916 ns/op > SecondarySupersLookup.testNegative57 avgt 15 127.952 ? 1.202 ns/op > SecondarySupersLookup.testNegative58 avgt 15 131.956 ? 4.515 ns/op > SecondarySupersLookup.testNegative59 avgt 15 131.858 ? 1.066 ns/op > SecondarySupersLookup.testNegative60... Hey, is there performance comparison data when zbb is not supported? Just want to make sure it can also get performance gain in that situation. Otherwise, maybe we should only enable it iff zbb is supported? ------------- PR Comment: https://git.openjdk.org/jdk/pull/19320#issuecomment-2144591938 From dholmes at openjdk.org Mon Jun 3 08:33:06 2024 From: dholmes at openjdk.org (David Holmes) Date: Mon, 3 Jun 2024 08:33:06 GMT Subject: RFR: 8330198: Add some class loading related perf counters to measure VM startup [v3] In-Reply-To: References: <7yfsvM0ff6gBYLefpro2qTcEMBmCOHd3YICcygItlZs=.d900a439-4932-46e6-b287-d1bf2789f195@github.com> <24n-bMIYvAF10yob4_Z5t1DPG_nrWypoQYE25zQ499U=.1a77aa5e-1005-4d32-9ae7-ec745838e449@github.com> Message-ID: On Mon, 3 Jun 2024 06:08:37 GMT, Calvin Cheung wrote: >> IMO a dedicated flag (`ProfileClassLinkage`) is well-justified here. `-Xlog:init` prints some data which is not collected by default. So, if a user explicitly specifies `-Xlog:init`, the expectation is JVM automatically enables relevant profiling logic. There's no need to require a user to explicitly specify another flag. >> >> (The main reason the data is not collected by default, unlike most of PerfData counters, is because Calvin spotted some negative effects on startup when profiling is turned on.) >> >> Speaking of the extra flag itself, it can be achieved solely by consulting whether `-Xlog:init` is enabled or not (replace `ProfileClassLinkage` with `log_is_enabled(Info, init)` checks). But I find it clearer and more convenient to control profiling logic with a dedicated flag. As a bonus, it fits nicely with the rest of PerfData framework: when `-XX:+ProfileClassLinkage` is specified w/o `-Xlog:init`, it is still possible to dump the data using external tools (jcmd and jstat). > > Thanks @iwanowww for charming in. > I've changed `arguments.cpp` back to the first version. I'm getting confused. The ProfileClassLinkage flag should control whether the counters are initialized and used. The -Xlog:init (perhaps with a better name/tag!) should control whether they get printed - in theory you could chose to print under any logging setting. I don't think either flag should imply/force the setting of the other. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18790#discussion_r1624012590 From luhenry at openjdk.org Mon Jun 3 08:36:10 2024 From: luhenry at openjdk.org (Ludovic Henry) Date: Mon, 3 Jun 2024 08:36:10 GMT Subject: RFR: 8332900: RISC-V: refactor nativeInst_riscv.cpp and macroAssembler_riscv.cpp [v4] In-Reply-To: References: Message-ID: On Fri, 31 May 2024 14:40:15 GMT, Hamlin Li wrote: >> Hi, >> Can you help to review the patch? >> Currently, code in nativeInst_riscv.cpp and macroAssembler_riscv.cpp call each other, which is not right for readability and maintainance. >> After refactoring, basically only code in nativeInst_riscv.cpp calls code in macroAssembler_riscv.cpp, but not in reverse direction. >> >> Thanks! >> >> * Tests are still running, so far so good. > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > adjust accessibility Marked as reviewed by luhenry (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/19459#pullrequestreview-2093160846 From sgehwolf at openjdk.org Mon Jun 3 09:23:12 2024 From: sgehwolf at openjdk.org (Severin Gehwolf) Date: Mon, 3 Jun 2024 09:23:12 GMT Subject: RFR: 8261242: [Linux] OSContainer::is_containerized() returns true when run outside a container [v3] In-Reply-To: References: Message-ID: On Fri, 3 May 2024 16:05:30 GMT, Severin Gehwolf wrote: >> Please review this enhancement to the container detection code which allows it to figure out whether the JVM is actually running inside a container (`podman`, `docker`, `crio`), or with some other means that enforces memory/cpu limits by means of the cgroup filesystem. If neither of those conditions hold, the JVM runs in not containerized mode, addressing the issue described in the JBS tracker. For example, on my Linux system `is_containerized() == false" is being indicated with the following trace log line: >> >> >> [0.001s][debug][os,container] OSContainer::init: is_containerized() = false because no cpu or memory limit is present >> >> >> This state is being exposed by the Java `Metrics` API class using the new (still JDK internal) `isContainerized()` method. Example: >> >> >> java -XshowSettings:system --version >> Operating System Metrics: >> Provider: cgroupv1 >> System not containerized. >> openjdk 23-internal 2024-09-17 >> OpenJDK Runtime Environment (fastdebug build 23-internal-adhoc.sgehwolf.jdk-jdk) >> OpenJDK 64-Bit Server VM (fastdebug build 23-internal-adhoc.sgehwolf.jdk-jdk, mixed mode, sharing) >> >> >> The basic property this is being built on is the observation that the cgroup controllers typically get mounted read only into containers. Note that the current container tests assert that `OSContainer::is_containerized() == true` in various tests. Therefore, using the heuristic of "is any memory or cpu limit present" isn't sufficient. I had considered that in an earlier iteration, but many container tests failed. >> >> Overall, I think, with this patch we improve the current situation of claiming a containerized system being present when it's actually just a regular Linux system. >> >> Testing: >> >> - [x] GHA (risc-v failure seems infra related) >> - [x] Container tests on Linux x86_64 of cgroups v1 and cgroups v2 (including gtests) >> - [x] Some manual testing using cri-o >> >> Thoughts? > > Severin Gehwolf has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 12 additional commits since the last revision: > > - Add doc for mountinfo scanning. > - Unify naming of variables > - Merge branch 'master' into jdk-8261242-is-containerized-fix > - Merge branch 'master' into jdk-8261242-is-containerized-fix > - jcheck fixes > - Fix tests > - Implement Metrics.isContainerized() > - Some clean-up > - Drop cgroups testing on plain Linux > - Implement fall-back logic for non-ro controller mounts > - ... and 2 more: https://git.openjdk.org/jdk/compare/88976cae...434430ca Keep open bot. Thanks. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18201#issuecomment-2144706137 From tschatzl at openjdk.org Mon Jun 3 09:28:15 2024 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Mon, 3 Jun 2024 09:28:15 GMT Subject: Integrated: 8332936: Test vmTestbase/metaspace/gc/watermark_70_80/TestDescription.java fails with no GC's recorded In-Reply-To: References: Message-ID: On Tue, 28 May 2024 09:25:29 GMT, Thomas Schatzl wrote: > Hi all, > > please review this change to exclude the watermark tests from use with -Xcomp. > > The failures reported are related to -Xcomp triggering the wrong kind of garbage collection pauses (CodeCache related GCs instead of Metadata related GCs) the test then fails on. > > The proposed solution is to just disable the tests with -Xcomp: the tests are not related to compilation at all. > > Testing: local, gha > > Thanks, > Thomas This pull request has now been integrated. Changeset: 5ed0d52c Author: Thomas Schatzl URL: https://git.openjdk.org/jdk/commit/5ed0d52c8424dd2e7f1ac2404e9fabb40c8402b8 Stats: 4 lines in 4 files changed: 4 ins; 0 del; 0 mod 8332936: Test vmTestbase/metaspace/gc/watermark_70_80/TestDescription.java fails with no GC's recorded Reviewed-by: stefank, ayang ------------- PR: https://git.openjdk.org/jdk/pull/19421 From tschatzl at openjdk.org Mon Jun 3 09:28:14 2024 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Mon, 3 Jun 2024 09:28:14 GMT Subject: RFR: 8332936: Test vmTestbase/metaspace/gc/watermark_70_80/TestDescription.java fails with no GC's recorded In-Reply-To: References: Message-ID: On Tue, 28 May 2024 09:35:42 GMT, Stefan Karlsson wrote: >> Hi all, >> >> please review this change to exclude the watermark tests from use with -Xcomp. >> >> The failures reported are related to -Xcomp triggering the wrong kind of garbage collection pauses (CodeCache related GCs instead of Metadata related GCs) the test then fails on. >> >> The proposed solution is to just disable the tests with -Xcomp: the tests are not related to compilation at all. >> >> Testing: local, gha >> >> Thanks, >> Thomas > > Marked as reviewed by stefank (Reviewer). Thanks @stefank @albertnetymk for your reviews ------------- PR Comment: https://git.openjdk.org/jdk/pull/19421#issuecomment-2144715120 From sspitsyn at openjdk.org Mon Jun 3 09:58:38 2024 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Mon, 3 Jun 2024 09:58:38 GMT Subject: RFR: 8326716: JVMTI spec: clarify what nullptr means for C/C++ developers [v5] In-Reply-To: References: Message-ID: > The following RFE was fixed recently: > [8324680](https://bugs.openjdk.org/browse/JDK-8324680): Replace NULL with nullptr in JVMTI generated code > > It replaced all the `NULL`'s in the generated spec with`nullptr`. JVMTI agents can be developed in C or C++. > This update is to make it clear that `nullptr` is C programming language `null` pointer. > > I think we do not need a CSR for this fix. > > Testing: N/A (not needed) Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: review: consistency and stylistical corrections ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19257/files - new: https://git.openjdk.org/jdk/pull/19257/files/48ba8f5d..ed2eff27 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19257&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19257&range=03-04 Stats: 43 lines in 4 files changed: 0 ins; 0 del; 43 mod Patch: https://git.openjdk.org/jdk/pull/19257.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19257/head:pull/19257 PR: https://git.openjdk.org/jdk/pull/19257 From sspitsyn at openjdk.org Mon Jun 3 09:58:38 2024 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Mon, 3 Jun 2024 09:58:38 GMT Subject: RFR: 8326716: JVMTI spec: clarify what nullptr means for C/C++ developers [v4] In-Reply-To: References: <778pi5AHHgXZdUEBV45R0Npj1wZPeuAHwWdrygWR830=.d67d6249-0dcd-4fef-976f-6911432d53f8@github.com> Message-ID: On Mon, 3 Jun 2024 08:01:25 GMT, David Holmes wrote: > The general rules are to either say "a null pointer" (possibly with capital A depending on context), or just "null". And in most cases you could choose either. I made various suggestions but really it is up to you. It is hard to get a sense of consistency from these small fragments. > > The word "null" should never be in code font as it is not a programming language entity. Thank you, David. This is really useful. I've pushed an update with all changes you suggested. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19257#issuecomment-2144777338 From stuefe at openjdk.org Mon Jun 3 10:02:16 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Mon, 3 Jun 2024 10:02:16 GMT Subject: RFR: 8312132: Add tracking of multiple address spaces in NMT [v120] In-Reply-To: <1GrkKo76eQlJz-RRYFMysxygcQIneEHzpygUqbJ_odU=.f611e2ed-963a-4c34-920c-c7ffc4c2e282@github.com> References: <1GrkKo76eQlJz-RRYFMysxygcQIneEHzpygUqbJ_odU=.f611e2ed-963a-4c34-920c-c7ffc4c2e282@github.com> Message-ID: On Thu, 30 May 2024 13:17:17 GMT, Johan Sj?len wrote: > This is a form of close-to-but-not-really double-accounting. My question is: Is this acceptable? I'm asking the ZGC people about this now. My opinion: short term, yes. If we want to get this into 23, we should avoid making any functional changes now. Long term, maybe also yes. The problem is, regardless of what we do, ZGC semantics are difficult to press into the normal reserved/commit scheme. IIUC (and I may not :), ZGC does 1 memfd_create, *then* 2 fallocate, *then* 3 maps that into the user address space. So, after (2), there is a time window where the process uses more memory than its vsize. That is because, IIUC, the allocation done with fallocate() allocates memory in the kernel on behalf of out process. But its not accounted to our process, right? Not visible in any metric. A) If we count the fallocate as only committed, not reserved, we can have the situation where the sum of committed memory is larger than reserved, which will confuse every analyst. B) If we count the fallocate as reserve+commit, we can see too much reserved memory, possibly larger than the vsize of the process. I still think (B) is the lesser of two evils. Also, less chance to trip of sanity tests that test reserved vs committed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18289#issuecomment-2144783898 From rehn at openjdk.org Mon Jun 3 10:51:01 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Mon, 3 Jun 2024 10:51:01 GMT Subject: RFR: 8332900: RISC-V: refactor nativeInst_riscv.cpp and macroAssembler_riscv.cpp [v4] In-Reply-To: <2rhfcNiCyomqQ5Ibh5xWJv6uEYkdme7gS8ViGzOTjww=.8bd6bffe-b72b-47ad-9939-a07e1644a84e@github.com> References: <2rhfcNiCyomqQ5Ibh5xWJv6uEYkdme7gS8ViGzOTjww=.8bd6bffe-b72b-47ad-9939-a07e1644a84e@github.com> Message-ID: On Mon, 3 Jun 2024 08:16:42 GMT, Hamlin Li wrote: > I'm not sure. If this is a problem in this patch, then in original implementation, MASM.hpp includes nativeInst.hpp, it will cause the similar issue? In another hand, will unused methods in a header file still exist in the compiled binary? It's about having readable header, havin long private, not part of API, method in header clobber it. They will not exists, but compiling away methods is not free as it prolongs the compile time. > > The reason I do it this way is because I just want to refactor necessary things (the pricipal is to only nativeInst depends on macroAssembler, not in reverse direction), for other things, I try to keep it as original ones. So if there is other optimization opportunities, it's better to do it in separate pr's. If this is a simple move of code, then I suggest you wait until my patch, as that changes names and doubles the methods which you want to move. Also now NativeCall::instruction_size, which is used in shared code is nativeInst, but NativeInstruction::instruction_size is in MASM. From my patch riscv.ad file: int MachCallDynamicJavaNode::ret_addr_offset() { return NativeMovConstReg::movptr2_instruction_size + NativeInstruction::instruction_size; // movptr2, jal if (UseTrampolines) { return NativeMovConstReg::movptr2_instruction_size + NativeInstruction::instruction_size; // movptr2, jal } } ------------- PR Comment: https://git.openjdk.org/jdk/pull/19459#issuecomment-2144881602 From gcao at openjdk.org Mon Jun 3 11:00:00 2024 From: gcao at openjdk.org (Gui Cao) Date: Mon, 3 Jun 2024 11:00:00 GMT Subject: RFR: 8332587: RISC-V: secondary_super_cache does not scale well In-Reply-To: References: Message-ID: On Tue, 21 May 2024 08:31:53 GMT, Gui Cao wrote: > Implementation of subtype checking [JDK-8180450](https://bugs.openjdk.org/browse/JDK-8180450) for linux-riscv64. > This optimization depends on availability of the Zbb extension which has the cpop instruction. > > ### Correctness testing: > > - [x] Run tier1-3, hotspot:tier4 tests on SOPHON SG2042 (release) > - [x] Run tier1-3 tests with -XX:+UseRVV on qemu 8.1.0 (fastdebug) > - [x] Run all of tier1 with `-XX:+VerifySecondarySupers` > > ### JMH tested on Banana Pi BPI-F3 board (has Zbb) and Enable UseZbb: > Original: > > Benchmark Mode Cnt Score Error Units > SecondarySuperCacheHits.test avgt 15 11.375 ? 0.071 ns/op > SecondarySuperCacheInterContention.test avgt 15 646.087 ? 32.587 ns/op > SecondarySuperCacheInterContention.test:t1 avgt 15 600.090 ? 83.779 ns/op > SecondarySuperCacheInterContention.test:t2 avgt 15 692.084 ? 73.218 ns/op > SecondarySupersLookup.testNegative00 avgt 15 16.420 ? 0.239 ns/op > SecondarySupersLookup.testNegative01 avgt 15 18.307 ? 0.260 ns/op > SecondarySupersLookup.testNegative02 avgt 15 21.695 ? 0.458 ns/op > SecondarySupersLookup.testNegative03 avgt 15 24.855 ? 0.664 ns/op > SecondarySupersLookup.testNegative04 avgt 15 27.305 ? 0.522 ns/op > SecondarySupersLookup.testNegative05 avgt 15 29.719 ? 0.385 ns/op > SecondarySupersLookup.testNegative06 avgt 15 32.231 ? 0.498 ns/op > SecondarySupersLookup.testNegative07 avgt 15 33.747 ? 0.603 ns/op > SecondarySupersLookup.testNegative08 avgt 15 35.856 ? 0.629 ns/op > SecondarySupersLookup.testNegative09 avgt 15 37.077 ? 0.546 ns/op > SecondarySupersLookup.testNegative10 avgt 15 39.408 ? 0.465 ns/op > SecondarySupersLookup.testNegative16 avgt 15 51.041 ? 0.547 ns/op > SecondarySupersLookup.testNegative20 avgt 15 58.722 ? 0.922 ns/op > SecondarySupersLookup.testNegative30 avgt 15 77.310 ? 0.654 ns/op > SecondarySupersLookup.testNegative32 avgt 15 81.116 ? 0.854 ns/op > SecondarySupersLookup.testNegative40 avgt 15 96.311 ? 0.840 ns/op > SecondarySupersLookup.testNegative50 avgt 15 115.427 ? 0.838 ns/op > SecondarySupersLookup.testNegative55 avgt 15 124.371 ? 1.076 ns/op > SecondarySupersLookup.testNegative56 avgt 15 126.796 ? 0.916 ns/op > SecondarySupersLookup.testNegative57 avgt 15 127.952 ? 1.202 ns/op > SecondarySupersLookup.testNegative58 avgt 15 131.956 ? 4.515 ns/op > SecondarySupersLookup.testNegative59 avgt 15 131.858 ? 1.066 ns/op > SecondarySupersLookup.testNegative60... Hi, the following is an implementation using scalar assembly when zbb is not available. ``` diff diff --git a/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp b/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp index ce6696c18a8..93e1045d2d4 100644 --- a/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp +++ b/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp @@ -3481,6 +3481,36 @@ void MacroAssembler::check_klass_subtype_slow_path(Register sub_klass, bind(L_fallthrough); } +void MacroAssembler::population_count(Register dst, Register src, + Register tmp1, Register tmp2) { + + if (UsePopCountInstruction) { + cpop(dst, src); + } else { + assert_different_registers(src, tmp1, tmp2); + assert_different_registers(dst, tmp1, tmp2); + Label loop, done; + + mv(tmp1, src); + // dst = 0; + // while(tmp1 != 0) { + // dst++; + // tmp1 &= (tmp1 - 1); + // } + mv(dst, zr); + beqz(tmp1, done); + { + bind(loop); + addi(dst, dst, 1); + mv(tmp2, tmp1); + addi(tmp2, tmp2, -1); + andr(tmp1, tmp1, tmp2); + bnez(tmp1, loop); + } + bind(done); + } +} + // Ensure that the inline code and the stub are using the same registers. #define LOOKUP_SECONDARY_SUPERS_TABLE_REGISTERS \ do { \ @@ -3533,7 +3563,7 @@ bool MacroAssembler::lookup_secondary_supers_table(Register r_sub_klass, // Get the first array index that can contain super_klass into r_array_index. if (bit != 0) { slli(r_array_index, r_bitmap, (Klass::SECONDARY_SUPERS_TABLE_MASK - bit)); - cpop(r_array_index, r_array_index); + population_count(r_array_index, r_array_index, t0, tmp1); } else { mv(r_array_index, (u1)1); } diff --git a/src/hotspot/cpu/riscv/macroAssembler_riscv.hpp b/src/hotspot/cpu/riscv/macroAssembler_riscv.hpp index 2575d5ea2ff..3e4930d5605 100644 --- a/src/hotspot/cpu/riscv/macroAssembler_riscv.hpp +++ b/src/hotspot/cpu/riscv/macroAssembler_riscv.hpp @@ -322,6 +322,8 @@ class MacroAssembler: public Assembler { Label* L_success, Label* L_failure); + void population_count(Register dst, Register src, Register tmp1, Register tmp2); + // As above, but with a constant super_klass. // The result is in Register result, not the condition codes. bool lookup_secondary_supers_table(Register r_sub_klass, JMH tested on HiFive Unmatched(has not Zbb): Original Benchmark Mode Cnt Score Error Units [14/1836] SecondarySupersLookup.testNegative00 avgt 15 28.625 ? 7.158 ns/op SecondarySupersLookup.testNegative01 avgt 15 33.860 ? 6.312 ns/op SecondarySupersLookup.testNegative02 avgt 15 30.887 ? 4.773 ns/op SecondarySupersLookup.testNegative03 avgt 15 39.477 ? 6.945 ns/op SecondarySupersLookup.testNegative04 avgt 15 34.976 ? 3.080 ns/op SecondarySupersLookup.testNegative05 avgt 15 42.025 ? 8.324 ns/op SecondarySupersLookup.testNegative06 avgt 15 49.359 ? 8.480 ns/op SecondarySupersLookup.testNegative07 avgt 15 49.996 ? 11.841 ns/op SecondarySupersLookup.testNegative08 avgt 15 58.468 ? 8.485 ns/op SecondarySupersLookup.testNegative09 avgt 15 57.198 ? 10.803 ns/op SecondarySupersLookup.testNegative10 avgt 15 63.531 ? 5.595 ns/op SecondarySupersLookup.testNegative16 avgt 15 73.716 ? 9.231 ns/op SecondarySupersLookup.testNegative20 avgt 15 88.823 ? 16.179 ns/op SecondarySupersLookup.testNegative30 avgt 15 118.832 ? 18.866 ns/op SecondarySupersLookup.testNegative32 avgt 15 126.538 ? 23.139 ns/op SecondarySupersLookup.testNegative40 avgt 15 149.722 ? 31.675 ns/op SecondarySupersLookup.testNegative50 avgt 15 186.958 ? 39.203 ns/op SecondarySupersLookup.testNegative55 avgt 15 193.787 ? 29.629 ns/op SecondarySupersLookup.testNegative56 avgt 15 204.451 ? 34.491 ns/op SecondarySupersLookup.testNegative57 avgt 15 204.104 ? 27.130 ns/op SecondarySupersLookup.testNegative58 avgt 15 207.017 ? 31.201 ns/op SecondarySupersLookup.testNegative59 avgt 15 219.159 ? 33.664 ns/op SecondarySupersLookup.testNegative60 avgt 15 208.726 ? 27.195 ns/op SecondarySupersLookup.testNegative61 avgt 15 214.557 ? 30.992 ns/op SecondarySupersLookup.testNegative62 avgt 15 212.104 ? 30.843 ns/op SecondarySupersLookup.testNegative63 avgt 15 227.805 ? 39.706 ns/op SecondarySupersLookup.testNegative64 avgt 15 229.951 ? 42.039 ns/op SecondarySupersLookup.testPositive01 avgt 15 18.498 ? 4.687 ns/op SecondarySupersLookup.testPositive02 avgt 15 20.130 ? 4.955 ns/op SecondarySupersLookup.testPositive03 avgt 15 18.576 ? 4.383 ns/op SecondarySupersLookup.testPositive04 avgt 15 19.202 ? 4.554 ns/op SecondarySupersLookup.testPositive05 avgt 15 18.923 ? 4.730 ns/op SecondarySupersLookup.testPositive06 avgt 15 20.494 ? 6.282 ns/op SecondarySupersLookup.testPositive07 avgt 15 17.679 ? 2.386 ns/op SecondarySupersLookup.testPositive08 avgt 15 19.396 ? 7.047 ns/op SecondarySupersLookup.testPositive09 avgt 15 18.163 ? 2.950 ns/op SecondarySupersLookup.testPositive10 avgt 15 21.135 ? 5.552 ns/op SecondarySupersLookup.testPositive16 avgt 15 20.117 ? 4.606 ns/op SecondarySupersLookup.testPositive20 avgt 15 21.209 ? 5.800 ns/op SecondarySupersLookup.testPositive30 avgt 15 21.388 ? 6.792 ns/op SecondarySupersLookup.testPositive32 avgt 15 19.720 ? 4.559 ns/op SecondarySupersLookup.testPositive40 avgt 15 17.354 ? 2.707 ns/op SecondarySupersLookup.testPositive50 avgt 15 20.825 ? 6.062 ns/op SecondarySupersLookup.testPositive60 avgt 15 19.910 ? 5.621 ns/op SecondarySupersLookup.testPositive63 avgt 15 18.989 ? 3.156 ns/op SecondarySupersLookup.testPositive64 avgt 15 20.298 ? 5.357 ns/op Finished running test 'micro:vm.lang.SecondarySupersLookup' With patch: Benchmark Mode Cnt Score Error Units SecondarySupersLookup.testNegative00 avgt 15 21.124 ? 1.601 ns/op SecondarySupersLookup.testNegative01 avgt 15 25.788 ? 3.713 ns/op SecondarySupersLookup.testNegative02 avgt 15 25.501 ? 5.616 ns/op SecondarySupersLookup.testNegative03 avgt 15 22.800 ? 6.454 ns/op SecondarySupersLookup.testNegative04 avgt 15 21.790 ? 2.629 ns/op SecondarySupersLookup.testNegative05 avgt 15 25.485 ? 6.082 ns/op SecondarySupersLookup.testNegative06 avgt 15 24.801 ? 5.387 ns/op SecondarySupersLookup.testNegative07 avgt 15 24.425 ? 4.686 ns/op SecondarySupersLookup.testNegative08 avgt 15 24.486 ? 4.044 ns/op SecondarySupersLookup.testNegative09 avgt 15 23.810 ? 3.838 ns/op SecondarySupersLookup.testNegative10 avgt 15 25.085 ? 3.756 ns/op SecondarySupersLookup.testNegative16 avgt 15 22.018 ? 2.924 ns/op SecondarySupersLookup.testNegative20 avgt 15 23.161 ? 3.271 ns/op SecondarySupersLookup.testNegative30 avgt 15 23.705 ? 4.669 ns/op SecondarySupersLookup.testNegative32 avgt 15 25.048 ? 7.125 ns/op SecondarySupersLookup.testNegative40 avgt 15 24.661 ? 3.541 ns/op SecondarySupersLookup.testNegative50 avgt 15 22.918 ? 2.879 ns/op SecondarySupersLookup.testNegative55 avgt 15 250.982 ? 10.224 ns/op SecondarySupersLookup.testNegative56 avgt 15 251.020 ? 8.432 ns/op SecondarySupersLookup.testNegative57 avgt 15 255.998 ? 9.054 ns/op SecondarySupersLookup.testNegative58 avgt 15 257.347 ? 11.340 ns/op SecondarySupersLookup.testNegative59 avgt 15 277.727 ? 10.007 ns/op SecondarySupersLookup.testNegative60 avgt 15 304.818 ? 12.092 ns/op SecondarySupersLookup.testNegative61 avgt 15 308.956 ? 13.060 ns/op SecondarySupersLookup.testNegative62 avgt 15 309.804 ? 14.715 ns/op SecondarySupersLookup.testNegative63 avgt 15 416.021 ? 8.051 ns/op SecondarySupersLookup.testNegative64 avgt 15 425.190 ? 10.966 ns/op SecondarySupersLookup.testPositive01 avgt 15 18.369 ? 4.490 ns/op SecondarySupersLookup.testPositive02 avgt 15 21.595 ? 6.626 ns/op SecondarySupersLookup.testPositive03 avgt 15 19.327 ? 4.973 ns/op SecondarySupersLookup.testPositive04 avgt 15 19.636 ? 4.759 ns/op SecondarySupersLookup.testPositive05 avgt 15 17.055 ? 2.329 ns/op SecondarySupersLookup.testPositive06 avgt 15 18.712 ? 3.333 ns/op SecondarySupersLookup.testPositive07 avgt 15 20.508 ? 4.213 ns/op SecondarySupersLookup.testPositive08 avgt 15 19.208 ? 3.761 ns/op SecondarySupersLookup.testPositive09 avgt 15 18.061 ? 3.619 ns/op SecondarySupersLookup.testPositive10 avgt 15 17.519 ? 3.322 ns/op SecondarySupersLookup.testPositive16 avgt 15 19.099 ? 4.358 ns/op SecondarySupersLookup.testPositive20 avgt 15 20.731 ? 5.230 ns/op SecondarySupersLookup.testPositive30 avgt 15 18.048 ? 2.994 ns/op SecondarySupersLookup.testPositive32 avgt 15 18.817 ? 3.856 ns/op SecondarySupersLookup.testPositive40 avgt 15 17.165 ? 2.536 ns/op SecondarySupersLookup.testPositive50 avgt 15 20.060 ? 4.473 ns/op SecondarySupersLookup.testPositive60 avgt 15 17.296 ? 2.411 ns/op SecondarySupersLookup.testPositive63 avgt 15 19.313 ? 4.133 ns/op SecondarySupersLookup.testPositive64 avgt 15 21.258 ? 4.909 ns/op Finished running test 'micro:vm.lang.SecondarySupersLookup' JMH tested on LicheePi 4A(has not Zbb): Original Benchmark Mode Cnt Score Error Units SecondarySupersLookup.testNegative00 avgt 15 26.297 ? 0.941 ns/op SecondarySupersLookup.testNegative01 avgt 15 26.345 ? 1.404 ns/op SecondarySupersLookup.testNegative02 avgt 15 27.191 ? 1.421 ns/op SecondarySupersLookup.testNegative03 avgt 15 28.546 ? 1.475 ns/op SecondarySupersLookup.testNegative04 avgt 15 28.785 ? 1.329 ns/op SecondarySupersLookup.testNegative05 avgt 15 30.524 ? 1.816 ns/op SecondarySupersLookup.testNegative06 avgt 15 30.284 ? 0.984 ns/op SecondarySupersLookup.testNegative07 avgt 15 31.594 ? 1.093 ns/op SecondarySupersLookup.testNegative08 avgt 15 32.778 ? 1.269 ns/op SecondarySupersLookup.testNegative09 avgt 15 33.549 ? 0.913 ns/op SecondarySupersLookup.testNegative10 avgt 15 35.468 ? 1.643 ns/op SecondarySupersLookup.testNegative16 avgt 15 68.065 ? 1.890 ns/op SecondarySupersLookup.testNegative20 avgt 15 58.098 ? 2.704 ns/op SecondarySupersLookup.testNegative30 avgt 15 70.944 ? 2.929 ns/op SecondarySupersLookup.testNegative32 avgt 15 76.134 ? 3.719 ns/op SecondarySupersLookup.testNegative40 avgt 15 89.092 ? 4.396 ns/op SecondarySupersLookup.testNegative50 avgt 15 105.226 ? 4.877 ns/op SecondarySupersLookup.testNegative55 avgt 15 115.744 ? 6.281 ns/op SecondarySupersLookup.testNegative56 avgt 15 119.860 ? 5.618 ns/op SecondarySupersLookup.testNegative57 avgt 15 117.818 ? 5.497 ns/op SecondarySupersLookup.testNegative58 avgt 15 121.410 ? 6.781 ns/op SecondarySupersLookup.testNegative59 avgt 15 124.500 ? 7.016 ns/op SecondarySupersLookup.testNegative60 avgt 15 125.322 ? 5.241 ns/op SecondarySupersLookup.testNegative61 avgt 15 129.009 ? 4.680 ns/op SecondarySupersLookup.testNegative62 avgt 15 126.704 ? 5.917 ns/op SecondarySupersLookup.testNegative63 avgt 15 131.529 ? 5.247 ns/op SecondarySupersLookup.testNegative64 avgt 15 134.511 ? 4.925 ns/op SecondarySupersLookup.testPositive01 avgt 15 22.386 ? 0.680 ns/op SecondarySupersLookup.testPositive02 avgt 15 21.655 ? 0.492 ns/op SecondarySupersLookup.testPositive03 avgt 15 22.123 ? 0.671 ns/op SecondarySupersLookup.testPositive04 avgt 15 22.050 ? 0.610 ns/op SecondarySupersLookup.testPositive05 avgt 15 22.048 ? 0.614 ns/op SecondarySupersLookup.testPositive06 avgt 15 21.850 ? 0.597 ns/op SecondarySupersLookup.testPositive07 avgt 15 21.844 ? 0.619 ns/op SecondarySupersLookup.testPositive08 avgt 15 21.832 ? 0.601 ns/op SecondarySupersLookup.testPositive09 avgt 15 21.743 ? 0.527 ns/op SecondarySupersLookup.testPositive10 avgt 15 22.037 ? 0.609 ns/op SecondarySupersLookup.testPositive16 avgt 15 22.300 ? 0.502 ns/op SecondarySupersLookup.testPositive20 avgt 15 21.607 ? 0.498 ns/op SecondarySupersLookup.testPositive30 avgt 15 21.836 ? 0.602 ns/op SecondarySupersLookup.testPositive32 avgt 15 21.629 ? 0.484 ns/op SecondarySupersLookup.testPositive40 avgt 15 21.850 ? 0.621 ns/op SecondarySupersLookup.testPositive50 avgt 15 22.478 ? 0.130 ns/op SecondarySupersLookup.testPositive60 avgt 15 22.058 ? 0.617 ns/op SecondarySupersLookup.testPositive63 avgt 15 21.828 ? 0.596 ns/op SecondarySupersLookup.testPositive64 avgt 15 22.077 ? 0.603 ns/op Finished running test 'micro:vm.lang.SecondarySupersLookup' With patch: Benchmark Mode Cnt Score Error Units SecondarySupersLookup.testNegative00 avgt 15 24.305 ? 1.269 ns/op SecondarySupersLookup.testNegative01 avgt 15 23.552 ? 1.214 ns/op SecondarySupersLookup.testNegative02 avgt 15 22.720 ? 0.771 ns/op SecondarySupersLookup.testNegative03 avgt 15 22.713 ? 0.834 ns/op SecondarySupersLookup.testNegative04 avgt 15 22.924 ? 0.684 ns/op SecondarySupersLookup.testNegative05 avgt 15 22.614 ? 0.604 ns/op SecondarySupersLookup.testNegative06 avgt 15 22.387 ? 0.641 ns/op SecondarySupersLookup.testNegative07 avgt 15 22.201 ? 0.502 ns/op SecondarySupersLookup.testNegative08 avgt 15 22.391 ? 0.606 ns/op SecondarySupersLookup.testNegative09 avgt 15 22.462 ? 0.617 ns/op SecondarySupersLookup.testNegative10 avgt 15 22.525 ? 0.202 ns/op SecondarySupersLookup.testNegative16 avgt 15 22.439 ? 0.616 ns/op SecondarySupersLookup.testNegative20 avgt 15 22.963 ? 0.298 ns/op SecondarySupersLookup.testNegative30 avgt 15 22.642 ? 0.621 ns/op SecondarySupersLookup.testNegative32 avgt 15 22.306 ? 0.670 ns/op SecondarySupersLookup.testNegative40 avgt 15 22.663 ? 0.644 ns/op SecondarySupersLookup.testNegative50 avgt 15 22.001 ? 0.238 ns/op SecondarySupersLookup.testNegative55 avgt 15 128.558 ? 5.735 ns/op SecondarySupersLookup.testNegative56 avgt 15 128.633 ? 4.893 ns/op SecondarySupersLookup.testNegative57 avgt 15 129.143 ? 5.955 ns/op SecondarySupersLookup.testNegative58 avgt 15 132.434 ? 6.478 ns/op SecondarySupersLookup.testNegative59 avgt 15 130.243 ? 5.901 ns/op SecondarySupersLookup.testNegative60 avgt 15 163.505 ? 8.278 ns/op SecondarySupersLookup.testNegative61 avgt 15 163.934 ? 9.008 ns/op SecondarySupersLookup.testNegative62 avgt 15 162.247 ? 6.238 ns/op SecondarySupersLookup.testNegative63 avgt 15 213.133 ? 9.582 ns/op SecondarySupersLookup.testNegative64 avgt 15 214.724 ? 11.562 ns/op SecondarySupersLookup.testPositive01 avgt 15 21.622 ? 0.482 ns/op SecondarySupersLookup.testPositive02 avgt 15 21.842 ? 0.602 ns/op SecondarySupersLookup.testPositive03 avgt 15 22.274 ? 0.516 ns/op SecondarySupersLookup.testPositive04 avgt 15 21.833 ? 0.632 ns/op SecondarySupersLookup.testPositive05 avgt 15 21.842 ? 0.603 ns/op SecondarySupersLookup.testPositive06 avgt 15 21.630 ? 0.527 ns/op SecondarySupersLookup.testPositive07 avgt 15 22.054 ? 0.581 ns/op SecondarySupersLookup.testPositive08 avgt 15 21.872 ? 0.613 ns/op SecondarySupersLookup.testPositive09 avgt 15 21.839 ? 0.604 ns/op SecondarySupersLookup.testPositive10 avgt 15 21.619 ? 0.494 ns/op SecondarySupersLookup.testPositive16 avgt 15 21.624 ? 0.509 ns/op SecondarySupersLookup.testPositive20 avgt 15 21.828 ? 0.595 ns/op SecondarySupersLookup.testPositive30 avgt 15 21.861 ? 0.617 ns/op SecondarySupersLookup.testPositive32 avgt 15 22.141 ? 0.609 ns/op SecondarySupersLookup.testPositive40 avgt 15 21.632 ? 0.485 ns/op SecondarySupersLookup.testPositive50 avgt 15 21.856 ? 0.597 ns/op SecondarySupersLookup.testPositive60 avgt 15 22.068 ? 0.610 ns/op SecondarySupersLookup.testPositive63 avgt 15 21.647 ? 0.496 ns/op SecondarySupersLookup.testPositive64 avgt 15 21.847 ? 0.595 ns/op Finished running test 'micro:vm.lang.SecondarySupersLookup' With the above test data, we see that there is a performance decrease from testNegative55 to testNegative64 when zbb is not available. This is because a loop is needed to count the number of 1 when Zbb is not available. Therefore, the current patch is only optimized when Zbb is available. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19320#issuecomment-2144900070 From gcao at openjdk.org Mon Jun 3 11:03:01 2024 From: gcao at openjdk.org (Gui Cao) Date: Mon, 3 Jun 2024 11:03:01 GMT Subject: RFR: 8332587: RISC-V: secondary_super_cache does not scale well In-Reply-To: References: Message-ID: On Mon, 3 Jun 2024 08:27:50 GMT, Hamlin Li wrote: > Hey, is there performance comparison data when zbb is not supported? Just want to know if it can also get performance gain in that situation, then this optimization is a more generic one. > > The reason for ask this question is that seems JDK-8180450 is an optimization for cache line, it should or could bring benefit even if zbb is not supported? Hi, Thanks for the review. Sorry for being late. I've implemented the scalar register version before, but tested it and had data performance decrease, so the scalar implementation is not included in this Patch. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19320#issuecomment-2144904585 From rehn at openjdk.org Mon Jun 3 11:06:03 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Mon, 3 Jun 2024 11:06:03 GMT Subject: RFR: 8332900: RISC-V: refactor nativeInst_riscv.cpp and macroAssembler_riscv.cpp [v4] In-Reply-To: References: Message-ID: On Fri, 31 May 2024 14:40:15 GMT, Hamlin Li wrote: >> Hi, >> Can you help to review the patch? >> Currently, code in nativeInst_riscv.cpp and macroAssembler_riscv.cpp call each other, which is not right for readability and maintainance. >> After refactoring, basically only code in nativeInst_riscv.cpp calls code in macroAssembler_riscv.cpp, but not in reverse direction. >> >> Thanks! >> >> * Tests are still running, so far so good. > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > adjust accessibility class Assembler : public AbstractAssembler { public: enum { instruction_size = 4, class MacroAssembler: public Assembler { public: enum { instruction_size = 4, Not sure why you have the same enum defs in both asm and masm? ------------- PR Comment: https://git.openjdk.org/jdk/pull/19459#issuecomment-2144910630 From mli at openjdk.org Mon Jun 3 11:12:04 2024 From: mli at openjdk.org (Hamlin Li) Date: Mon, 3 Jun 2024 11:12:04 GMT Subject: RFR: 8332587: RISC-V: secondary_super_cache does not scale well In-Reply-To: References: Message-ID: <5SuzTWVHimIbXSvfygBu9yXjYAwHlqECN-3WK-n93qs=.09eacb6f-ba71-4fbd-b6c0-1f5c27e293b8@github.com> On Tue, 21 May 2024 08:31:53 GMT, Gui Cao wrote: > Implementation of subtype checking [JDK-8180450](https://bugs.openjdk.org/browse/JDK-8180450) for linux-riscv64. > This optimization depends on availability of the Zbb extension which has the cpop instruction. > > ### Correctness testing: > > - [x] Run tier1-3, hotspot:tier4 tests on SOPHON SG2042 (release) > - [x] Run tier1-3 tests with -XX:+UseRVV on qemu 8.1.0 (fastdebug) > - [x] Run all of tier1 with `-XX:+VerifySecondarySupers` > > ### JMH tested on Banana Pi BPI-F3 board (has Zbb) and Enable UseZbb: > Original: > > Benchmark Mode Cnt Score Error Units > SecondarySuperCacheHits.test avgt 15 11.375 ? 0.071 ns/op > SecondarySuperCacheInterContention.test avgt 15 646.087 ? 32.587 ns/op > SecondarySuperCacheInterContention.test:t1 avgt 15 600.090 ? 83.779 ns/op > SecondarySuperCacheInterContention.test:t2 avgt 15 692.084 ? 73.218 ns/op > SecondarySupersLookup.testNegative00 avgt 15 16.420 ? 0.239 ns/op > SecondarySupersLookup.testNegative01 avgt 15 18.307 ? 0.260 ns/op > SecondarySupersLookup.testNegative02 avgt 15 21.695 ? 0.458 ns/op > SecondarySupersLookup.testNegative03 avgt 15 24.855 ? 0.664 ns/op > SecondarySupersLookup.testNegative04 avgt 15 27.305 ? 0.522 ns/op > SecondarySupersLookup.testNegative05 avgt 15 29.719 ? 0.385 ns/op > SecondarySupersLookup.testNegative06 avgt 15 32.231 ? 0.498 ns/op > SecondarySupersLookup.testNegative07 avgt 15 33.747 ? 0.603 ns/op > SecondarySupersLookup.testNegative08 avgt 15 35.856 ? 0.629 ns/op > SecondarySupersLookup.testNegative09 avgt 15 37.077 ? 0.546 ns/op > SecondarySupersLookup.testNegative10 avgt 15 39.408 ? 0.465 ns/op > SecondarySupersLookup.testNegative16 avgt 15 51.041 ? 0.547 ns/op > SecondarySupersLookup.testNegative20 avgt 15 58.722 ? 0.922 ns/op > SecondarySupersLookup.testNegative30 avgt 15 77.310 ? 0.654 ns/op > SecondarySupersLookup.testNegative32 avgt 15 81.116 ? 0.854 ns/op > SecondarySupersLookup.testNegative40 avgt 15 96.311 ? 0.840 ns/op > SecondarySupersLookup.testNegative50 avgt 15 115.427 ? 0.838 ns/op > SecondarySupersLookup.testNegative55 avgt 15 124.371 ? 1.076 ns/op > SecondarySupersLookup.testNegative56 avgt 15 126.796 ? 0.916 ns/op > SecondarySupersLookup.testNegative57 avgt 15 127.952 ? 1.202 ns/op > SecondarySupersLookup.testNegative58 avgt 15 131.956 ? 4.515 ns/op > SecondarySupersLookup.testNegative59 avgt 15 131.858 ? 1.066 ns/op > SecondarySupersLookup.testNegative60... Thanks for explanation. Seems this patch will also use some instructions in zba/zbs extension. Will they also impact the final performance? Also has some minor comments first. src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 3484: > 3482: } > 3483: > 3484: // Ensure that the inline code and the stub are using the same registers. Can you add comment about why inline code and the stub should use the same registers? src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 3485: > 3483: > 3484: // Ensure that the inline code and the stub are using the same registers. > 3485: #define LOOKUP_SECONDARY_SUPERS_TABLE_REGISTERS \ Can `r_super_klass ` and so on be passed as arguments of macro? src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 3530: > 3528: // the bit is zero, we are certain that super_klass is not one of > 3529: // the secondary supers. > 3530: test_bit(t0, r_bitmap, bit); it uses zbs if possible, I wonder what's the performance data when zbs is diabled or unsupported. src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 3549: > 3547: assert(Array::length_offset_in_bytes() == 0, "Adjust this code"); > 3548: > 3549: shadd(result, r_array_index, r_array_base, result, LogBytesPerWord); similar here, it uses zba if possible, so good to know what's the performance data when zba not supported. src/hotspot/cpu/riscv/riscv.ad line 10127: > 10125: %{ > 10126: match(Set result (PartialSubtypeCheck sub (Binary super_reg super_con))); > 10127: predicate(UseSecondarySupersTable); suggestion: move `predicate` before `match`. src/hotspot/cpu/riscv/vm_version_riscv.hpp line 280: > 278: constexpr static bool supports_recursive_lightweight_locking() { return true; } > 279: > 280: constexpr static bool supports_secondary_supers_table() { return true; } If it depends on zbb, should it return UseZbb here? ------------- PR Review: https://git.openjdk.org/jdk/pull/19320#pullrequestreview-2093273846 PR Review Comment: https://git.openjdk.org/jdk/pull/19320#discussion_r1624154904 PR Review Comment: https://git.openjdk.org/jdk/pull/19320#discussion_r1624122884 PR Review Comment: https://git.openjdk.org/jdk/pull/19320#discussion_r1624212710 PR Review Comment: https://git.openjdk.org/jdk/pull/19320#discussion_r1624214685 PR Review Comment: https://git.openjdk.org/jdk/pull/19320#discussion_r1624119194 PR Review Comment: https://git.openjdk.org/jdk/pull/19320#discussion_r1624090583 From mli at openjdk.org Mon Jun 3 11:12:05 2024 From: mli at openjdk.org (Hamlin Li) Date: Mon, 3 Jun 2024 11:12:05 GMT Subject: RFR: 8332587: RISC-V: secondary_super_cache does not scale well In-Reply-To: <5SuzTWVHimIbXSvfygBu9yXjYAwHlqECN-3WK-n93qs=.09eacb6f-ba71-4fbd-b6c0-1f5c27e293b8@github.com> References: <5SuzTWVHimIbXSvfygBu9yXjYAwHlqECN-3WK-n93qs=.09eacb6f-ba71-4fbd-b6c0-1f5c27e293b8@github.com> Message-ID: On Mon, 3 Jun 2024 09:14:10 GMT, Hamlin Li wrote: >> Implementation of subtype checking [JDK-8180450](https://bugs.openjdk.org/browse/JDK-8180450) for linux-riscv64. >> This optimization depends on availability of the Zbb extension which has the cpop instruction. >> >> ### Correctness testing: >> >> - [x] Run tier1-3, hotspot:tier4 tests on SOPHON SG2042 (release) >> - [x] Run tier1-3 tests with -XX:+UseRVV on qemu 8.1.0 (fastdebug) >> - [x] Run all of tier1 with `-XX:+VerifySecondarySupers` >> >> ### JMH tested on Banana Pi BPI-F3 board (has Zbb) and Enable UseZbb: >> Original: >> >> Benchmark Mode Cnt Score Error Units >> SecondarySuperCacheHits.test avgt 15 11.375 ? 0.071 ns/op >> SecondarySuperCacheInterContention.test avgt 15 646.087 ? 32.587 ns/op >> SecondarySuperCacheInterContention.test:t1 avgt 15 600.090 ? 83.779 ns/op >> SecondarySuperCacheInterContention.test:t2 avgt 15 692.084 ? 73.218 ns/op >> SecondarySupersLookup.testNegative00 avgt 15 16.420 ? 0.239 ns/op >> SecondarySupersLookup.testNegative01 avgt 15 18.307 ? 0.260 ns/op >> SecondarySupersLookup.testNegative02 avgt 15 21.695 ? 0.458 ns/op >> SecondarySupersLookup.testNegative03 avgt 15 24.855 ? 0.664 ns/op >> SecondarySupersLookup.testNegative04 avgt 15 27.305 ? 0.522 ns/op >> SecondarySupersLookup.testNegative05 avgt 15 29.719 ? 0.385 ns/op >> SecondarySupersLookup.testNegative06 avgt 15 32.231 ? 0.498 ns/op >> SecondarySupersLookup.testNegative07 avgt 15 33.747 ? 0.603 ns/op >> SecondarySupersLookup.testNegative08 avgt 15 35.856 ? 0.629 ns/op >> SecondarySupersLookup.testNegative09 avgt 15 37.077 ? 0.546 ns/op >> SecondarySupersLookup.testNegative10 avgt 15 39.408 ? 0.465 ns/op >> SecondarySupersLookup.testNegative16 avgt 15 51.041 ? 0.547 ns/op >> SecondarySupersLookup.testNegative20 avgt 15 58.722 ? 0.922 ns/op >> SecondarySupersLookup.testNegative30 avgt 15 77.310 ? 0.654 ns/op >> SecondarySupersLookup.testNegative32 avgt 15 81.116 ? 0.854 ns/op >> SecondarySupersLookup.testNegative40 avgt 15 96.311 ? 0.840 ns/op >> SecondarySupersLookup.testNegative50 avgt 15 115.427 ? 0.838 ns/op >> SecondarySupersLookup.testNegative55 avgt 15 124.371 ? 1.076 ns/op >> SecondarySupersLookup.testNegative56 avgt 15 126.796 ? 0.916 ns/op >> SecondarySupersLookup.testNegative57 avgt 15 127.952 ? 1.202 ns/op >> SecondarySupersLookup.testNegative58 avgt 15 131.956 ? 4.515 ns/op >> SecondarySupersLookup.testNega... > > src/hotspot/cpu/riscv/vm_version_riscv.hpp line 280: > >> 278: constexpr static bool supports_recursive_lightweight_locking() { return true; } >> 279: >> 280: constexpr static bool supports_secondary_supers_table() { return true; } > > If it depends on zbb, should it return UseZbb here? Ignore me, seems this function is called before UseZbb is detected. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19320#discussion_r1624111204 From gcao at openjdk.org Mon Jun 3 11:36:06 2024 From: gcao at openjdk.org (Gui Cao) Date: Mon, 3 Jun 2024 11:36:06 GMT Subject: RFR: 8332587: RISC-V: secondary_super_cache does not scale well In-Reply-To: <5SuzTWVHimIbXSvfygBu9yXjYAwHlqECN-3WK-n93qs=.09eacb6f-ba71-4fbd-b6c0-1f5c27e293b8@github.com> References: <5SuzTWVHimIbXSvfygBu9yXjYAwHlqECN-3WK-n93qs=.09eacb6f-ba71-4fbd-b6c0-1f5c27e293b8@github.com> Message-ID: On Mon, 3 Jun 2024 11:08:59 GMT, Hamlin Li wrote: > Thanks for explanation. Seems this patch will also use some instructions in zba/zbs extension. Will they also impact the final performance? > > Also has some minor comments first. I'm sorry for the Banana Pi BPI-F3 board data I forgot to specify that Banana Pi BPI-F3 board doesn't enable zba, zbb, zbs by default, here zbb is enabled manually by me. So the Banana Pi BPI-F3 board JMH data here is the performance JMH data without zba,zbs enabled. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19320#issuecomment-2144963984 From mli at openjdk.org Mon Jun 3 11:42:28 2024 From: mli at openjdk.org (Hamlin Li) Date: Mon, 3 Jun 2024 11:42:28 GMT Subject: RFR: 8332900: RISC-V: refactor nativeInst_riscv.cpp and macroAssembler_riscv.cpp [v5] In-Reply-To: References: Message-ID: > Hi, > Can you help to review the patch? > Currently, code in nativeInst_riscv.cpp and macroAssembler_riscv.cpp call each other, which is not right for readability and maintainance. > After refactoring, basically only code in nativeInst_riscv.cpp calls code in macroAssembler_riscv.cpp, but not in reverse direction. > > Thanks! > > * Tests are still running, so far so good. Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: move ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19459/files - new: https://git.openjdk.org/jdk/pull/19459/files/ef11d6a7..86c1787b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19459&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19459&range=03-04 Stats: 443 lines in 1 file changed: 220 ins; 223 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/19459.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19459/head:pull/19459 PR: https://git.openjdk.org/jdk/pull/19459 From mli at openjdk.org Mon Jun 3 11:42:28 2024 From: mli at openjdk.org (Hamlin Li) Date: Mon, 3 Jun 2024 11:42:28 GMT Subject: RFR: 8332900: RISC-V: refactor nativeInst_riscv.cpp and macroAssembler_riscv.cpp [v4] In-Reply-To: References: <2rhfcNiCyomqQ5Ibh5xWJv6uEYkdme7gS8ViGzOTjww=.8bd6bffe-b72b-47ad-9939-a07e1644a84e@github.com> Message-ID: <44gWC-sf6imga5FJiLYEhZj97Be7bKQoo6-EHvSsUaA=.addf2e32-7af8-4e77-8f5d-45250adb02a7@github.com> On Mon, 3 Jun 2024 10:48:09 GMT, Robbin Ehn wrote: > > I'm not sure. If this is a problem in this patch, then in original implementation, MASM.hpp includes nativeInst.hpp, it will cause the similar issue? In another hand, will unused methods in a header file still exist in the compiled binary? > > It's about having readable header, havin long private, not part of API, method in header clobber it. Not sure, as it's in its own section, not mixed with other sections. But, it's fine, I'll move it to the bottom of the file. > They will not exists, but compiling away methods is not free as it prolongs the compile time. As I said, MASM.hpp included nativeInst.hpp before, so it's not a issue (or new issue)? > > > The reason I do it this way is because I just want to refactor necessary things (the pricipal is to only nativeInst depends on macroAssembler, not in reverse direction), for other things, I try to keep it as original ones. So if there is other optimization opportunities, it's better to do it in separate pr's. > > If this is a simple move of code, then I suggest you wait until my patch, as that changes names and doubles the methods which you want to move. It's just a merge, either in this pr or that pr, not that complicated? > > Also now NativeCall::instruction_size, which is used in shared code is nativeInst, but NativeInstruction::instruction_size is (duplicated) in MASM. EDIT, better example: > > ``` > CodeBuffer* PhaseOutput::init_buffer() { > ... > int pad_req = NativeCall::instruction_size; > ``` > > So there is duplication if all size both in nativeInst and masm. I don't get the point, can you clarify? ------------- PR Comment: https://git.openjdk.org/jdk/pull/19459#issuecomment-2144964483 From mli at openjdk.org Mon Jun 3 11:42:28 2024 From: mli at openjdk.org (Hamlin Li) Date: Mon, 3 Jun 2024 11:42:28 GMT Subject: RFR: 8332900: RISC-V: refactor nativeInst_riscv.cpp and macroAssembler_riscv.cpp [v4] In-Reply-To: References: Message-ID: <0B_3_1WFJR2_gMtSgtHxqAKqfXzNvh6RLqxjHfw8qcs=.0ec5a21c-ffff-4785-b88a-09ab2aa6f24a@github.com> On Mon, 3 Jun 2024 11:03:35 GMT, Robbin Ehn wrote: > ``` > class Assembler : public AbstractAssembler { > public: > > enum { > instruction_size = 4, > > class MacroAssembler: public Assembler { > > public: > enum { > instruction_size = 4, > ``` > > Not sure why you have the same enum defs in both asm and masm? Yeh, it's redundant, I'll remove it. Thanks. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19459#issuecomment-2144965522 From rehn at openjdk.org Mon Jun 3 12:02:03 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Mon, 3 Jun 2024 12:02:03 GMT Subject: RFR: 8332900: RISC-V: refactor nativeInst_riscv.cpp and macroAssembler_riscv.cpp [v4] In-Reply-To: <44gWC-sf6imga5FJiLYEhZj97Be7bKQoo6-EHvSsUaA=.addf2e32-7af8-4e77-8f5d-45250adb02a7@github.com> References: <2rhfcNiCyomqQ5Ibh5xWJv6uEYkdme7gS8ViGzOTjww=.8bd6bffe-b72b-47ad-9939-a07e1644a84e@github.com> <44gWC-sf6imga5FJiLYEhZj97Be7bKQoo6-EHvSsUaA=.addf2e32-7af8-4e77-8f5d-45250adb02a7@github.com> Message-ID: On Mon, 3 Jun 2024 11:33:17 GMT, Hamlin Li wrote: > I don't get the point, can you clarify? I think don't de-coupling by duplicating the constant is better than keeping the depedancy. As long as the shared code expects the NativeXXX::instruction_size to work we can't move them. E.g. (partly from my patch) nativeInst.cpp if (NativeShortCall::is_at(addr)) { return NativeShortCall::instruction_size; } else if (NativeFarCall::is_at(addr)) { return NativeFarCall::instruction_size; } In masm related code: int MacroAssembler::max_patchable_far_call_stub_size() { // Max stub size: alignment nop, TrampolineStub. if (UseTrampolines) { return NativeInstruction::instruction_size + NativeShortCall::trampoline_size; This would now look like: int MacroAssembler::max_patchable_far_call_stub_size() { // Max stub size: alignment nop, TrampolineStub. if (UseTrampolines) { return Masm::instruction_size + Masm::native_short_call_trampoline_size; Using different constant here is not an improvement. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19459#issuecomment-2145012844 From mli at openjdk.org Mon Jun 3 12:14:02 2024 From: mli at openjdk.org (Hamlin Li) Date: Mon, 3 Jun 2024 12:14:02 GMT Subject: RFR: 8332900: RISC-V: refactor nativeInst_riscv.cpp and macroAssembler_riscv.cpp [v4] In-Reply-To: References: <2rhfcNiCyomqQ5Ibh5xWJv6uEYkdme7gS8ViGzOTjww=.8bd6bffe-b72b-47ad-9939-a07e1644a84e@github.com> <44gWC-sf6imga5FJiLYEhZj97Be7bKQoo6-EHvSsUaA=.addf2e32-7af8-4e77-8f5d-45250adb02a7@github.com> Message-ID: On Mon, 3 Jun 2024 11:59:38 GMT, Robbin Ehn wrote: > > I don't get the point, can you clarify? > > I think don't de-coupling by duplicating the constant is better than keeping the depedancy. As long as the shared code expects the NativeXXX::instruction_size to work we can't move them. > > E.g. (partly from my patch) nativeInst.cpp > > ``` > if (NativeShortCall::is_at(addr)) { > return NativeShortCall::instruction_size; > } else if (NativeFarCall::is_at(addr)) { > return NativeFarCall::instruction_size; > } > ``` > > In masm related code: > > ``` > int MacroAssembler::max_patchable_far_call_stub_size() { > // Max stub size: alignment nop, TrampolineStub. > if (UseTrampolines) { > return NativeInstruction::instruction_size + NativeShortCall::trampoline_size; > ``` > > This would now look like: > > ``` > int MacroAssembler::max_patchable_far_call_stub_size() { > // Max stub size: alignment nop, TrampolineStub. > if (UseTrampolines) { > return Masm::instruction_size + Masm::native_short_call_trampoline_size; > ``` > > Using different constant here is not an improvement. What I want is to let MASM have no dependency on nativeInst, that's one of the principals I stick to in this pr, as dependency on nativeInst from MASM makes no sense from the point of view of code logic, and bidirectional dependency makes code dependency complicated which will also lead to issues of readability and maintainance. And if you are going to do it, I hope to do it thoroughly. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19459#issuecomment-2145035926 From rehn at openjdk.org Mon Jun 3 12:46:03 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Mon, 3 Jun 2024 12:46:03 GMT Subject: RFR: 8332900: RISC-V: refactor nativeInst_riscv.cpp and macroAssembler_riscv.cpp [v4] In-Reply-To: References: <2rhfcNiCyomqQ5Ibh5xWJv6uEYkdme7gS8ViGzOTjww=.8bd6bffe-b72b-47ad-9939-a07e1644a84e@github.com> <44gWC-sf6imga5FJiLYEhZj97Be7bKQoo6-EHvSsUaA=.addf2e32-7af8-4e77-8f5d-45250adb02a7@github.com> Message-ID: <-m-kmeuLKAS0DOR6M07vh43HrDTQdBsaXKxxdeZVULE=.1af20bd2-6c88-4291-a10c-cc207bf2ba75@github.com> On Mon, 3 Jun 2024 12:11:30 GMT, Hamlin Li wrote: > What I want is to let MASM have no dependency on nativeInst, that's one of the principals I stick to in this pr, as dependency on nativeInst from MASM makes no sense from the point of view of code logic, and bidirectional dependency makes code dependency complicated which will also lead to issues of readability and maintainance. And if you are going to do it, I hope to do it thoroughly. I'm not disagreeing with your intent. I'm just saying I duplicating the constants is not good solution, and the code in some cases will be worse. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19459#issuecomment-2145110170 From duke at openjdk.org Mon Jun 3 12:53:09 2024 From: duke at openjdk.org (Yuri Gaevsky) Date: Mon, 3 Jun 2024 12:53:09 GMT Subject: RFR: 8324124: RISC-V: implement _vectorizedMismatch intrinsic In-Reply-To: References: Message-ID: On Wed, 7 Feb 2024 14:35:55 GMT, Yuri Gaevsky wrote: > Hello All, > > Please review these changes to enable the __vectorizedMismatch_ intrinsic on RISC-V platform with RVV instructions supported. > > Thank you, > -Yuri Gaevsky > > **Correctness checks:** > hotspot/jtreg/compiler/{intrinsic/c1/c2}/ under QEMU-8.1 with RVV v1.0.0 and -XX:TieredStopAtLevel=1/2/3/4. . ------------- PR Comment: https://git.openjdk.org/jdk/pull/17750#issuecomment-2145121551 From duke at openjdk.org Mon Jun 3 12:53:10 2024 From: duke at openjdk.org (Yuri Gaevsky) Date: Mon, 3 Jun 2024 12:53:10 GMT Subject: RFR: 8324124: RISC-V: implement _vectorizedMismatch intrinsic In-Reply-To: References: Message-ID: <8ezpnHbxAf_KrYnX48GVwG_MEVjeOWmkzH_BzA54Vp4=.69195966-1d94-4d5a-811d-94e91155657d@github.com> On Fri, 3 May 2024 18:58:55 GMT, Ludovic Henry wrote: >>> Hi, Do you have plan to implement instrinsic `VectorCmpMasked`? It's part of `vectorizedMismatch` >> >> Hi @Hamlin-Li, >> >> I don't have such plan for the moment. Why do you think it should be a part of `_vectorizedMismatch` intrinsic? The similar [fix](https://github.com/openjdk/jdk/commit/b05c40ca3b5fd34cbbc7a9479b108a4ff2c099f1?diff=split&w=0) for X64 ([JDK-8266951](https://bugs.openjdk.org/browse/JDK-8266951)) looks like natural enhancement/followup for the original intrinsic functionality. > > @ygaevsky the `VectorCmpMasked` is to support partial inlining for small arrays: https://github.com/openjdk/jdk/blob/b33096f887108c3d7e1f4e62689c2b10401234fa/src/hotspot/share/opto/library_call.cpp#L6372-L6411 > > It very much complements this intrinsic and allows it to focus on larger arrays. > @luhenry: I fully agree that we need `VectorCmpMasked` but I just want to understand why it couldn't be implemented as follow-up (similarly to x64). Does anybody have any comments on this? Thanks. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17750#issuecomment-2145125119 From rehn at openjdk.org Mon Jun 3 12:55:17 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Mon, 3 Jun 2024 12:55:17 GMT Subject: RFR: 8332689: RISC-V: Use load instead of trampolines [v2] In-Reply-To: References: Message-ID: <3W3z-PDFsRFSclrP3FJRmnEjL4rLDRSUEFN5qkFxSUI=.feb03562-9ca3-4383-94cd-967d4234a4aa@github.com> > Hi all, please consider! > > Today we do JAL to **dest** if **dest** is in reach (+/- 1 MB). > Using a very small application or running very short time we have fast patchable calls. > But any normal application running longer will increase the code size and code chrun/fragmentation. > So whatever or not you get hot fast calls rely on luck. > > To be patchable and get code cache reach we also emit a stub trampoline which we can point the JAL to. > This would be the common case for a patchable call. > > Code stream: > JAL > Stubs: > AUIPC > LD > JALR > > > > On some CPUs L1D and L1I can't contain the same cache line, which means the tramopline stub can bounce from L1I->L1D->L1I, which is expensive. > Even if you don't have that problem having a call to a jump is not the fastest way. > Loading the address avoids the pitsfalls of cmodx. > > This patch suggest to solve the problems with trampolines, we take small penalty in the naive case of JAL to **dest**, > and instead do by default: > > Code stream: > AUIPC > LD > JALR > Stubs: > > > An experimental option for turning trampolines back on exists. > > It should be possible to enhanced this with the WIP [Zjid](https://github.com/riscv/riscv-j-extension) by changing the JALR to JAL and nop out the auipc+ld (as the current proposal of Zjid forces the I-fetcher to fetch instruction in order (meaning we will avoid a lot issues which arm has)) when in reach and vice-versa. > > Numbers from VF2 (I have done them a few times, they are always overall in favor of this patch): > > fop (msec) 2239 | 2128 = 0.950424 > h2 (msec) 18660 | 16594 = 0.889282 > jython (msec) 22022 | 21925 = 0.995595 > luindex (msec) 2866 | 2842 = 0.991626 > lusearch (msec) 4108 | 4311 = 1.04942 > lusearch-fix (msec) 4406 | 4116 = 0.934181 > pmd (msec) 5976 | 5897 = 0.98678 > jython (msec) 22022 | 21925 = 0.995595 > Avg: 0.974112 > fop(xcomp) (msec) 2721 | 2714 = 0.997427 > h2(xcomp) (msec) 37719 | 38004 = 1.00756 > jython(xcomp) ... Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: - Merge branch 'master' into 8332689 - Remove accidental files - Remove accidental files - Baseline ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19453/files - new: https://git.openjdk.org/jdk/pull/19453/files/41e576cc..3c5db819 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19453&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19453&range=00-01 Stats: 9920 lines in 326 files changed: 5604 ins; 3137 del; 1179 mod Patch: https://git.openjdk.org/jdk/pull/19453.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19453/head:pull/19453 PR: https://git.openjdk.org/jdk/pull/19453 From stefank at openjdk.org Mon Jun 3 13:38:21 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Mon, 3 Jun 2024 13:38:21 GMT Subject: RFR: 8312132: Add tracking of multiple address spaces in NMT [v121] In-Reply-To: References: Message-ID: On Thu, 30 May 2024 13:19:41 GMT, Johan Sj?len wrote: >> Hi, >> >> This PR introduces a new abstraction to NMT, named `MemoryFileTracker`. Today, NMT does not track any memory outside of the virtual memory address space. This means that if you allocated memory in something such as a memory-backed file and use `mmap` to map into that memory, then you'll have trouble reporting this to NMT. This is the situation that ZGC is in, and that is what this patch attempts to fix. >> >> ## `MemoryFileTracker` >> >> The `MemoryFileTracker` adds the ability of adding new virtual memory address spaces to NMT and committing memory to these, the basic API is: >> >> ```c++ >> static MemoryFile* make_device(const char* descriptive_name); >> static void free_device(MemoryFile* device); >> >> static void allocate_memory(MemoryFile* device, size_t offset, size_t size, >> MEMFLAGS flag, const NativeCallStack& stack); >> static void free_memory(MemoryFile* device, size_t offset, size_t size); >> >> >> It is easiest to see how this is used by looking at what ZGC's `ZNMT` class does: >> >> ```c++ >> void ZNMT::reserve(zaddress_unsafe start, size_t size) { >> MemTracker::record_virtual_memory_reserve((address)start, size, CALLER_PC, mtJavaHeap); >> } >> void ZNMT::commit(zoffset offset, size_t size) { >> MemTracker::allocate_memory_in(ZNMT::_device, static_cast(offset), size, mtJavaHeap, CALLER_PC); >> } >> void ZNMT::uncommit(zoffset offset, size_t size) { >> MemTracker::free_memory_in(ZNMT::_device, (size_t)offset, size); >> } >> >> void ZNMT::map(zaddress_unsafe addr, size_t size, zoffset offset) { >> // NMT doesn't track mappings at the moment. >> } >> void ZNMT::unmap(zaddress_unsafe addr, size_t size) { >> // NMT doesn't track mappings at the moment. >> } >> >> >> As you can see, any mapping between reserved regions and device-allocated memory is not recorded in NMT. This means that in detailed mode you only get reserved regions printed for the reserved memory, the device-allocated memory is reported separately. When performing summary reporting any memory allocated via these devices is added to the corresponding `MEMFLAGS` as `committed` memory. >> >> This patch is also acting as a base on which we deploy multiple new backend ideas to NMT. These ideas are: >> >> 1. Implement VMA tracking using a balanced binary tree approach. Today's `VirtualMemoryTracker`'s usage of linked lists is slow and brittle, we'd like to move away from it. Our Treap-based approach in this patch gives a performance bo... > > Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: > > Check for nullptr afterwards ZGC reserves virtual memory *before* committing "physical" memory. Reporting the physical memory to also be "reserved" memory sounds misleading to me. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18289#issuecomment-2145222888 From mli at openjdk.org Mon Jun 3 15:27:05 2024 From: mli at openjdk.org (Hamlin Li) Date: Mon, 3 Jun 2024 15:27:05 GMT Subject: RFR: 8332900: RISC-V: refactor nativeInst_riscv.cpp and macroAssembler_riscv.cpp [v4] In-Reply-To: <-m-kmeuLKAS0DOR6M07vh43HrDTQdBsaXKxxdeZVULE=.1af20bd2-6c88-4291-a10c-cc207bf2ba75@github.com> References: <2rhfcNiCyomqQ5Ibh5xWJv6uEYkdme7gS8ViGzOTjww=.8bd6bffe-b72b-47ad-9939-a07e1644a84e@github.com> <44gWC-sf6imga5FJiLYEhZj97Be7bKQoo6-EHvSsUaA=.addf2e32-7af8-4e77-8f5d-45250adb02a7@github.com> <-m-kmeuLKAS0DOR6M07vh43HrDTQdBsaXKxxdeZVULE=.1af20bd2-6c88-4291-a10c-cc207bf2ba75@github.com> Message-ID: On Mon, 3 Jun 2024 12:43:27 GMT, Robbin Ehn wrote: > I'm not disagreeing with your intent. I'm just saying I duplicating the constants is not good solution, and the code in some cases will be worse. Thanks for clarifying, I get your point. But still think we'd better have no dependency from macroAssembler to nativeInst. And I don't see issue in following code (in your example above) int MacroAssembler::max_patchable_far_call_stub_size() { // Max stub size: alignment nop, TrampolineStub. if (UseTrampolines) { return Masm::instruction_size + Masm::native_short_call_trampoline_size; `MacroAssembler` should only know fields or methods in itself, not in nativeInst. If some code need to access both `MacroAssembler` and `nativeInst`, it should know both of them. And, if must choose between bidirectional dependencies and some data duplication, I prefer the latter. And in existing code of nativeInst, there are some kind of similar data duplication codes, e.g. : https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/riscv/nativeInst_riscv.hpp#L446 ------------- PR Comment: https://git.openjdk.org/jdk/pull/19459#issuecomment-2145503718 From rehn at openjdk.org Mon Jun 3 17:13:03 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Mon, 3 Jun 2024 17:13:03 GMT Subject: RFR: 8332900: RISC-V: refactor nativeInst_riscv.cpp and macroAssembler_riscv.cpp [v5] In-Reply-To: References: Message-ID: <07Pn2aqEtRUmQihll9YlDESTwYOIjRTxUmnN5swz87s=.afe47194-9eaf-4c6c-ae57-ad7b470867b2@github.com> On Mon, 3 Jun 2024 11:42:28 GMT, Hamlin Li wrote: >> Hi, >> Can you help to review the patch? >> Currently, code in nativeInst_riscv.cpp and macroAssembler_riscv.cpp call each other, which is not right for readability and maintainance. >> After refactoring, basically only code in nativeInst_riscv.cpp calls code in macroAssembler_riscv.cpp, but not in reverse direction. >> >> Thanks! >> >> * Tests are still running, so far so good. > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > move Yea, there is were we disagree, as constants needs to be duplicate there is a depenecy even if you now can remove header. If guys like, sure. If it is possible for you to wait for the other PR, please do so. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19459#issuecomment-2145727575 From kvn at openjdk.org Mon Jun 3 18:04:28 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 3 Jun 2024 18:04:28 GMT Subject: RFR: 8333226: Regressions 2-3% in Compress ZGC after 8331253 Message-ID: Revert [JDK-8331253](https://bugs.openjdk.org/browse/JDK-8331253) changes [#3383ad63](https://git.openjdk.org/jdk/commit/3383ad6397d5a2d8fb232ffd3e29a54e0b37b686) to avoid regression. And convert `nmethod::_skipped_instructions_size field` field to `int` type to address original JDK-8331253 issue. Tested tier1-3,stress,xcomp and performance. Note: You may see some regressions, but performance returns to state before JDK-8331253. I may look again in a future on changes I did to calculated skipped instructions. ------------- Commit messages: - 8333226: Regressions 2-3% in Compress ZGC after 8331253 Changes: https://git.openjdk.org/jdk/pull/19531/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19531&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8333226 Stats: 43 lines in 9 files changed: 6 ins; 24 del; 13 mod Patch: https://git.openjdk.org/jdk/pull/19531.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19531/head:pull/19531 PR: https://git.openjdk.org/jdk/pull/19531 From mli at openjdk.org Mon Jun 3 18:14:04 2024 From: mli at openjdk.org (Hamlin Li) Date: Mon, 3 Jun 2024 18:14:04 GMT Subject: RFR: 8332587: RISC-V: secondary_super_cache does not scale well In-Reply-To: References: Message-ID: On Tue, 21 May 2024 08:31:53 GMT, Gui Cao wrote: > Implementation of subtype checking [JDK-8180450](https://bugs.openjdk.org/browse/JDK-8180450) for linux-riscv64. > This optimization depends on availability of the Zbb extension which has the cpop instruction. > > ### Correctness testing: > > - [x] Run tier1-3, hotspot:tier4 tests on SOPHON SG2042 (release) > - [x] Run tier1-3 tests with -XX:+UseRVV on qemu 8.1.0 (fastdebug) > - [x] Run all of tier1 with `-XX:+VerifySecondarySupers` > > ### JMH tested on Banana Pi BPI-F3 board (has Zbb) and Enable UseZbb, Not Enable UseZba, UseZbs > Original: > > Benchmark Mode Cnt Score Error Units > SecondarySuperCacheHits.test avgt 15 11.375 ? 0.071 ns/op > SecondarySuperCacheInterContention.test avgt 15 646.087 ? 32.587 ns/op > SecondarySuperCacheInterContention.test:t1 avgt 15 600.090 ? 83.779 ns/op > SecondarySuperCacheInterContention.test:t2 avgt 15 692.084 ? 73.218 ns/op > SecondarySupersLookup.testNegative00 avgt 15 16.420 ? 0.239 ns/op > SecondarySupersLookup.testNegative01 avgt 15 18.307 ? 0.260 ns/op > SecondarySupersLookup.testNegative02 avgt 15 21.695 ? 0.458 ns/op > SecondarySupersLookup.testNegative03 avgt 15 24.855 ? 0.664 ns/op > SecondarySupersLookup.testNegative04 avgt 15 27.305 ? 0.522 ns/op > SecondarySupersLookup.testNegative05 avgt 15 29.719 ? 0.385 ns/op > SecondarySupersLookup.testNegative06 avgt 15 32.231 ? 0.498 ns/op > SecondarySupersLookup.testNegative07 avgt 15 33.747 ? 0.603 ns/op > SecondarySupersLookup.testNegative08 avgt 15 35.856 ? 0.629 ns/op > SecondarySupersLookup.testNegative09 avgt 15 37.077 ? 0.546 ns/op > SecondarySupersLookup.testNegative10 avgt 15 39.408 ? 0.465 ns/op > SecondarySupersLookup.testNegative16 avgt 15 51.041 ? 0.547 ns/op > SecondarySupersLookup.testNegative20 avgt 15 58.722 ? 0.922 ns/op > SecondarySupersLookup.testNegative30 avgt 15 77.310 ? 0.654 ns/op > SecondarySupersLookup.testNegative32 avgt 15 81.116 ? 0.854 ns/op > SecondarySupersLookup.testNegative40 avgt 15 96.311 ? 0.840 ns/op > SecondarySupersLookup.testNegative50 avgt 15 115.427 ? 0.838 ns/op > SecondarySupersLookup.testNegative55 avgt 15 124.371 ? 1.076 ns/op > SecondarySupersLookup.testNegative56 avgt 15 126.796 ? 0.916 ns/op > SecondarySupersLookup.testNegative57 avgt 15 127.952 ? 1.202 ns/op > SecondarySupersLookup.testNegative58 avgt 15 131.956 ? 4.515 ns/op > SecondarySupersLookup.testNegative59 avgt 15 131.858 ? 1.066 ns/op > SecondaryS... Some more comments. src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 3525: > 3523: mv(result, 1); > 3524: > 3525: ld(r_bitmap, Address(r_sub_klass, Klass::bitmap_offset())); ~~when bitmap is `SECONDARY_SUPERS_BITMAP_FULL`, can we skip the hash+bitmap lookup, and go to linear lookup (e.g. `repne_scan ` below)?~~ Ignore it, I think the logic is put in `lookup_secondary_supers_table_slow_path`, seems that's fine. src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 3605: > 3603: LOOKUP_SECONDARY_SUPERS_TABLE_REGISTERS; > 3604: > 3605: Label L_matched, L_fallthrough, L_huge; Seems to me, `L_bitmap_full` is more meanful than `L_huge`. src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 3607: > 3605: Label L_matched, L_fallthrough, L_huge; > 3606: > 3607: // Make sure that result is nonzero value `1` of `result` here means mismatch? so the comment is confusing. I think something like "initialize result value to 1 which means mismatch." will be more meaningful. src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 3619: > 3617: // The bitmap is full to bursting. > 3618: // Implicit invariant: BITMAP_FULL implies (length > 0) > 3619: assert(Klass::SECONDARY_SUPERS_BITMAP_FULL == ~uintx(0), ""); maybe the assert messge could be "Adjust this code" too? src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 3620: > 3618: // Implicit invariant: BITMAP_FULL implies (length > 0) > 3619: assert(Klass::SECONDARY_SUPERS_BITMAP_FULL == ~uintx(0), ""); > 3620: addi(t0, r_bitmap, (u1)1); A comment like "check if bitmap is `SECONDARY_SUPERS_BITMAP_FULL `" will help to understand. src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 3659: > 3657: // complexity? > 3658: bind(L_huge); > 3659: repne_scan(r_array_base, r_super_klass, r_array_length, t0); ~~Seems this will start to scan at 0 index in r_array_base? maybe we just need to start at index 64?~~ Ignore it, I misunderstand the code, it's only for bitmap full case, and it starts to scan from 0 index. ------------- PR Review: https://git.openjdk.org/jdk/pull/19320#pullrequestreview-2094397977 PR Review Comment: https://git.openjdk.org/jdk/pull/19320#discussion_r1624826855 PR Review Comment: https://git.openjdk.org/jdk/pull/19320#discussion_r1624854699 PR Review Comment: https://git.openjdk.org/jdk/pull/19320#discussion_r1624836606 PR Review Comment: https://git.openjdk.org/jdk/pull/19320#discussion_r1624832722 PR Review Comment: https://git.openjdk.org/jdk/pull/19320#discussion_r1624859553 PR Review Comment: https://git.openjdk.org/jdk/pull/19320#discussion_r1624768434 From sspitsyn at openjdk.org Mon Jun 3 19:01:36 2024 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Mon, 3 Jun 2024 19:01:36 GMT Subject: RFR: 8311177: Switching to interpreter only mode in carrier thread can lead to crashes [v3] In-Reply-To: References: Message-ID: On Sat, 1 Jun 2024 00:22:45 GMT, Alex Menkov wrote: >> Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: >> >> review: refactored def and use of process_pending_interp_only() > > test/hotspot/jtreg/serviceability/jvmti/vthread/CarrierThreadEventNotification/libCarrierThreadEventNotification.cpp line 40: > >> 38: >> 39: static const char* CTHREAD_NAME_START = "ForkJoinPool"; >> 40: static const size_t CTHREAD_NAME_START_LEN = (int)strlen("ForkJoinPool"); > > `(int)` cast is not needed Thanks, fixed now. > test/hotspot/jtreg/serviceability/jvmti/vthread/CarrierThreadEventNotification/libCarrierThreadEventNotification.cpp line 58: > >> 56: cthreads[ct_cnt++] = jni->NewGlobalRef(thread); >> 57: } >> 58: deallocate(jvmti, jni, (void*)tname); > > cast to `void*` is not needed Why do you think, the cast is not needed? This is the `deallocate()` function in the `jvmti_common.hpp`: static void deallocate(jvmtiEnv *jvmti, JNIEnv* jni, void* ptr) { jvmtiError err = jvmti->Deallocate((unsigned char*)ptr); check_jvmti_status(jni, err, "deallocate: error in JVMTI Deallocate call"); } > test/hotspot/jtreg/serviceability/jvmti/vthread/CarrierThreadEventNotification/libCarrierThreadEventNotification.cpp line 96: > >> 94: } >> 95: jvmtiError err = jvmti->Deallocate((unsigned char*)carrier_threads); >> 96: check_jvmti_status(jni, err, "deallocate: error in JVMTI Deallocate call"); > > replace with `deallocate(jvmti, jni, carrier_threads);` ? Thanks, fixed now. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19438#discussion_r1624909747 PR Review Comment: https://git.openjdk.org/jdk/pull/19438#discussion_r1624911862 PR Review Comment: https://git.openjdk.org/jdk/pull/19438#discussion_r1624914083 From iklam at openjdk.org Mon Jun 3 19:16:34 2024 From: iklam at openjdk.org (Ioi Lam) Date: Mon, 3 Jun 2024 19:16:34 GMT Subject: RFR: 8293980: Resolve CONSTANT_FieldRef at CDS dump time [v3] In-Reply-To: References: <6tYcoQdH8mEhbBRUoAcOi3Gue7Lz9qIjJh3GpcLKGGE=.0b286773-cc03-434e-88ca-2c9cc5efce67@github.com> <2cV0qix4YBr6H58RLaKdtiRmmiQ222IFHGH8kw-bWCY=.278bc880-4b3e-481c-90df-dd45a94f7822@github.com> Message-ID: On Fri, 31 May 2024 18:43:49 GMT, Ioi Lam wrote: >> The current algorithm says: >> >> for each bytecode in each method: >> switch(bytecode) { >> case getfield: >> case outfield: >> InterpreterRuntime::resolve_get_put(bc, raw_index, mh, cp, false /*initialize_holder*/, CHECK); >> break; >> .... >> } >> >> What I'm proposing is: >> >> for each ResolvedFieldEntry >> bool success = InterpreterRuntime::resolve_get_put(getfield, raw_index, nullptr /* mh */, cp, false /*initialize_holder*/, CHECK); >> if (success) { >> // also resolve for put >> InterpreterRuntime::resolve_get_put(putfield, raw_index, nullptr /* mh */, cp, false /*initialize_holder*/, CHECK); >> } >> >> >> The `method` parameter is not critical as the "current" algorithm attempts resolution with multiple methods - once for each method that references the ResolvedFieldEntry. The resolution logic already has to handle dealing with different rules for different types of methods (ie `` & ``) for normal execution and "knows" not to resolve entries (like puts of field fields) regardless of the method as they need to do additional runtime checks on every access. >> >> The same will apply to invoke bytecodes later.... it feels safer to do only what the bytecodes in some method have asked for but the runtime already has to be robust against different kinds of gets/puts or invokes targeting the same cp entries. By eagerly resolving we're not giving up any safety. >> >> If you really want to only resolve the exact cases (ie: gets not puts, etc) that were resolved in the training run, then we need to write out as part of the classlist more explicitly what needs to be resolved: >> ie: >> >> @cp_resolved_gets 4, 7 8 >> @cp_resolved_puts 7 8 10 > > This makes sense. I will try to prototype it in the Leyden repo and then update this PR. I tried skipping the `methodHandle` parameter to `InterpreterRuntime::resolve_get_put` but it's more complicated than I thought. 1. The `fieldDescriptor::has_initialized_final_update()` will return true IFF the class has `putfield` bytecodes to a final field outside of `` methods. See [here](https://github.com/openjdk/jdk/blob/9686e804a2b058955ff88149c54a0a7896c0a2eb/src/hotspot/share/interpreter/rewriter.cpp#L463) 2. When `InterpreterRuntime::resolve_get_put` is called for a `putfield`, it adds `putfield` to the `ResolvedFieldEntry` only if `fieldDescriptor::has_initialized_final_update()` is false. See [here](https://github.com/openjdk/jdk/blob/9686e804a2b058955ff88149c54a0a7896c0a2eb/src/hotspot/share/interpreter/interpreterRuntime.cpp#L703) 3. `InterpreterRuntime::resolve_get_put`calls `LinkResolver::resolve_field()`, which always checks if the `methodHandle` is `` or not, without consulting `fieldDescriptor::has_initialized_final_update()`. See [here](https://github.com/openjdk/jdk/blob/9686e804a2b058955ff88149c54a0a7896c0a2eb/src/hotspot/share/interpreter/linkResolver.cpp#L1040) (2) is an optimization -- if a method sets final fields only inside `` methods, we should fully resolve the `putfield` bytecodes. Otherwise every such `putfield` will result in a VM call. (3) is for correctness -- make sure that only `` can modify final fields. I am pretty sure (2) and (3) are equivalent. I.e., we should check against the method in (3) only if `fieldDescriptor::has_initialized_final_update()` is true. However, (3) is security related code, so I don't want to change it inside an optimization PR like this one. Without fixing that, I cannot call `InterpreterRuntime::resolve_get_put` with a null `methodHandle`, as it will hit the assert. This goes back to my original point -- I'd rather do something stupid but correct (call the existing APIs and live with the existing behavior), rather than trying to analyze the resolution code and see if we can skip certain checks. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19355#discussion_r1624928922 From iklam at openjdk.org Mon Jun 3 20:48:33 2024 From: iklam at openjdk.org (Ioi Lam) Date: Mon, 3 Jun 2024 20:48:33 GMT Subject: RFR: 8330198: Add some class loading related perf counters to measure VM startup [v3] In-Reply-To: References: <7yfsvM0ff6gBYLefpro2qTcEMBmCOHd3YICcygItlZs=.d900a439-4932-46e6-b287-d1bf2789f195@github.com> <24n-bMIYvAF10yob4_Z5t1DPG_nrWypoQYE25zQ499U=.1a77aa5e-1005-4d32-9ae7-ec745838e449@github.com> Message-ID: On Mon, 3 Jun 2024 08:30:46 GMT, David Holmes wrote: >> Thanks @iwanowww for charming in. >> I've changed `arguments.cpp` back to the first version. > > I'm getting confused. > > The ProfileClassLinkage flag should control whether the counters are initialized and used. > > The -Xlog:init (perhaps with a better name/tag!) should control whether they get printed - in theory you could chose to print under any logging setting. > > I don't think either flag should imply/force the setting of the other. It's about usability. `-Xlog:init` means "I want to see logs related to initialization", so it should enable all the counters for printing the related logs. We may add several groups of counters in the future. We don't want to force the user to enumerate all these counters like: java -Xlog:init -XX:+ ProfileClassLinkage -XX:+ProfileAAA -XX:+ProfileBBB .... Also, `-Xlog:init` doesn't force the all the counters to be used. It just changes the default value of the selection: if (log_is_enabled(Info, init)) { FLAG_SET_ERGO_IF_DEFAULT(ProfileClassLinkage, true); FLAG_SET_ERGO_IF_DEFAULT(ProfileAAA, true); FLAG_SET_ERGO_IF_DEFAULT(ProfileBBB, true); } The user could disable certain counters by: java -Xlog:init -XX:-ProfileAAA ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18790#discussion_r1625030585 From iklam at openjdk.org Mon Jun 3 21:23:59 2024 From: iklam at openjdk.org (Ioi Lam) Date: Mon, 3 Jun 2024 21:23:59 GMT Subject: RFR: 8293980: Resolve CONSTANT_FieldRef at CDS dump time [v5] In-Reply-To: <6tYcoQdH8mEhbBRUoAcOi3Gue7Lz9qIjJh3GpcLKGGE=.0b286773-cc03-434e-88ca-2c9cc5efce67@github.com> References: <6tYcoQdH8mEhbBRUoAcOi3Gue7Lz9qIjJh3GpcLKGGE=.0b286773-cc03-434e-88ca-2c9cc5efce67@github.com> Message-ID: > ### Overview > > This PR archives `CONSTANT_FieldRef` entries in the _resolved_ state when it's safe to do so. > > I.e., when a `CONSTANT_FieldRef` constant pool entry in class `A` refers to a *non-static* field `B.F`, > - `B` is the same class as `A`; or > - `B` is a supertype of `A`; or > - `B` is one of the [vmClasses](https://github.com/openjdk/jdk/blob/3d4185a9ce482cc655a4c67f39cb2682b02ae4fe/src/hotspot/share/classfile/vmClasses.hpp), and `A` is loaded by the boot class loader. > > Under these conditions, it's guaranteed that whenever `A` tries to use this entry at runtime, `B` is guaranteed to have already been resolved in A's system dictionary, to the same value as resolved during dump time. > > Therefore, we can safely archive the `ResolvedFieldEntry` in class `A` that refers to `B.F`. > > (Note that we do not archive the `CONSTANT_FieldRef` entries for static fields, as the resolution of such entries can lead to class initialization at runtime. We plan to handle them in a future RFE.) > > ### Static CDS Archive > > This feature is implemented in three steps for static CDS archive dump: > > 1. At the end of the training run, `ClassListWriter` iterates over all loaded classes and writes the indices of their resolved `Class` and `FieldRef` constant pool entries into the classlist file, with the `@cp` prefix. E.g., the following means that the constant pool entries at indices 2, 19 and 106 were resolved during the training run: > > @cp java/util/Objects 2 19 106 > > 2. When creating the static CDS archive from the classlist file, `ClassListParser` processes the `@cp` entries and resolves all the indicated entries. > > 3. Inside the `ArchiveBuilder::make_klasses_shareable()` function, we iterate over all entries in all archived `ConstantPools`. When we see a _resolved_ entry that does not satisfy the safety requirements as stated in _Overview_, we revert it back to the unresolved state. > > ### Dynamic CDS Archive > > When dumping the dynamic CDS archive, `ClassListWriter` and `ClassListParser` are not used, so steps 1 and 2 are skipped. We only perform step 3 when the archive is being written. > > ### Limitations > > - For safety, we limit this optimization to only classes loaded by the boot, platform, and app class loaders. This may be relaxed in the future. > - We archive only the constant pool entries that are actually resolved during the training run. We don't speculatively resolve other entries, as doing so may cause C2 to unnecessarily generate code for paths that are never taken by the app... Ioi Lam has updated the pull request incrementally with two additional commits since the last revision: - Added test case for safety with putfield against final fields (related to JDK-8157181) - Moved the test ResolvedConstants.java to resolvedConstants, as we will have more tests cases in this area ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19355/files - new: https://git.openjdk.org/jdk/pull/19355/files/17a1ce62..58e08e18 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19355&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19355&range=03-04 Stats: 161 lines in 3 files changed: 160 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/19355.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19355/head:pull/19355 PR: https://git.openjdk.org/jdk/pull/19355 From cjplummer at openjdk.org Mon Jun 3 22:05:52 2024 From: cjplummer at openjdk.org (Chris Plummer) Date: Mon, 3 Jun 2024 22:05:52 GMT Subject: RFR: 8326716: JVMTI spec: clarify what nullptr means for C/C++ developers [v5] In-Reply-To: References: Message-ID: On Mon, 3 Jun 2024 09:58:38 GMT, Serguei Spitsyn wrote: >> The following RFE was fixed recently: >> [8324680](https://bugs.openjdk.org/browse/JDK-8324680): Replace NULL with nullptr in JVMTI generated code >> >> It replaced all the `NULL`'s in the generated spec with`nullptr`. JVMTI agents can be developed in C or C++. >> This update is to make it clear that `nullptr` is C programming language `null` pointer. >> >> I think we do not need a CSR for this fix. >> >> Testing: N/A (not needed) > > Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: > > review: consistency and stylistical corrections src/hotspot/share/prims/jvmti.xml line 1007: > 1005: explicitly deallocate. This is indicated in the individual > 1006: function descriptions. Empty lists, arrays, sequences, etc are > 1007: returned as a null pointer (C NULL or C++ nullptr). Why describe what is meant by a "null pointer" here when it is not done elsewhere? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19257#discussion_r1625105918 From sspitsyn at openjdk.org Mon Jun 3 22:55:24 2024 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Mon, 3 Jun 2024 22:55:24 GMT Subject: RFR: 8311177: Switching to interpreter only mode in carrier thread can lead to crashes [v4] In-Reply-To: References: Message-ID: > Please, review the following `interp-only` issue related to carrier threads. > There are 3 problems fixed here: > - The `EnterInterpOnlyModeClosure::do_threads` is taking the `JvmtiThreadState` with the `jt->jvmti_thread_state()` which is incorrect when we have a deal with a carrier thread. The target state is known at the point when the `HandshakeClosure` is set, so the fix is to pass it as a constructor parameter. > - The `state->is_pending_interp_only_mode())` was processed at mounts only but it has to be processed for unmounts as well. > - The test `test/hotspot/jtreg/serviceability/jvmti/vthread/MethodExitTest/libMethodExitTest.cpp` has a wrong assumption that there can't be `MethodExit` event on the carrier thread when the function `breakpoint_hit1` is being executed. However, it can happen if the virtual thread gets unmounted. > > The fix also includes new test case `vthread/CarrierThreadEventNotification` developed by Patricio. > > Testing: > - Ran new test case locally > - Ran mach5 tiers 1-6 Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: review: get rid of unneeded casts in new test ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19438/files - new: https://git.openjdk.org/jdk/pull/19438/files/19e4d8fa..01304354 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19438&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19438&range=02-03 Stats: 4 lines in 1 file changed: 0 ins; 1 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/19438.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19438/head:pull/19438 PR: https://git.openjdk.org/jdk/pull/19438 From sspitsyn at openjdk.org Mon Jun 3 22:55:24 2024 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Mon, 3 Jun 2024 22:55:24 GMT Subject: RFR: 8311177: Switching to interpreter only mode in carrier thread can lead to crashes [v3] In-Reply-To: References: Message-ID: On Fri, 31 May 2024 23:55:20 GMT, Serguei Spitsyn wrote: >> Please, review the following `interp-only` issue related to carrier threads. >> There are 3 problems fixed here: >> - The `EnterInterpOnlyModeClosure::do_threads` is taking the `JvmtiThreadState` with the `jt->jvmti_thread_state()` which is incorrect when we have a deal with a carrier thread. The target state is known at the point when the `HandshakeClosure` is set, so the fix is to pass it as a constructor parameter. >> - The `state->is_pending_interp_only_mode())` was processed at mounts only but it has to be processed for unmounts as well. >> - The test `test/hotspot/jtreg/serviceability/jvmti/vthread/MethodExitTest/libMethodExitTest.cpp` has a wrong assumption that there can't be `MethodExit` event on the carrier thread when the function `breakpoint_hit1` is being executed. However, it can happen if the virtual thread gets unmounted. >> >> The fix also includes new test case `vthread/CarrierThreadEventNotification` developed by Patricio. >> >> Testing: >> - Ran new test case locally >> - Ran mach5 tiers 1-6 > > Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: > > review: refactored def and use of process_pending_interp_only() > I'm not sure I follow the test logic. Its summary says "Verifies that MethodExit events are delivered on both carrier and virtual threads", but now it just ignores MethodExit requested for carrier thread in breakpoint_hit1. Then there is no sense to request the event on carrier thread. > Per the test summary I'd expect the test should test MethodExit for carrier thread, but then java part needs to force unmount. As we already agreed I've filed the cleanup test bug: [8333459](https://bugs.openjdk.org/browse/JDK-8333459) cleanup and check MethodExit events are posted on carrier threads in MethodExitTest ------------- PR Comment: https://git.openjdk.org/jdk/pull/19438#issuecomment-2146254345 From dholmes at openjdk.org Mon Jun 3 23:10:25 2024 From: dholmes at openjdk.org (David Holmes) Date: Mon, 3 Jun 2024 23:10:25 GMT Subject: RFR: 8256828: ostream::print_cr() truncates buffer in copy-through case [v2] In-Reply-To: <7sjAHNVe08bQX2n2Kp38Ppxqf9zem-oYahE3dO_la84=.20f4fbd3-2c20-415a-8e75-275d69cf4f7b@github.com> References: <7sjAHNVe08bQX2n2Kp38Ppxqf9zem-oYahE3dO_la84=.20f4fbd3-2c20-415a-8e75-275d69cf4f7b@github.com> Message-ID: > Clarifies the behaviour of this function in regards to truncation when adding a CR. Ensures a truncation warning is always issued. > > Adds unit testing for the specialized cases. > > See JBS for discussion of other suggestions. > > Testing: - tiers 1-4 > > Thanks David Holmes has updated the pull request incrementally with two additional commits since the last revision: - Use constexpr and add missing definition that some compilers complain about - use strncmp ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19512/files - new: https://git.openjdk.org/jdk/pull/19512/files/6b5aa1c9..e0b0693a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19512&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19512&range=00-01 Stats: 4 lines in 2 files changed: 1 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/19512.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19512/head:pull/19512 PR: https://git.openjdk.org/jdk/pull/19512 From amenkov at openjdk.org Mon Jun 3 23:31:12 2024 From: amenkov at openjdk.org (Alex Menkov) Date: Mon, 3 Jun 2024 23:31:12 GMT Subject: RFR: 8311177: Switching to interpreter only mode in carrier thread can lead to crashes [v4] In-Reply-To: References: Message-ID: <3noeBAjVIGN-pBWZvASA4c9hMMs0agvhinOAPbUy1-8=.ab468618-a05c-4b20-a61d-463a698034c0@github.com> On Mon, 3 Jun 2024 22:55:24 GMT, Serguei Spitsyn wrote: >> Please, review the following `interp-only` issue related to carrier threads. >> There are 3 problems fixed here: >> - The `EnterInterpOnlyModeClosure::do_threads` is taking the `JvmtiThreadState` with the `jt->jvmti_thread_state()` which is incorrect when we have a deal with a carrier thread. The target state is known at the point when the `HandshakeClosure` is set, so the fix is to pass it as a constructor parameter. >> - The `state->is_pending_interp_only_mode())` was processed at mounts only but it has to be processed for unmounts as well. >> - The test `test/hotspot/jtreg/serviceability/jvmti/vthread/MethodExitTest/libMethodExitTest.cpp` has a wrong assumption that there can't be `MethodExit` event on the carrier thread when the function `breakpoint_hit1` is being executed. However, it can happen if the virtual thread gets unmounted. >> >> The fix also includes new test case `vthread/CarrierThreadEventNotification` developed by Patricio. >> >> Testing: >> - Ran new test case locally >> - Ran mach5 tiers 1-6 > > Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: > > review: get rid of unneeded casts in new test Marked as reviewed by amenkov (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/19438#pullrequestreview-2095027750 From gcao at openjdk.org Tue Jun 4 02:06:18 2024 From: gcao at openjdk.org (Gui Cao) Date: Tue, 4 Jun 2024 02:06:18 GMT Subject: RFR: 8332587: RISC-V: secondary_super_cache does not scale well [v2] In-Reply-To: References: Message-ID: > Implementation of subtype checking [JDK-8180450](https://bugs.openjdk.org/browse/JDK-8180450) for linux-riscv64. > This optimization depends on availability of the Zbb extension which has the cpop instruction. > > ### Correctness testing: > > - [x] Run tier1-3, hotspot:tier4 tests on SOPHON SG2042 (release) > - [x] Run tier1-3 tests with -XX:+UseRVV on qemu 8.1.0 (fastdebug) > - [x] Run all of tier1 with `-XX:+VerifySecondarySupers` > > ### JMH tested on Banana Pi BPI-F3 board (has Zbb) and Enable UseZbb, Not Enable UseZba, UseZbs > Original: > > Benchmark Mode Cnt Score Error Units > SecondarySuperCacheHits.test avgt 15 11.375 ? 0.071 ns/op > SecondarySuperCacheInterContention.test avgt 15 646.087 ? 32.587 ns/op > SecondarySuperCacheInterContention.test:t1 avgt 15 600.090 ? 83.779 ns/op > SecondarySuperCacheInterContention.test:t2 avgt 15 692.084 ? 73.218 ns/op > SecondarySupersLookup.testNegative00 avgt 15 16.420 ? 0.239 ns/op > SecondarySupersLookup.testNegative01 avgt 15 18.307 ? 0.260 ns/op > SecondarySupersLookup.testNegative02 avgt 15 21.695 ? 0.458 ns/op > SecondarySupersLookup.testNegative03 avgt 15 24.855 ? 0.664 ns/op > SecondarySupersLookup.testNegative04 avgt 15 27.305 ? 0.522 ns/op > SecondarySupersLookup.testNegative05 avgt 15 29.719 ? 0.385 ns/op > SecondarySupersLookup.testNegative06 avgt 15 32.231 ? 0.498 ns/op > SecondarySupersLookup.testNegative07 avgt 15 33.747 ? 0.603 ns/op > SecondarySupersLookup.testNegative08 avgt 15 35.856 ? 0.629 ns/op > SecondarySupersLookup.testNegative09 avgt 15 37.077 ? 0.546 ns/op > SecondarySupersLookup.testNegative10 avgt 15 39.408 ? 0.465 ns/op > SecondarySupersLookup.testNegative16 avgt 15 51.041 ? 0.547 ns/op > SecondarySupersLookup.testNegative20 avgt 15 58.722 ? 0.922 ns/op > SecondarySupersLookup.testNegative30 avgt 15 77.310 ? 0.654 ns/op > SecondarySupersLookup.testNegative32 avgt 15 81.116 ? 0.854 ns/op > SecondarySupersLookup.testNegative40 avgt 15 96.311 ? 0.840 ns/op > SecondarySupersLookup.testNegative50 avgt 15 115.427 ? 0.838 ns/op > SecondarySupersLookup.testNegative55 avgt 15 124.371 ? 1.076 ns/op > SecondarySupersLookup.testNegative56 avgt 15 126.796 ? 0.916 ns/op > SecondarySupersLookup.testNegative57 avgt 15 127.952 ? 1.202 ns/op > SecondarySupersLookup.testNegative58 avgt 15 131.956 ? 4.515 ns/op > SecondarySupersLookup.testNegative59 avgt 15 131.858 ? 1.066 ns/op > SecondaryS... Gui Cao has updated the pull request incrementally with one additional commit since the last revision: Fix for Hamlin comment ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19320/files - new: https://git.openjdk.org/jdk/pull/19320/files/53d34329..c9110326 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19320&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19320&range=00-01 Stats: 30 lines in 2 files changed: 7 ins; 2 del; 21 mod Patch: https://git.openjdk.org/jdk/pull/19320.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19320/head:pull/19320 PR: https://git.openjdk.org/jdk/pull/19320 From dholmes at openjdk.org Tue Jun 4 02:09:13 2024 From: dholmes at openjdk.org (David Holmes) Date: Tue, 4 Jun 2024 02:09:13 GMT Subject: RFR: 8256828: ostream::print_cr() truncates buffer in copy-through case [v3] In-Reply-To: <7sjAHNVe08bQX2n2Kp38Ppxqf9zem-oYahE3dO_la84=.20f4fbd3-2c20-415a-8e75-275d69cf4f7b@github.com> References: <7sjAHNVe08bQX2n2Kp38Ppxqf9zem-oYahE3dO_la84=.20f4fbd3-2c20-415a-8e75-275d69cf4f7b@github.com> Message-ID: <7BwrnI6pyDmsjbYvkV4SrWh6ObNj38Vfla7Lyx3u-Rc=.7e9a3bf9-60b2-46b7-ab45-324ac646400d@github.com> > Clarifies the behaviour of this function in regards to truncation when adding a CR. Ensures a truncation warning is always issued. > > Adds unit testing for the specialized cases. > > See JBS for discussion of other suggestions. > > Testing: - tiers 1-4 > > Thanks David Holmes has updated the pull request incrementally with one additional commit since the last revision: revert constexpr usage ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19512/files - new: https://git.openjdk.org/jdk/pull/19512/files/e0b0693a..eeee7f57 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19512&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19512&range=01-02 Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/19512.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19512/head:pull/19512 PR: https://git.openjdk.org/jdk/pull/19512 From gcao at openjdk.org Tue Jun 4 02:11:35 2024 From: gcao at openjdk.org (Gui Cao) Date: Tue, 4 Jun 2024 02:11:35 GMT Subject: RFR: 8332587: RISC-V: secondary_super_cache does not scale well [v3] In-Reply-To: References: Message-ID: <4BgBBn_Oqhipw03h7BA7ZV4ZbhnMhdDqXU94Z-IFshs=.bd949215-5dd8-43a6-ac4e-348feee4853b@github.com> > Implementation of subtype checking [JDK-8180450](https://bugs.openjdk.org/browse/JDK-8180450) for linux-riscv64. > This optimization depends on availability of the Zbb extension which has the cpop instruction. > > ### Correctness testing: > > - [x] Run tier1-3, hotspot:tier4 tests on SOPHON SG2042 (release) > - [x] Run tier1-3 tests with -XX:+UseRVV on qemu 8.1.0 (fastdebug) > - [x] Run all of tier1 with `-XX:+VerifySecondarySupers` > > ### JMH tested on Banana Pi BPI-F3 board (has Zbb) and Enable UseZbb, Not Enable UseZba, UseZbs > Original: > > Benchmark Mode Cnt Score Error Units > SecondarySuperCacheHits.test avgt 15 11.375 ? 0.071 ns/op > SecondarySuperCacheInterContention.test avgt 15 646.087 ? 32.587 ns/op > SecondarySuperCacheInterContention.test:t1 avgt 15 600.090 ? 83.779 ns/op > SecondarySuperCacheInterContention.test:t2 avgt 15 692.084 ? 73.218 ns/op > SecondarySupersLookup.testNegative00 avgt 15 16.420 ? 0.239 ns/op > SecondarySupersLookup.testNegative01 avgt 15 18.307 ? 0.260 ns/op > SecondarySupersLookup.testNegative02 avgt 15 21.695 ? 0.458 ns/op > SecondarySupersLookup.testNegative03 avgt 15 24.855 ? 0.664 ns/op > SecondarySupersLookup.testNegative04 avgt 15 27.305 ? 0.522 ns/op > SecondarySupersLookup.testNegative05 avgt 15 29.719 ? 0.385 ns/op > SecondarySupersLookup.testNegative06 avgt 15 32.231 ? 0.498 ns/op > SecondarySupersLookup.testNegative07 avgt 15 33.747 ? 0.603 ns/op > SecondarySupersLookup.testNegative08 avgt 15 35.856 ? 0.629 ns/op > SecondarySupersLookup.testNegative09 avgt 15 37.077 ? 0.546 ns/op > SecondarySupersLookup.testNegative10 avgt 15 39.408 ? 0.465 ns/op > SecondarySupersLookup.testNegative16 avgt 15 51.041 ? 0.547 ns/op > SecondarySupersLookup.testNegative20 avgt 15 58.722 ? 0.922 ns/op > SecondarySupersLookup.testNegative30 avgt 15 77.310 ? 0.654 ns/op > SecondarySupersLookup.testNegative32 avgt 15 81.116 ? 0.854 ns/op > SecondarySupersLookup.testNegative40 avgt 15 96.311 ? 0.840 ns/op > SecondarySupersLookup.testNegative50 avgt 15 115.427 ? 0.838 ns/op > SecondarySupersLookup.testNegative55 avgt 15 124.371 ? 1.076 ns/op > SecondarySupersLookup.testNegative56 avgt 15 126.796 ? 0.916 ns/op > SecondarySupersLookup.testNegative57 avgt 15 127.952 ? 1.202 ns/op > SecondarySupersLookup.testNegative58 avgt 15 131.956 ? 4.515 ns/op > SecondarySupersLookup.testNegative59 avgt 15 131.858 ? 1.066 ns/op > SecondaryS... Gui Cao has updated the pull request incrementally with one additional commit since the last revision: Fix Code format ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19320/files - new: https://git.openjdk.org/jdk/pull/19320/files/c9110326..0c7c9f59 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19320&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19320&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/19320.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19320/head:pull/19320 PR: https://git.openjdk.org/jdk/pull/19320 From duke at openjdk.org Tue Jun 4 02:16:03 2024 From: duke at openjdk.org (kuaiwei) Date: Tue, 4 Jun 2024 02:16:03 GMT Subject: RFR: 8325821: [REDO] use "dmb.ishst+dmb.ishld" for release barrier [v7] In-Reply-To: <6ibfHj015tewMFWCGCYCSBr1-DIlkbebYAGXuoC_h58=.2a4336e2-fcdf-4c68-a61b-e6f65cdb5eb8@github.com> References: <6ibfHj015tewMFWCGCYCSBr1-DIlkbebYAGXuoC_h58=.2a4336e2-fcdf-4c68-a61b-e6f65cdb5eb8@github.com> Message-ID: <00Y9wQWgMDLWbKewlbt8ei25UMuYcNLKH74Cu9TuHQ8=.adf53c3e-8f49-4845-acb9-7a13f99c9ae1@github.com> On Mon, 3 Jun 2024 06:45:22 GMT, kuaiwei wrote: >> he origin patch for https://bugs.openjdk.org/browse/JDK-8324186 has 2 issues: >> 1 It show regression in some platform, like Apple silicon in mac os >> 2 Can not handle instruction sequence like "dmb.ishld; dmb.ishst; dmb.ishld; dmb.ishld" >> >> It can be fixed by: >> 1 Enable AlwaysMergeDMB by default, only disable it in architecture we can see performance improvement (N1 or N2) >> 2 Check the special pattern and merge the subsequent dmb. >> >> It also fix a bug when code buffer is expanding, st/ld/dmb can not be merged. I added unit tests for these. >> >> This patch still has a unhandled case. Insts like "dmb.ishld; dmb.ishst; dmb.ish", it will merge the last 2 instructions and can not merge all three. Because when emitting dmb.ish, if merge all previous dmbs, the code buffer will shrink the size. I think it may break some resumption and think it's not a common pattern. >> >> In previous PR https://github.com/openjdk/jdk/pull/18467 , I tried an implementation to use state machine for merging. But it looks risky to pending instruction during emitting. > > kuaiwei has updated the pull request incrementally with one additional commit since the last revision: > > Use constexpr for test encoding Thanks for your review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19278#issuecomment-2146437869 From gcao at openjdk.org Tue Jun 4 02:19:14 2024 From: gcao at openjdk.org (Gui Cao) Date: Tue, 4 Jun 2024 02:19:14 GMT Subject: RFR: 8332587: RISC-V: secondary_super_cache does not scale well In-Reply-To: References: Message-ID: On Mon, 3 Jun 2024 08:27:50 GMT, Hamlin Li wrote: >> Implementation of subtype checking [JDK-8180450](https://bugs.openjdk.org/browse/JDK-8180450) for linux-riscv64. >> This optimization depends on availability of the Zbb extension which has the cpop instruction. >> >> ### Correctness testing: >> >> - [x] Run tier1-3, hotspot:tier4 tests on SOPHON SG2042 (release) >> - [x] Run tier1-3 tests with -XX:+UseRVV on qemu 8.1.0 (fastdebug) >> - [x] Run all of tier1 with `-XX:+VerifySecondarySupers` >> >> ### JMH tested on Banana Pi BPI-F3 board (has Zbb) and Enable UseZbb, Not Enable UseZba, UseZbs >> Original: >> >> Benchmark Mode Cnt Score Error Units >> SecondarySuperCacheHits.test avgt 15 11.375 ? 0.071 ns/op >> SecondarySuperCacheInterContention.test avgt 15 646.087 ? 32.587 ns/op >> SecondarySuperCacheInterContention.test:t1 avgt 15 600.090 ? 83.779 ns/op >> SecondarySuperCacheInterContention.test:t2 avgt 15 692.084 ? 73.218 ns/op >> SecondarySupersLookup.testNegative00 avgt 15 16.420 ? 0.239 ns/op >> SecondarySupersLookup.testNegative01 avgt 15 18.307 ? 0.260 ns/op >> SecondarySupersLookup.testNegative02 avgt 15 21.695 ? 0.458 ns/op >> SecondarySupersLookup.testNegative03 avgt 15 24.855 ? 0.664 ns/op >> SecondarySupersLookup.testNegative04 avgt 15 27.305 ? 0.522 ns/op >> SecondarySupersLookup.testNegative05 avgt 15 29.719 ? 0.385 ns/op >> SecondarySupersLookup.testNegative06 avgt 15 32.231 ? 0.498 ns/op >> SecondarySupersLookup.testNegative07 avgt 15 33.747 ? 0.603 ns/op >> SecondarySupersLookup.testNegative08 avgt 15 35.856 ? 0.629 ns/op >> SecondarySupersLookup.testNegative09 avgt 15 37.077 ? 0.546 ns/op >> SecondarySupersLookup.testNegative10 avgt 15 39.408 ? 0.465 ns/op >> SecondarySupersLookup.testNegative16 avgt 15 51.041 ? 0.547 ns/op >> SecondarySupersLookup.testNegative20 avgt 15 58.722 ? 0.922 ns/op >> SecondarySupersLookup.testNegative30 avgt 15 77.310 ? 0.654 ns/op >> SecondarySupersLookup.testNegative32 avgt 15 81.116 ? 0.854 ns/op >> SecondarySupersLookup.testNegative40 avgt 15 96.311 ? 0.840 ns/op >> SecondarySupersLookup.testNegative50 avgt 15 115.427 ? 0.838 ns/op >> SecondarySupersLookup.testNegative55 avgt 15 124.371 ? 1.076 ns/op >> SecondarySupersLookup.testNegative56 avgt 15 126.796 ? 0.916 ns/op >> SecondarySupersLookup.testNegative57 avgt 15 127.952 ? 1.202 ns/op >> SecondarySupersLookup.testNegative58 avgt 15 131.956 ? 4.515 ns/op >> Seco... > > Hey, is there performance comparison data when zbb is not supported? Just want to know if it can also get performance gain in that situation, then this optimization is a more generic one. > > The reason for ask this question is that seems JDK-8180450 is an optimization for cache line, it should or could bring benefit even if zbb is not supported? @Hamlin-Li : Hi Hamlin, I've fixed all of the above comments. I've enabled Zba, Zbb, Zbs and tested JMH before and the performance has improved a bit compared to just enabling Zbb, I'll be re-testing the JMH with Zba, Zbb,Zbs enabled today and I'll update here when I'm done with it. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19320#issuecomment-2146441422 From vlivanov at openjdk.org Tue Jun 4 03:50:06 2024 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Tue, 4 Jun 2024 03:50:06 GMT Subject: RFR: 8330198: Add some class loading related perf counters to measure VM startup [v3] In-Reply-To: References: <7yfsvM0ff6gBYLefpro2qTcEMBmCOHd3YICcygItlZs=.d900a439-4932-46e6-b287-d1bf2789f195@github.com> <24n-bMIYvAF10yob4_Z5t1DPG_nrWypoQYE25zQ499U=.1a77aa5e-1005-4d32-9ae7-ec745838e449@github.com> Message-ID: <0gZ9cLiP3bLH1LNL71dwnuaUY5iN-Ewea-Qyw7eGe44=.f8fa485f-5947-47fc-a8eb-cafd1c56165d@github.com> On Mon, 3 Jun 2024 20:45:39 GMT, Ioi Lam wrote: >> I'm getting confused. >> >> The ProfileClassLinkage flag should control whether the counters are initialized and used. >> >> The -Xlog:init (perhaps with a better name/tag!) should control whether they get printed - in theory you could chose to print under any logging setting. >> >> I don't think either flag should imply/force the setting of the other. > > It's about usability. `-Xlog:init` means "I want to see logs related to initialization", so it should enable all the counters for printing the related logs. We may add several groups of counters in the future. We don't want to force the user to enumerate all these counters like: > > > java -Xlog:init -XX:+ ProfileClassLinkage -XX:+ProfileAAA -XX:+ProfileBBB .... > > > Also, `-Xlog:init` doesn't force the all the counters to be used. It just changes the default value of the selection: > > > if (log_is_enabled(Info, init)) { > FLAG_SET_ERGO_IF_DEFAULT(ProfileClassLinkage, true); > FLAG_SET_ERGO_IF_DEFAULT(ProfileAAA, true); > FLAG_SET_ERGO_IF_DEFAULT(ProfileBBB, true); > } > > > The user could disable certain counters by: > > > java -Xlog:init -XX:-ProfileAAA > The -Xlog:init (perhaps with a better name/tag!) I'm all for a better naming scheme. Any suggestions? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18790#discussion_r1625302912 From dholmes at openjdk.org Tue Jun 4 04:51:17 2024 From: dholmes at openjdk.org (David Holmes) Date: Tue, 4 Jun 2024 04:51:17 GMT Subject: RFR: 8326716: JVMTI spec: clarify what nullptr means for C/C++ developers [v5] In-Reply-To: References: Message-ID: On Mon, 3 Jun 2024 22:03:13 GMT, Chris Plummer wrote: >> Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: >> >> review: consistency and stylistical corrections > > src/hotspot/share/prims/jvmti.xml line 1007: > >> 1005: explicitly deallocate. This is indicated in the individual >> 1006: function descriptions. Empty lists, arrays, sequences, etc are >> 1007: returned as a null pointer (C NULL or C++ nullptr). > > Why describe what is meant by a "null pointer" here when it is not done elsewhere? The intent is to provide a definition of what a null pointer is, for both C and C++ programs. Is there a better place to do that so that elsewhere the spec can simply to refer to "a null pointer" or "null"? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19257#discussion_r1625338781 From dholmes at openjdk.org Tue Jun 4 05:17:05 2024 From: dholmes at openjdk.org (David Holmes) Date: Tue, 4 Jun 2024 05:17:05 GMT Subject: RFR: 8330198: Add some class loading related perf counters to measure VM startup [v3] In-Reply-To: <0gZ9cLiP3bLH1LNL71dwnuaUY5iN-Ewea-Qyw7eGe44=.f8fa485f-5947-47fc-a8eb-cafd1c56165d@github.com> References: <7yfsvM0ff6gBYLefpro2qTcEMBmCOHd3YICcygItlZs=.d900a439-4932-46e6-b287-d1bf2789f195@github.com> <24n-bMIYvAF10yob4_Z5t1DPG_nrWypoQYE25zQ499U=.1a77aa5e-1005-4d32-9ae7-ec745838e449@github.com> <0gZ9cLiP3bLH1LNL71dwnuaUY5iN-Ewea-Qyw7eGe44=.f8fa485f-5947-47fc-a8eb-cafd1c56165d@github.com> Message-ID: On Tue, 4 Jun 2024 03:47:19 GMT, Vladimir Ivanov wrote: >> It's about usability. `-Xlog:init` means "I want to see logs related to initialization", so it should enable all the counters for printing the related logs. We may add several groups of counters in the future. We don't want to force the user to enumerate all these counters like: >> >> >> java -Xlog:init -XX:+ ProfileClassLinkage -XX:+ProfileAAA -XX:+ProfileBBB .... >> >> >> Also, `-Xlog:init` doesn't force the all the counters to be used. It just changes the default value of the selection: >> >> >> if (log_is_enabled(Info, init)) { >> FLAG_SET_ERGO_IF_DEFAULT(ProfileClassLinkage, true); >> FLAG_SET_ERGO_IF_DEFAULT(ProfileAAA, true); >> FLAG_SET_ERGO_IF_DEFAULT(ProfileBBB, true); >> } >> >> >> The user could disable certain counters by: >> >> >> java -Xlog:init -XX:-ProfileAAA > >> The -Xlog:init (perhaps with a better name/tag!) > > I'm all for a better naming scheme. Any suggestions? > -Xlog:init means "I want to see logs related to initialization", so it should enable all the counters for printing the related logs. I don't agree. Initialization logging could encompass many different things, some of which are individually controllable via different flags. Simply turning on init logging should not turn on all such flags. If you want that level of coupling then perhaps use init_counters (or something like that) to make it clear this is not a general log tag intended for any initialization code to use, but something you have chosen to tie to specific functionality. > We may add several groups of counters in the future. We don't want to force the user to enumerate all these counters It is not clear to me how you envisage that working. You want individual group switches plus a global one? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18790#discussion_r1625355958 From stuefe at openjdk.org Tue Jun 4 05:19:17 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 4 Jun 2024 05:19:17 GMT Subject: RFR: 8256828: ostream::print_cr() truncates buffer in copy-through case [v3] In-Reply-To: <7BwrnI6pyDmsjbYvkV4SrWh6ObNj38Vfla7Lyx3u-Rc=.7e9a3bf9-60b2-46b7-ab45-324ac646400d@github.com> References: <7sjAHNVe08bQX2n2Kp38Ppxqf9zem-oYahE3dO_la84=.20f4fbd3-2c20-415a-8e75-275d69cf4f7b@github.com> <7BwrnI6pyDmsjbYvkV4SrWh6ObNj38Vfla7Lyx3u-Rc=.7e9a3bf9-60b2-46b7-ab45-324ac646400d@github.com> Message-ID: <3EdbDDByACAibg6I93c1kZZXerw-t2WZrMZZwhVcy_g=.46bc7a54-f6cf-4f29-a686-12ae5c65aed5@github.com> On Tue, 4 Jun 2024 02:09:13 GMT, David Holmes wrote: >> Clarifies the behaviour of this function in regards to truncation when adding a CR. Ensures a truncation warning is always issued. >> >> Adds unit testing for the specialized cases. >> >> See JBS for discussion of other suggestions. >> >> Testing: - tiers 1-4 >> >> Thanks > > David Holmes has updated the pull request incrementally with one additional commit since the last revision: > > revert constexpr usage src/hotspot/share/utilities/defaultStream.hpp line 33: > 31: friend void ostream_abort(); > 32: public: > 33: class TestSupport; // Unit test support Interesting variant of the "friend class for test support". I never thought of this. An advantage is that its name is scoped to the testee class, so one does not have to think up a good name. src/hotspot/share/utilities/ostream.cpp line 142: > 140: warning("outputStream::do_vsnprintf output truncated -- buffer length is " SIZE_FORMAT > 141: " bytes but " SIZE_FORMAT " bytes are needed.", > 142: add_cr ? buflen + 1 : buflen, required_len + 1); PrintWarnings defaults to true, right? Ah, I see, you limit to debug builds. Okay. test/hotspot/gtest/utilities/test_ostream.cpp line 147: > 145: va_list ap; > 146: va_start(ap, format); > 147: const char* res = tty->do_vsnprintf(buf, len, format, ap, add_cr, rlen); This was confusing at first glance. Any reason you use tty instead of just defaultStream::do_vsnprintf? Or, outputStream::do_vsnprintf? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19512#discussion_r1625318876 PR Review Comment: https://git.openjdk.org/jdk/pull/19512#discussion_r1625331428 PR Review Comment: https://git.openjdk.org/jdk/pull/19512#discussion_r1625319845 From stuefe at openjdk.org Tue Jun 4 05:19:18 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 4 Jun 2024 05:19:18 GMT Subject: RFR: 8256828: ostream::print_cr() truncates buffer in copy-through case [v3] In-Reply-To: <3EdbDDByACAibg6I93c1kZZXerw-t2WZrMZZwhVcy_g=.46bc7a54-f6cf-4f29-a686-12ae5c65aed5@github.com> References: <7sjAHNVe08bQX2n2Kp38Ppxqf9zem-oYahE3dO_la84=.20f4fbd3-2c20-415a-8e75-275d69cf4f7b@github.com> <7BwrnI6pyDmsjbYvkV4SrWh6ObNj38Vfla7Lyx3u-Rc=.7e9a3bf9-60b2-46b7-ab45-324ac646400d@github.com> <3EdbDDByACAibg6I93c1kZZXerw-t2WZrMZZwhVcy_g=.46bc7a54-f6cf-4f29-a686-12ae5c65aed5@github.com> Message-ID: <-cLe5bCXo01vLuQ3x3Hr4zvCGlKW60gu8Ua9DHLdpA8=.ad14af0e-7fdf-41d6-837e-238461802e28@github.com> On Tue, 4 Jun 2024 04:17:56 GMT, Thomas Stuefe wrote: >> David Holmes has updated the pull request incrementally with one additional commit since the last revision: >> >> revert constexpr usage > > src/hotspot/share/utilities/defaultStream.hpp line 33: > >> 31: friend void ostream_abort(); >> 32: public: >> 33: class TestSupport; // Unit test support > > Interesting variant of the "friend class for test support". I never thought of this. An advantage is that its name is scoped to the testee class, so one does not have to think up a good name. Any reason why you test defaultstream and instread of outputStream? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19512#discussion_r1625319557 From stuefe at openjdk.org Tue Jun 4 05:19:18 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 4 Jun 2024 05:19:18 GMT Subject: RFR: 8256828: ostream::print_cr() truncates buffer in copy-through case [v3] In-Reply-To: References: <7sjAHNVe08bQX2n2Kp38Ppxqf9zem-oYahE3dO_la84=.20f4fbd3-2c20-415a-8e75-275d69cf4f7b@github.com> Message-ID: On Mon, 3 Jun 2024 07:14:19 GMT, David Holmes wrote: >> src/hotspot/share/utilities/ostream.hpp line 74: >> >>> 72: // of the returned string. >>> 73: // >>> 74: // In a debug build, if truncation occurs a VM warning is issued. >> >> I had to think a bit (I am not a native English speaker) about what the "Nominally" means, but I think it is supposed to contrast the second paragraph? As in "Normally we do that, but in the case of ... we do... ?". Same for "idiomatically" - what does that signify? > > Right "nominally" is indicating that it basically operates one way but there are exceptions as outlined in the second paragraph. > > "idiomatically" means we are applying a specific coding idiom aka pattern - in this case secure programming says you never pass a non-constant string to a printf-like function, but instead pass "%s" and supply the actual string as the argument. So when we encounter that idiom we will handle it specially. If you go for a clear behavior description, there are a few infos missing: - result_len contains the length of the returned string, including trailing cr, excluding trailing nul - function will always print the trailing cr regardless of truncation (e.g., print_cr("abc") with output buffer of 2 will just return a cr). But I wonder whether we can shorten the description to something like this: "Function will return a formatted string. It may bypass the caller-provided output buffer, returning the provided format string directly, if no formatting or copying is needed." ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19512#discussion_r1625327284 From stuefe at openjdk.org Tue Jun 4 05:33:16 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 4 Jun 2024 05:33:16 GMT Subject: RFR: 8312132: Add tracking of multiple address spaces in NMT [v121] In-Reply-To: References: Message-ID: On Thu, 30 May 2024 13:19:41 GMT, Johan Sj?len wrote: >> Hi, >> >> This PR introduces a new abstraction to NMT, named `MemoryFileTracker`. Today, NMT does not track any memory outside of the virtual memory address space. This means that if you allocated memory in something such as a memory-backed file and use `mmap` to map into that memory, then you'll have trouble reporting this to NMT. This is the situation that ZGC is in, and that is what this patch attempts to fix. >> >> ## `MemoryFileTracker` >> >> The `MemoryFileTracker` adds the ability of adding new virtual memory address spaces to NMT and committing memory to these, the basic API is: >> >> ```c++ >> static MemoryFile* make_device(const char* descriptive_name); >> static void free_device(MemoryFile* device); >> >> static void allocate_memory(MemoryFile* device, size_t offset, size_t size, >> MEMFLAGS flag, const NativeCallStack& stack); >> static void free_memory(MemoryFile* device, size_t offset, size_t size); >> >> >> It is easiest to see how this is used by looking at what ZGC's `ZNMT` class does: >> >> ```c++ >> void ZNMT::reserve(zaddress_unsafe start, size_t size) { >> MemTracker::record_virtual_memory_reserve((address)start, size, CALLER_PC, mtJavaHeap); >> } >> void ZNMT::commit(zoffset offset, size_t size) { >> MemTracker::allocate_memory_in(ZNMT::_device, static_cast(offset), size, mtJavaHeap, CALLER_PC); >> } >> void ZNMT::uncommit(zoffset offset, size_t size) { >> MemTracker::free_memory_in(ZNMT::_device, (size_t)offset, size); >> } >> >> void ZNMT::map(zaddress_unsafe addr, size_t size, zoffset offset) { >> // NMT doesn't track mappings at the moment. >> } >> void ZNMT::unmap(zaddress_unsafe addr, size_t size) { >> // NMT doesn't track mappings at the moment. >> } >> >> >> As you can see, any mapping between reserved regions and device-allocated memory is not recorded in NMT. This means that in detailed mode you only get reserved regions printed for the reserved memory, the device-allocated memory is reported separately. When performing summary reporting any memory allocated via these devices is added to the corresponding `MEMFLAGS` as `committed` memory. >> >> This patch is also acting as a base on which we deploy multiple new backend ideas to NMT. These ideas are: >> >> 1. Implement VMA tracking using a balanced binary tree approach. Today's `VirtualMemoryTracker`'s usage of linked lists is slow and brittle, we'd like to move away from it. Our Treap-based approach in this patch gives a performance bo... > > Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: > > Check for nullptr afterwards > ZGC reserves virtual memory _before_ committing "physical" memory. Reporting the physical memory to also be "reserved" memory sounds misleading to me. Okay, @jdksjolen, I retract my original comment and think using only the "reserved" property is okay as workaround for now for the ZGC memory tree. We then think of "reserved" as "allocated". What we should do going forward would be to: - have an own VMATree for ZGC memory file that does not carry "reserved" and "committed" but just a single boolean, "allocated". Since we only want to track which parts of the underlying memory file are allocated, right? There are only two states, allocated or not allocated. - To later reuse VMATree for regular mapping operations, we still need reserved and committed. So, in a future RFE, we could templatify the RegionData property type if we wanted. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18289#issuecomment-2146644007 From stuefe at openjdk.org Tue Jun 4 05:41:20 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 4 Jun 2024 05:41:20 GMT Subject: RFR: 8312132: Add tracking of multiple address spaces in NMT [v121] In-Reply-To: References: Message-ID: On Thu, 30 May 2024 13:19:41 GMT, Johan Sj?len wrote: >> Hi, >> >> This PR introduces a new abstraction to NMT, named `MemoryFileTracker`. Today, NMT does not track any memory outside of the virtual memory address space. This means that if you allocated memory in something such as a memory-backed file and use `mmap` to map into that memory, then you'll have trouble reporting this to NMT. This is the situation that ZGC is in, and that is what this patch attempts to fix. >> >> ## `MemoryFileTracker` >> >> The `MemoryFileTracker` adds the ability of adding new virtual memory address spaces to NMT and committing memory to these, the basic API is: >> >> ```c++ >> static MemoryFile* make_device(const char* descriptive_name); >> static void free_device(MemoryFile* device); >> >> static void allocate_memory(MemoryFile* device, size_t offset, size_t size, >> MEMFLAGS flag, const NativeCallStack& stack); >> static void free_memory(MemoryFile* device, size_t offset, size_t size); >> >> >> It is easiest to see how this is used by looking at what ZGC's `ZNMT` class does: >> >> ```c++ >> void ZNMT::reserve(zaddress_unsafe start, size_t size) { >> MemTracker::record_virtual_memory_reserve((address)start, size, CALLER_PC, mtJavaHeap); >> } >> void ZNMT::commit(zoffset offset, size_t size) { >> MemTracker::allocate_memory_in(ZNMT::_device, static_cast(offset), size, mtJavaHeap, CALLER_PC); >> } >> void ZNMT::uncommit(zoffset offset, size_t size) { >> MemTracker::free_memory_in(ZNMT::_device, (size_t)offset, size); >> } >> >> void ZNMT::map(zaddress_unsafe addr, size_t size, zoffset offset) { >> // NMT doesn't track mappings at the moment. >> } >> void ZNMT::unmap(zaddress_unsafe addr, size_t size) { >> // NMT doesn't track mappings at the moment. >> } >> >> >> As you can see, any mapping between reserved regions and device-allocated memory is not recorded in NMT. This means that in detailed mode you only get reserved regions printed for the reserved memory, the device-allocated memory is reported separately. When performing summary reporting any memory allocated via these devices is added to the corresponding `MEMFLAGS` as `committed` memory. >> >> This patch is also acting as a base on which we deploy multiple new backend ideas to NMT. These ideas are: >> >> 1. Implement VMA tracking using a balanced binary tree approach. Today's `VirtualMemoryTracker`'s usage of linked lists is slow and brittle, we'd like to move away from it. Our Treap-based approach in this patch gives a performance bo... > > Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: > > Check for nullptr afterwards I had a last look at this yesterday and today morning, and think this is okay to go. We will probably massage this a lot over the next weeks, but it is reasonably well tested. I'll approve now. I leave it up to you and Stefan if you want to change the double-reservation-accounting for ZGC. Also, good job. This was a lot of work. ------------- Marked as reviewed by stuefe (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18289#pullrequestreview-2095357061 PR Comment: https://git.openjdk.org/jdk/pull/18289#issuecomment-2146653557 From sspitsyn at openjdk.org Tue Jun 4 06:41:06 2024 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Tue, 4 Jun 2024 06:41:06 GMT Subject: RFR: 8326716: JVMTI spec: clarify what nullptr means for C/C++ developers [v2] In-Reply-To: <5OI8D0PhkM19awFsxnm6RTlJkaDxkUyvW75D3q-wK0Q=.a2a0262e-9d3c-4380-aafd-e6b7cfc4393a@github.com> References: <6Sb8kKpbkh4ylD4u5Zayx2fV0ZaC5aVNicqoX6g_UNA=.7831eabc-905f-489b-87da-68953ec03412@github.com> <5OI8D0PhkM19awFsxnm6RTlJkaDxkUyvW75D3q-wK0Q=.a2a0262e-9d3c-4380-aafd-e6b7cfc4393a@github.com> Message-ID: On Fri, 17 May 2024 03:49:21 GMT, Quan Anh Mai wrote: >> Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: >> >> review: corrected the nullptr clarification > > src/hotspot/share/prims/jvmti.xml line 1007: > >> 1005: explicitly deallocate. This is indicated in the individual >> 1006: function descriptions. Empty lists, arrays, sequences, etc are >> 1007: returned as a null pointer (C NULL or C++ nullptr). > > This may be a little unnecessary rigor, but I believe that `nullptr` is not a null pointer. `nullptr` is the pointer literal that can be implicitly converted to a null pointer value of any pointer type and any pointer to member type. And I think the thing returned here is a null pointer, not `nullptr`. I'm not sure I understand this comment. Sorry. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19257#discussion_r1625430986 From sspitsyn at openjdk.org Tue Jun 4 06:41:08 2024 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Tue, 4 Jun 2024 06:41:08 GMT Subject: RFR: 8326716: JVMTI spec: clarify what nullptr means for C/C++ developers [v5] In-Reply-To: References: Message-ID: On Tue, 4 Jun 2024 04:48:04 GMT, David Holmes wrote: >> src/hotspot/share/prims/jvmti.xml line 1007: >> >>> 1005: explicitly deallocate. This is indicated in the individual >>> 1006: function descriptions. Empty lists, arrays, sequences, etc are >>> 1007: returned as a null pointer (C NULL or C++ nullptr). >> >> Why describe what is meant by a "null pointer" here when it is not done elsewhere? > > The intent is to provide a definition of what a null pointer is, for both C and C++ programs. Is there a better place to do that so that elsewhere the spec can simply to refer to "a null pointer" or "null"? Thanks, David. I also feel this clarification is still useful. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19257#discussion_r1625432133 From sspitsyn at openjdk.org Tue Jun 4 06:46:17 2024 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Tue, 4 Jun 2024 06:46:17 GMT Subject: RFR: 8311177: Switching to interpreter only mode in carrier thread can lead to crashes [v4] In-Reply-To: References: Message-ID: On Mon, 3 Jun 2024 22:55:24 GMT, Serguei Spitsyn wrote: >> Please, review the following `interp-only` issue related to carrier threads. >> There are 3 problems fixed here: >> - The `EnterInterpOnlyModeClosure::do_threads` is taking the `JvmtiThreadState` with the `jt->jvmti_thread_state()` which is incorrect when we have a deal with a carrier thread. The target state is known at the point when the `HandshakeClosure` is set, so the fix is to pass it as a constructor parameter. >> - The `state->is_pending_interp_only_mode())` was processed at mounts only but it has to be processed for unmounts as well. >> - The test `test/hotspot/jtreg/serviceability/jvmti/vthread/MethodExitTest/libMethodExitTest.cpp` has a wrong assumption that there can't be `MethodExit` event on the carrier thread when the function `breakpoint_hit1` is being executed. However, it can happen if the virtual thread gets unmounted. >> >> The fix also includes new test case `vthread/CarrierThreadEventNotification` developed by Patricio. >> >> Testing: >> - Ran new test case locally >> - Ran mach5 tiers 1-6 > > Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: > > review: get rid of unneeded casts in new test Thank you for review, Alex! ------------- PR Comment: https://git.openjdk.org/jdk/pull/19438#issuecomment-2146732081 From dholmes at openjdk.org Tue Jun 4 06:56:01 2024 From: dholmes at openjdk.org (David Holmes) Date: Tue, 4 Jun 2024 06:56:01 GMT Subject: RFR: 8256828: ostream::print_cr() truncates buffer in copy-through case [v3] In-Reply-To: <-cLe5bCXo01vLuQ3x3Hr4zvCGlKW60gu8Ua9DHLdpA8=.ad14af0e-7fdf-41d6-837e-238461802e28@github.com> References: <7sjAHNVe08bQX2n2Kp38Ppxqf9zem-oYahE3dO_la84=.20f4fbd3-2c20-415a-8e75-275d69cf4f7b@github.com> <7BwrnI6pyDmsjbYvkV4SrWh6ObNj38Vfla7Lyx3u-Rc=.7e9a3bf9-60b2-46b7-ab45-324ac646400d@github.com> <3EdbDDByACAibg6I93c1kZZXerw-t2WZrMZZwhVcy_g=.46bc7a54-f6cf-4f29-a686-12ae5c65aed5@github.com> <-cLe5bCXo01vLuQ3x3Hr4zvCGlKW60gu8Ua9DHLdpA8=.ad14af0e-7fdf-41d6-837e-238461802e28@github.com> Message-ID: On Tue, 4 Jun 2024 04:19:03 GMT, Thomas Stuefe wrote: >> src/hotspot/share/utilities/defaultStream.hpp line 33: >> >>> 31: friend void ostream_abort(); >>> 32: public: >>> 33: class TestSupport; // Unit test support >> >> Interesting variant of the "friend class for test support". I never thought of this. An advantage is that its name is scoped to the testee class, so one does not have to think up a good name. > > Any reason why you test defaultstream and instread of outputStream? Because I use `tty` which is a `defaultStream` so that is where the test code has to be placed to access the protected function. BTW this pattern was suggested by Kim Barrett in preference to the use of a friend class. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19512#discussion_r1625448806 From alanb at openjdk.org Tue Jun 4 07:04:02 2024 From: alanb at openjdk.org (Alan Bateman) Date: Tue, 4 Jun 2024 07:04:02 GMT Subject: RFR: 8326716: JVMTI spec: clarify what nullptr means for C/C++ developers [v5] In-Reply-To: References: Message-ID: On Tue, 4 Jun 2024 06:38:36 GMT, Serguei Spitsyn wrote: >> The intent is to provide a definition of what a null pointer is, for both C and C++ programs. Is there a better place to do that so that elsewhere the spec can simply to refer to "a null pointer" or "null"? > > Thanks, David. I also feel this clarification is still useful. I think this is the right place but it is only for return values. There are a few functions where a parameter value can be a null pointer, e.g. in GetThreadState, SuspendThread, GetOwnedMonitorInfo the thread parameter can be a null pointer to mean the current thread. I don't think the introduction section has anywhere right now to say what a null pointer means. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19257#discussion_r1625458511 From dholmes at openjdk.org Tue Jun 4 07:05:10 2024 From: dholmes at openjdk.org (David Holmes) Date: Tue, 4 Jun 2024 07:05:10 GMT Subject: RFR: 8256828: ostream::print_cr() truncates buffer in copy-through case [v3] In-Reply-To: <3EdbDDByACAibg6I93c1kZZXerw-t2WZrMZZwhVcy_g=.46bc7a54-f6cf-4f29-a686-12ae5c65aed5@github.com> References: <7sjAHNVe08bQX2n2Kp38Ppxqf9zem-oYahE3dO_la84=.20f4fbd3-2c20-415a-8e75-275d69cf4f7b@github.com> <7BwrnI6pyDmsjbYvkV4SrWh6ObNj38Vfla7Lyx3u-Rc=.7e9a3bf9-60b2-46b7-ab45-324ac646400d@github.com> <3EdbDDByACAibg6I93c1kZZXerw-t2WZrMZZwhVcy_g=.46bc7a54-f6cf-4f29-a686-12ae5c65aed5@github.com> Message-ID: <_sV3FxX1WBEh15dnB483_tKqFXqvA43kwQwW29tPI1Q=.952d1838-3a2b-4159-ad1d-a50a8e210c3b@github.com> On Tue, 4 Jun 2024 04:36:30 GMT, Thomas Stuefe wrote: >> David Holmes has updated the pull request incrementally with one additional commit since the last revision: >> >> revert constexpr usage > > src/hotspot/share/utilities/ostream.cpp line 142: > >> 140: warning("outputStream::do_vsnprintf output truncated -- buffer length is " SIZE_FORMAT >> 141: " bytes but " SIZE_FORMAT " bytes are needed.", >> 142: add_cr ? buflen + 1 : buflen, required_len + 1); > > PrintWarnings defaults to true, right? Ah, I see, you limit to debug builds. Okay. Yes same as before. > test/hotspot/gtest/utilities/test_ostream.cpp line 147: > >> 145: va_list ap; >> 146: va_start(ap, format); >> 147: const char* res = tty->do_vsnprintf(buf, len, format, ap, add_cr, rlen); > > This was confusing at first glance. Any reason you use tty instead of just defaultStream::do_vsnprintf? Or, outputStream::do_vsnprintf? Ha! OMG I never even noticed `do_vnsprintf` is static - everything else is an instance method so I used `tty`. Okay I can simplify this and move the test class to outputStream ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19512#discussion_r1625453279 PR Review Comment: https://git.openjdk.org/jdk/pull/19512#discussion_r1625452545 From dholmes at openjdk.org Tue Jun 4 07:05:10 2024 From: dholmes at openjdk.org (David Holmes) Date: Tue, 4 Jun 2024 07:05:10 GMT Subject: RFR: 8256828: ostream::print_cr() truncates buffer in copy-through case [v3] In-Reply-To: References: <7sjAHNVe08bQX2n2Kp38Ppxqf9zem-oYahE3dO_la84=.20f4fbd3-2c20-415a-8e75-275d69cf4f7b@github.com> Message-ID: On Tue, 4 Jun 2024 04:29:38 GMT, Thomas Stuefe wrote: > result_len contains the length of the returned string, including trailing cr, excluding trailing nul Yes I state that it is the length of the returned string as per strlen. Yes I can clarify cr is always printed. > It may bypass the caller-provided output buffer, returning the provided format string directly, if no formatting or copying is needed Well it _will_ bypass in those cases, though you then need to know when "no copying is needed". The idea was to clearly document the behaviour so that people would no longer be surprised or make assumptions/interepretations. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19512#discussion_r1625457465 From mli at openjdk.org Tue Jun 4 07:05:37 2024 From: mli at openjdk.org (Hamlin Li) Date: Tue, 4 Jun 2024 07:05:37 GMT Subject: RFR: 8332900: RISC-V: refactor nativeInst_riscv.cpp and macroAssembler_riscv.cpp [v3] In-Reply-To: References: Message-ID: <5nb7z7duJwm6LdqipIOVokMeG02rOBdaIezqfu7hSFE=.cbdbdba5-0922-44bb-9b83-bd88f80b7a12@github.com> On Fri, 31 May 2024 12:13:32 GMT, Fei Yang wrote: >>> Like `NativeInstruction::is_li16u` which delegates work to `MacroAssembler::is_li16u_at`. >> >> I don't find `NativeInstruction::is_li16u`, maybe you want to say something else for the delegation you mentioned? >> >> Take `MacroAssembler::is_li16u_at` as example, I moved it to macroAssembler, because in macroAssembler it's used too. So one of the principals I'd like to stick to in this refactoring is to make these 2 classes's communication unidirectional, so maybe it's better to move `MacroAssembler::is_li16u_at` too. > >> > Like `NativeInstruction::is_li16u` which delegates work to `MacroAssembler::is_li16u_at`. >> >> I don't find `NativeInstruction::is_li16u`, maybe you want to say something else for the delegation you mentioned? > > Never mind. I think I miss read the code. > >> Take `MacroAssembler::is_li16u_at` as example, I moved it to macroAssembler, because in macroAssembler it's used too. So one of the principals I'd like to stick to in this refactoring is to make these 2 classes's communication unidirectional, so maybe it's better to move `MacroAssembler::is_li16u_at` too. > > Yeah. Your change becomes interesting to me now. I am having another check. Thanks @RealFYang @luhenry @robehn for your reviewing. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19459#issuecomment-2146761997 From mli at openjdk.org Tue Jun 4 07:08:07 2024 From: mli at openjdk.org (Hamlin Li) Date: Tue, 4 Jun 2024 07:08:07 GMT Subject: Integrated: 8332900: RISC-V: refactor nativeInst_riscv.cpp and macroAssembler_riscv.cpp In-Reply-To: References: Message-ID: On Wed, 29 May 2024 15:47:05 GMT, Hamlin Li wrote: > Hi, > Can you help to review the patch? > Currently, code in nativeInst_riscv.cpp and macroAssembler_riscv.cpp call each other, which is not right for readability and maintainance. > After refactoring, basically only code in nativeInst_riscv.cpp calls code in macroAssembler_riscv.cpp, but not in reverse direction. > > Thanks! > > * Tests are still running, so far so good. This pull request has now been integrated. Changeset: 454660d3 Author: Hamlin Li URL: https://git.openjdk.org/jdk/commit/454660d361e39f362ff0e10a5c2389af910cca23 Stats: 775 lines in 6 files changed: 354 ins; 341 del; 80 mod 8332900: RISC-V: refactor nativeInst_riscv.cpp and macroAssembler_riscv.cpp Reviewed-by: fyang, luhenry ------------- PR: https://git.openjdk.org/jdk/pull/19459 From dholmes at openjdk.org Tue Jun 4 07:34:34 2024 From: dholmes at openjdk.org (David Holmes) Date: Tue, 4 Jun 2024 07:34:34 GMT Subject: RFR: 8256828: ostream::print_cr() truncates buffer in copy-through case [v4] In-Reply-To: <7sjAHNVe08bQX2n2Kp38Ppxqf9zem-oYahE3dO_la84=.20f4fbd3-2c20-415a-8e75-275d69cf4f7b@github.com> References: <7sjAHNVe08bQX2n2Kp38Ppxqf9zem-oYahE3dO_la84=.20f4fbd3-2c20-415a-8e75-275d69cf4f7b@github.com> Message-ID: > Clarifies the behaviour of this function in regards to truncation when adding a CR. Ensures a truncation warning is always issued. > > Adds unit testing for the specialized cases. > > See JBS for discussion of other suggestions. > > Testing: - tiers 1-4 > > Thanks David Holmes has updated the pull request incrementally with one additional commit since the last revision: Simplified the test code - thanks @tstuefe! Rewrote the comment block describing do_vsnprintf. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19512/files - new: https://git.openjdk.org/jdk/pull/19512/files/eeee7f57..1aef14ca Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19512&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19512&range=02-03 Stats: 43 lines in 3 files changed: 5 ins; 1 del; 37 mod Patch: https://git.openjdk.org/jdk/pull/19512.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19512/head:pull/19512 PR: https://git.openjdk.org/jdk/pull/19512 From dholmes at openjdk.org Tue Jun 4 07:37:04 2024 From: dholmes at openjdk.org (David Holmes) Date: Tue, 4 Jun 2024 07:37:04 GMT Subject: RFR: 8256828: ostream::print_cr() truncates buffer in copy-through case [v4] In-Reply-To: References: <7sjAHNVe08bQX2n2Kp38Ppxqf9zem-oYahE3dO_la84=.20f4fbd3-2c20-415a-8e75-275d69cf4f7b@github.com> Message-ID: On Tue, 4 Jun 2024 07:00:56 GMT, David Holmes wrote: >> If you go for a clear behavior description, there are a few infos missing: >> - result_len contains the length of the returned string, including trailing cr, excluding trailing nul >> - function will always print the trailing cr regardless of truncation (e.g., print_cr("abc") with output buffer of 2 will just return a cr). >> >> But I wonder whether we can shorten the description to something like this: >> >> "Function will return a formatted string. It may bypass the caller-provided output buffer, returning the provided format string directly, if no formatting or copying is needed." > >> result_len contains the length of the returned string, including trailing cr, excluding trailing nul > > Yes I state that it is the length of the returned string as per strlen. > > Yes I can clarify cr is always printed. > >> It may bypass the caller-provided output buffer, returning the provided format string directly, if no formatting or copying is needed > > Well it _will_ bypass in those cases, though you then need to know when "no copying is needed". The idea was to clearly document the behaviour so that people would no longer be surprised or make assumptions/interepretations. I've simplified and clarified and even expanded the comment block describing the behaviour. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19512#discussion_r1625504156 From stuefe at openjdk.org Tue Jun 4 07:45:02 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 4 Jun 2024 07:45:02 GMT Subject: RFR: 8256828: ostream::print_cr() truncates buffer in copy-through case [v4] In-Reply-To: References: <7sjAHNVe08bQX2n2Kp38Ppxqf9zem-oYahE3dO_la84=.20f4fbd3-2c20-415a-8e75-275d69cf4f7b@github.com> Message-ID: On Tue, 4 Jun 2024 07:34:34 GMT, David Holmes wrote: >> Clarifies the behaviour of this function in regards to truncation when adding a CR. Ensures a truncation warning is always issued. >> >> Adds unit testing for the specialized cases. >> >> See JBS for discussion of other suggestions. >> >> Testing: - tiers 1-4 >> >> Thanks > > David Holmes has updated the pull request incrementally with one additional commit since the last revision: > > Simplified the test code - thanks @tstuefe! > Rewrote the comment block describing do_vsnprintf. Good. Thank you for taking this on. ------------- Marked as reviewed by stuefe (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19512#pullrequestreview-2095579155 From gcao at openjdk.org Tue Jun 4 07:57:14 2024 From: gcao at openjdk.org (Gui Cao) Date: Tue, 4 Jun 2024 07:57:14 GMT Subject: RFR: 8332587: RISC-V: secondary_super_cache does not scale well [v3] In-Reply-To: <4BgBBn_Oqhipw03h7BA7ZV4ZbhnMhdDqXU94Z-IFshs=.bd949215-5dd8-43a6-ac4e-348feee4853b@github.com> References: <4BgBBn_Oqhipw03h7BA7ZV4ZbhnMhdDqXU94Z-IFshs=.bd949215-5dd8-43a6-ac4e-348feee4853b@github.com> Message-ID: On Tue, 4 Jun 2024 02:11:35 GMT, Gui Cao wrote: >> Implementation of subtype checking [JDK-8180450](https://bugs.openjdk.org/browse/JDK-8180450) for linux-riscv64. >> This optimization depends on availability of the Zbb extension which has the cpop instruction. >> >> ### Correctness testing: >> >> - [x] Run tier1-3, hotspot:tier4 tests on SOPHON SG2042 (release) >> - [x] Run tier1-3 tests with -XX:+UseRVV on qemu 8.1.0 (fastdebug) >> - [x] Run all of tier1 with `-XX:+VerifySecondarySupers` >> >> ### JMH tested on Banana Pi BPI-F3 board (has Zbb) and Enable UseZbb, Not Enable UseZba, UseZbs >> Original: >> >> Benchmark Mode Cnt Score Error Units >> SecondarySuperCacheHits.test avgt 15 11.375 ? 0.071 ns/op >> SecondarySuperCacheInterContention.test avgt 15 646.087 ? 32.587 ns/op >> SecondarySuperCacheInterContention.test:t1 avgt 15 600.090 ? 83.779 ns/op >> SecondarySuperCacheInterContention.test:t2 avgt 15 692.084 ? 73.218 ns/op >> SecondarySupersLookup.testNegative00 avgt 15 16.420 ? 0.239 ns/op >> SecondarySupersLookup.testNegative01 avgt 15 18.307 ? 0.260 ns/op >> SecondarySupersLookup.testNegative02 avgt 15 21.695 ? 0.458 ns/op >> SecondarySupersLookup.testNegative03 avgt 15 24.855 ? 0.664 ns/op >> SecondarySupersLookup.testNegative04 avgt 15 27.305 ? 0.522 ns/op >> SecondarySupersLookup.testNegative05 avgt 15 29.719 ? 0.385 ns/op >> SecondarySupersLookup.testNegative06 avgt 15 32.231 ? 0.498 ns/op >> SecondarySupersLookup.testNegative07 avgt 15 33.747 ? 0.603 ns/op >> SecondarySupersLookup.testNegative08 avgt 15 35.856 ? 0.629 ns/op >> SecondarySupersLookup.testNegative09 avgt 15 37.077 ? 0.546 ns/op >> SecondarySupersLookup.testNegative10 avgt 15 39.408 ? 0.465 ns/op >> SecondarySupersLookup.testNegative16 avgt 15 51.041 ? 0.547 ns/op >> SecondarySupersLookup.testNegative20 avgt 15 58.722 ? 0.922 ns/op >> SecondarySupersLookup.testNegative30 avgt 15 77.310 ? 0.654 ns/op >> SecondarySupersLookup.testNegative32 avgt 15 81.116 ? 0.854 ns/op >> SecondarySupersLookup.testNegative40 avgt 15 96.311 ? 0.840 ns/op >> SecondarySupersLookup.testNegative50 avgt 15 115.427 ? 0.838 ns/op >> SecondarySupersLookup.testNegative55 avgt 15 124.371 ? 1.076 ns/op >> SecondarySupersLookup.testNegative56 avgt 15 126.796 ? 0.916 ns/op >> SecondarySupersLookup.testNegative57 avgt 15 127.952 ? 1.202 ns/op >> SecondarySupersLookup.testNegative58 avgt 15 131.956 ? 4.515 ns/op >> Seco... > > Gui Cao has updated the pull request incrementally with one additional commit since the last revision: > > Fix Code format JMH tested on Banana Pi BPI-F3 board (has Zba,Zbb,Zbs) and Enable UseZba, UseZbb, UseZbs Original(not with patch): Benchmark Mode Cnt Score Error Units SecondarySupersLookup.testNegative00 avgt 15 16.405 ? 0.221 ns/op SecondarySupersLookup.testNegative01 avgt 15 18.276 ? 0.251 ns/op SecondarySupersLookup.testNegative02 avgt 15 21.519 ? 0.314 ns/op SecondarySupersLookup.testNegative03 avgt 15 24.434 ? 0.370 ns/op SecondarySupersLookup.testNegative04 avgt 15 27.246 ? 0.441 ns/op SecondarySupersLookup.testNegative05 avgt 15 29.652 ? 0.397 ns/op SecondarySupersLookup.testNegative06 avgt 15 32.049 ? 0.533 ns/op SecondarySupersLookup.testNegative07 avgt 15 33.568 ? 0.501 ns/op SecondarySupersLookup.testNegative08 avgt 15 35.593 ? 0.606 ns/op SecondarySupersLookup.testNegative09 avgt 15 37.220 ? 0.407 ns/op SecondarySupersLookup.testNegative10 avgt 15 39.522 ? 0.511 ns/op SecondarySupersLookup.testNegative16 avgt 15 51.146 ? 0.667 ns/op SecondarySupersLookup.testNegative20 avgt 15 58.404 ? 0.654 ns/op SecondarySupersLookup.testNegative30 avgt 15 77.190 ? 0.796 ns/op SecondarySupersLookup.testNegative32 avgt 15 81.144 ? 0.761 ns/op SecondarySupersLookup.testNegative40 avgt 15 96.018 ? 0.733 ns/op SecondarySupersLookup.testNegative50 avgt 15 115.170 ? 0.876 ns/op SecondarySupersLookup.testNegative55 avgt 15 125.827 ? 4.322 ns/op SecondarySupersLookup.testNegative56 avgt 15 126.151 ? 0.979 ns/op SecondarySupersLookup.testNegative57 avgt 15 129.638 ? 4.326 ns/op SecondarySupersLookup.testNegative58 avgt 15 131.960 ? 4.584 ns/op SecondarySupersLookup.testNegative59 avgt 15 131.403 ? 1.051 ns/op SecondarySupersLookup.testNegative60 avgt 15 133.660 ? 0.888 ns/op SecondarySupersLookup.testNegative61 avgt 15 137.293 ? 4.852 ns/op SecondarySupersLookup.testNegative62 avgt 15 137.476 ? 1.081 ns/op SecondarySupersLookup.testNegative63 avgt 15 139.028 ? 1.026 ns/op SecondarySupersLookup.testNegative64 avgt 15 143.545 ? 5.011 ns/op SecondarySupersLookup.testPositive01 avgt 15 10.734 ? 0.156 ns/op SecondarySupersLookup.testPositive02 avgt 15 10.727 ? 0.145 ns/op SecondarySupersLookup.testPositive03 avgt 15 10.729 ? 0.149 ns/op SecondarySupersLookup.testPositive04 avgt 15 10.724 ? 0.140 ns/op SecondarySupersLookup.testPositive05 avgt 15 10.730 ? 0.152 ns/op SecondarySupersLookup.testPositive06 avgt 15 10.726 ? 0.143 ns/op SecondarySupersLookup.testPositive07 avgt 15 10.731 ? 0.151 ns/op SecondarySupersLookup.testPositive08 avgt 15 10.735 ? 0.158 ns/op SecondarySupersLookup.testPositive09 avgt 15 10.731 ? 0.152 ns/op SecondarySupersLookup.testPositive10 avgt 15 10.728 ? 0.147 ns/op SecondarySupersLookup.testPositive16 avgt 15 10.732 ? 0.153 ns/op SecondarySupersLookup.testPositive20 avgt 15 10.727 ? 0.145 ns/op SecondarySupersLookup.testPositive30 avgt 15 10.749 ? 0.170 ns/op SecondarySupersLookup.testPositive32 avgt 15 10.732 ? 0.153 ns/op SecondarySupersLookup.testPositive40 avgt 15 10.732 ? 0.153 ns/op SecondarySupersLookup.testPositive50 avgt 15 10.732 ? 0.153 ns/op SecondarySupersLookup.testPositive60 avgt 15 10.733 ? 0.155 ns/op SecondarySupersLookup.testPositive63 avgt 15 10.733 ? 0.153 ns/op SecondarySupersLookup.testPositive64 avgt 15 10.735 ? 0.158 ns/op Finished running test 'micro:vm.lang.SecondarySupersLookup' With patch: Benchmark Mode Cnt Score Error Units SecondarySupersLookup.testNegative00 avgt 15 13.270 ? 0.201 ns/op SecondarySupersLookup.testNegative01 avgt 15 13.261 ? 0.194 ns/op SecondarySupersLookup.testNegative02 avgt 15 13.265 ? 0.198 ns/op SecondarySupersLookup.testNegative03 avgt 15 13.274 ? 0.222 ns/op SecondarySupersLookup.testNegative04 avgt 15 13.262 ? 0.197 ns/op SecondarySupersLookup.testNegative05 avgt 15 13.264 ? 0.200 ns/op SecondarySupersLookup.testNegative06 avgt 15 13.259 ? 0.193 ns/op SecondarySupersLookup.testNegative07 avgt 15 13.261 ? 0.196 ns/op SecondarySupersLookup.testNegative08 avgt 15 13.255 ? 0.186 ns/op SecondarySupersLookup.testNegative09 avgt 15 13.267 ? 0.200 ns/op SecondarySupersLookup.testNegative10 avgt 15 13.265 ? 0.200 ns/op SecondarySupersLookup.testNegative16 avgt 15 13.277 ? 0.220 ns/op SecondarySupersLookup.testNegative20 avgt 15 13.270 ? 0.209 ns/op SecondarySupersLookup.testNegative30 avgt 15 13.279 ? 0.223 ns/op SecondarySupersLookup.testNegative32 avgt 15 13.284 ? 0.232 ns/op SecondarySupersLookup.testNegative40 avgt 15 13.288 ? 0.237 ns/op SecondarySupersLookup.testNegative50 avgt 15 13.290 ? 0.241 ns/op SecondarySupersLookup.testNegative55 avgt 15 51.179 ? 0.761 ns/op SecondarySupersLookup.testNegative56 avgt 15 51.175 ? 0.763 ns/op SecondarySupersLookup.testNegative57 avgt 15 51.550 ? 1.070 ns/op SecondarySupersLookup.testNegative58 avgt 15 51.182 ? 0.737 ns/op SecondarySupersLookup.testNegative59 avgt 15 51.169 ? 0.773 ns/op SecondarySupersLookup.testNegative60 avgt 15 74.605 ? 1.445 ns/op SecondarySupersLookup.testNegative61 avgt 15 74.434 ? 1.006 ns/op SecondarySupersLookup.testNegative62 avgt 15 74.587 ? 1.078 ns/op SecondarySupersLookup.testNegative63 avgt 15 155.881 ? 0.856 ns/op SecondarySupersLookup.testNegative64 avgt 15 158.028 ? 5.778 ns/op SecondarySupersLookup.testPositive01 avgt 15 10.744 ? 0.176 ns/op SecondarySupersLookup.testPositive02 avgt 15 10.731 ? 0.151 ns/op SecondarySupersLookup.testPositive03 avgt 15 10.727 ? 0.146 ns/op SecondarySupersLookup.testPositive04 avgt 15 10.732 ? 0.153 ns/op SecondarySupersLookup.testPositive05 avgt 15 10.728 ? 0.147 ns/op SecondarySupersLookup.testPositive06 avgt 15 10.731 ? 0.151 ns/op SecondarySupersLookup.testPositive07 avgt 15 10.725 ? 0.143 ns/op SecondarySupersLookup.testPositive08 avgt 15 10.730 ? 0.148 ns/op SecondarySupersLookup.testPositive09 avgt 15 10.734 ? 0.156 ns/op SecondarySupersLookup.testPositive10 avgt 15 10.726 ? 0.144 ns/op SecondarySupersLookup.testPositive16 avgt 15 10.727 ? 0.146 ns/op SecondarySupersLookup.testPositive20 avgt 15 10.731 ? 0.152 ns/op SecondarySupersLookup.testPositive30 avgt 15 10.729 ? 0.150 ns/op SecondarySupersLookup.testPositive32 avgt 15 10.728 ? 0.147 ns/op SecondarySupersLookup.testPositive40 avgt 15 10.745 ? 0.181 ns/op SecondarySupersLookup.testPositive50 avgt 15 10.732 ? 0.153 ns/op SecondarySupersLookup.testPositive60 avgt 15 10.729 ? 0.148 ns/op SecondarySupersLookup.testPositive63 avgt 15 10.734 ? 0.158 ns/op SecondarySupersLookup.testPositive64 avgt 15 10.732 ? 0.154 ns/op Finished running test 'micro:vm.lang.SecondarySupersLookup' ------------- PR Comment: https://git.openjdk.org/jdk/pull/19320#issuecomment-2146851987 From stuefe at openjdk.org Tue Jun 4 08:06:10 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 4 Jun 2024 08:06:10 GMT Subject: RFR: 8332105: Exploded JDK does not include CDS In-Reply-To: References: Message-ID: <3r1Fqo8nbGE_yqNiy4q29P7CTz5CfkpLs5tPDk5cWzo=.8448b1bb-2c75-414f-b18a-3050d9ea2982@github.com> On Sat, 11 May 2024 06:13:29 GMT, Thomas Stuefe wrote: > An exploded JDK cannot be used with either -Xshare:on or -Xshare:auto. That causes tests like runtime/CompressedOops/CompressedCPUSpecificClassSpaceReservation.java to fail when running on an exploded JDK. > > Since an exploded JDK cannot use CDS, we should - for tests - treat it as if CDS had not been included. > > > ---- > > Note that I was torn between two ways to fix this: > > - either this fix, which is rather simple and automatically updates the "vm.cds" `@requires` property > - or to expose "exploded-ness" as a boolean property via `WhiteBox` and `VMProps`(`jdk.exploded`). See this draft PR: https://github.com/openjdk/jdk/pull/19178 . > > The latter is cleaner and clearer, conveying the message of exploded-ness without muddling it with the CDS aspect. But OTOH the complexity may not be required. > > I can go either way, though I have a slight preference for this PR, which is why I posted it. Holding this off until post RDP1 23, not that important ------------- PR Comment: https://git.openjdk.org/jdk/pull/19188#issuecomment-2146870548 From mli at openjdk.org Tue Jun 4 08:58:12 2024 From: mli at openjdk.org (Hamlin Li) Date: Tue, 4 Jun 2024 08:58:12 GMT Subject: RFR: 8332587: RISC-V: secondary_super_cache does not scale well [v3] In-Reply-To: <4BgBBn_Oqhipw03h7BA7ZV4ZbhnMhdDqXU94Z-IFshs=.bd949215-5dd8-43a6-ac4e-348feee4853b@github.com> References: <4BgBBn_Oqhipw03h7BA7ZV4ZbhnMhdDqXU94Z-IFshs=.bd949215-5dd8-43a6-ac4e-348feee4853b@github.com> Message-ID: On Tue, 4 Jun 2024 02:11:35 GMT, Gui Cao wrote: >> Implementation of subtype checking [JDK-8180450](https://bugs.openjdk.org/browse/JDK-8180450) for linux-riscv64. >> This optimization depends on availability of the Zbb extension which has the cpop instruction. >> >> ### Correctness testing: >> >> - [x] Run tier1-3, hotspot:tier4 tests on SOPHON SG2042 (release) >> - [x] Run tier1-3 tests with -XX:+UseRVV on qemu 8.1.0 (fastdebug) >> - [x] Run all of tier1 with `-XX:+VerifySecondarySupers` >> >> ### JMH tested on Banana Pi BPI-F3 board (has Zbb) and Enable UseZbb, Not Enable UseZba, UseZbs >> Original: >> >> Benchmark Mode Cnt Score Error Units >> SecondarySuperCacheHits.test avgt 15 11.375 ? 0.071 ns/op >> SecondarySuperCacheInterContention.test avgt 15 646.087 ? 32.587 ns/op >> SecondarySuperCacheInterContention.test:t1 avgt 15 600.090 ? 83.779 ns/op >> SecondarySuperCacheInterContention.test:t2 avgt 15 692.084 ? 73.218 ns/op >> SecondarySupersLookup.testNegative00 avgt 15 16.420 ? 0.239 ns/op >> SecondarySupersLookup.testNegative01 avgt 15 18.307 ? 0.260 ns/op >> SecondarySupersLookup.testNegative02 avgt 15 21.695 ? 0.458 ns/op >> SecondarySupersLookup.testNegative03 avgt 15 24.855 ? 0.664 ns/op >> SecondarySupersLookup.testNegative04 avgt 15 27.305 ? 0.522 ns/op >> SecondarySupersLookup.testNegative05 avgt 15 29.719 ? 0.385 ns/op >> SecondarySupersLookup.testNegative06 avgt 15 32.231 ? 0.498 ns/op >> SecondarySupersLookup.testNegative07 avgt 15 33.747 ? 0.603 ns/op >> SecondarySupersLookup.testNegative08 avgt 15 35.856 ? 0.629 ns/op >> SecondarySupersLookup.testNegative09 avgt 15 37.077 ? 0.546 ns/op >> SecondarySupersLookup.testNegative10 avgt 15 39.408 ? 0.465 ns/op >> SecondarySupersLookup.testNegative16 avgt 15 51.041 ? 0.547 ns/op >> SecondarySupersLookup.testNegative20 avgt 15 58.722 ? 0.922 ns/op >> SecondarySupersLookup.testNegative30 avgt 15 77.310 ? 0.654 ns/op >> SecondarySupersLookup.testNegative32 avgt 15 81.116 ? 0.854 ns/op >> SecondarySupersLookup.testNegative40 avgt 15 96.311 ? 0.840 ns/op >> SecondarySupersLookup.testNegative50 avgt 15 115.427 ? 0.838 ns/op >> SecondarySupersLookup.testNegative55 avgt 15 124.371 ? 1.076 ns/op >> SecondarySupersLookup.testNegative56 avgt 15 126.796 ? 0.916 ns/op >> SecondarySupersLookup.testNegative57 avgt 15 127.952 ? 1.202 ns/op >> SecondarySupersLookup.testNegative58 avgt 15 131.956 ? 4.515 ns/op >> Seco... > > Gui Cao has updated the pull request incrementally with one additional commit since the last revision: > > Fix Code format There are a bit regression in cases of testNegative63/64, although these might be rare cases or not very common cases, but it's worth to have a try to improve it if possible. I think it's related to the implementation for the cases when bitmap is full, if it's full, before go to `repne_scan`, there're some instructions to execute. I wonder if it will help to have another "bitmap full test" just after "bitmap false test" (which is `test_bit(t0, r_bitmap, bit);`). But I'm not sure if it's feasible, maybe worth a try. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19320#issuecomment-2146976623 From varadam at openjdk.org Tue Jun 4 09:19:20 2024 From: varadam at openjdk.org (Varada M) Date: Tue, 4 Jun 2024 09:19:20 GMT Subject: RFR: 8331733: [PPC64] saving and restoring CR is not needed at most places Message-ID: PPC64 uses save/restore CR less often. Only LR is critical, CR is mainly needed for native-to-Java calls. ------------- Commit messages: - [PPC64] saving and restoring CR is not needed at most places - [PPC64] saving and restoring CR is not needed at most places Changes: https://git.openjdk.org/jdk/pull/19494/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19494&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8331733 Stats: 75 lines in 12 files changed: 13 ins; 3 del; 59 mod Patch: https://git.openjdk.org/jdk/pull/19494.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19494/head:pull/19494 PR: https://git.openjdk.org/jdk/pull/19494 From mdoerr at openjdk.org Tue Jun 4 09:19:20 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 4 Jun 2024 09:19:20 GMT Subject: RFR: 8331733: [PPC64] saving and restoring CR is not needed at most places In-Reply-To: References: Message-ID: <39aW2Z66k-tf-QGM-sDRe3SmfQOHC-_xTarZ2xdW8W4=.d8b21d43-41ff-4e52-b1df-c69bd892c37f@github.com> On Fri, 31 May 2024 08:56:36 GMT, Varada M wrote: > PPC64 uses save/restore CR less often. Only LR is critical, CR is mainly needed for native-to-Java calls. src/hotspot/cpu/ppc/gc/shared/barrierSetAssembler_ppc.cpp line 345: > 343: } > 344: } else if (vm_reg->is_ConditionRegister()) { > 345: // NOP. Conditions registers are covered by save_LR This comment is no longer correct. I don't think that we ever save or restore condition registers at this point. So, I think we can replace this comment by `ShouldNotReachHere(); // live condition registers are unexpected at this point`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19494#discussion_r1622505227 From varadam at openjdk.org Tue Jun 4 09:19:20 2024 From: varadam at openjdk.org (Varada M) Date: Tue, 4 Jun 2024 09:19:20 GMT Subject: RFR: 8331733: [PPC64] saving and restoring CR is not needed at most places In-Reply-To: <39aW2Z66k-tf-QGM-sDRe3SmfQOHC-_xTarZ2xdW8W4=.d8b21d43-41ff-4e52-b1df-c69bd892c37f@github.com> References: <39aW2Z66k-tf-QGM-sDRe3SmfQOHC-_xTarZ2xdW8W4=.d8b21d43-41ff-4e52-b1df-c69bd892c37f@github.com> Message-ID: On Fri, 31 May 2024 14:24:46 GMT, Martin Doerr wrote: >> PPC64 uses save/restore CR less often. Only LR is critical, CR is mainly needed for native-to-Java calls. > > src/hotspot/cpu/ppc/gc/shared/barrierSetAssembler_ppc.cpp line 345: > >> 343: } >> 344: } else if (vm_reg->is_ConditionRegister()) { >> 345: // NOP. Conditions registers are covered by save_LR > > This comment is no longer correct. I don't think that we ever save or restore condition registers at this point. So, I think we can replace this comment by `ShouldNotReachHere(); // live condition registers are unexpected at this point`. Thanks @TheRealMDoerr . I have changed the comment ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19494#discussion_r1625649264 From sgehwolf at openjdk.org Tue Jun 4 09:54:27 2024 From: sgehwolf at openjdk.org (Severin Gehwolf) Date: Tue, 4 Jun 2024 09:54:27 GMT Subject: RFR: 8333446: Add tests for hierarchical container support Message-ID: Please review this PR which adds test support for systemd slices so that bugs like [JDK-8217338](https://bugs.openjdk.org/browse/JDK-8217338) can be verified. The added test, `SystemdMemoryAwarenessTest` currently passes on cgroups v1 and fails on cgroups v2 due to the way how [JDK-8217338](https://bugs.openjdk.org/browse/JDK-8217338) was implemented when JDK 13 was a thing. Therefore immediately problem-listed. It should get unlisted once [JDK-8322420](https://bugs.openjdk.org/browse/JDK-8322420) merges. I'm adding those tests in order to not regress another time. Testing: - [x] Container tests on Linux x86_64 cgroups v2 and Linux x86_64 cgroups v1. - [x] New systemd test on cg v1 (passes). Fails on cg v2 (due to JDK-8322420) - [x] GHA ------------- Commit messages: - Fix comments - 8333446: Add tests for hierarchical container support Changes: https://git.openjdk.org/jdk/pull/19530/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19530&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8333446 Stats: 489 lines in 8 files changed: 482 ins; 0 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/19530.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19530/head:pull/19530 PR: https://git.openjdk.org/jdk/pull/19530 From sgehwolf at openjdk.org Tue Jun 4 09:54:27 2024 From: sgehwolf at openjdk.org (Severin Gehwolf) Date: Tue, 4 Jun 2024 09:54:27 GMT Subject: RFR: 8333446: Add tests for hierarchical container support In-Reply-To: References: Message-ID: On Mon, 3 Jun 2024 17:28:09 GMT, Severin Gehwolf wrote: > Please review this PR which adds test support for systemd slices so that bugs like [JDK-8217338](https://bugs.openjdk.org/browse/JDK-8217338) can be verified. The added test, `SystemdMemoryAwarenessTest` currently passes on cgroups v1 and fails on cgroups v2 due to the way how [JDK-8217338](https://bugs.openjdk.org/browse/JDK-8217338) was implemented when JDK 13 was a thing. Therefore immediately problem-listed. It should get unlisted once [JDK-8322420](https://bugs.openjdk.org/browse/JDK-8322420) merges. > > I'm adding those tests in order to not regress another time. > > Testing: > - [x] Container tests on Linux x86_64 cgroups v2 and Linux x86_64 cgroups v1. > - [x] New systemd test on cg v1 (passes). Fails on cg v2 (due to JDK-8322420) > - [x] GHA GHA failure of macos-x64 seems infra related and not related to this patch. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19530#issuecomment-2147097626 From jsjolen at openjdk.org Tue Jun 4 10:33:43 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Tue, 4 Jun 2024 10:33:43 GMT Subject: RFR: 8312132: Add tracking of multiple address spaces in NMT [v122] In-Reply-To: References: Message-ID: <7GsGYT_S2F419TgL3UZcnoUt8nz2i0lmEiG5mWKrlCY=.edf90cb9-0a02-4314-a4be-e99b36e78461@github.com> > Hi, > > This PR introduces a new abstraction to NMT, named `MemoryFileTracker`. Today, NMT does not track any memory outside of the virtual memory address space. This means that if you allocated memory in something such as a memory-backed file and use `mmap` to map into that memory, then you'll have trouble reporting this to NMT. This is the situation that ZGC is in, and that is what this patch attempts to fix. > > ## `MemoryFileTracker` > > The `MemoryFileTracker` adds the ability of adding new virtual memory address spaces to NMT and committing memory to these, the basic API is: > > ```c++ > static MemoryFile* make_device(const char* descriptive_name); > static void free_device(MemoryFile* device); > > static void allocate_memory(MemoryFile* device, size_t offset, size_t size, > MEMFLAGS flag, const NativeCallStack& stack); > static void free_memory(MemoryFile* device, size_t offset, size_t size); > > > It is easiest to see how this is used by looking at what ZGC's `ZNMT` class does: > > ```c++ > void ZNMT::reserve(zaddress_unsafe start, size_t size) { > MemTracker::record_virtual_memory_reserve((address)start, size, CALLER_PC, mtJavaHeap); > } > void ZNMT::commit(zoffset offset, size_t size) { > MemTracker::allocate_memory_in(ZNMT::_device, static_cast(offset), size, mtJavaHeap, CALLER_PC); > } > void ZNMT::uncommit(zoffset offset, size_t size) { > MemTracker::free_memory_in(ZNMT::_device, (size_t)offset, size); > } > > void ZNMT::map(zaddress_unsafe addr, size_t size, zoffset offset) { > // NMT doesn't track mappings at the moment. > } > void ZNMT::unmap(zaddress_unsafe addr, size_t size) { > // NMT doesn't track mappings at the moment. > } > > > As you can see, any mapping between reserved regions and device-allocated memory is not recorded in NMT. This means that in detailed mode you only get reserved regions printed for the reserved memory, the device-allocated memory is reported separately. When performing summary reporting any memory allocated via these devices is added to the corresponding `MEMFLAGS` as `committed` memory. > > This patch is also acting as a base on which we deploy multiple new backend ideas to NMT. These ideas are: > > 1. Implement VMA tracking using a balanced binary tree approach. Today's `VirtualMemoryTracker`'s usage of linked lists is slow and brittle, we'd like to move away from it. Our Treap-based approach in this patch gives a performance boost such that we see 25x better performance in a benchmark. The idea and draft of this... Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: Delete dead code and only account for committed memory in summary ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18289/files - new: https://git.openjdk.org/jdk/pull/18289/files/fd165407..ae73fda8 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18289&range=121 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18289&range=120-121 Stats: 10 lines in 2 files changed: 0 ins; 9 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/18289.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18289/head:pull/18289 PR: https://git.openjdk.org/jdk/pull/18289 From jsjolen at openjdk.org Tue Jun 4 10:53:49 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Tue, 4 Jun 2024 10:53:49 GMT Subject: RFR: 8312132: Add tracking of multiple address spaces in NMT [v123] In-Reply-To: References: Message-ID: > Hi, > > This PR introduces a new abstraction to NMT, named `MemoryFileTracker`. Today, NMT does not track any memory outside of the virtual memory address space. This means that if you allocated memory in something such as a memory-backed file and use `mmap` to map into that memory, then you'll have trouble reporting this to NMT. This is the situation that ZGC is in, and that is what this patch attempts to fix. > > ## `MemoryFileTracker` > > The `MemoryFileTracker` adds the ability of adding new virtual memory address spaces to NMT and committing memory to these, the basic API is: > > ```c++ > static MemoryFile* make_device(const char* descriptive_name); > static void free_device(MemoryFile* device); > > static void allocate_memory(MemoryFile* device, size_t offset, size_t size, > MEMFLAGS flag, const NativeCallStack& stack); > static void free_memory(MemoryFile* device, size_t offset, size_t size); > > > It is easiest to see how this is used by looking at what ZGC's `ZNMT` class does: > > ```c++ > void ZNMT::reserve(zaddress_unsafe start, size_t size) { > MemTracker::record_virtual_memory_reserve((address)start, size, CALLER_PC, mtJavaHeap); > } > void ZNMT::commit(zoffset offset, size_t size) { > MemTracker::allocate_memory_in(ZNMT::_device, static_cast(offset), size, mtJavaHeap, CALLER_PC); > } > void ZNMT::uncommit(zoffset offset, size_t size) { > MemTracker::free_memory_in(ZNMT::_device, (size_t)offset, size); > } > > void ZNMT::map(zaddress_unsafe addr, size_t size, zoffset offset) { > // NMT doesn't track mappings at the moment. > } > void ZNMT::unmap(zaddress_unsafe addr, size_t size) { > // NMT doesn't track mappings at the moment. > } > > > As you can see, any mapping between reserved regions and device-allocated memory is not recorded in NMT. This means that in detailed mode you only get reserved regions printed for the reserved memory, the device-allocated memory is reported separately. When performing summary reporting any memory allocated via these devices is added to the corresponding `MEMFLAGS` as `committed` memory. > > This patch is also acting as a base on which we deploy multiple new backend ideas to NMT. These ideas are: > > 1. Implement VMA tracking using a balanced binary tree approach. Today's `VirtualMemoryTracker`'s usage of linked lists is slow and brittle, we'd like to move away from it. Our Treap-based approach in this patch gives a performance boost such that we see 25x better performance in a benchmark. The idea and draft of this... Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: Use zu ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18289/files - new: https://git.openjdk.org/jdk/pull/18289/files/ae73fda8..ee8c7da3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18289&range=122 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18289&range=121-122 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/18289.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18289/head:pull/18289 PR: https://git.openjdk.org/jdk/pull/18289 From mli at openjdk.org Tue Jun 4 10:57:03 2024 From: mli at openjdk.org (Hamlin Li) Date: Tue, 4 Jun 2024 10:57:03 GMT Subject: RFR: 8332689: RISC-V: Use load instead of trampolines [v2] In-Reply-To: <3W3z-PDFsRFSclrP3FJRmnEjL4rLDRSUEFN5qkFxSUI=.feb03562-9ca3-4383-94cd-967d4234a4aa@github.com> References: <3W3z-PDFsRFSclrP3FJRmnEjL4rLDRSUEFN5qkFxSUI=.feb03562-9ca3-4383-94cd-967d4234a4aa@github.com> Message-ID: On Mon, 3 Jun 2024 12:55:17 GMT, Robbin Ehn wrote: >> Hi all, please consider! >> >> Today we do JAL to **dest** if **dest** is in reach (+/- 1 MB). >> Using a very small application or running very short time we have fast patchable calls. >> But any normal application running longer will increase the code size and code chrun/fragmentation. >> So whatever or not you get hot fast calls rely on luck. >> >> To be patchable and get code cache reach we also emit a stub trampoline which we can point the JAL to. >> This would be the common case for a patchable call. >> >> Code stream: >> JAL >> Stubs: >> AUIPC >> LD >> JALR >> >> >> >> On some CPUs L1D and L1I can't contain the same cache line, which means the tramopline stub can bounce from L1I->L1D->L1I, which is expensive. >> Even if you don't have that problem having a call to a jump is not the fastest way. >> Loading the address avoids the pitsfalls of cmodx. >> >> This patch suggest to solve the problems with trampolines, we take small penalty in the naive case of JAL to **dest**, >> and instead do by default: >> >> Code stream: >> AUIPC >> LD >> JALR >> Stubs: >> >> >> An experimental option for turning trampolines back on exists. >> >> It should be possible to enhanced this with the WIP [Zjid](https://github.com/riscv/riscv-j-extension) by changing the JALR to JAL and nop out the auipc+ld (as the current proposal of Zjid forces the I-fetcher to fetch instruction in order (meaning we will avoid a lot issues which arm has)) when in reach and vice-versa. >> >> Numbers from VF2 (I have done them a few times, they are always overall in favor of this patch): >> >> fop (msec) 2239 | 2128 = 0.950424 >> h2 (msec) 18660 | 16594 = 0.889282 >> jython (msec) 22022 | 21925 = 0.995595 >> luindex (msec) 2866 | 2842 = 0.991626 >> lusearch (msec) 4108 | 4311 = 1.04942 >> lusearch-fix (msec) 4406 | 4116 = 0.934181 >> pmd (msec) 5976 | 5897 = 0.98678 >> jython (msec) 22022 | 21925 = 0.995595 >> Avg: 0.974112 >> fop(xcomp) (msec) 2721 | 2714 = 0.997427 >> h2(xcomp) ... > > Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - Merge branch 'master' into 8332689 > - Remove accidental files > - Remove accidental files > - Baseline src/hotspot/cpu/riscv/nativeInst_riscv.cpp line 90: > 88: > 89: bool NativeInstruction::is_call_at(address addr) { > 90: return NativeCall::is_at(addr); parent class calling into child class? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19453#discussion_r1625794254 From jsjolen at openjdk.org Tue Jun 4 10:57:09 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Tue, 4 Jun 2024 10:57:09 GMT Subject: RFR: 8331193: Return references when possible in GrowableArray [v11] In-Reply-To: References: Message-ID: On Tue, 28 May 2024 16:09:19 GMT, Johan Sj?len wrote: >> Hi, >> >> This PR introduces the possibility of using references more often when using GrowableArray, where as previously this was only possible when using the `at()` method. This lets us avoid copying and redundant method calls and makes the API more streamlined. After the patch, we can use `at_grow` just like `at` works. The same goes for `top`, `first`, and `last`. >> >> >> Some example code: >> ```c++ >> // Before this patch this worked: >> GrowableArray arr(8,8,-1); // Pre-fill with 8 -1s >> int& x = arr.at(7); >> if (x == -1) { >> x = 2; >> } >> assert(arr.at(7) == 2, "this holds"); >> // but this was forbidden >> int& x = arr.at_grow(9, -1); // Compilation error! at_grow returns E, not E& >> // so we had to do >> int x = arr.at_grow(9, -1); >> if (x == -1) { >> arr.at_put(9, 2); >> } >> >> >> Thanks. > > Johan Sj?len has updated the pull request incrementally with two additional commits since the last revision: > > - Merge remote-tracking branch 'origin/return-reference' into return-reference > - Also add test for first and last Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/18975#issuecomment-2147229294 From jsjolen at openjdk.org Tue Jun 4 10:57:10 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Tue, 4 Jun 2024 10:57:10 GMT Subject: Integrated: 8331193: Return references when possible in GrowableArray In-Reply-To: References: Message-ID: <-gXXQphAkuJggDSeMt7y4VBnOTogSmTDRFa4w_8ryQU=.6c97630d-17a0-4713-a529-ee7f451d7dbc@github.com> On Fri, 26 Apr 2024 11:58:43 GMT, Johan Sj?len wrote: > Hi, > > This PR introduces the possibility of using references more often when using GrowableArray, where as previously this was only possible when using the `at()` method. This lets us avoid copying and redundant method calls and makes the API more streamlined. After the patch, we can use `at_grow` just like `at` works. The same goes for `top`, `first`, and `last`. > > > Some example code: > ```c++ > // Before this patch this worked: > GrowableArray arr(8,8,-1); // Pre-fill with 8 -1s > int& x = arr.at(7); > if (x == -1) { > x = 2; > } > assert(arr.at(7) == 2, "this holds"); > // but this was forbidden > int& x = arr.at_grow(9, -1); // Compilation error! at_grow returns E, not E& > // so we had to do > int x = arr.at_grow(9, -1); > if (x == -1) { > arr.at_put(9, 2); > } > > > Thanks. This pull request has now been integrated. Changeset: 0f4154a9 Author: Johan Sj?len URL: https://git.openjdk.org/jdk/commit/0f4154a9e9805534595feccc53a4a1abf20f99ae Stats: 46 lines in 2 files changed: 41 ins; 0 del; 5 mod 8331193: Return references when possible in GrowableArray Reviewed-by: stefank, kbarrett, epeter ------------- PR: https://git.openjdk.org/jdk/pull/18975 From duke at openjdk.org Tue Jun 4 11:03:15 2024 From: duke at openjdk.org (kuaiwei) Date: Tue, 4 Jun 2024 11:03:15 GMT Subject: RFR: 8325821: [REDO] use "dmb.ishst+dmb.ishld" for release barrier [v2] In-Reply-To: References: Message-ID: <0pKp3n0P2TTZOBuB5CkdEQE9aQysSaoEWStYuh_npzo=.e40d186b-1748-45fc-9d02-164f002b23b4@github.com> On Wed, 29 May 2024 11:11:35 GMT, Andrew Haley wrote: >> Cursory review: > > This looks ready to me. I think we need jcstress with C1 and C2, and we should be done. @shipilev , do you agree? @theRealAph @shipilev Could you help sponsor it ? Thanks. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19278#issuecomment-2147244816 From mli at openjdk.org Tue Jun 4 11:09:09 2024 From: mli at openjdk.org (Hamlin Li) Date: Tue, 4 Jun 2024 11:09:09 GMT Subject: RFR: 8332689: RISC-V: Use load instead of trampolines [v2] In-Reply-To: <3W3z-PDFsRFSclrP3FJRmnEjL4rLDRSUEFN5qkFxSUI=.feb03562-9ca3-4383-94cd-967d4234a4aa@github.com> References: <3W3z-PDFsRFSclrP3FJRmnEjL4rLDRSUEFN5qkFxSUI=.feb03562-9ca3-4383-94cd-967d4234a4aa@github.com> Message-ID: On Mon, 3 Jun 2024 12:55:17 GMT, Robbin Ehn wrote: >> Hi all, please consider! >> >> Today we do JAL to **dest** if **dest** is in reach (+/- 1 MB). >> Using a very small application or running very short time we have fast patchable calls. >> But any normal application running longer will increase the code size and code chrun/fragmentation. >> So whatever or not you get hot fast calls rely on luck. >> >> To be patchable and get code cache reach we also emit a stub trampoline which we can point the JAL to. >> This would be the common case for a patchable call. >> >> Code stream: >> JAL >> Stubs: >> AUIPC >> LD >> JALR >> >> >> >> On some CPUs L1D and L1I can't contain the same cache line, which means the tramopline stub can bounce from L1I->L1D->L1I, which is expensive. >> Even if you don't have that problem having a call to a jump is not the fastest way. >> Loading the address avoids the pitsfalls of cmodx. >> >> This patch suggest to solve the problems with trampolines, we take small penalty in the naive case of JAL to **dest**, >> and instead do by default: >> >> Code stream: >> AUIPC >> LD >> JALR >> Stubs: >> >> >> An experimental option for turning trampolines back on exists. >> >> It should be possible to enhanced this with the WIP [Zjid](https://github.com/riscv/riscv-j-extension) by changing the JALR to JAL and nop out the auipc+ld (as the current proposal of Zjid forces the I-fetcher to fetch instruction in order (meaning we will avoid a lot issues which arm has)) when in reach and vice-versa. >> >> Numbers from VF2 (I have done them a few times, they are always overall in favor of this patch): >> >> fop (msec) 2239 | 2128 = 0.950424 >> h2 (msec) 18660 | 16594 = 0.889282 >> jython (msec) 22022 | 21925 = 0.995595 >> luindex (msec) 2866 | 2842 = 0.991626 >> lusearch (msec) 4108 | 4311 = 1.04942 >> lusearch-fix (msec) 4406 | 4116 = 0.934181 >> pmd (msec) 5976 | 5897 = 0.98678 >> jython (msec) 22022 | 21925 = 0.995595 >> Avg: 0.974112 >> fop(xcomp) (msec) 2721 | 2714 = 0.997427 >> h2(xcomp) ... > > Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - Merge branch 'master' into 8332689 > - Remove accidental files > - Remove accidental files > - Baseline I see new classes are added in nativeInst, maybe the comments at the top of nativeInst.hpp needs updated accordingly. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19453#issuecomment-2147252958 From jsjolen at openjdk.org Tue Jun 4 11:10:39 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Tue, 4 Jun 2024 11:10:39 GMT Subject: RFR: 8312132: Add tracking of multiple address spaces in NMT [v124] In-Reply-To: References: Message-ID: > Hi, > > This PR introduces a new abstraction to NMT, named `MemoryFileTracker`. Today, NMT does not track any memory outside of the virtual memory address space. This means that if you allocated memory in something such as a memory-backed file and use `mmap` to map into that memory, then you'll have trouble reporting this to NMT. This is the situation that ZGC is in, and that is what this patch attempts to fix. > > ## `MemoryFileTracker` > > The `MemoryFileTracker` adds the ability of adding new virtual memory address spaces to NMT and committing memory to these, the basic API is: > > ```c++ > static MemoryFile* make_device(const char* descriptive_name); > static void free_device(MemoryFile* device); > > static void allocate_memory(MemoryFile* device, size_t offset, size_t size, > MEMFLAGS flag, const NativeCallStack& stack); > static void free_memory(MemoryFile* device, size_t offset, size_t size); > > > It is easiest to see how this is used by looking at what ZGC's `ZNMT` class does: > > ```c++ > void ZNMT::reserve(zaddress_unsafe start, size_t size) { > MemTracker::record_virtual_memory_reserve((address)start, size, CALLER_PC, mtJavaHeap); > } > void ZNMT::commit(zoffset offset, size_t size) { > MemTracker::allocate_memory_in(ZNMT::_device, static_cast(offset), size, mtJavaHeap, CALLER_PC); > } > void ZNMT::uncommit(zoffset offset, size_t size) { > MemTracker::free_memory_in(ZNMT::_device, (size_t)offset, size); > } > > void ZNMT::map(zaddress_unsafe addr, size_t size, zoffset offset) { > // NMT doesn't track mappings at the moment. > } > void ZNMT::unmap(zaddress_unsafe addr, size_t size) { > // NMT doesn't track mappings at the moment. > } > > > As you can see, any mapping between reserved regions and device-allocated memory is not recorded in NMT. This means that in detailed mode you only get reserved regions printed for the reserved memory, the device-allocated memory is reported separately. When performing summary reporting any memory allocated via these devices is added to the corresponding `MEMFLAGS` as `committed` memory. > > This patch is also acting as a base on which we deploy multiple new backend ideas to NMT. These ideas are: > > 1. Implement VMA tracking using a balanced binary tree approach. Today's `VirtualMemoryTracker`'s usage of linked lists is slow and brittle, we'd like to move away from it. Our Treap-based approach in this patch gives a performance boost such that we see 25x better performance in a benchmark. The idea and draft of this... Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: Do not copy paste random phrases into the Copyright statement ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18289/files - new: https://git.openjdk.org/jdk/pull/18289/files/ee8c7da3..482144be Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18289&range=123 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18289&range=122-123 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/18289.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18289/head:pull/18289 PR: https://git.openjdk.org/jdk/pull/18289 From rehn at openjdk.org Tue Jun 4 11:50:29 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Tue, 4 Jun 2024 11:50:29 GMT Subject: RFR: 8332689: RISC-V: Use load instead of trampolines [v3] In-Reply-To: References: Message-ID: > Hi all, please consider! > > Today we do JAL to **dest** if **dest** is in reach (+/- 1 MB). > Using a very small application or running very short time we have fast patchable calls. > But any normal application running longer will increase the code size and code chrun/fragmentation. > So whatever or not you get hot fast calls rely on luck. > > To be patchable and get code cache reach we also emit a stub trampoline which we can point the JAL to. > This would be the common case for a patchable call. > > Code stream: > JAL > Stubs: > AUIPC > LD > JALR > > > > On some CPUs L1D and L1I can't contain the same cache line, which means the tramopline stub can bounce from L1I->L1D->L1I, which is expensive. > Even if you don't have that problem having a call to a jump is not the fastest way. > Loading the address avoids the pitsfalls of cmodx. > > This patch suggest to solve the problems with trampolines, we take small penalty in the naive case of JAL to **dest**, > and instead do by default: > > Code stream: > AUIPC > LD > JALR > Stubs: > > > An experimental option for turning trampolines back on exists. > > It should be possible to enhanced this with the WIP [Zjid](https://github.com/riscv/riscv-j-extension) by changing the JALR to JAL and nop out the auipc+ld (as the current proposal of Zjid forces the I-fetcher to fetch instruction in order (meaning we will avoid a lot issues which arm has)) when in reach and vice-versa. > > Numbers from VF2 (I have done them a few times, they are always overall in favor of this patch): > > fop (msec) 2239 | 2128 = 0.950424 > h2 (msec) 18660 | 16594 = 0.889282 > jython (msec) 22022 | 21925 = 0.995595 > luindex (msec) 2866 | 2842 = 0.991626 > lusearch (msec) 4108 | 4311 = 1.04942 > lusearch-fix (msec) 4406 | 4116 = 0.934181 > pmd (msec) 5976 | 5897 = 0.98678 > jython (msec) 22022 | 21925 = 0.995595 > Avg: 0.974112 > fop(xcomp) (msec) 2721 | 2714 = 0.997427 > h2(xcomp) (msec) 37719 | 38004 = 1.00756 > jython(xcomp) ... Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: - Cleanup - Merge branch 'master' into 8332689 - Merge branch 'master' into 8332689 - Remove accidental files - Remove accidental files - Baseline ------------- Changes: https://git.openjdk.org/jdk/pull/19453/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19453&range=02 Stats: 809 lines in 15 files changed: 601 ins; 110 del; 98 mod Patch: https://git.openjdk.org/jdk/pull/19453.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19453/head:pull/19453 PR: https://git.openjdk.org/jdk/pull/19453 From rehn at openjdk.org Tue Jun 4 11:52:32 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Tue, 4 Jun 2024 11:52:32 GMT Subject: RFR: 8332689: RISC-V: Use load instead of trampolines [v2] In-Reply-To: References: <3W3z-PDFsRFSclrP3FJRmnEjL4rLDRSUEFN5qkFxSUI=.feb03562-9ca3-4383-94cd-967d4234a4aa@github.com> Message-ID: On Tue, 4 Jun 2024 10:54:49 GMT, Hamlin Li wrote: >> Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: >> >> - Merge branch 'master' into 8332689 >> - Remove accidental files >> - Remove accidental files >> - Baseline > > src/hotspot/cpu/riscv/nativeInst_riscv.cpp line 90: > >> 88: >> 89: bool NativeInstruction::is_call_at(address addr) { >> 90: return NativeCall::is_at(addr); > > parent class calling into child class? We have this every where, hotspot is not an OO design, e.g.: `bool Thread::is_JavaThread_protected(const JavaThread* target)` Parent class have an method which takes a child as parameter. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19453#discussion_r1625857951 From jsjolen at openjdk.org Tue Jun 4 12:09:34 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Tue, 4 Jun 2024 12:09:34 GMT Subject: RFR: 8312132: Add tracking of multiple address spaces in NMT [v125] In-Reply-To: References: Message-ID: > Hi, > > This PR introduces a new abstraction to NMT, named `MemoryFileTracker`. Today, NMT does not track any memory outside of the virtual memory address space. This means that if you allocated memory in something such as a memory-backed file and use `mmap` to map into that memory, then you'll have trouble reporting this to NMT. This is the situation that ZGC is in, and that is what this patch attempts to fix. > > ## `MemoryFileTracker` > > The `MemoryFileTracker` adds the ability of adding new virtual memory address spaces to NMT and committing memory to these, the basic API is: > > ```c++ > static MemoryFile* make_device(const char* descriptive_name); > static void free_device(MemoryFile* device); > > static void allocate_memory(MemoryFile* device, size_t offset, size_t size, > MEMFLAGS flag, const NativeCallStack& stack); > static void free_memory(MemoryFile* device, size_t offset, size_t size); > > > It is easiest to see how this is used by looking at what ZGC's `ZNMT` class does: > > ```c++ > void ZNMT::reserve(zaddress_unsafe start, size_t size) { > MemTracker::record_virtual_memory_reserve((address)start, size, CALLER_PC, mtJavaHeap); > } > void ZNMT::commit(zoffset offset, size_t size) { > MemTracker::allocate_memory_in(ZNMT::_device, static_cast(offset), size, mtJavaHeap, CALLER_PC); > } > void ZNMT::uncommit(zoffset offset, size_t size) { > MemTracker::free_memory_in(ZNMT::_device, (size_t)offset, size); > } > > void ZNMT::map(zaddress_unsafe addr, size_t size, zoffset offset) { > // NMT doesn't track mappings at the moment. > } > void ZNMT::unmap(zaddress_unsafe addr, size_t size) { > // NMT doesn't track mappings at the moment. > } > > > As you can see, any mapping between reserved regions and device-allocated memory is not recorded in NMT. This means that in detailed mode you only get reserved regions printed for the reserved memory, the device-allocated memory is reported separately. When performing summary reporting any memory allocated via these devices is added to the corresponding `MEMFLAGS` as `committed` memory. > > This patch is also acting as a base on which we deploy multiple new backend ideas to NMT. These ideas are: > > 1. Implement VMA tracking using a balanced binary tree approach. Today's `VirtualMemoryTracker`'s usage of linked lists is slow and brittle, we'd like to move away from it. Our Treap-based approach in this patch gives a performance boost such that we see 25x better performance in a benchmark. The idea and draft of this... Johan Sj?len has updated the pull request incrementally with two additional commits since the last revision: - It's OK that _device is nullptr - Add the free_file method for Instance ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18289/files - new: https://git.openjdk.org/jdk/pull/18289/files/482144be..f1dd3096 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18289&range=124 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18289&range=123-124 Stats: 6 lines in 2 files changed: 5 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/18289.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18289/head:pull/18289 PR: https://git.openjdk.org/jdk/pull/18289 From mli at openjdk.org Tue Jun 4 12:12:30 2024 From: mli at openjdk.org (Hamlin Li) Date: Tue, 4 Jun 2024 12:12:30 GMT Subject: RFR: 8332689: RISC-V: Use load instead of trampolines [v2] In-Reply-To: References: <3W3z-PDFsRFSclrP3FJRmnEjL4rLDRSUEFN5qkFxSUI=.feb03562-9ca3-4383-94cd-967d4234a4aa@github.com> Message-ID: On Tue, 4 Jun 2024 11:49:29 GMT, Robbin Ehn wrote: >> src/hotspot/cpu/riscv/nativeInst_riscv.cpp line 90: >> >>> 88: >>> 89: bool NativeInstruction::is_call_at(address addr) { >>> 90: return NativeCall::is_at(addr); >> >> parent class calling into child class? > > We have this every where, hotspot is not an OO design, e.g.: > `bool Thread::is_JavaThread_protected(const JavaThread* target)` > Parent class have an method which takes a child as parameter. You're right, no, it's not for OO design. But that's not the argument to break it easily. Maybe in this case (`is_call_at `) it is necessary to do so? or we can find some better way? I'm not sure, I'll have further check. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19453#discussion_r1625883535 From jsjolen at openjdk.org Tue Jun 4 12:35:25 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Tue, 4 Jun 2024 12:35:25 GMT Subject: RFR: 8312132: Add tracking of multiple address spaces in NMT [v121] In-Reply-To: References: Message-ID: On Tue, 4 Jun 2024 05:30:19 GMT, Thomas Stuefe wrote: > ZGC reserves virtual memory _before_ committing "physical" memory. Reporting the physical memory to also be "reserved" memory sounds misleading to me. Good. We have an assert in NMT that `reserved >= committed`, so this is necessary for our tests to pass at least. This has no effect on production builds, of course. I've fixed the code so that we report `MemoryFileTracker` memory as committed usage only in summary mode. I'm running tier1-tier3 in our CI system, which I expect will pass. Also, some additional manual testing. I'll integrate after everything is green. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18289#issuecomment-2147421280 From coleenp at openjdk.org Tue Jun 4 15:45:11 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 4 Jun 2024 15:45:11 GMT Subject: RFR: 8332139: SymbolTableHash::Node allocations allocates twice the required memory In-Reply-To: References: <8Q1-f5OGC6_vqM0W-k370VibVVLs7M8Dsyyele4FWT8=.53e09e58-0b6d-437c-85e4-ca89de97c123@github.com> Message-ID: On Tue, 4 Jun 2024 15:36:36 GMT, Coleen Phillimore wrote: >> The symbols are inline and allocated together with the ConcurrentHashTable (CHT) Nodes. The calculation used for the required size is `alloc_size = size + value.byte_size() + value.effective_length();` >> >> Where >> * `size == sizeof(SymbolTableHash::Node) == sizeof(void*) + sizeof(Symbol)` >> * `value.byte_size() == dynamic_sizeof(Symbol) == sizeof(Symbol) + ` >> * `value.effective_length() == dynamic_sizeof(Symbol) - sizeof(Symbol) == ` >> >> So `alloc_size` ends up being `sizeof(void*) /* node metadata */ + 2 * dynamic_sizeof(Symbol)` >> >> Because using the CHT with dynamically sized (and inlined) types requires knowing about its implementation details I chose to make the functionality for calculating the the allocation size a property of the CHT. It now queries the CHT for the node allocation size given the dynamic size required for the VALUE. >> >> The only current (implicit) restriction regarding using dynamically sized (and inlined) types in CHT is that the _value field C++ object ends where the Node object ends, so there is not padding bytes where the dynamic payload is allocated. (effectively `sizeof(VALUE) % alignof(Node) == 0` as long as there are no non-standard alignment fields in the Node metadata). I chose to test this as a runtime assert that the _value ends where the Node object ends, instead of a static assert with the alignment as it seemed to more explicitly show the intent of the check. >> >> Running testing tier1-7 > > src/hotspot/share/classfile/symbolTable.cpp line 185: > >> 183: private: >> 184: static void* allocate_node_impl(size_t size, Value const& value) { >> 185: size_t alloc_size = SymbolTableHash::get_dynamic_node_size(value.byte_size()); > > So 'size' passed in is actually sizeof(NODE), right or is it sizeof(NODE) + sizeof(VALUE) ? So also to fix it, don't we just remove value.byte_size() from this calculation? The real problem is that size passed in is known to the concurrent hash table but not to the caller, so maybe this does help make this less error prone. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19214#discussion_r1626241160 From coleenp at openjdk.org Tue Jun 4 15:45:10 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 4 Jun 2024 15:45:10 GMT Subject: RFR: 8332139: SymbolTableHash::Node allocations allocates twice the required memory In-Reply-To: <8Q1-f5OGC6_vqM0W-k370VibVVLs7M8Dsyyele4FWT8=.53e09e58-0b6d-437c-85e4-ca89de97c123@github.com> References: <8Q1-f5OGC6_vqM0W-k370VibVVLs7M8Dsyyele4FWT8=.53e09e58-0b6d-437c-85e4-ca89de97c123@github.com> Message-ID: On Mon, 13 May 2024 12:30:38 GMT, Axel Boldt-Christmas wrote: > The symbols are inline and allocated together with the ConcurrentHashTable (CHT) Nodes. The calculation used for the required size is `alloc_size = size + value.byte_size() + value.effective_length();` > > Where > * `size == sizeof(SymbolTableHash::Node) == sizeof(void*) + sizeof(Symbol)` > * `value.byte_size() == dynamic_sizeof(Symbol) == sizeof(Symbol) + ` > * `value.effective_length() == dynamic_sizeof(Symbol) - sizeof(Symbol) == ` > > So `alloc_size` ends up being `sizeof(void*) /* node metadata */ + 2 * dynamic_sizeof(Symbol)` > > Because using the CHT with dynamically sized (and inlined) types requires knowing about its implementation details I chose to make the functionality for calculating the the allocation size a property of the CHT. It now queries the CHT for the node allocation size given the dynamic size required for the VALUE. > > The only current (implicit) restriction regarding using dynamically sized (and inlined) types in CHT is that the _value field C++ object ends where the Node object ends, so there is not padding bytes where the dynamic payload is allocated. (effectively `sizeof(VALUE) % alignof(Node) == 0` as long as there are no non-standard alignment fields in the Node metadata). I chose to test this as a runtime assert that the _value ends where the Node object ends, instead of a static assert with the alignment as it seemed to more explicitly show the intent of the check. > > Running testing tier1-7 Yes, I think this is a helpful patch and fixes the bug. src/hotspot/share/classfile/symbolTable.cpp line 185: > 183: private: > 184: static void* allocate_node_impl(size_t size, Value const& value) { > 185: size_t alloc_size = SymbolTableHash::get_dynamic_node_size(value.byte_size()); So 'size' passed in is actually sizeof(NODE), right or is it sizeof(NODE) + sizeof(VALUE) ? So also to fix it, don't we just remove value.byte_size() from this calculation? src/hotspot/share/utilities/concurrentHashTable.inline.hpp line 1065: > 1063: assert(value_size >= sizeof(VALUE), "must include the VALUE"); > 1064: return sizeof(Node) - sizeof(VALUE) + value_size; > 1065: } I'm not sure if this makes it less error prone, if I'm right that the size passed in is sizeof(Node) + sizeof(VALUE). ------------- Marked as reviewed by coleenp (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19214#pullrequestreview-2096778534 PR Review Comment: https://git.openjdk.org/jdk/pull/19214#discussion_r1626235425 PR Review Comment: https://git.openjdk.org/jdk/pull/19214#discussion_r1626236348 From jsjolen at openjdk.org Tue Jun 4 15:51:40 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Tue, 4 Jun 2024 15:51:40 GMT Subject: RFR: 8332139: SymbolTableHash::Node allocations allocates twice the required memory In-Reply-To: <8Q1-f5OGC6_vqM0W-k370VibVVLs7M8Dsyyele4FWT8=.53e09e58-0b6d-437c-85e4-ca89de97c123@github.com> References: <8Q1-f5OGC6_vqM0W-k370VibVVLs7M8Dsyyele4FWT8=.53e09e58-0b6d-437c-85e4-ca89de97c123@github.com> Message-ID: On Mon, 13 May 2024 12:30:38 GMT, Axel Boldt-Christmas wrote: > The symbols are inline and allocated together with the ConcurrentHashTable (CHT) Nodes. The calculation used for the required size is `alloc_size = size + value.byte_size() + value.effective_length();` > > Where > * `size == sizeof(SymbolTableHash::Node) == sizeof(void*) + sizeof(Symbol)` > * `value.byte_size() == dynamic_sizeof(Symbol) == sizeof(Symbol) + ` > * `value.effective_length() == dynamic_sizeof(Symbol) - sizeof(Symbol) == ` > > So `alloc_size` ends up being `sizeof(void*) /* node metadata */ + 2 * dynamic_sizeof(Symbol)` > > Because using the CHT with dynamically sized (and inlined) types requires knowing about its implementation details I chose to make the functionality for calculating the the allocation size a property of the CHT. It now queries the CHT for the node allocation size given the dynamic size required for the VALUE. > > The only current (implicit) restriction regarding using dynamically sized (and inlined) types in CHT is that the _value field C++ object ends where the Node object ends, so there is not padding bytes where the dynamic payload is allocated. (effectively `sizeof(VALUE) % alignof(Node) == 0` as long as there are no non-standard alignment fields in the Node metadata). I chose to test this as a runtime assert that the _value ends where the Node object ends, instead of a static assert with the alignment as it seemed to more explicitly show the intent of the check. > > Running testing tier1-7 Hi Axel, I don't understand why the patch isn't just `size_t alloc_size = size + value.effective_length()`, as this would be `sizeof(void*) + sizeof(Symbol) + sizeof()`. Could you explain that, please? Thank you. src/hotspot/share/utilities/concurrentHashTable.inline.hpp line 1065: > 1063: assert(value_size >= sizeof(VALUE), "must include the VALUE"); > 1064: return sizeof(Node) - sizeof(VALUE) + value_size; > 1065: } Style, suggestion: ```c++ template inline size_t ConcurrentHashTable::get_dynamic_node_size(size_t value_size) { That's 88 characters, well within what we accept. ------------- PR Review: https://git.openjdk.org/jdk/pull/19214#pullrequestreview-2096695407 PR Review Comment: https://git.openjdk.org/jdk/pull/19214#discussion_r1626185397 From iklam at openjdk.org Tue Jun 4 15:54:26 2024 From: iklam at openjdk.org (Ioi Lam) Date: Tue, 4 Jun 2024 15:54:26 GMT Subject: RFR: 8330198: Add some class loading related perf counters to measure VM startup [v3] In-Reply-To: References: <7yfsvM0ff6gBYLefpro2qTcEMBmCOHd3YICcygItlZs=.d900a439-4932-46e6-b287-d1bf2789f195@github.com> <24n-bMIYvAF10yob4_Z5t1DPG_nrWypoQYE25zQ499U=.1a77aa5e-1005-4d32-9ae7-ec745838e449@github.com> <0gZ9cLiP3bLH1LNL71dwnuaUY5iN-Ewea-Qyw7eGe44=.f8fa485f-5947-47fc-a8eb-cafd1c56165d@github.com> Message-ID: <9k9sDnon7SehCf1oFouQwWKo2YBTlSwZrBTN3e3ezzw=.070cccf2-cfc6-4a51-9bd6-02e17eacc50b@github.com> On Tue, 4 Jun 2024 05:13:58 GMT, David Holmes wrote: >>> The -Xlog:init (perhaps with a better name/tag!) >> >> I'm all for a better naming scheme. Any suggestions? > >> -Xlog:init means "I want to see logs related to initialization", so it should enable all the counters for printing the related logs. > > I don't agree. Initialization logging could encompass many different things, some of which are individually controllable via different flags. Simply turning on init logging should not turn on all such flags. If you want that level of coupling then perhaps use init_counters (or something like that) to make it clear this is not a general log tag intended for any initialization code to use, but something you have chosen to tie to specific functionality. > >> We may add several groups of counters in the future. We don't want to force the user to enumerate all these counters > > It is not clear to me how you envisage that working. You want individual group switches plus a global one? OK, I agree that not `-Xlog:init` logs are related to timing. We actually have an existing tag that's used for conditional logging. Perhaps we should use that instead? ./share/logging/logTag.hpp: LOG_TAG(startuptime) \ ./share/memory/universe.cpp: TraceTime timer("Genesis", TRACETIME_LOG(Info, startuptime)); ./share/prims/methodHandles.cpp: TraceTime timer("MethodHandles adapters generation", TRACETIME_LOG(Info, startuptime)); ./share/runtime/stubRoutines.cpp: TraceTime timer(timer_msg, TRACETIME_LOG(Info, startuptime)); ./share/runtime/threads.cpp: TraceTime timer("Initialize module system", TRACETIME_LOG(Info, startuptime)); ./share/runtime/threads.cpp: TraceTime timer("Initialize java.lang classes", TRACETIME_LOG(Info, startuptime)); ./share/runtime/threads.cpp: TraceTime timer("Initialize java.lang.invoke classes", TRACETIME_LOG(Info, startuptime)); ./share/runtime/threads.cpp: TraceTime timer("Create VM", TRACETIME_LOG(Info, startuptime)); ./share/runtime/threads.cpp: { TraceTime timer("Start VMThread", TRACETIME_LOG(Info, startuptime)); ./share/runtime/timerTrace.hpp:// TraceTime t("some timer", TIMERTRACE_LOG(Info, startuptime, tagX...)); ./share/utilities/ostream.cpp: // lazily create log file (at startup, LogVMOutput is false even $ java -Xlog:startuptime --version [0.010s][info][startuptime] StubRoutines generation initial stubs, 0.0006132 secs [0.025s][info][startuptime] Genesis, 0.0145142 secs [0.025s][info][startuptime] StubRoutines generation continuation stubs, 0.0000198 secs [0.028s][info][startuptime] Interpreter generation, 0.0005919 secs [0.028s][info][startuptime] StubRoutines generation final stubs, 0.0000717 secs [0.028s][info][startuptime] MethodHandles adapters generation, 0.0000101 secs [0.029s][info][startuptime] Start VMThread, 0.0000846 secs [0.032s][info][startuptime] Initialize java.lang classes, 0.0036507 secs [0.033s][info][startuptime] Initialize java.lang.invoke classes, 0.0002106 secs [0.034s][info][startuptime] StubRoutines generation compiler stubs, 0.0015061 secs [0.035s][info][startuptime] Initialize module system, 0.0017471 secs [0.035s][info][startuptime] Create VM, 0.0305224 secs java 23-internal 2024-09-17 Java(TM) SE Runtime Environment (build 23-internal-adhoc.iklam.zoo) Java HotSpot(TM) 64-Bit Server VM (build 23-internal-adhoc.iklam.zoo, mixed mode, sharing) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18790#discussion_r1626256873 From mdoerr at openjdk.org Tue Jun 4 16:11:10 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 4 Jun 2024 16:11:10 GMT Subject: RFR: 8331733: [PPC64] saving and restoring CR is not needed at most places In-Reply-To: References: Message-ID: <_oOJA6AJ729zZn5EZpXF-W5h9CfU_IoX5NtY119trrg=.737da5ff-5a8f-42a5-b823-02fb3e000bf4@github.com> On Fri, 31 May 2024 08:56:36 GMT, Varada M wrote: > PPC64 uses save/restore CR less often. Only LR is critical, CR is mainly needed for native-to-Java calls. > Fastdebug: build and tier1 testing successful. [unrelated failures] Please make xBarrierSetAssembler_ppc.cpp consistent with the normal BarrierSetAssembler. "// NOP. Conditions registers are covered by save_LR_CR" should get replaced, too. Saving and restoring CR in `RegisterSaver::push_frame_reg_args_and_save_live_registers` and `RegisterSaver::restore_live_registers_and_pop_frame` are not needed, either. Otherwise, this PR looks good to me. I'll run tests. (Will retest when you make updates.) ------------- PR Comment: https://git.openjdk.org/jdk/pull/19494#issuecomment-2147765836 From rehn at openjdk.org Tue Jun 4 16:34:34 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Tue, 4 Jun 2024 16:34:34 GMT Subject: RFR: 8332689: RISC-V: Use load instead of trampolines [v4] In-Reply-To: References: Message-ID: > Hi all, please consider! > > Today we do JAL to **dest** if **dest** is in reach (+/- 1 MB). > Using a very small application or running very short time we have fast patchable calls. > But any normal application running longer will increase the code size and code chrun/fragmentation. > So whatever or not you get hot fast calls rely on luck. > > To be patchable and get code cache reach we also emit a stub trampoline which we can point the JAL to. > This would be the common case for a patchable call. > > Code stream: > JAL > Stubs: > AUIPC > LD > JALR > > > > On some CPUs L1D and L1I can't contain the same cache line, which means the tramopline stub can bounce from L1I->L1D->L1I, which is expensive. > Even if you don't have that problem having a call to a jump is not the fastest way. > Loading the address avoids the pitsfalls of cmodx. > > This patch suggest to solve the problems with trampolines, we take small penalty in the naive case of JAL to **dest**, > and instead do by default: > > Code stream: > AUIPC > LD > JALR > Stubs: > > > An experimental option for turning trampolines back on exists. > > It should be possible to enhanced this with the WIP [Zjid](https://github.com/riscv/riscv-j-extension) by changing the JALR to JAL and nop out the auipc+ld (as the current proposal of Zjid forces the I-fetcher to fetch instruction in order (meaning we will avoid a lot issues which arm has)) when in reach and vice-versa. > > Numbers from VF2 (I have done them a few times, they are always overall in favor of this patch): > > fop (msec) 2239 | 2128 = 0.950424 > h2 (msec) 18660 | 16594 = 0.889282 > jython (msec) 22022 | 21925 = 0.995595 > luindex (msec) 2866 | 2842 = 0.991626 > lusearch (msec) 4108 | 4311 = 1.04942 > lusearch-fix (msec) 4406 | 4116 = 0.934181 > pmd (msec) 5976 | 5897 = 0.98678 > jython (msec) 22022 | 21925 = 0.995595 > Avg: 0.974112 > fop(xcomp) (msec) 2721 | 2714 = 0.997427 > h2(xcomp) (msec) 37719 | 38004 = 1.00756 > jython(xcomp) ... Robbin Ehn has updated the pull request incrementally with one additional commit since the last revision: Move shart/far code to cpp ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19453/files - new: https://git.openjdk.org/jdk/pull/19453/files/d42d9e58..c4c02f2e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19453&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19453&range=02-03 Stats: 323 lines in 4 files changed: 156 ins; 156 del; 11 mod Patch: https://git.openjdk.org/jdk/pull/19453.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19453/head:pull/19453 PR: https://git.openjdk.org/jdk/pull/19453 From rehn at openjdk.org Tue Jun 4 16:34:35 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Tue, 4 Jun 2024 16:34:35 GMT Subject: RFR: 8332689: RISC-V: Use load instead of trampolines [v2] In-Reply-To: References: <3W3z-PDFsRFSclrP3FJRmnEjL4rLDRSUEFN5qkFxSUI=.feb03562-9ca3-4383-94cd-967d4234a4aa@github.com> Message-ID: On Tue, 4 Jun 2024 11:05:19 GMT, Hamlin Li wrote: > I see new classes are added in nativeInst, maybe the comments at the top of nativeInst.hpp needs updated accordingly. They are all private, moved them to cpp. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19453#issuecomment-2147947271 From rehn at openjdk.org Tue Jun 4 16:34:38 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Tue, 4 Jun 2024 16:34:38 GMT Subject: RFR: 8332689: RISC-V: Use load instead of trampolines [v3] In-Reply-To: References: Message-ID: On Tue, 4 Jun 2024 11:50:29 GMT, Robbin Ehn wrote: >> Hi all, please consider! >> >> Today we do JAL to **dest** if **dest** is in reach (+/- 1 MB). >> Using a very small application or running very short time we have fast patchable calls. >> But any normal application running longer will increase the code size and code chrun/fragmentation. >> So whatever or not you get hot fast calls rely on luck. >> >> To be patchable and get code cache reach we also emit a stub trampoline which we can point the JAL to. >> This would be the common case for a patchable call. >> >> Code stream: >> JAL >> Stubs: >> AUIPC >> LD >> JALR >> >> >> >> On some CPUs L1D and L1I can't contain the same cache line, which means the tramopline stub can bounce from L1I->L1D->L1I, which is expensive. >> Even if you don't have that problem having a call to a jump is not the fastest way. >> Loading the address avoids the pitsfalls of cmodx. >> >> This patch suggest to solve the problems with trampolines, we take small penalty in the naive case of JAL to **dest**, >> and instead do by default: >> >> Code stream: >> AUIPC >> LD >> JALR >> Stubs: >> >> >> An experimental option for turning trampolines back on exists. >> >> It should be possible to enhanced this with the WIP [Zjid](https://github.com/riscv/riscv-j-extension) by changing the JALR to JAL and nop out the auipc+ld (as the current proposal of Zjid forces the I-fetcher to fetch instruction in order (meaning we will avoid a lot issues which arm has)) when in reach and vice-versa. >> >> Numbers from VF2 (I have done them a few times, they are always overall in favor of this patch): >> >> fop (msec) 2239 | 2128 = 0.950424 >> h2 (msec) 18660 | 16594 = 0.889282 >> jython (msec) 22022 | 21925 = 0.995595 >> luindex (msec) 2866 | 2842 = 0.991626 >> lusearch (msec) 4108 | 4311 = 1.04942 >> lusearch-fix (msec) 4406 | 4116 = 0.934181 >> pmd (msec) 5976 | 5897 = 0.98678 >> jython (msec) 22022 | 21925 = 0.995595 >> Avg: 0.974112 >> fop(xcomp) (msec) 2721 | 2714 = 0.997427 >> h2(xcomp) ... > > Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: > > - Cleanup > - Merge branch 'master' into 8332689 > - Merge branch 'master' into 8332689 > - Remove accidental files > - Remove accidental files > - Baseline It just passed t1(+UseTramp) and t1-3, I'll restart due to the large merge. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19453#issuecomment-2147951836 From never at openjdk.org Tue Jun 4 16:36:49 2024 From: never at openjdk.org (Tom Rodriguez) Date: Tue, 4 Jun 2024 16:36:49 GMT Subject: RFR: 8333300: [JVMCI] add support for generational ZGC [v3] In-Reply-To: References: Message-ID: > This exposes the required values for JVMCI to support generational ZGC. It includes a few things worth mentioning. JVMCI still exports XBarrierSetRuntime as fields in CompilerToVM::Data under the original name of ZBarrierSetRuntime. I have exported the XBarrierSetRuntime and ZBarrierSetRuntime functions as addresses under their actual name. This permits backward compatibility until all the required parts are in place. We can eventually delete the CompilerToVM::Data names. > > I added ZBarrierSetRuntime::load_barrier_on_oop_array paralleling XBarrierSetRuntime::load_barrier_on_oop_array as we use that for a vector barrier. I could create the function as part of JVMCIRuntime if there are any concerns about including that in the ZGC core. Tom Rodriguez has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains ten additional commits since the last revision: - Merge branch 'tkr-genz' of github.com:tkrodriguez/jdk into tkr-genz - Fix riscv compilation - 8333300: [JVMCI] add support for generational ZGC - Merge branch 'master' into tkr-genz - Merge branch 'master' into tkr-genz - Use NativeAccess to read from handles - Enable support for UseEpsilonGC - Fix riscv compilation - 8333300: [JVMCI] add support for generational ZGC ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19490/files - new: https://git.openjdk.org/jdk/pull/19490/files/bb91b42c..8e0cf360 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19490&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19490&range=01-02 Stats: 6305 lines in 231 files changed: 3674 ins; 1867 del; 764 mod Patch: https://git.openjdk.org/jdk/pull/19490.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19490/head:pull/19490 PR: https://git.openjdk.org/jdk/pull/19490 From never at openjdk.org Tue Jun 4 16:36:49 2024 From: never at openjdk.org (Tom Rodriguez) Date: Tue, 4 Jun 2024 16:36:49 GMT Subject: RFR: 8333300: [JVMCI] add support for generational ZGC [v2] In-Reply-To: References: Message-ID: On Fri, 31 May 2024 21:03:30 GMT, Tom Rodriguez wrote: >> This exposes the required values for JVMCI to support generational ZGC. It includes a few things worth mentioning. JVMCI still exports XBarrierSetRuntime as fields in CompilerToVM::Data under the original name of ZBarrierSetRuntime. I have exported the XBarrierSetRuntime and ZBarrierSetRuntime functions as addresses under their actual name. This permits backward compatibility until all the required parts are in place. We can eventually delete the CompilerToVM::Data names. >> >> I added ZBarrierSetRuntime::load_barrier_on_oop_array paralleling XBarrierSetRuntime::load_barrier_on_oop_array as we use that for a vector barrier. I could create the function as part of JVMCIRuntime if there are any concerns about including that in the ZGC core. > > Tom Rodriguez has updated the pull request incrementally with one additional commit since the last revision: > > Fix riscv compilation There was one fix I needed to include to use NativeAccess to read from the JVMCI handles when repacking them. It's only checking for null and not actually accessing the contents so it was benignly broken with singlegen ZGC. Generational ZGC includes new verify oop logic which was catching this. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19490#issuecomment-2147954438 From mbaesken at openjdk.org Tue Jun 4 16:53:43 2024 From: mbaesken at openjdk.org (Matthias Baesken) Date: Tue, 4 Jun 2024 16:53:43 GMT Subject: RFR: 8331854: ubsan: copy.hpp:218:10: runtime error: addition of unsigned offset to 0x7fc2b4024518 overflowed to 0x7fc2b4024510 Message-ID: When building with ubsan, we see a number of overflows at this code location : /jdk/src/hotspot/share/utilities/copy.hpp:218:10: runtime error: addition of unsigned offset to 0x7fc2b4024518 overflowed to 0x7fc2b4024510 #0 0x10b70896d in Copy::conjoint_words_to_higher(HeapWordImpl* const*, HeapWordImpl**, unsigned long) copy.hpp:218 #1 0x10c4f78f1 in Node_Array::insert(unsigned int, Node*) node.cpp:2783 #2 0x10b8a1386 in Block::insert_node(Node*, unsigned int) block.hpp:134 #3 0x10c556630 in PhaseOutput::fill_buffer(C2_MacroAssembler*, unsigned int*) output.cpp:1792 #4 0x10c552f6b in PhaseOutput::Output() output.cpp:367 #5 0x10b9ba859 in Compile::Code_Gen() compile.cpp:3035 #6 0x10b9b7cb1 in Compile::Compile(ciEnv*, ciMethod*, int, Options, DirectiveSet*) compile.cpp:896 #7 0x10b859912 in C2Compiler::compile_method(ciEnv*, ciMethod*, int, bool, DirectiveSet*) c2compiler.cpp:142 #8 0x10b9dd4f1 in CompileBroker::invoke_compiler_on_method(CompileTask*) compileBroker.cpp:2305 #9 0x10b9dc345 in CompileBroker::compiler_thread_loop() compileBroker.cpp:1963 #10 0x10bfd5ebf in JavaThread::thread_main_inner() javaThread.cpp:760 #11 0x10bfd5b62 in JavaThread::run() javaThread.cpp:745 #12 0x10c9310d6 in Thread::call_run() thread.cpp:221 #13 0x10c53ece4 in thread_native_entry(Thread*) os_bsd.cpp:598 ------------- Commit messages: - JDK-8331854 Changes: https://git.openjdk.org/jdk/pull/19541/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19541&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8331854 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/19541.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19541/head:pull/19541 PR: https://git.openjdk.org/jdk/pull/19541 From dnsimon at openjdk.org Tue Jun 4 17:05:04 2024 From: dnsimon at openjdk.org (Doug Simon) Date: Tue, 4 Jun 2024 17:05:04 GMT Subject: RFR: 8333300: [JVMCI] add support for generational ZGC [v3] In-Reply-To: References: Message-ID: On Tue, 4 Jun 2024 16:36:49 GMT, Tom Rodriguez wrote: >> This exposes the required values for JVMCI to support generational ZGC. It includes a few things worth mentioning. JVMCI still exports XBarrierSetRuntime as fields in CompilerToVM::Data under the original name of ZBarrierSetRuntime. I have exported the XBarrierSetRuntime and ZBarrierSetRuntime functions as addresses under their actual name. This permits backward compatibility until all the required parts are in place. We can eventually delete the CompilerToVM::Data names. >> >> I added ZBarrierSetRuntime::load_barrier_on_oop_array paralleling XBarrierSetRuntime::load_barrier_on_oop_array as we use that for a vector barrier. I could create the function as part of JVMCIRuntime if there are any concerns about including that in the ZGC core. > > Tom Rodriguez has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains ten additional commits since the last revision: > > - Merge branch 'tkr-genz' of github.com:tkrodriguez/jdk into tkr-genz > - Fix riscv compilation > - 8333300: [JVMCI] add support for generational ZGC > - Merge branch 'master' into tkr-genz > - Merge branch 'master' into tkr-genz > - Use NativeAccess to read from handles > - Enable support for UseEpsilonGC > - Fix riscv compilation > - 8333300: [JVMCI] add support for generational ZGC Marked as reviewed by dnsimon (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/19490#pullrequestreview-2096958170 From shade at openjdk.org Tue Jun 4 17:07:00 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 4 Jun 2024 17:07:00 GMT Subject: RFR: 8325821: [REDO] use "dmb.ishst+dmb.ishld" for release barrier [v7] In-Reply-To: <6ibfHj015tewMFWCGCYCSBr1-DIlkbebYAGXuoC_h58=.2a4336e2-fcdf-4c68-a61b-e6f65cdb5eb8@github.com> References: <6ibfHj015tewMFWCGCYCSBr1-DIlkbebYAGXuoC_h58=.2a4336e2-fcdf-4c68-a61b-e6f65cdb5eb8@github.com> Message-ID: On Mon, 3 Jun 2024 06:45:22 GMT, kuaiwei wrote: >> he origin patch for https://bugs.openjdk.org/browse/JDK-8324186 has 2 issues: >> 1 It show regression in some platform, like Apple silicon in mac os >> 2 Can not handle instruction sequence like "dmb.ishld; dmb.ishst; dmb.ishld; dmb.ishld" >> >> It can be fixed by: >> 1 Enable AlwaysMergeDMB by default, only disable it in architecture we can see performance improvement (N1 or N2) >> 2 Check the special pattern and merge the subsequent dmb. >> >> It also fix a bug when code buffer is expanding, st/ld/dmb can not be merged. I added unit tests for these. >> >> This patch still has a unhandled case. Insts like "dmb.ishld; dmb.ishst; dmb.ish", it will merge the last 2 instructions and can not merge all three. Because when emitting dmb.ish, if merge all previous dmbs, the code buffer will shrink the size. I think it may break some resumption and think it's not a common pattern. >> >> In previous PR https://github.com/openjdk/jdk/pull/18467 , I tried an implementation to use state machine for merging. But it looks risky to pending instruction during emitting. > > kuaiwei has updated the pull request incrementally with one additional commit since the last revision: > > Use constexpr for test encoding GH says the branch has conflicts that must be resolved. Merging from master is advisable anyway to avoid merge surprises. Let's do that before integrating. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19278#issuecomment-2148011161 From kvn at openjdk.org Tue Jun 4 17:11:56 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 4 Jun 2024 17:11:56 GMT Subject: RFR: 8331854: ubsan: copy.hpp:218:10: runtime error: addition of unsigned offset to 0x7fc2b4024518 overflowed to 0x7fc2b4024510 In-Reply-To: References: Message-ID: <8dewise-Q2txbAkaSdsy1mZdYCGtWf9n-XmY1c7Y8HY=.1c1bc7a5-1cdc-4860-a8c7-31db0f343390@github.com> On Tue, 4 Jun 2024 14:11:42 GMT, Matthias Baesken wrote: > When building with ubsan, we see a number of overflows at this code location : > > /jdk/src/hotspot/share/utilities/copy.hpp:218:10: runtime error: addition of unsigned offset to 0x7fc2b4024518 overflowed to 0x7fc2b4024510 > #0 0x10b70896d in Copy::conjoint_words_to_higher(HeapWordImpl* const*, HeapWordImpl**, unsigned long) copy.hpp:218 > #1 0x10c4f78f1 in Node_Array::insert(unsigned int, Node*) node.cpp:2783 > #2 0x10b8a1386 in Block::insert_node(Node*, unsigned int) block.hpp:134 > #3 0x10c556630 in PhaseOutput::fill_buffer(C2_MacroAssembler*, unsigned int*) output.cpp:1792 > #4 0x10c552f6b in PhaseOutput::Output() output.cpp:367 > #5 0x10b9ba859 in Compile::Code_Gen() compile.cpp:3035 > #6 0x10b9b7cb1 in Compile::Compile(ciEnv*, ciMethod*, int, Options, DirectiveSet*) compile.cpp:896 > #7 0x10b859912 in C2Compiler::compile_method(ciEnv*, ciMethod*, int, bool, DirectiveSet*) c2compiler.cpp:142 > #8 0x10b9dd4f1 in CompileBroker::invoke_compiler_on_method(CompileTask*) compileBroker.cpp:2305 > #9 0x10b9dc345 in CompileBroker::compiler_thread_loop() compileBroker.cpp:1963 > #10 0x10bfd5ebf in JavaThread::thread_main_inner() javaThread.cpp:760 > #11 0x10bfd5b62 in JavaThread::run() javaThread.cpp:745 > #12 0x10c9310d6 in Thread::call_run() thread.cpp:221 > #13 0x10c53ece4 in thread_native_entry(Thread*) os_bsd.cpp:598 Good. I read through comments in bug report and this fix makes sense. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19541#pullrequestreview-2096969564 From cjplummer at openjdk.org Tue Jun 4 19:03:00 2024 From: cjplummer at openjdk.org (Chris Plummer) Date: Tue, 4 Jun 2024 19:03:00 GMT Subject: RFR: 8326716: JVMTI spec: clarify what nullptr means for C/C++ developers [v5] In-Reply-To: References: Message-ID: On Tue, 4 Jun 2024 07:01:54 GMT, Alan Bateman wrote: >> Thanks, David. I also feel this clarification is still useful. > > I think this is the right place but it is only for return values. There are a few functions where a parameter value can be a null pointer, e.g. in GetThreadState, SuspendThread, GetOwnedMonitorInfo the thread parameter can be a null pointer to mean the current thread. I don't think the introduction section has anywhere right now to say what a null pointer means. Yes, my point was that this section is only for return values. The section is titled "Function Return Values". Maybe we should add another short section just before this one to describe what is meant by "null pointer". ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19257#discussion_r1626480428 From amitkumar at openjdk.org Tue Jun 4 19:12:24 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Tue, 4 Jun 2024 19:12:24 GMT Subject: RFR: 8333382: [s390x] Move population_count implementation out of ad file Message-ID: We need to move popcnt instruction implementation out of s390.ad file as it is required to be required some methods present in [JDK-8331126.](https://bugs.openjdk.org/browse/JDK-8331126) When the miscellaneous-instruction-extensions facility 3 is not installed or bit 0 of the M3 field is zero, a count of the number of one bits in each of the eight bytes of general register R2 is placed into the corresponding byte of general register R1. Each byte of general register R1 is an 8-bit binary integer in the range of 0-8. When the miscellaneous-instruction-extensions facility 3 is installed and bit 0 of the M3 field is one, a count of the total number of one bits in the 64-bit general register R2 is placed into general register R1. The result is a 64-bit unsigned integer in the range 0 to 64. Performed tier1 test on fastdebug build and didn't see any regression. ------------- Commit messages: - cleanup - split method for long & int - restrict previous match rules - facility 3 support - only relocation - pop_count Changes: https://git.openjdk.org/jdk/pull/19509/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19509&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8333382 Stats: 126 lines in 5 files changed: 99 ins; 16 del; 11 mod Patch: https://git.openjdk.org/jdk/pull/19509.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19509/head:pull/19509 PR: https://git.openjdk.org/jdk/pull/19509 From amitkumar at openjdk.org Tue Jun 4 19:12:24 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Tue, 4 Jun 2024 19:12:24 GMT Subject: RFR: 8333382: [s390x] Move population_count implementation out of ad file In-Reply-To: References: Message-ID: On Sat, 1 Jun 2024 13:15:45 GMT, Amit Kumar wrote: > We need to move popcnt instruction implementation out of s390.ad file as it is required to be required some methods present in [JDK-8331126.](https://bugs.openjdk.org/browse/JDK-8331126) > > > When the miscellaneous-instruction-extensions facility 3 is not installed or bit 0 of the M3 > field is zero, a count of the number of one bits in each of the eight bytes of general register > R2 is placed into the corresponding byte of general register R1. Each byte of general register > R1 is an 8-bit binary integer in the range of 0-8. > > > > When the miscellaneous-instruction-extensions facility 3 is installed and bit 0 of the M3 field > is one, a count of the total number of one bits in the 64-bit general register R2 is placed into > general register R1. The result is a 64-bit unsigned integer in the range 0 to 64. > > > Performed tier1 test on fastdebug build and didn't see any regression. src/hotspot/cpu/s390/macroAssembler_s390.cpp line 5814: > 5812: } else { > 5813: > 5814: z_illtrap(48); // fixme: remove I'll remove this before making this PR ready for review. As of now I'm using z16 machine, So this is not producing any error. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19509#discussion_r1625759934 From aph at openjdk.org Tue Jun 4 19:12:24 2024 From: aph at openjdk.org (Andrew Haley) Date: Tue, 4 Jun 2024 19:12:24 GMT Subject: RFR: 8333382: [s390x] Move population_count implementation out of ad file In-Reply-To: References: Message-ID: On Tue, 4 Jun 2024 10:28:47 GMT, Amit Kumar wrote: >> We need to move popcnt instruction implementation out of s390.ad file as it is required to be required some methods present in [JDK-8331126.](https://bugs.openjdk.org/browse/JDK-8331126) >> >> >> When the miscellaneous-instruction-extensions facility 3 is not installed or bit 0 of the M3 >> field is zero, a count of the number of one bits in each of the eight bytes of general register >> R2 is placed into the corresponding byte of general register R1. Each byte of general register >> R1 is an 8-bit binary integer in the range of 0-8. >> >> >> >> When the miscellaneous-instruction-extensions facility 3 is installed and bit 0 of the M3 field >> is one, a count of the total number of one bits in the 64-bit general register R2 is placed into >> general register R1. The result is a 64-bit unsigned integer in the range 0 to 64. >> >> >> Performed tier1 test on fastdebug build and didn't see any regression. > > src/hotspot/cpu/s390/macroAssembler_s390.cpp line 5814: > >> 5812: } else { >> 5813: >> 5814: z_illtrap(48); // fixme: remove > > I'll remove this before making this PR ready for review. As of now I'm using z16 machine, So this is not producing any error. I wouldn't use a bool here. The implementations of long and int are different, might as well have two routines. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19509#discussion_r1625838376 From jsjolen at openjdk.org Tue Jun 4 19:26:28 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Tue, 4 Jun 2024 19:26:28 GMT Subject: RFR: 8312132: Add tracking of multiple address spaces in NMT [v126] In-Reply-To: References: Message-ID: > Hi, > > This PR introduces a new abstraction to NMT, named `MemoryFileTracker`. Today, NMT does not track any memory outside of the virtual memory address space. This means that if you allocated memory in something such as a memory-backed file and use `mmap` to map into that memory, then you'll have trouble reporting this to NMT. This is the situation that ZGC is in, and that is what this patch attempts to fix. > > ## `MemoryFileTracker` > > The `MemoryFileTracker` adds the ability of adding new virtual memory address spaces to NMT and committing memory to these, the basic API is: > > ```c++ > static MemoryFile* make_device(const char* descriptive_name); > static void free_device(MemoryFile* device); > > static void allocate_memory(MemoryFile* device, size_t offset, size_t size, > MEMFLAGS flag, const NativeCallStack& stack); > static void free_memory(MemoryFile* device, size_t offset, size_t size); > > > It is easiest to see how this is used by looking at what ZGC's `ZNMT` class does: > > ```c++ > void ZNMT::reserve(zaddress_unsafe start, size_t size) { > MemTracker::record_virtual_memory_reserve((address)start, size, CALLER_PC, mtJavaHeap); > } > void ZNMT::commit(zoffset offset, size_t size) { > MemTracker::allocate_memory_in(ZNMT::_device, static_cast(offset), size, mtJavaHeap, CALLER_PC); > } > void ZNMT::uncommit(zoffset offset, size_t size) { > MemTracker::free_memory_in(ZNMT::_device, (size_t)offset, size); > } > > void ZNMT::map(zaddress_unsafe addr, size_t size, zoffset offset) { > // NMT doesn't track mappings at the moment. > } > void ZNMT::unmap(zaddress_unsafe addr, size_t size) { > // NMT doesn't track mappings at the moment. > } > > > As you can see, any mapping between reserved regions and device-allocated memory is not recorded in NMT. This means that in detailed mode you only get reserved regions printed for the reserved memory, the device-allocated memory is reported separately. When performing summary reporting any memory allocated via these devices is added to the corresponding `MEMFLAGS` as `committed` memory. > > This patch is also acting as a base on which we deploy multiple new backend ideas to NMT. These ideas are: > > 1. Implement VMA tracking using a balanced binary tree approach. Today's `VirtualMemoryTracker`'s usage of linked lists is slow and brittle, we'd like to move away from it. Our Treap-based approach in this patch gives a performance boost such that we see 25x better performance in a benchmark. The idea and draft of this... Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: Add 'from' in report ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18289/files - new: https://git.openjdk.org/jdk/pull/18289/files/f1dd3096..8fd26a38 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18289&range=125 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18289&range=124-125 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/18289.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18289/head:pull/18289 PR: https://git.openjdk.org/jdk/pull/18289 From jsjolen at openjdk.org Tue Jun 4 19:26:28 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Tue, 4 Jun 2024 19:26:28 GMT Subject: RFR: 8312132: Add tracking of multiple address spaces in NMT [v125] In-Reply-To: References: Message-ID: On Tue, 4 Jun 2024 12:09:34 GMT, Johan Sj?len wrote: >> Hi, >> >> This PR introduces a new abstraction to NMT, named `MemoryFileTracker`. Today, NMT does not track any memory outside of the virtual memory address space. This means that if you allocated memory in something such as a memory-backed file and use `mmap` to map into that memory, then you'll have trouble reporting this to NMT. This is the situation that ZGC is in, and that is what this patch attempts to fix. >> >> ## `MemoryFileTracker` >> >> The `MemoryFileTracker` adds the ability of adding new virtual memory address spaces to NMT and committing memory to these, the basic API is: >> >> ```c++ >> static MemoryFile* make_device(const char* descriptive_name); >> static void free_device(MemoryFile* device); >> >> static void allocate_memory(MemoryFile* device, size_t offset, size_t size, >> MEMFLAGS flag, const NativeCallStack& stack); >> static void free_memory(MemoryFile* device, size_t offset, size_t size); >> >> >> It is easiest to see how this is used by looking at what ZGC's `ZNMT` class does: >> >> ```c++ >> void ZNMT::reserve(zaddress_unsafe start, size_t size) { >> MemTracker::record_virtual_memory_reserve((address)start, size, CALLER_PC, mtJavaHeap); >> } >> void ZNMT::commit(zoffset offset, size_t size) { >> MemTracker::allocate_memory_in(ZNMT::_device, static_cast(offset), size, mtJavaHeap, CALLER_PC); >> } >> void ZNMT::uncommit(zoffset offset, size_t size) { >> MemTracker::free_memory_in(ZNMT::_device, (size_t)offset, size); >> } >> >> void ZNMT::map(zaddress_unsafe addr, size_t size, zoffset offset) { >> // NMT doesn't track mappings at the moment. >> } >> void ZNMT::unmap(zaddress_unsafe addr, size_t size) { >> // NMT doesn't track mappings at the moment. >> } >> >> >> As you can see, any mapping between reserved regions and device-allocated memory is not recorded in NMT. This means that in detailed mode you only get reserved regions printed for the reserved memory, the device-allocated memory is reported separately. When performing summary reporting any memory allocated via these devices is added to the corresponding `MEMFLAGS` as `committed` memory. >> >> This patch is also acting as a base on which we deploy multiple new backend ideas to NMT. These ideas are: >> >> 1. Implement VMA tracking using a balanced binary tree approach. Today's `VirtualMemoryTracker`'s usage of linked lists is slow and brittle, we'd like to move away from it. Our Treap-based approach in this patch gives a performance bo... > > Johan Sj?len has updated the pull request incrementally with two additional commits since the last revision: > > - It's OK that _device is nullptr > - Add the free_file method for Instance Passes tier1-tier3. Added the word 'from' that was missing in the report. Integrating tomorrow to be able to keep my eye on the testing progressing throughout the day. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18289#issuecomment-2148241302 From kbarrett at openjdk.org Tue Jun 4 19:38:58 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Tue, 4 Jun 2024 19:38:58 GMT Subject: RFR: 8326085: Remove unnecessary UpcallContext constructor In-Reply-To: References: Message-ID: On Fri, 26 Apr 2024 17:42:48 GMT, Sonia Zaldana Calles wrote: > Hi all, > > This PR removes the explicit constructor to UpcallContext (hotspot/share/prims/upcallLinker.cpp) that was added as workaround for [8286891](https://bugs.openjdk.org/browse/JDK-8286891). > > The minimum required version of XLC has since been bumped in [8325880](https://bugs.openjdk.org/browse/JDK-8325880), so we can remove this. > > Thanks, > Sonia Looks good, assuming it passes testing on the ppc-aix port. ------------- Marked as reviewed by kbarrett (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18982#pullrequestreview-2097242688 From ayang at openjdk.org Tue Jun 4 21:17:56 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Tue, 4 Jun 2024 21:17:56 GMT Subject: RFR: 8331854: ubsan: copy.hpp:218:10: runtime error: addition of unsigned offset to 0x7fc2b4024518 overflowed to 0x7fc2b4024510 In-Reply-To: References: Message-ID: On Tue, 4 Jun 2024 14:11:42 GMT, Matthias Baesken wrote: > When building with ubsan, we see a number of overflows at this code location : > > /jdk/src/hotspot/share/utilities/copy.hpp:218:10: runtime error: addition of unsigned offset to 0x7fc2b4024518 overflowed to 0x7fc2b4024510 > #0 0x10b70896d in Copy::conjoint_words_to_higher(HeapWordImpl* const*, HeapWordImpl**, unsigned long) copy.hpp:218 > #1 0x10c4f78f1 in Node_Array::insert(unsigned int, Node*) node.cpp:2783 > #2 0x10b8a1386 in Block::insert_node(Node*, unsigned int) block.hpp:134 > #3 0x10c556630 in PhaseOutput::fill_buffer(C2_MacroAssembler*, unsigned int*) output.cpp:1792 > #4 0x10c552f6b in PhaseOutput::Output() output.cpp:367 > #5 0x10b9ba859 in Compile::Code_Gen() compile.cpp:3035 > #6 0x10b9b7cb1 in Compile::Compile(ciEnv*, ciMethod*, int, Options, DirectiveSet*) compile.cpp:896 > #7 0x10b859912 in C2Compiler::compile_method(ciEnv*, ciMethod*, int, bool, DirectiveSet*) c2compiler.cpp:142 > #8 0x10b9dd4f1 in CompileBroker::invoke_compiler_on_method(CompileTask*) compileBroker.cpp:2305 > #9 0x10b9dc345 in CompileBroker::compiler_thread_loop() compileBroker.cpp:1963 > #10 0x10bfd5ebf in JavaThread::thread_main_inner() javaThread.cpp:760 > #11 0x10bfd5b62 in JavaThread::run() javaThread.cpp:745 > #12 0x10c9310d6 in Thread::call_run() thread.cpp:221 > #13 0x10c53ece4 in thread_native_entry(Thread*) os_bsd.cpp:598 I wonder if using for-loop works here. (Since #iterations is known, for-loop seems more natural.) for (size_t i = 0; i < count; ++i) { to[count - 1 - i] = from[count - 1 - i]; } ------------- PR Comment: https://git.openjdk.org/jdk/pull/19541#issuecomment-2148428451 From ccheung at openjdk.org Tue Jun 4 21:50:00 2024 From: ccheung at openjdk.org (Calvin Cheung) Date: Tue, 4 Jun 2024 21:50:00 GMT Subject: RFR: 8330198: Add some class loading related perf counters to measure VM startup [v3] In-Reply-To: <9k9sDnon7SehCf1oFouQwWKo2YBTlSwZrBTN3e3ezzw=.070cccf2-cfc6-4a51-9bd6-02e17eacc50b@github.com> References: <7yfsvM0ff6gBYLefpro2qTcEMBmCOHd3YICcygItlZs=.d900a439-4932-46e6-b287-d1bf2789f195@github.com> <24n-bMIYvAF10yob4_Z5t1DPG_nrWypoQYE25zQ499U=.1a77aa5e-1005-4d32-9ae7-ec745838e449@github.com> <0gZ9cLiP3bLH1LNL71dwnuaUY5iN-Ewea-Qyw7eGe44=.f8fa485f-5947-47fc-a8eb-cafd1c56165d@github.com> <9k9sDnon7SehCf1oFouQwWKo2YBTlSwZrBTN3e3ezzw=.070cccf2-cfc6-4a51-9bd6-02e17eacc50b@github.com> Message-ID: <2H_eDsrF-8iXVs6DKjKoLzXaMQ0ZlJzXCVu0eQE51U4=.4ac41a7e-6534-4310-9b00-30644b16c358@github.com> On Tue, 4 Jun 2024 15:51:54 GMT, Ioi Lam wrote: >>> -Xlog:init means "I want to see logs related to initialization", so it should enable all the counters for printing the related logs. >> >> I don't agree. Initialization logging could encompass many different things, some of which are individually controllable via different flags. Simply turning on init logging should not turn on all such flags. If you want that level of coupling then perhaps use init_counters (or something like that) to make it clear this is not a general log tag intended for any initialization code to use, but something you have chosen to tie to specific functionality. >> >>> We may add several groups of counters in the future. We don't want to force the user to enumerate all these counters >> >> It is not clear to me how you envisage that working. You want individual group switches plus a global one? > > OK, I agree that not `-Xlog:init` logs are related to timing. > > We actually have an existing tag that's used for conditional logging. Perhaps we should use that instead? > > > ./share/logging/logTag.hpp: LOG_TAG(startuptime) \ > ./share/memory/universe.cpp: TraceTime timer("Genesis", TRACETIME_LOG(Info, startuptime)); > ./share/prims/methodHandles.cpp: TraceTime timer("MethodHandles adapters generation", TRACETIME_LOG(Info, startuptime)); > ./share/runtime/stubRoutines.cpp: TraceTime timer(timer_msg, TRACETIME_LOG(Info, startuptime)); > ./share/runtime/threads.cpp: TraceTime timer("Initialize module system", TRACETIME_LOG(Info, startuptime)); > ./share/runtime/threads.cpp: TraceTime timer("Initialize java.lang classes", TRACETIME_LOG(Info, startuptime)); > ./share/runtime/threads.cpp: TraceTime timer("Initialize java.lang.invoke classes", TRACETIME_LOG(Info, startuptime)); > ./share/runtime/threads.cpp: TraceTime timer("Create VM", TRACETIME_LOG(Info, startuptime)); > ./share/runtime/threads.cpp: { TraceTime timer("Start VMThread", TRACETIME_LOG(Info, startuptime)); > ./share/runtime/timerTrace.hpp:// TraceTime t("some timer", TIMERTRACE_LOG(Info, startuptime, tagX...)); > ./share/utilities/ostream.cpp: // lazily create log file (at startup, LogVMOutput is false even > > $ java -Xlog:startuptime --version > [0.010s][info][startuptime] StubRoutines generation initial stubs, 0.0006132 secs > [0.025s][info][startuptime] Genesis, 0.0145142 secs > [0.025s][info][startuptime] StubRoutines generation continuation stubs, 0.0000198 secs > [0.028s][info][startuptime] Interpreter generation, 0.0005919 secs > [0.028s][info][startuptime] StubRoutines generation final stubs, 0.0000717 secs > [0.028s][info][startuptime] MethodHandles adapters generation, 0.0000101 secs > [0.029s][info][startuptime] Start VMThread, 0.0000846 secs > [0.032s][info][startuptime] Initialize java.lang classes, 0.0036507 secs > [0.033s][info][startuptime] Initialize java.lang.invoke classes, 0.0002106 secs > [0.034s][info][startuptime] StubRoutines generation compiler stubs, 0.0015061 secs > [0.035s][info][startuptime] Initialize module system, 0.0017471 secs > [0.035s][info][startuptime] Create VM, 0.0305224 secs > java 23-internal 2024-09-17 > Java(TM) SE Runtime Environment (build 23-internal-adhoc.iklam.zoo) > Java HotSpot(TM) 64-Bit Server VM (build 23-internal-adhoc.iklam.zoo, mixed mode, sharing) The `TraceTime` doesn't fit what we need well. All the function using `TraceTime `will be called only once. I tried it in `LinkResolver::resolve_invokehandle` void LinkResolver::resolve_invokehandle(CallInfo& result, const constantPoolHandle& pool, int index, TRAPS) { TraceTime trace_timer("Resolve invokehandle", TRACETIME_LOG(Info, startuptime)); When I run a HelloWorld with `-Xlog:startuptime`, I saw: [0.008s][info][startuptime] StubRoutines generation initial stubs, 0.0005199 secs [0.060s][info][startuptime] Genesis, 0.0519067 secs [0.060s][info][startuptime] StubRoutines generation continuation stubs, 0.0000791 secs [0.069s][info][startuptime] Interpreter generation, 0.0022987 secs [0.082s][info][startuptime] StubRoutines generation final stubs, 0.0012770 secs [0.082s][info][startuptime] MethodHandles adapters generation, 0.0001008 secs [0.082s][info][startuptime] Start VMThread, 0.0001566 secs [0.092s][info][startuptime] Initialize java.lang classes, 0.0101517 secs [0.094s][info][startuptime] Initialize java.lang.invoke classes, 0.0007457 secs [0.097s][info][startuptime] StubRoutines generation compiler stubs, 0.0035488 secs [0.100s][info][startuptime] Initialize module system, 0.0064275 secs [0.101s][info][startuptime] Create VM, 0.1007149 secs [0.135s][info][startuptime] Resolve invokehandle, 0.0004090 secs [0.135s][info][startuptime] Resolve invokehandle, 0.0000021 secs [0.135s][info][startuptime] Resolve invokehandle, 0.0000009 secs [0.137s][info][startuptime] Resolve invokehandle, 0.0002939 secs [0.137s][info][startuptime] Resolve invokehandle, 0.0000017 secs [0.139s][info][startuptime] Resolve invokehandle, 0.0001010 secs [0.139s][info][startuptime] Resolve invokehandle, 0.0000016 secs [0.143s][info][startuptime] Resolve invokehandle, 0.0000026 secs hello, world Also, I think `TraceTime` doesn't have a counter to track the number of times the function is called. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18790#discussion_r1626637404 From sspitsyn at openjdk.org Tue Jun 4 23:56:59 2024 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Tue, 4 Jun 2024 23:56:59 GMT Subject: RFR: 8326716: JVMTI spec: clarify what nullptr means for C/C++ developers [v5] In-Reply-To: References: Message-ID: On Tue, 4 Jun 2024 19:00:32 GMT, Chris Plummer wrote: >> I think this is the right place but it is only for return values. There are a few functions where a parameter value can be a null pointer, e.g. in GetThreadState, SuspendThread, GetOwnedMonitorInfo the thread parameter can be a null pointer to mean the current thread. I don't think the introduction section has anywhere right now to reference for parameters that can be NULL/nullptr. > > Yes, my point was that this section is only for return values. The section is titled "Function Return Values". Maybe we should add another short section just before this one to describe what is meant by "null pointer". Okay, thanks. What about the following: : diff --git a/src/hotspot/share/prims/jvmti.xml b/src/hotspot/share/prims/jvmti.xml index a6ebd0d42c5..a81014c70bb 100644 --- a/src/hotspot/share/prims/jvmti.xml +++ b/src/hotspot/share/prims/jvmti.xml @@ -995,7 +995,10 @@ jvmtiEnv *jvmti; across threads and are created dynamically. - + + There are a few functions where a parameter value can be a null pointer + (C NULL or C++ nullptr), e.g. the thread parameter + can be a null pointer to mean the current thread. functions always return an error code via the function return value. @@ -1004,7 +1007,7 @@ jvmtiEnv *jvmti; In some cases, functions allocate memory that your program must explicitly deallocate. This is indicated in the individual function descriptions. Empty lists, arrays, sequences, etc are - returned as a null pointer (C NULL or C++ nullptr). + returned as a null pointer.

In the event that the function encounters an error (any return value other than JVMTI_ERROR_NONE) the values I can try to add a couple of more examples where a null pointer can be passed as a parameter value if it is desirable. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19257#discussion_r1626756885 From duke at openjdk.org Wed Jun 5 02:14:05 2024 From: duke at openjdk.org (duke) Date: Wed, 5 Jun 2024 02:14:05 GMT Subject: Withdrawn: 8316930: HotSpot should use noexcept instead of throw() In-Reply-To: References: Message-ID: On Tue, 26 Sep 2023 02:50:57 GMT, Julian Waters wrote: > throw() has been deprecated since C++11 alongside dynamic exception specifications, we should replace all instances of it with noexcept to prepare HotSpot for later versions of C++ This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/15910 From duke at openjdk.org Wed Jun 5 03:52:24 2024 From: duke at openjdk.org (kuaiwei) Date: Wed, 5 Jun 2024 03:52:24 GMT Subject: RFR: 8325821: [REDO] use "dmb.ishst+dmb.ishld" for release barrier [v8] In-Reply-To: References: Message-ID: > he origin patch for https://bugs.openjdk.org/browse/JDK-8324186 has 2 issues: > 1 It show regression in some platform, like Apple silicon in mac os > 2 Can not handle instruction sequence like "dmb.ishld; dmb.ishst; dmb.ishld; dmb.ishld" > > It can be fixed by: > 1 Enable AlwaysMergeDMB by default, only disable it in architecture we can see performance improvement (N1 or N2) > 2 Check the special pattern and merge the subsequent dmb. > > It also fix a bug when code buffer is expanding, st/ld/dmb can not be merged. I added unit tests for these. > > This patch still has a unhandled case. Insts like "dmb.ishld; dmb.ishst; dmb.ish", it will merge the last 2 instructions and can not merge all three. Because when emitting dmb.ish, if merge all previous dmbs, the code buffer will shrink the size. I think it may break some resumption and think it's not a common pattern. > > In previous PR https://github.com/openjdk/jdk/pull/18467 , I tried an implementation to use state machine for merging. But it looks risky to pending instruction during emitting. kuaiwei has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains eight commits: - Merge master - Use constexpr for test encoding - Add comment in aarch64.ad - Remove tailing white space - Refine merge dmb test cases - Add more unit tests - Make MacroAssembler::merge more clear - 8325821: [REDO] use "dmb.ishst+dmb.ishld" for release barrier ------------- Changes: https://git.openjdk.org/jdk/pull/19278/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19278&range=07 Stats: 523 lines in 9 files changed: 510 ins; 0 del; 13 mod Patch: https://git.openjdk.org/jdk/pull/19278.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19278/head:pull/19278 PR: https://git.openjdk.org/jdk/pull/19278 From ccheung at openjdk.org Wed Jun 5 04:04:57 2024 From: ccheung at openjdk.org (Calvin Cheung) Date: Wed, 5 Jun 2024 04:04:57 GMT Subject: RFR: 8330198: Add some class loading related perf counters to measure VM startup [v3] In-Reply-To: <2H_eDsrF-8iXVs6DKjKoLzXaMQ0ZlJzXCVu0eQE51U4=.4ac41a7e-6534-4310-9b00-30644b16c358@github.com> References: <7yfsvM0ff6gBYLefpro2qTcEMBmCOHd3YICcygItlZs=.d900a439-4932-46e6-b287-d1bf2789f195@github.com> <24n-bMIYvAF10yob4_Z5t1DPG_nrWypoQYE25zQ499U=.1a77aa5e-1005-4d32-9ae7-ec745838e449@github.com> <0gZ9cLiP3bLH1LNL71dwnuaUY5iN-Ewea-Qyw7eGe44=.f8fa485f-5947-47fc-a8eb-cafd1c56165d@github.com> <9k9sDnon7SehCf1oFouQwWKo2YBTlSwZrBTN3e3ezzw=.070cccf2-cfc6-4a51-9bd6-02e17eacc50b@github.com> <2H_eDsrF-8iXVs6DKjKoLzXaMQ0ZlJzXCVu0eQE51U4=.4ac41a7e-6534-4310-9b00-30644b16c358@github.com> Message-ID: On Tue, 4 Jun 2024 21:47:07 GMT, Calvin Cheung wrote: >> OK, I agree that not `-Xlog:init` logs are related to timing. >> >> We actually have an existing tag that's used for conditional logging. Perhaps we should use that instead? >> >> >> ./share/logging/logTag.hpp: LOG_TAG(startuptime) \ >> ./share/memory/universe.cpp: TraceTime timer("Genesis", TRACETIME_LOG(Info, startuptime)); >> ./share/prims/methodHandles.cpp: TraceTime timer("MethodHandles adapters generation", TRACETIME_LOG(Info, startuptime)); >> ./share/runtime/stubRoutines.cpp: TraceTime timer(timer_msg, TRACETIME_LOG(Info, startuptime)); >> ./share/runtime/threads.cpp: TraceTime timer("Initialize module system", TRACETIME_LOG(Info, startuptime)); >> ./share/runtime/threads.cpp: TraceTime timer("Initialize java.lang classes", TRACETIME_LOG(Info, startuptime)); >> ./share/runtime/threads.cpp: TraceTime timer("Initialize java.lang.invoke classes", TRACETIME_LOG(Info, startuptime)); >> ./share/runtime/threads.cpp: TraceTime timer("Create VM", TRACETIME_LOG(Info, startuptime)); >> ./share/runtime/threads.cpp: { TraceTime timer("Start VMThread", TRACETIME_LOG(Info, startuptime)); >> ./share/runtime/timerTrace.hpp:// TraceTime t("some timer", TIMERTRACE_LOG(Info, startuptime, tagX...)); >> ./share/utilities/ostream.cpp: // lazily create log file (at startup, LogVMOutput is false even >> >> $ java -Xlog:startuptime --version >> [0.010s][info][startuptime] StubRoutines generation initial stubs, 0.0006132 secs >> [0.025s][info][startuptime] Genesis, 0.0145142 secs >> [0.025s][info][startuptime] StubRoutines generation continuation stubs, 0.0000198 secs >> [0.028s][info][startuptime] Interpreter generation, 0.0005919 secs >> [0.028s][info][startuptime] StubRoutines generation final stubs, 0.0000717 secs >> [0.028s][info][startuptime] MethodHandles adapters generation, 0.0000101 secs >> [0.029s][info][startuptime] Start VMThread, 0.0000846 secs >> [0.032s][info][startuptime] Initialize java.lang classes, 0.0036507 secs >> [0.033s][info][startuptime] Initialize java.lang.invoke classes, 0.0002106 secs >> [0.034s][info][startuptime] StubRoutines generation compiler stubs, 0.0015061 secs >> [0.035s][info][startuptime] Initialize module system, 0.0017471 secs >> [0.035s][info][startuptime] Create VM, 0.0305224 secs >> java 23-internal 2024-09-17 >> Java(TM) SE Runtime Environment (build 23-internal-adhoc.iklam.zoo) >> Java HotSpot(TM) 64-Bit Server VM (build 23-internal-adhoc.iklam.zoo, mixed mode, sharing) > > The `TraceTime` doesn't fit what we need well. All the function using `TraceTime `will be called only once. > I tried it in `LinkResolver::resolve_invokehandle` > > void LinkResolver::resolve_invokehandle(CallInfo& result, const constantPoolHandle& pool, int index, TRAPS) { > > TraceTime trace_timer("Resolve invokehandle", TRACETIME_LOG(Info, startuptime)); > > When I run a HelloWorld with `-Xlog:startuptime`, I saw: > > [0.008s][info][startuptime] StubRoutines generation initial stubs, 0.0005199 secs > [0.060s][info][startuptime] Genesis, 0.0519067 secs > [0.060s][info][startuptime] StubRoutines generation continuation stubs, 0.0000791 secs > [0.069s][info][startuptime] Interpreter generation, 0.0022987 secs > [0.082s][info][startuptime] StubRoutines generation final stubs, 0.0012770 secs > [0.082s][info][startuptime] MethodHandles adapters generation, 0.0001008 secs > [0.082s][info][startuptime] Start VMThread, 0.0001566 secs > [0.092s][info][startuptime] Initialize java.lang classes, 0.0101517 secs > [0.094s][info][startuptime] Initialize java.lang.invoke classes, 0.0007457 secs > [0.097s][info][startuptime] StubRoutines generation compiler stubs, 0.0035488 secs > [0.100s][info][startuptime] Initialize module system, 0.0064275 secs > [0.101s][info][startuptime] Create VM, 0.1007149 secs > [0.135s][info][startuptime] Resolve invokehandle, 0.0004090 secs > [0.135s][info][startuptime] Resolve invokehandle, 0.0000021 secs > [0.135s][info][startuptime] Resolve invokehandle, 0.0000009 secs > [0.137s][info][startuptime] Resolve invokehandle, 0.0002939 secs > [0.137s][info][startuptime] Resolve invokehandle, 0.0000017 secs > [0.139s][info][startuptime] Resolve invokehandle, 0.0001010 secs > [0.139s][info][startuptime] Resolve invokehandle, 0.0000016 secs > [0.143s][info][startuptime] Resolve invokehandle, 0.0000026 secs > hello, world > > Also, I think `TraceTime` doesn't have a counter to track the number of times the function is called. > > The -Xlog:init (perhaps with a better name/tag!) > How about `-Xlog:initcounters` as the name/tag? It is similar to @dholmes-ora 's suggestion above without the underscore between "init" and "counters". Currently, none of the logging tag contains an underscore character. @iwanowww, @iklam, @dholmes-ora, what do you guys think? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18790#discussion_r1626893440 From dlong at openjdk.org Wed Jun 5 04:15:56 2024 From: dlong at openjdk.org (Dean Long) Date: Wed, 5 Jun 2024 04:15:56 GMT Subject: RFR: 8331854: ubsan: copy.hpp:218:10: runtime error: addition of unsigned offset to 0x7fc2b4024518 overflowed to 0x7fc2b4024510 In-Reply-To: References: Message-ID: On Tue, 4 Jun 2024 21:15:07 GMT, Albert Mingkun Yang wrote: >> When building with ubsan, we see a number of overflows at this code location : >> >> /jdk/src/hotspot/share/utilities/copy.hpp:218:10: runtime error: addition of unsigned offset to 0x7fc2b4024518 overflowed to 0x7fc2b4024510 >> #0 0x10b70896d in Copy::conjoint_words_to_higher(HeapWordImpl* const*, HeapWordImpl**, unsigned long) copy.hpp:218 >> #1 0x10c4f78f1 in Node_Array::insert(unsigned int, Node*) node.cpp:2783 >> #2 0x10b8a1386 in Block::insert_node(Node*, unsigned int) block.hpp:134 >> #3 0x10c556630 in PhaseOutput::fill_buffer(C2_MacroAssembler*, unsigned int*) output.cpp:1792 >> #4 0x10c552f6b in PhaseOutput::Output() output.cpp:367 >> #5 0x10b9ba859 in Compile::Code_Gen() compile.cpp:3035 >> #6 0x10b9b7cb1 in Compile::Compile(ciEnv*, ciMethod*, int, Options, DirectiveSet*) compile.cpp:896 >> #7 0x10b859912 in C2Compiler::compile_method(ciEnv*, ciMethod*, int, bool, DirectiveSet*) c2compiler.cpp:142 >> #8 0x10b9dd4f1 in CompileBroker::invoke_compiler_on_method(CompileTask*) compileBroker.cpp:2305 >> #9 0x10b9dc345 in CompileBroker::compiler_thread_loop() compileBroker.cpp:1963 >> #10 0x10bfd5ebf in JavaThread::thread_main_inner() javaThread.cpp:760 >> #11 0x10bfd5b62 in JavaThread::run() javaThread.cpp:745 >> #12 0x10c9310d6 in Thread::call_run() thread.cpp:221 >> #13 0x10c53ece4 in thread_native_entry(Thread*) os_bsd.cpp:598 > > I wonder if using for-loop works here. (Since #iterations is known, for-loop seems more natural.) > > > for (size_t i = 0; i < count; ++i) { > to[count - 1 - i] = from[count - 1 - i]; > } @albertnetymk, that rewrite seems fine, but at least to me it's less obvious what it does, and I wonder if the C++ compiler generates equivalent code. If we are going to change the loop, here's another alternative: from += count; to += count; while (count-- > 0) { *--to = *--from; } ------------- PR Comment: https://git.openjdk.org/jdk/pull/19541#issuecomment-2148819381 From iklam at openjdk.org Wed Jun 5 04:43:57 2024 From: iklam at openjdk.org (Ioi Lam) Date: Wed, 5 Jun 2024 04:43:57 GMT Subject: RFR: 8330198: Add some class loading related perf counters to measure VM startup [v3] In-Reply-To: References: <7yfsvM0ff6gBYLefpro2qTcEMBmCOHd3YICcygItlZs=.d900a439-4932-46e6-b287-d1bf2789f195@github.com> <24n-bMIYvAF10yob4_Z5t1DPG_nrWypoQYE25zQ499U=.1a77aa5e-1005-4d32-9ae7-ec745838e449@github.com> <0gZ9cLiP3bLH1LNL71dwnuaUY5iN-Ewea-Qyw7eGe44=.f8fa485f-5947-47fc-a8eb-cafd1c56165d@github.com> <9k9sDnon7SehCf1oFouQwWKo2YBTlSwZrBTN3e3ezzw=.070cccf2-cfc6-4a51-9bd6-02e17eacc50b@github.com> <2H_eDsrF-8iXVs6DKjKoLzXaMQ0ZlJzXCVu0eQE51U4=.4ac41a7e-6534-4310-9b00-30644b16c358@github.com> Message-ID: <7eca2PnRmrfCFnua7sipAivaGRl_jhOlXqr_1lM5ExU=.e6ad2522-4dad-428c-9a85-4e22c201baa2@github.com> On Wed, 5 Jun 2024 04:02:18 GMT, Calvin Cheung wrote: >> The `TraceTime` doesn't fit what we need well. All the function using `TraceTime `will be called only once. >> I tried it in `LinkResolver::resolve_invokehandle` >> >> void LinkResolver::resolve_invokehandle(CallInfo& result, const constantPoolHandle& pool, int index, TRAPS) { >> >> TraceTime trace_timer("Resolve invokehandle", TRACETIME_LOG(Info, startuptime)); >> >> When I run a HelloWorld with `-Xlog:startuptime`, I saw: >> >> [0.008s][info][startuptime] StubRoutines generation initial stubs, 0.0005199 secs >> [0.060s][info][startuptime] Genesis, 0.0519067 secs >> [0.060s][info][startuptime] StubRoutines generation continuation stubs, 0.0000791 secs >> [0.069s][info][startuptime] Interpreter generation, 0.0022987 secs >> [0.082s][info][startuptime] StubRoutines generation final stubs, 0.0012770 secs >> [0.082s][info][startuptime] MethodHandles adapters generation, 0.0001008 secs >> [0.082s][info][startuptime] Start VMThread, 0.0001566 secs >> [0.092s][info][startuptime] Initialize java.lang classes, 0.0101517 secs >> [0.094s][info][startuptime] Initialize java.lang.invoke classes, 0.0007457 secs >> [0.097s][info][startuptime] StubRoutines generation compiler stubs, 0.0035488 secs >> [0.100s][info][startuptime] Initialize module system, 0.0064275 secs >> [0.101s][info][startuptime] Create VM, 0.1007149 secs >> [0.135s][info][startuptime] Resolve invokehandle, 0.0004090 secs >> [0.135s][info][startuptime] Resolve invokehandle, 0.0000021 secs >> [0.135s][info][startuptime] Resolve invokehandle, 0.0000009 secs >> [0.137s][info][startuptime] Resolve invokehandle, 0.0002939 secs >> [0.137s][info][startuptime] Resolve invokehandle, 0.0000017 secs >> [0.139s][info][startuptime] Resolve invokehandle, 0.0001010 secs >> [0.139s][info][startuptime] Resolve invokehandle, 0.0000016 secs >> [0.143s][info][startuptime] Resolve invokehandle, 0.0000026 secs >> hello, world >> >> Also, I think `TraceTime` doesn't have a counter to track the number of times the function is called. > >> > The -Xlog:init (perhaps with a better name/tag!) >> > > How about `-Xlog:initcounters` as the name/tag? It is similar to @dholmes-ora 's suggestion above without the underscore between "init" and "counters". Currently, none of the logging tag contains an underscore character. > > @iwanowww, @iklam, @dholmes-ora, what do you guys think? While these counters are useful for start-up measurement, they can be used for other purposes (e.g., monitoring how much time is spend in linking classes in a long running application). So I think the logging tag should be more neutral. We already have `-Xlog:perf`. Maybe we can have sub options like `-Xlog:perf+class+link` for the counters in this PR? I know an upcoming PR will be for stats for MutexLocker. Something like `-Xlog:perf+lock` would work for such stats. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18790#discussion_r1626932616 From thartmann at openjdk.org Wed Jun 5 05:23:00 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 5 Jun 2024 05:23:00 GMT Subject: RFR: 8325821: [REDO] use "dmb.ishst+dmb.ishld" for release barrier [v8] In-Reply-To: References: Message-ID: On Wed, 5 Jun 2024 03:52:24 GMT, kuaiwei wrote: >> he origin patch for https://bugs.openjdk.org/browse/JDK-8324186 has 2 issues: >> 1 It show regression in some platform, like Apple silicon in mac os >> 2 Can not handle instruction sequence like "dmb.ishld; dmb.ishst; dmb.ishld; dmb.ishld" >> >> It can be fixed by: >> 1 Enable AlwaysMergeDMB by default, only disable it in architecture we can see performance improvement (N1 or N2) >> 2 Check the special pattern and merge the subsequent dmb. >> >> It also fix a bug when code buffer is expanding, st/ld/dmb can not be merged. I added unit tests for these. >> >> This patch still has a unhandled case. Insts like "dmb.ishld; dmb.ishst; dmb.ish", it will merge the last 2 instructions and can not merge all three. Because when emitting dmb.ish, if merge all previous dmbs, the code buffer will shrink the size. I think it may break some resumption and think it's not a common pattern. >> >> In previous PR https://github.com/openjdk/jdk/pull/18467 , I tried an implementation to use state machine for merging. But it looks risky to pending instruction during emitting. > > kuaiwei has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains eight commits: > > - Merge master > - Use constexpr for test encoding > - Add comment in aarch64.ad > - Remove tailing white space > - Refine merge dmb test cases > - Add more unit tests > - Make MacroAssembler::merge more clear > - 8325821: [REDO] use "dmb.ishst+dmb.ishld" for release barrier Given that the first version of the patch had some issues, I would recommend waiting for the fork tomorrow and only integrating this into JDK 24. I'll run this through our correctness and performance testing and report back once it passed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19278#issuecomment-2148876300 PR Comment: https://git.openjdk.org/jdk/pull/19278#issuecomment-2148877808 From dholmes at openjdk.org Wed Jun 5 06:00:56 2024 From: dholmes at openjdk.org (David Holmes) Date: Wed, 5 Jun 2024 06:00:56 GMT Subject: RFR: 8256828: ostream::print_cr() truncates buffer in copy-through case [v4] In-Reply-To: References: <7sjAHNVe08bQX2n2Kp38Ppxqf9zem-oYahE3dO_la84=.20f4fbd3-2c20-415a-8e75-275d69cf4f7b@github.com> Message-ID: On Tue, 4 Jun 2024 07:42:00 GMT, Thomas Stuefe wrote: >> David Holmes has updated the pull request incrementally with one additional commit since the last revision: >> >> Simplified the test code - thanks @tstuefe! >> Rewrote the comment block describing do_vsnprintf. > > Good. Thank you for taking this on. Thanks for the review @tstuefe ! ------------- PR Comment: https://git.openjdk.org/jdk/pull/19512#issuecomment-2148923364 From amitkumar at openjdk.org Wed Jun 5 06:44:58 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Wed, 5 Jun 2024 06:44:58 GMT Subject: RFR: 8333382: [s390x] Move population_count implementation out of ad file In-Reply-To: References: Message-ID: <3d8lQjpyD47bXYpl2kp_R0uWSiiNdIpJbkyVuI3q0GQ=.10c0e582-43da-4b25-8933-205adde9c3fb@github.com> On Sat, 1 Jun 2024 13:15:45 GMT, Amit Kumar wrote: > We need to move popcnt instruction implementation out of s390.ad file as it is required to be required some methods present in [JDK-8331126.](https://bugs.openjdk.org/browse/JDK-8331126) > > > When the miscellaneous-instruction-extensions facility 3 is not installed or bit 0 of the M3 > field is zero, a count of the number of one bits in each of the eight bytes of general register > R2 is placed into the corresponding byte of general register R1. Each byte of general register > R1 is an 8-bit binary integer in the range of 0-8. > > > > When the miscellaneous-instruction-extensions facility 3 is installed and bit 0 of the M3 field > is one, a count of the total number of one bits in the 64-bit general register R2 is placed into > general register R1. The result is a 64-bit unsigned integer in the range 0 to 64. > > > Performed tier1 test on fastdebug build and didn't see any regression. These are the benchmark results from custom benchmark: Without the Patch: Benchmark Mode Cnt Score Error Units PopCount.fineTest avgt 20 1608179.360 ? 140304.896 ns/op With the Patch: Benchmark Mode Cnt Score Error Units PopCount.fineTest avgt 20 762352.534 ? 84070.285 ns/op This is the benchmark: package org.openjdk.bench.vm.runtime; import org.openjdk.jmh.annotations.*; import java.util.Random; import java.util.concurrent.TimeUnit; @BenchmarkMode(Mode.AverageTime) @OutputTimeUnit(TimeUnit.NANOSECONDS) @State(Scope.Thread) @Warmup(iterations = 1, time = 1) @Measurement(iterations = 4, time = 1) @Fork(value = 5) public class PopCount { public Random rand = new Random(); int numTests = 100_000; int[] testNumbers = new int[numTests]; @Setup public void warmup() { long l1 = 1, l2 = 2, l3 = 3, l4 = 4, l5 = 5, l6 = 6, l7 = 7, l8 = 9, l9 = 9, l10 = 10; for (long i = 0; i < numTests; i++) { l1 ^= Long.bitCount(l1) + i; l2 ^= Long.bitCount(l2) + i; l3 ^= Long.bitCount(l3) + i; l4 ^= Long.bitCount(l4) + i; l5 ^= Long.bitCount(l5) + i; l6 ^= Long.bitCount(l6) + i; l7 ^= Long.bitCount(l7) + i; l8 ^= Long.bitCount(l8) + i; l9 ^= Long.bitCount(l9) + i; l10 ^= Long.bitCount(l10) + i; } long x = l1 + l2 + l3 + l4 + l5 + l6 + l7 + l8 + l9 + l10; } @Benchmark public long fineTest() { long l1 = 1, l2 = 2, l3 = 3, l4 = 4, l5 = 5, l6 = 6, l7 = 7, l8 = 9, l9 = 9, l10 = 10; for (long i = 0; i < numTests; i++) { l1 ^= Long.bitCount(l1) + i; l2 ^= Long.bitCount(l2) + i; l3 ^= Long.bitCount(l3) + i; l4 ^= Long.bitCount(l4) + i; l5 ^= Long.bitCount(l5) + i; l6 ^= Long.bitCount(l6) + i; l7 ^= Long.bitCount(l7) + i; l8 ^= Long.bitCount(l8) + i; l9 ^= Long.bitCount(l9) + i; l10 ^= Long.bitCount(l10) + i; } return l1 + l2 + l3 + l4 + l5 + l6 + l7 + l8 + l9 + l10; } } @RealLucy ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/19509#issuecomment-2149000386 From varadam at openjdk.org Wed Jun 5 07:00:26 2024 From: varadam at openjdk.org (Varada M) Date: Wed, 5 Jun 2024 07:00:26 GMT Subject: RFR: 8331733: [PPC64] saving and restoring CR is not needed at most places [v2] In-Reply-To: References: Message-ID: <4QAGcf4iBInupqh6dYvZ02y6LzpCMrOKXnca74Uny9A=.c85d59eb-4e2c-4f3d-be00-71b2f4cb3269@github.com> > PPC64 uses save/restore CR less often. Only LR is critical, CR is mainly needed for native-to-Java calls. > Fastdebug: build and tier1 testing successful. [unrelated failures] Varada M has updated the pull request incrementally with one additional commit since the last revision: [PPC64] saving and restoring CR is not needed at most places ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19494/files - new: https://git.openjdk.org/jdk/pull/19494/files/75b9afa9..6dee9281 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19494&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19494&range=00-01 Stats: 7 lines in 2 files changed: 0 ins; 5 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/19494.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19494/head:pull/19494 PR: https://git.openjdk.org/jdk/pull/19494 From varadam at openjdk.org Wed Jun 5 07:03:57 2024 From: varadam at openjdk.org (Varada M) Date: Wed, 5 Jun 2024 07:03:57 GMT Subject: RFR: 8331733: [PPC64] saving and restoring CR is not needed at most places In-Reply-To: <_oOJA6AJ729zZn5EZpXF-W5h9CfU_IoX5NtY119trrg=.737da5ff-5a8f-42a5-b823-02fb3e000bf4@github.com> References: <_oOJA6AJ729zZn5EZpXF-W5h9CfU_IoX5NtY119trrg=.737da5ff-5a8f-42a5-b823-02fb3e000bf4@github.com> Message-ID: On Tue, 4 Jun 2024 15:02:43 GMT, Martin Doerr wrote: >> PPC64 uses save/restore CR less often. Only LR is critical, CR is mainly needed for native-to-Java calls. >> Fastdebug: build and tier1 testing successful. [unrelated failures] > > Please make xBarrierSetAssembler_ppc.cpp consistent with the normal BarrierSetAssembler. "// NOP. Conditions registers are covered by save_LR_CR" should get replaced, too. > Saving and restoring CR in `RegisterSaver::push_frame_reg_args_and_save_live_registers` and `RegisterSaver::restore_live_registers_and_pop_frame` are not needed, either. > Otherwise, this PR looks good to me. I'll run tests. (Will retest when you make updates.) Thank you @TheRealMDoerr . I have made the changes. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19494#issuecomment-2149030309 From fyang at openjdk.org Wed Jun 5 07:06:57 2024 From: fyang at openjdk.org (Fei Yang) Date: Wed, 5 Jun 2024 07:06:57 GMT Subject: RFR: 8332689: RISC-V: Use load instead of trampolines [v4] In-Reply-To: References: Message-ID: <0E8sVpwVMGzxi3n6c3YSXk68xKycQzsb9QqAEq8Tqw4=.8de6afc3-bbd1-4d49-9d0f-b30cd64fdf01@github.com> On Tue, 4 Jun 2024 16:34:34 GMT, Robbin Ehn wrote: >> Hi all, please consider! >> >> Today we do JAL to **dest** if **dest** is in reach (+/- 1 MB). >> Using a very small application or running very short time we have fast patchable calls. >> But any normal application running longer will increase the code size and code chrun/fragmentation. >> So whatever or not you get hot fast calls rely on luck. >> >> To be patchable and get code cache reach we also emit a stub trampoline which we can point the JAL to. >> This would be the common case for a patchable call. >> >> Code stream: >> JAL >> Stubs: >> AUIPC >> LD >> JALR >> >> >> >> On some CPUs L1D and L1I can't contain the same cache line, which means the tramopline stub can bounce from L1I->L1D->L1I, which is expensive. >> Even if you don't have that problem having a call to a jump is not the fastest way. >> Loading the address avoids the pitsfalls of cmodx. >> >> This patch suggest to solve the problems with trampolines, we take small penalty in the naive case of JAL to **dest**, >> and instead do by default: >> >> Code stream: >> AUIPC >> LD >> JALR >> Stubs: >> >> >> An experimental option for turning trampolines back on exists. >> >> It should be possible to enhanced this with the WIP [Zjid](https://github.com/riscv/riscv-j-extension) by changing the JALR to JAL and nop out the auipc+ld (as the current proposal of Zjid forces the I-fetcher to fetch instruction in order (meaning we will avoid a lot issues which arm has)) when in reach and vice-versa. >> >> Numbers from VF2 (I have done them a few times, they are always overall in favor of this patch): >> >> fop (msec) 2239 | 2128 = 0.950424 >> h2 (msec) 18660 | 16594 = 0.889282 >> jython (msec) 22022 | 21925 = 0.995595 >> luindex (msec) 2866 | 2842 = 0.991626 >> lusearch (msec) 4108 | 4311 = 1.04942 >> lusearch-fix (msec) 4406 | 4116 = 0.934181 >> pmd (msec) 5976 | 5897 = 0.98678 >> jython (msec) 22022 | 21925 = 0.995595 >> Avg: 0.974112 >> fop(xcomp) (msec) 2721 | 2714 = 0.997427 >> h2(xcomp) ... > > Robbin Ehn has updated the pull request incrementally with one additional commit since the last revision: > > Move shart/far code to cpp Hi, I only have several minor comments for now. Still looking at the code. src/hotspot/cpu/riscv/nativeInst_riscv.cpp line 330: > 328: > 329: enum RISCV_specific_constants { > 330: return_address_offset = 3 * NativeInstruction::instruction_size, // ld auipc jalr Suggestion about the code comment: `// auipc + ld + jalr` src/hotspot/cpu/riscv/nativeInst_riscv.cpp line 636: > 634: call = (NativeCall*)(return_address - NativeFarCall::return_address_offset); > 635: } else { > 636: call = (NativeCall*)(return_address - NativeShortCall::instruction_size); Maybe it's better to have a `return_address_offset` (which equals `NativeShortCall::instruction_size`) const for `NativeShortCall` too? src/hotspot/cpu/riscv/nativeInst_riscv.hpp line 118: > 116: public: > 117: enum { > 118: instruction_size = 3 * Assembler::instruction_size, It looks odd for this `instruction_size` `NativeCall` to have a size of 12 here. It should depends on whether it is a `NativeShortCall` for `NativeFarCall` at the bottom, right? ------------- PR Review: https://git.openjdk.org/jdk/pull/19453#pullrequestreview-2098062531 PR Review Comment: https://git.openjdk.org/jdk/pull/19453#discussion_r1626976092 PR Review Comment: https://git.openjdk.org/jdk/pull/19453#discussion_r1627098898 PR Review Comment: https://git.openjdk.org/jdk/pull/19453#discussion_r1627095666 From rehn at openjdk.org Wed Jun 5 07:18:57 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Wed, 5 Jun 2024 07:18:57 GMT Subject: RFR: 8332689: RISC-V: Use load instead of trampolines [v4] In-Reply-To: <0E8sVpwVMGzxi3n6c3YSXk68xKycQzsb9QqAEq8Tqw4=.8de6afc3-bbd1-4d49-9d0f-b30cd64fdf01@github.com> References: <0E8sVpwVMGzxi3n6c3YSXk68xKycQzsb9QqAEq8Tqw4=.8de6afc3-bbd1-4d49-9d0f-b30cd64fdf01@github.com> Message-ID: <-KsbrqGrBHviBPsjYCvlmpPGVXbZfMUswRU_o45s6Gs=.67c2fb5d-d701-4fa2-8a7c-811a410bb255@github.com> On Wed, 5 Jun 2024 06:58:40 GMT, Fei Yang wrote: >> Robbin Ehn has updated the pull request incrementally with one additional commit since the last revision: >> >> Move shart/far code to cpp > > src/hotspot/cpu/riscv/nativeInst_riscv.hpp line 118: > >> 116: public: >> 117: enum { >> 118: instruction_size = 3 * Assembler::instruction_size, > > It looks odd for this `instruction_size` `NativeCall` to have a size of 12 here. It should depends on whether it is a `NativeShortCall` or `NativeFarCall` at the bottom, right? The issue is that the common code access the enum directly: src/hotspot/share/code/nmethod.inline.hpp: || (is_compiled_by_jvmci() && pc == (deopt_handler_begin() + NativeCall::instruction_size)) src/hotspot/share/code/nmethod.inline.hpp: || (is_compiled_by_jvmci() && pc == (deopt_mh_handler_begin() + NativeCall::instruction_size)) src/hotspot/share/opto/output.cpp: int pad_req = NativeCall::instruction_size; For c2 it just for an size estimate, so it's fine to give a larger value. JVM CI is buggy (in general it's buggy all over), as we have no clue what size the unknown compiler using JVM CI generates here. We can't use a constant for a C1/C2 call, so this code is just wrong. Not sure what to do about, I have seen no errors for it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19453#discussion_r1627125905 From clanger at openjdk.org Wed Jun 5 07:21:59 2024 From: clanger at openjdk.org (Christoph Langer) Date: Wed, 5 Jun 2024 07:21:59 GMT Subject: RFR: 8331854: ubsan: copy.hpp:218:10: runtime error: addition of unsigned offset to 0x7fc2b4024518 overflowed to 0x7fc2b4024510 In-Reply-To: References: Message-ID: On Tue, 4 Jun 2024 14:11:42 GMT, Matthias Baesken wrote: > When building with ubsan, we see a number of overflows at this code location : > > /jdk/src/hotspot/share/utilities/copy.hpp:218:10: runtime error: addition of unsigned offset to 0x7fc2b4024518 overflowed to 0x7fc2b4024510 > #0 0x10b70896d in Copy::conjoint_words_to_higher(HeapWordImpl* const*, HeapWordImpl**, unsigned long) copy.hpp:218 > #1 0x10c4f78f1 in Node_Array::insert(unsigned int, Node*) node.cpp:2783 > #2 0x10b8a1386 in Block::insert_node(Node*, unsigned int) block.hpp:134 > #3 0x10c556630 in PhaseOutput::fill_buffer(C2_MacroAssembler*, unsigned int*) output.cpp:1792 > #4 0x10c552f6b in PhaseOutput::Output() output.cpp:367 > #5 0x10b9ba859 in Compile::Code_Gen() compile.cpp:3035 > #6 0x10b9b7cb1 in Compile::Compile(ciEnv*, ciMethod*, int, Options, DirectiveSet*) compile.cpp:896 > #7 0x10b859912 in C2Compiler::compile_method(ciEnv*, ciMethod*, int, bool, DirectiveSet*) c2compiler.cpp:142 > #8 0x10b9dd4f1 in CompileBroker::invoke_compiler_on_method(CompileTask*) compileBroker.cpp:2305 > #9 0x10b9dc345 in CompileBroker::compiler_thread_loop() compileBroker.cpp:1963 > #10 0x10bfd5ebf in JavaThread::thread_main_inner() javaThread.cpp:760 > #11 0x10bfd5b62 in JavaThread::run() javaThread.cpp:745 > #12 0x10c9310d6 in Thread::call_run() thread.cpp:221 > #13 0x10c53ece4 in thread_native_entry(Thread*) os_bsd.cpp:598 Marked as reviewed by clanger (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/19541#pullrequestreview-2098246327 From sspitsyn at openjdk.org Wed Jun 5 07:23:57 2024 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Wed, 5 Jun 2024 07:23:57 GMT Subject: RFR: 8332785: Replace naked uses of UseSharedSpaces with CDSConfig::is_using_archive [v2] In-Reply-To: References: Message-ID: On Fri, 31 May 2024 15:12:39 GMT, Sonia Zaldana Calles wrote: >> Hi folks, >> >> This PR addresses [8332785](https://bugs.openjdk.org/browse/JDK-8332785) replacing all naked uses for ```UseSharedSpaces``` with ```CDSConfig::is_using_archive```. >> >> Testing: >> - [x] Tier 1 with GHA. >> >> Thanks, >> Sonia > > Sonia Zaldana Calles has updated the pull request incrementally with one additional commit since the last revision: > > Updating copyright headers and unnecessary CDSConfig:: Looks good. ------------- Marked as reviewed by sspitsyn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19463#pullrequestreview-2098250154 From stefank at openjdk.org Wed Jun 5 07:44:58 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Wed, 5 Jun 2024 07:44:58 GMT Subject: RFR: 8331854: ubsan: copy.hpp:218:10: runtime error: addition of unsigned offset to 0x7fc2b4024518 overflowed to 0x7fc2b4024510 In-Reply-To: References: Message-ID: On Tue, 4 Jun 2024 14:11:42 GMT, Matthias Baesken wrote: > When building with ubsan, we see a number of overflows at this code location : > > /jdk/src/hotspot/share/utilities/copy.hpp:218:10: runtime error: addition of unsigned offset to 0x7fc2b4024518 overflowed to 0x7fc2b4024510 > #0 0x10b70896d in Copy::conjoint_words_to_higher(HeapWordImpl* const*, HeapWordImpl**, unsigned long) copy.hpp:218 > #1 0x10c4f78f1 in Node_Array::insert(unsigned int, Node*) node.cpp:2783 > #2 0x10b8a1386 in Block::insert_node(Node*, unsigned int) block.hpp:134 > #3 0x10c556630 in PhaseOutput::fill_buffer(C2_MacroAssembler*, unsigned int*) output.cpp:1792 > #4 0x10c552f6b in PhaseOutput::Output() output.cpp:367 > #5 0x10b9ba859 in Compile::Code_Gen() compile.cpp:3035 > #6 0x10b9b7cb1 in Compile::Compile(ciEnv*, ciMethod*, int, Options, DirectiveSet*) compile.cpp:896 > #7 0x10b859912 in C2Compiler::compile_method(ciEnv*, ciMethod*, int, bool, DirectiveSet*) c2compiler.cpp:142 > #8 0x10b9dd4f1 in CompileBroker::invoke_compiler_on_method(CompileTask*) compileBroker.cpp:2305 > #9 0x10b9dc345 in CompileBroker::compiler_thread_loop() compileBroker.cpp:1963 > #10 0x10bfd5ebf in JavaThread::thread_main_inner() javaThread.cpp:760 > #11 0x10bfd5b62 in JavaThread::run() javaThread.cpp:745 > #12 0x10c9310d6 in Thread::call_run() thread.cpp:221 > #13 0x10c53ece4 in thread_native_entry(Thread*) os_bsd.cpp:598 FWIW, I was also thinking that this could be written in another way, but I held of because didn't want to derail yet another ubsan review. :) The reasons why I would have preferred if this were written another way are: 1) The inserted bail-out is placed just before the assert block. This makes the function have a different structure compared to the other functions that palace the invariant checks first. I prefer to keep code consistent. 2) I'm really not a fan of if statements with returns to the right. It makes it much harder to see the return, IMHO. It's as if we want to hide the return instead of showing it prominently. Now that people have been given alternatives, I can say that I first considered @dean-long's version. It has a drawback that the from and to names are slightly off given that they point to one beyond the current from and to elements. (With that said, this function already uses `count` as both the count and a loop variable, so I'm not sure that would be worse). @albertnetymk's version is interesting, but I agree with Dean that it is less obvious. I wonder if this could be written something like this: while (count-- > 0) { to[count] = from[count] } ------------- PR Comment: https://git.openjdk.org/jdk/pull/19541#issuecomment-2149101681 From jsjolen at openjdk.org Wed Jun 5 07:57:21 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Wed, 5 Jun 2024 07:57:21 GMT Subject: Integrated: 8312132: Add tracking of multiple address spaces in NMT In-Reply-To: References: Message-ID: On Wed, 13 Mar 2024 21:52:58 GMT, Johan Sj?len wrote: > Hi, > > This PR introduces a new abstraction to NMT, named `MemoryFileTracker`. Today, NMT does not track any memory outside of the virtual memory address space. This means that if you allocated memory in something such as a memory-backed file and use `mmap` to map into that memory, then you'll have trouble reporting this to NMT. This is the situation that ZGC is in, and that is what this patch attempts to fix. > > ## `MemoryFileTracker` > > The `MemoryFileTracker` adds the ability of adding new virtual memory address spaces to NMT and committing memory to these, the basic API is: > > ```c++ > static MemoryFile* make_device(const char* descriptive_name); > static void free_device(MemoryFile* device); > > static void allocate_memory(MemoryFile* device, size_t offset, size_t size, > MEMFLAGS flag, const NativeCallStack& stack); > static void free_memory(MemoryFile* device, size_t offset, size_t size); > > > It is easiest to see how this is used by looking at what ZGC's `ZNMT` class does: > > ```c++ > void ZNMT::reserve(zaddress_unsafe start, size_t size) { > MemTracker::record_virtual_memory_reserve((address)start, size, CALLER_PC, mtJavaHeap); > } > void ZNMT::commit(zoffset offset, size_t size) { > MemTracker::allocate_memory_in(ZNMT::_device, static_cast(offset), size, mtJavaHeap, CALLER_PC); > } > void ZNMT::uncommit(zoffset offset, size_t size) { > MemTracker::free_memory_in(ZNMT::_device, (size_t)offset, size); > } > > void ZNMT::map(zaddress_unsafe addr, size_t size, zoffset offset) { > // NMT doesn't track mappings at the moment. > } > void ZNMT::unmap(zaddress_unsafe addr, size_t size) { > // NMT doesn't track mappings at the moment. > } > > > As you can see, any mapping between reserved regions and device-allocated memory is not recorded in NMT. This means that in detailed mode you only get reserved regions printed for the reserved memory, the device-allocated memory is reported separately. When performing summary reporting any memory allocated via these devices is added to the corresponding `MEMFLAGS` as `committed` memory. > > This patch is also acting as a base on which we deploy multiple new backend ideas to NMT. These ideas are: > > 1. Implement VMA tracking using a balanced binary tree approach. Today's `VirtualMemoryTracker`'s usage of linked lists is slow and brittle, we'd like to move away from it. Our Treap-based approach in this patch gives a performance boost such that we see 25x better performance in a benchmark. The idea and draft of this... This pull request has now been integrated. Changeset: 3944e673 Author: Johan Sj?len URL: https://git.openjdk.org/jdk/commit/3944e67366601b6f748df1c5f93f184a7cb23ec3 Stats: 2357 lines in 21 files changed: 2252 ins; 86 del; 19 mod 8312132: Add tracking of multiple address spaces in NMT Co-authored-by: Thomas Stuefe Reviewed-by: stefank, stuefe ------------- PR: https://git.openjdk.org/jdk/pull/18289 From mbaesken at openjdk.org Wed Jun 5 08:00:27 2024 From: mbaesken at openjdk.org (Matthias Baesken) Date: Wed, 5 Jun 2024 08:00:27 GMT Subject: RFR: 8331854: ubsan: copy.hpp:218:10: runtime error: addition of unsigned offset to 0x7fc2b4024518 overflowed to 0x7fc2b4024510 [v2] In-Reply-To: References: Message-ID: > When building with ubsan, we see a number of overflows at this code location : > > /jdk/src/hotspot/share/utilities/copy.hpp:218:10: runtime error: addition of unsigned offset to 0x7fc2b4024518 overflowed to 0x7fc2b4024510 > #0 0x10b70896d in Copy::conjoint_words_to_higher(HeapWordImpl* const*, HeapWordImpl**, unsigned long) copy.hpp:218 > #1 0x10c4f78f1 in Node_Array::insert(unsigned int, Node*) node.cpp:2783 > #2 0x10b8a1386 in Block::insert_node(Node*, unsigned int) block.hpp:134 > #3 0x10c556630 in PhaseOutput::fill_buffer(C2_MacroAssembler*, unsigned int*) output.cpp:1792 > #4 0x10c552f6b in PhaseOutput::Output() output.cpp:367 > #5 0x10b9ba859 in Compile::Code_Gen() compile.cpp:3035 > #6 0x10b9b7cb1 in Compile::Compile(ciEnv*, ciMethod*, int, Options, DirectiveSet*) compile.cpp:896 > #7 0x10b859912 in C2Compiler::compile_method(ciEnv*, ciMethod*, int, bool, DirectiveSet*) c2compiler.cpp:142 > #8 0x10b9dd4f1 in CompileBroker::invoke_compiler_on_method(CompileTask*) compileBroker.cpp:2305 > #9 0x10b9dc345 in CompileBroker::compiler_thread_loop() compileBroker.cpp:1963 > #10 0x10bfd5ebf in JavaThread::thread_main_inner() javaThread.cpp:760 > #11 0x10bfd5b62 in JavaThread::run() javaThread.cpp:745 > #12 0x10c9310d6 in Thread::call_run() thread.cpp:221 > #13 0x10c53ece4 in thread_native_entry(Thread*) os_bsd.cpp:598 Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: move check after assert ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19541/files - new: https://git.openjdk.org/jdk/pull/19541/files/1f8ea858..70d09108 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19541&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19541&range=00-01 Stats: 2 lines in 1 file changed: 1 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/19541.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19541/head:pull/19541 PR: https://git.openjdk.org/jdk/pull/19541 From mbaesken at openjdk.org Wed Jun 5 08:00:28 2024 From: mbaesken at openjdk.org (Matthias Baesken) Date: Wed, 5 Jun 2024 08:00:28 GMT Subject: RFR: 8331854: ubsan: copy.hpp:218:10: runtime error: addition of unsigned offset to 0x7fc2b4024518 overflowed to 0x7fc2b4024510 In-Reply-To: References: Message-ID: On Wed, 5 Jun 2024 07:41:51 GMT, Stefan Karlsson wrote: > The inserted bail-out is placed just before the assert block. This makes the function have a different structure compared to the other functions that palace the invariant checks first. I prefer to keep code consistent. Moved the check after the asserts. A loop improvement can be done in a separate PR (maybe another loop is more or less efficient, might need more testing) . ------------- PR Comment: https://git.openjdk.org/jdk/pull/19541#issuecomment-2149133785 From rehn at openjdk.org Wed Jun 5 08:08:17 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Wed, 5 Jun 2024 08:08:17 GMT Subject: RFR: 8332689: RISC-V: Use load instead of trampolines [v5] In-Reply-To: References: Message-ID: > Hi all, please consider! > > Today we do JAL to **dest** if **dest** is in reach (+/- 1 MB). > Using a very small application or running very short time we have fast patchable calls. > But any normal application running longer will increase the code size and code chrun/fragmentation. > So whatever or not you get hot fast calls rely on luck. > > To be patchable and get code cache reach we also emit a stub trampoline which we can point the JAL to. > This would be the common case for a patchable call. > > Code stream: > JAL > Stubs: > AUIPC > LD > JALR > > > > On some CPUs L1D and L1I can't contain the same cache line, which means the tramopline stub can bounce from L1I->L1D->L1I, which is expensive. > Even if you don't have that problem having a call to a jump is not the fastest way. > Loading the address avoids the pitsfalls of cmodx. > > This patch suggest to solve the problems with trampolines, we take small penalty in the naive case of JAL to **dest**, > and instead do by default: > > Code stream: > AUIPC > LD > JALR > Stubs: > > > An experimental option for turning trampolines back on exists. > > It should be possible to enhanced this with the WIP [Zjid](https://github.com/riscv/riscv-j-extension) by changing the JALR to JAL and nop out the auipc+ld (as the current proposal of Zjid forces the I-fetcher to fetch instruction in order (meaning we will avoid a lot issues which arm has)) when in reach and vice-versa. > > Numbers from VF2 (I have done them a few times, they are always overall in favor of this patch): > > fop (msec) 2239 | 2128 = 0.950424 > h2 (msec) 18660 | 16594 = 0.889282 > jython (msec) 22022 | 21925 = 0.995595 > luindex (msec) 2866 | 2842 = 0.991626 > lusearch (msec) 4108 | 4311 = 1.04942 > lusearch-fix (msec) 4406 | 4116 = 0.934181 > pmd (msec) 5976 | 5897 = 0.98678 > jython (msec) 22022 | 21925 = 0.995595 > Avg: 0.974112 > fop(xcomp) (msec) 2721 | 2714 = 0.997427 > h2(xcomp) (msec) 37719 | 38004 = 1.00756 > jython(xcomp) ... Robbin Ehn has updated the pull request incrementally with one additional commit since the last revision: Review comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19453/files - new: https://git.openjdk.org/jdk/pull/19453/files/c4c02f2e..193a9343 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19453&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19453&range=03-04 Stats: 8 lines in 1 file changed: 4 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/19453.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19453/head:pull/19453 PR: https://git.openjdk.org/jdk/pull/19453 From fyang at openjdk.org Wed Jun 5 08:18:59 2024 From: fyang at openjdk.org (Fei Yang) Date: Wed, 5 Jun 2024 08:18:59 GMT Subject: RFR: 8332689: RISC-V: Use load instead of trampolines [v4] In-Reply-To: <-KsbrqGrBHviBPsjYCvlmpPGVXbZfMUswRU_o45s6Gs=.67c2fb5d-d701-4fa2-8a7c-811a410bb255@github.com> References: <0E8sVpwVMGzxi3n6c3YSXk68xKycQzsb9QqAEq8Tqw4=.8de6afc3-bbd1-4d49-9d0f-b30cd64fdf01@github.com> <-KsbrqGrBHviBPsjYCvlmpPGVXbZfMUswRU_o45s6Gs=.67c2fb5d-d701-4fa2-8a7c-811a410bb255@github.com> Message-ID: On Wed, 5 Jun 2024 07:16:22 GMT, Robbin Ehn wrote: > The issue is that the common code access the enum directly: > > ``` > src/hotspot/share/code/nmethod.inline.hpp: || (is_compiled_by_jvmci() && pc == (deopt_handler_begin() + NativeCall::instruction_size)) > src/hotspot/share/code/nmethod.inline.hpp: || (is_compiled_by_jvmci() && pc == (deopt_mh_handler_begin() + NativeCall::instruction_size)) > src/hotspot/share/opto/output.cpp: int pad_req = NativeCall::instruction_size; > ``` > > For c2 it just for an size estimate, so it's fine to give a larger value. Seems that hotspot shared code have some assumptions about the size of NativeCall. While it is OK for this c2 place, I am still a bit worried that we may have more uses of this const in hotspot shared code in future. > JVM CI is buggy (in general it's buggy all over), as we have no clue what size the unknown compiler using JVM CI generates here. We can't use a constant for a C1/C2 call, so this code is just wrong. Not sure what to do about, I have seen no errors for it. JVMCI is only *partially* supported on riscv for now and seems to be lack of love (See bug [1]). I am not sure how this change will affect them. [1] https://bugs.openjdk.org/browse/JDK-8290154 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19453#discussion_r1627227642 From aboldtch at openjdk.org Wed Jun 5 08:21:58 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Wed, 5 Jun 2024 08:21:58 GMT Subject: RFR: 8332139: SymbolTableHash::Node allocations allocates twice the required memory In-Reply-To: References: <8Q1-f5OGC6_vqM0W-k370VibVVLs7M8Dsyyele4FWT8=.53e09e58-0b6d-437c-85e4-ca89de97c123@github.com> Message-ID: On Tue, 4 Jun 2024 15:40:41 GMT, Coleen Phillimore wrote: >> src/hotspot/share/classfile/symbolTable.cpp line 185: >> >>> 183: private: >>> 184: static void* allocate_node_impl(size_t size, Value const& value) { >>> 185: size_t alloc_size = SymbolTableHash::get_dynamic_node_size(value.byte_size()); >> >> So 'size' passed in is actually sizeof(NODE), right or is it sizeof(NODE) + sizeof(VALUE) ? So also to fix it, don't we just remove value.byte_size() from this calculation? > > The real problem is that size passed in is known to the concurrent hash table but not to the caller, so maybe this does help make this less error prone. > So 'size' passed in is actually sizeof(NODE), right or is it sizeof(NODE) + sizeof(VALUE) ? It is `sizeof(NODE)`. Which for types that does not have exotic alignment would end up being `sizeof(void*) /* node metadata */ + sizeof(VALUE) /* payload/data */` > So also to fix it, don't we just remove value.byte_size() from this calculation? That is correct, but it does require implementation details of CHT to leak all the way into SymbolTableConfig. _There is an implementation of CHT that does not care about the storage requirements of the VALUE, nor needs to know how to create it. One that only works with `void*` and off loads not only the allocation, but also the construction and destruction to the Config. (Would also require more care with alignment)_ The attempt with this patch is to create an abstraction where the Config does not need to know about the implementation details of the CHT node. So now: * `static void* allocate_node(void* context, size_t size, Value const& value)` allows the CHT to query the Config for an allocation of at least `size` into which the value object will be copy constructed. * If the `Value` is dynamically sized (has extended storage beyond the end of the C++ object) the Config can query the CHT for an updated `size` given the `value`'s dynamic size requirements. * This also enforces that the Value's type and object layout is compatible with the CHT's Node implementation. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19214#discussion_r1627231890 From aboldtch at openjdk.org Wed Jun 5 08:21:59 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Wed, 5 Jun 2024 08:21:59 GMT Subject: RFR: 8332139: SymbolTableHash::Node allocations allocates twice the required memory In-Reply-To: References: <8Q1-f5OGC6_vqM0W-k370VibVVLs7M8Dsyyele4FWT8=.53e09e58-0b6d-437c-85e4-ca89de97c123@github.com> Message-ID: On Tue, 4 Jun 2024 15:05:46 GMT, Johan Sj?len wrote: >> The symbols are inline and allocated together with the ConcurrentHashTable (CHT) Nodes. The calculation used for the required size is `alloc_size = size + value.byte_size() + value.effective_length();` >> >> Where >> * `size == sizeof(SymbolTableHash::Node) == sizeof(void*) + sizeof(Symbol)` >> * `value.byte_size() == dynamic_sizeof(Symbol) == sizeof(Symbol) + ` >> * `value.effective_length() == dynamic_sizeof(Symbol) - sizeof(Symbol) == ` >> >> So `alloc_size` ends up being `sizeof(void*) /* node metadata */ + 2 * dynamic_sizeof(Symbol)` >> >> Because using the CHT with dynamically sized (and inlined) types requires knowing about its implementation details I chose to make the functionality for calculating the the allocation size a property of the CHT. It now queries the CHT for the node allocation size given the dynamic size required for the VALUE. >> >> The only current (implicit) restriction regarding using dynamically sized (and inlined) types in CHT is that the _value field C++ object ends where the Node object ends, so there is not padding bytes where the dynamic payload is allocated. (effectively `sizeof(VALUE) % alignof(Node) == 0` as long as there are no non-standard alignment fields in the Node metadata). I chose to test this as a runtime assert that the _value ends where the Node object ends, instead of a static assert with the alignment as it seemed to more explicitly show the intent of the check. >> >> Running testing tier1-7 > > src/hotspot/share/utilities/concurrentHashTable.inline.hpp line 1065: > >> 1063: assert(value_size >= sizeof(VALUE), "must include the VALUE"); >> 1064: return sizeof(Node) - sizeof(VALUE) + value_size; >> 1065: } > > Style, suggestion: > ```c++ > template > inline size_t ConcurrentHashTable::get_dynamic_node_size(size_t value_size) { > > > That's 88 characters, well within what we accept. This choice was not based on line length. But on the fact that this whole file follows the style: ```C++ template inline ReturnType ConcurrentHashTable:: function_name(...) { // Function Body } This is the only file in HotSpot which has this style, but it is also consistent throughout the file. I would rather not introduce inconsistent styles. Harmonizing CHTs C++ style with the rest of the HotSpot code base is a different RFE. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19214#discussion_r1627232077 From aboldtch at openjdk.org Wed Jun 5 08:22:00 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Wed, 5 Jun 2024 08:22:00 GMT Subject: RFR: 8332139: SymbolTableHash::Node allocations allocates twice the required memory In-Reply-To: References: <8Q1-f5OGC6_vqM0W-k370VibVVLs7M8Dsyyele4FWT8=.53e09e58-0b6d-437c-85e4-ca89de97c123@github.com> Message-ID: <0KXxuRakwz3UxORygW7DWPyuX5rLqJC7T0GFAbwyfAk=.6c9a578e-c71e-40fe-b520-594e8afbd508@github.com> On Tue, 4 Jun 2024 15:37:15 GMT, Coleen Phillimore wrote: >> The symbols are inline and allocated together with the ConcurrentHashTable (CHT) Nodes. The calculation used for the required size is `alloc_size = size + value.byte_size() + value.effective_length();` >> >> Where >> * `size == sizeof(SymbolTableHash::Node) == sizeof(void*) + sizeof(Symbol)` >> * `value.byte_size() == dynamic_sizeof(Symbol) == sizeof(Symbol) + ` >> * `value.effective_length() == dynamic_sizeof(Symbol) - sizeof(Symbol) == ` >> >> So `alloc_size` ends up being `sizeof(void*) /* node metadata */ + 2 * dynamic_sizeof(Symbol)` >> >> Because using the CHT with dynamically sized (and inlined) types requires knowing about its implementation details I chose to make the functionality for calculating the the allocation size a property of the CHT. It now queries the CHT for the node allocation size given the dynamic size required for the VALUE. >> >> The only current (implicit) restriction regarding using dynamically sized (and inlined) types in CHT is that the _value field C++ object ends where the Node object ends, so there is not padding bytes where the dynamic payload is allocated. (effectively `sizeof(VALUE) % alignof(Node) == 0` as long as there are no non-standard alignment fields in the Node metadata). I chose to test this as a runtime assert that the _value ends where the Node object ends, instead of a static assert with the alignment as it seemed to more explicitly show the intent of the check. >> >> Running testing tier1-7 > > src/hotspot/share/utilities/concurrentHashTable.inline.hpp line 1065: > >> 1063: assert(value_size >= sizeof(VALUE), "must include the VALUE"); >> 1064: return sizeof(Node) - sizeof(VALUE) + value_size; >> 1065: } > > I'm not sure if this makes it less error prone, if I'm right that the size passed in is sizeof(Node) + sizeof(VALUE). `size_t value_size` is supposed to be the dynamic size of VALUE. What is returned is the allocation size required to construct a CHT node which contains a VALUE with that dynamic size. For the only current use of dynamic sized VALUE types, i.e. the SymbolTable. We call this with `Symbol::byte_size()`. Which is the dynamic size of that symbol. Not sure I understand exactly what you mean by: > if I'm right that the size passed in is sizeof(Node) + sizeof(VALUE). The `sizeof(Node) - sizeof(VALUE)` part of the calculation is a CHT implementation detail. That is excluding the data / payload how big is the node, or more plainly the size of the node metadata. The we add on the size of the data / payload. To end up with the size required for the node metadata and the data / payload. Any users of CHT with dynamic types needs to know about `CHT::get_dynamic_node_size(size_t)` so in that sense it is still error prone. But the users do not have to know about the implementation of CHT to allocate a node. All they need to know is how much memory will the instance of a dynamically sized type require. (Some of the CHT implementations leaks through the restrictions it imposes, such that `sizeof(VALUE) % alignof(void*)` / `offset_of(Node, _value) + sizeof(_value) == sizeof(Node)`.) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19214#discussion_r1627232232 From mdoerr at openjdk.org Wed Jun 5 08:24:57 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 5 Jun 2024 08:24:57 GMT Subject: RFR: 8331733: [PPC64] saving and restoring CR is not needed at most places [v2] In-Reply-To: <4QAGcf4iBInupqh6dYvZ02y6LzpCMrOKXnca74Uny9A=.c85d59eb-4e2c-4f3d-be00-71b2f4cb3269@github.com> References: <4QAGcf4iBInupqh6dYvZ02y6LzpCMrOKXnca74Uny9A=.c85d59eb-4e2c-4f3d-be00-71b2f4cb3269@github.com> Message-ID: On Wed, 5 Jun 2024 07:00:26 GMT, Varada M wrote: >> PPC64 uses save/restore CR less often. Only LR is critical, CR is mainly needed for native-to-Java calls. >> Fastdebug: build and tier1 testing successful. [unrelated failures] > > Varada M has updated the pull request incrementally with one additional commit since the last revision: > > [PPC64] saving and restoring CR is not needed at most places Thanks for the updates! Sorry, my assumption is wrong. I have found a test which runs into the `ShouldNotReachHere()`: jdk/incubator/vector/VectorMaxConversionTests.java#ZGenerational I think we should revert all changes in `SaveLiveRegisters` (barrierSetAssembler_ppc.cpp) and `XSaveLiveRegisters` (xBarrierSetAssembler_ppc.cpp). We can keep them saving and restoring CR. Sorry for not finding this earlier. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19494#issuecomment-2149186342 From stuefe at openjdk.org Wed Jun 5 08:40:26 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 5 Jun 2024 08:40:26 GMT Subject: RFR: 8312132: Add tracking of multiple address spaces in NMT [v126] In-Reply-To: References: Message-ID: On Tue, 4 Jun 2024 19:26:28 GMT, Johan Sj?len wrote: >> Hi, >> >> This PR introduces a new abstraction to NMT, named `MemoryFileTracker`. Today, NMT does not track any memory outside of the virtual memory address space. This means that if you allocated memory in something such as a memory-backed file and use `mmap` to map into that memory, then you'll have trouble reporting this to NMT. This is the situation that ZGC is in, and that is what this patch attempts to fix. >> >> ## `MemoryFileTracker` >> >> The `MemoryFileTracker` adds the ability of adding new virtual memory address spaces to NMT and committing memory to these, the basic API is: >> >> ```c++ >> static MemoryFile* make_device(const char* descriptive_name); >> static void free_device(MemoryFile* device); >> >> static void allocate_memory(MemoryFile* device, size_t offset, size_t size, >> MEMFLAGS flag, const NativeCallStack& stack); >> static void free_memory(MemoryFile* device, size_t offset, size_t size); >> >> >> It is easiest to see how this is used by looking at what ZGC's `ZNMT` class does: >> >> ```c++ >> void ZNMT::reserve(zaddress_unsafe start, size_t size) { >> MemTracker::record_virtual_memory_reserve((address)start, size, CALLER_PC, mtJavaHeap); >> } >> void ZNMT::commit(zoffset offset, size_t size) { >> MemTracker::allocate_memory_in(ZNMT::_device, static_cast(offset), size, mtJavaHeap, CALLER_PC); >> } >> void ZNMT::uncommit(zoffset offset, size_t size) { >> MemTracker::free_memory_in(ZNMT::_device, (size_t)offset, size); >> } >> >> void ZNMT::map(zaddress_unsafe addr, size_t size, zoffset offset) { >> // NMT doesn't track mappings at the moment. >> } >> void ZNMT::unmap(zaddress_unsafe addr, size_t size) { >> // NMT doesn't track mappings at the moment. >> } >> >> >> As you can see, any mapping between reserved regions and device-allocated memory is not recorded in NMT. This means that in detailed mode you only get reserved regions printed for the reserved memory, the device-allocated memory is reported separately. When performing summary reporting any memory allocated via these devices is added to the corresponding `MEMFLAGS` as `committed` memory. >> >> This patch is also acting as a base on which we deploy multiple new backend ideas to NMT. These ideas are: >> >> 1. Implement VMA tracking using a balanced binary tree approach. Today's `VirtualMemoryTracker`'s usage of linked lists is slow and brittle, we'd like to move away from it. Our Treap-based approach in this patch gives a performance bo... > > Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: > > Add 'from' in report Wow. Thanks to everyone involved! ------------- PR Comment: https://git.openjdk.org/jdk/pull/18289#issuecomment-2149215888 From ayang at openjdk.org Wed Jun 5 09:42:56 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Wed, 5 Jun 2024 09:42:56 GMT Subject: RFR: 8331854: ubsan: copy.hpp:218:10: runtime error: addition of unsigned offset to 0x7fc2b4024518 overflowed to 0x7fc2b4024510 In-Reply-To: References: Message-ID: On Wed, 5 Jun 2024 07:57:18 GMT, Matthias Baesken wrote: > A loop improvement can be done in a separate PR Given there is no consensus on how the new loop should look, it makes sense to defer that. I also think not placing `return` on its own line complicates the code unnecessarily. Ofc, this is super subjective. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19541#issuecomment-2149347942 From wanghaomin at openjdk.org Wed Jun 5 10:11:02 2024 From: wanghaomin at openjdk.org (Wang Haomin) Date: Wed, 5 Jun 2024 10:11:02 GMT Subject: RFR: 8329421: Native methods can not be selectively printed [v2] In-Reply-To: References: Message-ID: On Tue, 2 Apr 2024 07:23:25 GMT, Volker Simonis wrote: >> Native methods (i.e. "native wrappers") can not be selectively printed with `-XX:CompileCommand=print,class::method`. Currently the only way to print native methods is to use the global `-XX:+PrintAssembly` option. But this prints *all* compiled methods which can be too much if we're just interested in a specific native wrapper. There's no reason to not apply `-XX:CompileCommand` options correctly to native methods as well. > > Volker Simonis has updated the pull request incrementally with one additional commit since the last revision: > > Add test for -XX:+PrintNativeNMethods Hi @simonis , I encountered a build error after this commit. When `make images` using congiure `--with-jvm-variants=core`, the error is as follows: # # A fatal error has been detected by the Java Runtime Environment: # # SIGSEGV (0xb) at pc=0x0000fffdea9ad1f4, pid=1696170, tid=1696241 # # JRE version: OpenJDK Runtime Environment (23.0) (fastdebug build 23-internal-adhoc.wanghaomin.jdk-ls) # Java VM: OpenJDK 64-Bit Core VM (fastdebug 23-internal-adhoc.wanghaomin.jdk-ls, interpreted mode, compressed oops, compressed class ptrs, serial gc, linux-aarch64) # Problematic frame: # V [libjvm.so+0x8ed1f4] BasicMatcher::match(methodHandle const&)+0x15c # # Core dump will be written. Default location: Core dumps may be processed with "/usr/lib/systemd/systemd-coredump %P %u %g %s %t %c %h" (or dumping to /home/wanghaomin/jdk-ls/make/core.1696170) # # An error report file with more information is saved as: # /home/wanghaomin/jdk-ls/make/hs_err_pid1696170.log # # If you would like to submit a bug report, please visit: # https://bugreport.java.com/bugreport/crash.jsp # /usr/bin/bash: line 1: 1696170 Aborted (core dumped) /home/wanghaomin/jdk-ls/build/linux-aarch64-core-fastdebug/jdk/bin/javac -g -Xlint:all -source 23 -target 23 -implicit:none -Xprefer:source -XDignore.symbol.file=true -encoding ascii -Werror --add-modules jdk.compiler,jdk.jdeps --add-exports jdk.compiler/com.sun.tools.javac.api=ALL-UNNAMED --add-exports jdk.compiler/com.sun.tools.javac.code=ALL-UNNAMED --add-exports jdk.compiler/com.sun.tools.javac.util=ALL-UNNAMED --add-exports jdk.compiler/com.sun.tools.javac.jvm=ALL-UNNAMED --add-exports jdk.jdeps/com.sun.tools.classfile=ALL-UNNAMED -Xlint:-options -XDmodifiedInputs=/home/wanghaomin/jdk-ls/build/linux-aarch64-core-fastdebug/buildtools/create_symbols_javac/_the.COMPILE_CREATE_SYMBOLS_batch.modfiles.fixed -d /home/wanghaomin/jdk-ls/build/linux-aarch64-core-fastdebug/buildtools/create_symbols_javac @/home/wanghaomin/jdk-ls/build/linux-aarch64-core-fastdebug/buildtools/create_symbols_javac/_the.COMPILE_CREATE_SYMBOLS _batch.filelist > >(/usr/bin/tee -a /home/wanghaomin/jdk-ls/build/linux-aarch64-core-fastdebug/buildtools/create_symbols_javac/_the.COMPILE_CREATE_SYMBOLS_batch.log) 2> >(/usr/bin/tee -a /home/wanghaomin/jdk-ls/build/linux-aarch64-core-fastdebug/buildtools/create_symbols_javac/_the.COMPILE_CREATE_SYMBOLS_batch.log >&2) gmake[3]: *** [Gendata.gmk:64: /home/wanghaomin/jdk-ls/build/linux-aarch64-core-fastdebug/buildtools/create_symbols_javac/_the.COMPILE_CREATE_SYMBOLS_batch] Error 134 gmake[2]: *** [make/Main.gmk:147: jdk.compiler-gendata] Error 2 gmake[2]: *** Waiting for unfinished jobs.... ERROR: Build failed for targets 'clean images' in configuration 'linux-aarch64-core-fastdebug' (exit code 2) Stopping javac server Could you solve this problem? ------------- PR Comment: https://git.openjdk.org/jdk/pull/18567#issuecomment-2149408243 From varadam at openjdk.org Wed Jun 5 10:24:23 2024 From: varadam at openjdk.org (Varada M) Date: Wed, 5 Jun 2024 10:24:23 GMT Subject: RFR: 8331733: [PPC64] saving and restoring CR is not needed at most places [v3] In-Reply-To: References: Message-ID: > PPC64 uses save/restore CR less often. Only LR is critical, CR is mainly needed for native-to-Java calls. > Fastdebug: build and tier1 testing successful. [unrelated failures] Varada M has updated the pull request incrementally with one additional commit since the last revision: [PPC64] saving and restoring CR is not needed at most places ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19494/files - new: https://git.openjdk.org/jdk/pull/19494/files/6dee9281..f01de445 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19494&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19494&range=01-02 Stats: 4 lines in 2 files changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/19494.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19494/head:pull/19494 PR: https://git.openjdk.org/jdk/pull/19494 From varadam at openjdk.org Wed Jun 5 10:24:23 2024 From: varadam at openjdk.org (Varada M) Date: Wed, 5 Jun 2024 10:24:23 GMT Subject: RFR: 8331733: [PPC64] saving and restoring CR is not needed at most places [v2] In-Reply-To: References: <4QAGcf4iBInupqh6dYvZ02y6LzpCMrOKXnca74Uny9A=.c85d59eb-4e2c-4f3d-be00-71b2f4cb3269@github.com> Message-ID: On Wed, 5 Jun 2024 08:22:24 GMT, Martin Doerr wrote: > Thanks for the updates! Sorry, my assumption is wrong. I have found a test which runs into the `ShouldNotReachHere()`: jdk/incubator/vector/VectorMaxConversionTests.java#ZGenerational I think we should revert all changes in `SaveLiveRegisters` (barrierSetAssembler_ppc.cpp) and `XSaveLiveRegisters` (xBarrierSetAssembler_ppc.cpp). We can keep them saving and restoring CR. Sorry for not finding this earlier. No problem @TheRealMDoerr . Thanks for fix. I have reverted the changes and now the test jdk/incubator/vector/VectorMaxConversionTests.java is passing. I will run tier1 one more time ------------- PR Comment: https://git.openjdk.org/jdk/pull/19494#issuecomment-2149435569 From mdoerr at openjdk.org Wed Jun 5 10:29:57 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 5 Jun 2024 10:29:57 GMT Subject: RFR: 8331733: [PPC64] saving and restoring CR is not needed at most places [v3] In-Reply-To: References: Message-ID: On Wed, 5 Jun 2024 10:24:23 GMT, Varada M wrote: >> PPC64 uses save/restore CR less often. Only LR is critical, CR is mainly needed for native-to-Java calls. >> Fastdebug: build and tier1 testing successful. [unrelated failures] > > Varada M has updated the pull request incrementally with one additional commit since the last revision: > > [PPC64] saving and restoring CR is not needed at most places Please also revert the two ShouldNotReachHere changes. The old comments are correct again, now. Note that VectorMaxConversionTests.java is in tier3. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19494#issuecomment-2149446914 From varadam at openjdk.org Wed Jun 5 10:35:26 2024 From: varadam at openjdk.org (Varada M) Date: Wed, 5 Jun 2024 10:35:26 GMT Subject: RFR: 8331733: [PPC64] saving and restoring CR is not needed at most places [v4] In-Reply-To: References: Message-ID: > PPC64 uses save/restore CR less often. Only LR is critical, CR is mainly needed for native-to-Java calls. > Fastdebug: build and tier1 testing successful. [unrelated failures] Varada M has updated the pull request incrementally with one additional commit since the last revision: [PPC64] saving and restoring CR is not needed at most places ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19494/files - new: https://git.openjdk.org/jdk/pull/19494/files/f01de445..3445e9ac Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19494&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19494&range=02-03 Stats: 2 lines in 2 files changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/19494.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19494/head:pull/19494 PR: https://git.openjdk.org/jdk/pull/19494 From mdoerr at openjdk.org Wed Jun 5 10:41:05 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 5 Jun 2024 10:41:05 GMT Subject: RFR: 8331733: [PPC64] saving and restoring CR is not needed at most places [v4] In-Reply-To: References: Message-ID: On Wed, 5 Jun 2024 10:35:26 GMT, Varada M wrote: >> PPC64 uses save/restore CR less often. Only LR is critical, CR is mainly needed for native-to-Java calls. >> Fastdebug: build and tier1 testing successful. [unrelated failures] > > Varada M has updated the pull request incrementally with one additional commit since the last revision: > > [PPC64] saving and restoring CR is not needed at most places Looks good, now. Thanks. I'll rerun the tests. ------------- PR Review: https://git.openjdk.org/jdk/pull/19494#pullrequestreview-2098741697 From rehn at openjdk.org Wed Jun 5 10:43:16 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Wed, 5 Jun 2024 10:43:16 GMT Subject: RFR: 8332689: RISC-V: Use load instead of trampolines [v4] In-Reply-To: References: <0E8sVpwVMGzxi3n6c3YSXk68xKycQzsb9QqAEq8Tqw4=.8de6afc3-bbd1-4d49-9d0f-b30cd64fdf01@github.com> <-KsbrqGrBHviBPsjYCvlmpPGVXbZfMUswRU_o45s6Gs=.67c2fb5d-d701-4fa2-8a7c-811a410bb255@github.com> Message-ID: <0EEglOwns8nryuDNewNwDW9SIA0lvNOnBbpx0twk8vM=.f147088a-c76c-4508-859f-12241dd38af9@github.com> On Wed, 5 Jun 2024 08:16:13 GMT, Fei Yang wrote: >> The issue is that the common code access the enum directly: >> >> src/hotspot/share/code/nmethod.inline.hpp: || (is_compiled_by_jvmci() && pc == (deopt_handler_begin() + NativeCall::instruction_size)) >> src/hotspot/share/code/nmethod.inline.hpp: || (is_compiled_by_jvmci() && pc == (deopt_mh_handler_begin() + NativeCall::instruction_size)) >> src/hotspot/share/opto/output.cpp: int pad_req = NativeCall::instruction_size; >> >> >> >> For c2 it just for an size estimate, so it's fine to give a larger value. >> >> JVM CI is buggy (in general it's buggy all over), as we have no clue what size the unknown compiler using JVM CI generates here. >> We can't use a constant for a C1/C2 call, so this code is just wrong. >> Not sure what to do about, I have seen no errors for it. > >> The issue is that the common code access the enum directly: >> >> ``` >> src/hotspot/share/code/nmethod.inline.hpp: || (is_compiled_by_jvmci() && pc == (deopt_handler_begin() + NativeCall::instruction_size)) >> src/hotspot/share/code/nmethod.inline.hpp: || (is_compiled_by_jvmci() && pc == (deopt_mh_handler_begin() + NativeCall::instruction_size)) >> src/hotspot/share/opto/output.cpp: int pad_req = NativeCall::instruction_size; >> ``` >> >> For c2 it just for an size estimate, so it's fine to give a larger value. > > Seems that hotspot shared code have some assumptions about the size of NativeCall. While it is OK for this c2 place, I am still a bit worried that we may have more uses of this const in hotspot shared code in future. > >> JVM CI is buggy (in general it's buggy all over), as we have no clue what size the unknown compiler using JVM CI generates here. We can't use a constant for a C1/C2 call, so this code is just wrong. Not sure what to do about, I have seen no errors for it. > > JVMCI is only *partially* supported on riscv for now and seems to be lack of love (See bug [1]). > I am not sure how this change will affect them. > > [1] https://bugs.openjdk.org/browse/JDK-8290154 Yea, we should probably add a `NativeCall::instruction_size()` but that limit us to have a fixed sized. It would really be good if we could have variable size. E.g. look up size via pc desc. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19453#discussion_r1627480938 From rehn at openjdk.org Wed Jun 5 10:49:56 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Wed, 5 Jun 2024 10:49:56 GMT Subject: RFR: 8332689: RISC-V: Use load instead of trampolines [v4] In-Reply-To: <0EEglOwns8nryuDNewNwDW9SIA0lvNOnBbpx0twk8vM=.f147088a-c76c-4508-859f-12241dd38af9@github.com> References: <0E8sVpwVMGzxi3n6c3YSXk68xKycQzsb9QqAEq8Tqw4=.8de6afc3-bbd1-4d49-9d0f-b30cd64fdf01@github.com> <-KsbrqGrBHviBPsjYCvlmpPGVXbZfMUswRU_o45s6Gs=.67c2fb5d-d701-4fa2-8a7c-811a410bb255@github.com> <0EEglOwns8nryuDNewNwDW9SIA0lvNOnBbpx0twk8vM=.f147088a-c76c-4508-859f-12241dd38af9@github.com> Message-ID: On Wed, 5 Jun 2024 10:39:59 GMT, Robbin Ehn wrote: >>> The issue is that the common code access the enum directly: >>> >>> ``` >>> src/hotspot/share/code/nmethod.inline.hpp: || (is_compiled_by_jvmci() && pc == (deopt_handler_begin() + NativeCall::instruction_size)) >>> src/hotspot/share/code/nmethod.inline.hpp: || (is_compiled_by_jvmci() && pc == (deopt_mh_handler_begin() + NativeCall::instruction_size)) >>> src/hotspot/share/opto/output.cpp: int pad_req = NativeCall::instruction_size; >>> ``` >>> >>> For c2 it just for an size estimate, so it's fine to give a larger value. >> >> Seems that hotspot shared code have some assumptions about the size of NativeCall. While it is OK for this c2 place, I am still a bit worried that we may have more uses of this const in hotspot shared code in future. >> >>> JVM CI is buggy (in general it's buggy all over), as we have no clue what size the unknown compiler using JVM CI generates here. We can't use a constant for a C1/C2 call, so this code is just wrong. Not sure what to do about, I have seen no errors for it. >> >> JVMCI is only *partially* supported on riscv for now and seems to be lack of love (See bug [1]). >> I am not sure how this change will affect them. >> >> [1] https://bugs.openjdk.org/browse/JDK-8290154 > > Yea, we should probably add a `NativeCall::instruction_size()` but that limit us to have a fixed sized. > It would really be good if we could have variable size. E.g. look up size via pc desc. And underlying problem should be fixed: `// When using JVMCI the address might be off by the size of a call instruction.` Poking around and guestimate is not the right thing to do. It's seem like this only effects frame::safe_for_sender, so it seem like graal is not setting up frames as expected, otherwise this sender_pc would be correct always. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19453#discussion_r1627492220 From thartmann at openjdk.org Wed Jun 5 11:21:09 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 5 Jun 2024 11:21:09 GMT Subject: RFR: 8329538: Accelerate P256 on x86_64 using Montgomery intrinsic [v12] In-Reply-To: <2HF_LGpK7B6i1UcgJ8g9JgzGF27gVAHZkGnVQk-Fo4w=.98339735-cd89-4059-a449-6a4911e9af41@github.com> References: <2HF_LGpK7B6i1UcgJ8g9JgzGF27gVAHZkGnVQk-Fo4w=.98339735-cd89-4059-a449-6a4911e9af41@github.com> Message-ID: On Wed, 22 May 2024 14:19:36 GMT, Volodymyr Paprotski wrote: >> Volodymyr Paprotski has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 17 additional commits since the last revision: >> >> - Merge remote-tracking branch 'origin/master' into ecc-montgomery >> - shenandoah verifier >> - comments from Sandhya >> - whitespace >> - add message back >> - whitespace >> - Use AffinePoint to exit Montgomery domain >> >> Style notes: >> Affine.equals() >> - Mismatched fields only appear to be used from testing, perhaps should be moved there instead >> Affine.getX(boolean)|getY(boolean) >> - "Passing flag is bad design" - cleanest/performant alternative to several instanceof checks >> - needed to convert Affine to Projective (need to stay in montgomery domain) >> ECOperations.PointMultiplier >> - changes could probably be restored to original (since ProjectivePoint handling no longer required) >> - consider these changes an improvement? (fewer nested classes) >> - was an inner-class but not using inner-class features (i.e. ecOps variable should be converted) >> - whitespace >> - Comments from Tony and Jatin >> - Comments from Jatin and Tony >> - ... and 7 more: https://git.openjdk.org/jdk/compare/1adfff34...b1a33004 > > Thanks Tobi! Unfortunately, this caused a performance regression, see [JDK-8333583](https://bugs.openjdk.org/browse/JDK-8333583). @vpaprotsk, please have a look. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18583#issuecomment-2149576062 From rehn at openjdk.org Wed Jun 5 12:38:58 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Wed, 5 Jun 2024 12:38:58 GMT Subject: RFR: 8332689: RISC-V: Use load instead of trampolines [v4] In-Reply-To: References: <0E8sVpwVMGzxi3n6c3YSXk68xKycQzsb9QqAEq8Tqw4=.8de6afc3-bbd1-4d49-9d0f-b30cd64fdf01@github.com> <-KsbrqGrBHviBPsjYCvlmpPGVXbZfMUswRU_o45s6Gs=.67c2fb5d-d701-4fa2-8a7c-811a410bb255@github.com> <0EEglOwns8nryuDNewNwDW9SIA0lvNOnBbpx0twk8vM=.f147088a-c76c-4508-859f-12241dd38af9@github.com> Message-ID: On Wed, 5 Jun 2024 10:46:01 GMT, Robbin Ehn wrote: >> Yea, we should probably add a `NativeCall::instruction_size()` but that limit us to have a fixed sized. >> It would really be good if we could have variable size. E.g. look up size via pc desc. > > And underlying problem should be fixed: > `// When using JVMCI the address might be off by the size of a call instruction.` > Poking around and guestimate is not the right thing to do. > > It's seem like this only effects frame::safe_for_sender, so it seem like graal is not setting up frames as expected, otherwise this sender_pc would be correct always. Sent out: https://github.com/openjdk/jdk/pull/19556 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19453#discussion_r1627679651 From mbaesken at openjdk.org Wed Jun 5 12:48:02 2024 From: mbaesken at openjdk.org (Matthias Baesken) Date: Wed, 5 Jun 2024 12:48:02 GMT Subject: Integrated: 8331854: ubsan: copy.hpp:218:10: runtime error: addition of unsigned offset to 0x7fc2b4024518 overflowed to 0x7fc2b4024510 In-Reply-To: References: Message-ID: On Tue, 4 Jun 2024 14:11:42 GMT, Matthias Baesken wrote: > When building with ubsan, we see a number of overflows at this code location : > > /jdk/src/hotspot/share/utilities/copy.hpp:218:10: runtime error: addition of unsigned offset to 0x7fc2b4024518 overflowed to 0x7fc2b4024510 > #0 0x10b70896d in Copy::conjoint_words_to_higher(HeapWordImpl* const*, HeapWordImpl**, unsigned long) copy.hpp:218 > #1 0x10c4f78f1 in Node_Array::insert(unsigned int, Node*) node.cpp:2783 > #2 0x10b8a1386 in Block::insert_node(Node*, unsigned int) block.hpp:134 > #3 0x10c556630 in PhaseOutput::fill_buffer(C2_MacroAssembler*, unsigned int*) output.cpp:1792 > #4 0x10c552f6b in PhaseOutput::Output() output.cpp:367 > #5 0x10b9ba859 in Compile::Code_Gen() compile.cpp:3035 > #6 0x10b9b7cb1 in Compile::Compile(ciEnv*, ciMethod*, int, Options, DirectiveSet*) compile.cpp:896 > #7 0x10b859912 in C2Compiler::compile_method(ciEnv*, ciMethod*, int, bool, DirectiveSet*) c2compiler.cpp:142 > #8 0x10b9dd4f1 in CompileBroker::invoke_compiler_on_method(CompileTask*) compileBroker.cpp:2305 > #9 0x10b9dc345 in CompileBroker::compiler_thread_loop() compileBroker.cpp:1963 > #10 0x10bfd5ebf in JavaThread::thread_main_inner() javaThread.cpp:760 > #11 0x10bfd5b62 in JavaThread::run() javaThread.cpp:745 > #12 0x10c9310d6 in Thread::call_run() thread.cpp:221 > #13 0x10c53ece4 in thread_native_entry(Thread*) os_bsd.cpp:598 This pull request has now been integrated. Changeset: 2c1b311f Author: Matthias Baesken URL: https://git.openjdk.org/jdk/commit/2c1b311f81319cee1af574526a91424c2577b78c Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod 8331854: ubsan: copy.hpp:218:10: runtime error: addition of unsigned offset to 0x7fc2b4024518 overflowed to 0x7fc2b4024510 Reviewed-by: kvn, clanger ------------- PR: https://git.openjdk.org/jdk/pull/19541 From rehn at openjdk.org Wed Jun 5 13:09:37 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Wed, 5 Jun 2024 13:09:37 GMT Subject: RFR: 8332689: RISC-V: Use load instead of trampolines [v6] In-Reply-To: References: Message-ID: > Hi all, please consider! > > Today we do JAL to **dest** if **dest** is in reach (+/- 1 MB). > Using a very small application or running very short time we have fast patchable calls. > But any normal application running longer will increase the code size and code chrun/fragmentation. > So whatever or not you get hot fast calls rely on luck. > > To be patchable and get code cache reach we also emit a stub trampoline which we can point the JAL to. > This would be the common case for a patchable call. > > Code stream: > JAL > Stubs: > AUIPC > LD > JALR > > > > On some CPUs L1D and L1I can't contain the same cache line, which means the tramopline stub can bounce from L1I->L1D->L1I, which is expensive. > Even if you don't have that problem having a call to a jump is not the fastest way. > Loading the address avoids the pitsfalls of cmodx. > > This patch suggest to solve the problems with trampolines, we take small penalty in the naive case of JAL to **dest**, > and instead do by default: > > Code stream: > AUIPC > LD > JALR > Stubs: > > > An experimental option for turning trampolines back on exists. > > It should be possible to enhanced this with the WIP [Zjid](https://github.com/riscv/riscv-j-extension) by changing the JALR to JAL and nop out the auipc+ld (as the current proposal of Zjid forces the I-fetcher to fetch instruction in order (meaning we will avoid a lot issues which arm has)) when in reach and vice-versa. > > Numbers from VF2 (I have done them a few times, they are always overall in favor of this patch): > > fop (msec) 2239 | 2128 = 0.950424 > h2 (msec) 18660 | 16594 = 0.889282 > jython (msec) 22022 | 21925 = 0.995595 > luindex (msec) 2866 | 2842 = 0.991626 > lusearch (msec) 4108 | 4311 = 1.04942 > lusearch-fix (msec) 4406 | 4116 = 0.934181 > pmd (msec) 5976 | 5897 = 0.98678 > jython (msec) 22022 | 21925 = 0.995595 > Avg: 0.974112 > fop(xcomp) (msec) 2721 | 2714 = 0.997427 > h2(xcomp) (msec) 37719 | 38004 = 1.00756 > jython(xcomp) ... Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 11 commits: - Prepare for dynamic NativeCall size - Only allow one calling convetion, i.e. fixed sized - Merge branch 'master' into 8332689 - Review comments - Move shart/far code to cpp - Cleanup - Merge branch 'master' into 8332689 - Merge branch 'master' into 8332689 - Remove accidental files - Remove accidental files - ... and 1 more: https://git.openjdk.org/jdk/compare/3cbdf8d4...d9e55450 ------------- Changes: https://git.openjdk.org/jdk/pull/19453/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19453&range=05 Stats: 907 lines in 17 files changed: 652 ins; 161 del; 94 mod Patch: https://git.openjdk.org/jdk/pull/19453.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19453/head:pull/19453 PR: https://git.openjdk.org/jdk/pull/19453 From rehn at openjdk.org Wed Jun 5 13:09:37 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Wed, 5 Jun 2024 13:09:37 GMT Subject: RFR: 8332689: RISC-V: Use load instead of trampolines In-Reply-To: <2vYlP7K0d8TQe5OKzTZA33rSAWDjIQSsoO8obofaMos=.9da50618-ed09-4e1b-954f-6199b3e34745@github.com> References: <2vYlP7K0d8TQe5OKzTZA33rSAWDjIQSsoO8obofaMos=.9da50618-ed09-4e1b-954f-6199b3e34745@github.com> Message-ID: <3yDCCvPRIzfaquDQPH-_lYvvzaGIfmdll5EojG5GUrQ=.36e7c72e-ae8f-4e17-9ce5-67568869d1aa@github.com> On Fri, 31 May 2024 06:02:56 GMT, Fei Yang wrote: >> Hi all, please consider! >> >> Today we do JAL to **dest** if **dest** is in reach (+/- 1 MB). >> Using a very small application or running very short time we have fast patchable calls. >> But any normal application running longer will increase the code size and code chrun/fragmentation. >> So whatever or not you get hot fast calls rely on luck. >> >> To be patchable and get code cache reach we also emit a stub trampoline which we can point the JAL to. >> This would be the common case for a patchable call. >> >> Code stream: >> JAL >> Stubs: >> AUIPC >> LD >> JALR >> >> >> >> On some CPUs L1D and L1I can't contain the same cache line, which means the tramopline stub can bounce from L1I->L1D->L1I, which is expensive. >> Even if you don't have that problem having a call to a jump is not the fastest way. >> Loading the address avoids the pitsfalls of cmodx. >> >> This patch suggest to solve the problems with trampolines, we take small penalty in the naive case of JAL to **dest**, >> and instead do by default: >> >> Code stream: >> AUIPC >> LD >> JALR >> Stubs: >> >> >> An experimental option for turning trampolines back on exists. >> >> It should be possible to enhanced this with the WIP [Zjid](https://github.com/riscv/riscv-j-extension) by changing the JALR to JAL and nop out the auipc+ld (as the current proposal of Zjid forces the I-fetcher to fetch instruction in order (meaning we will avoid a lot issues which arm has)) when in reach and vice-versa. >> >> Numbers from VF2 (I have done them a few times, they are always overall in favor of this patch): >> >> fop (msec) 2239 | 2128 = 0.950424 >> h2 (msec) 18660 | 16594 = 0.889282 >> jython (msec) 22022 | 21925 = 0.995595 >> luindex (msec) 2866 | 2842 = 0.991626 >> lusearch (msec) 4108 | 4311 = 1.04942 >> lusearch-fix (msec) 4406 | 4116 = 0.934181 >> pmd (msec) 5976 | 5897 = 0.98678 >> jython (msec) 22022 | 21925 = 0.995595 >> Avg: 0.974112 >> fop(xcomp) (msec) 2721 | 2714 = 0.997427 >> h2(xcomp) ... > > Interesting. I will take a look and play with it on my machines. I sent a patch addressing @RealFYang concern. (https://github.com/openjdk/jdk/pull/19556) And I update the code in preparation for that, so above PR goes in first and when I merge with it the concern should be addressed. (We are assuming a JVM CI will respect the calling convetion flag UseTrampolines) ------------- PR Comment: https://git.openjdk.org/jdk/pull/19453#issuecomment-2149834918 From rehn at openjdk.org Wed Jun 5 13:14:35 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Wed, 5 Jun 2024 13:14:35 GMT Subject: RFR: 8332689: RISC-V: Use load instead of trampolines [v7] In-Reply-To: References: Message-ID: <3UcoOmZTsZJBu0ZkOZQ21-ynr1slrOGCDuWYFnJQz1U=.6467ec54-cd28-4aae-a90c-6b6858f37986@github.com> > Hi all, please consider! > > Today we do JAL to **dest** if **dest** is in reach (+/- 1 MB). > Using a very small application or running very short time we have fast patchable calls. > But any normal application running longer will increase the code size and code chrun/fragmentation. > So whatever or not you get hot fast calls rely on luck. > > To be patchable and get code cache reach we also emit a stub trampoline which we can point the JAL to. > This would be the common case for a patchable call. > > Code stream: > JAL > Stubs: > AUIPC > LD > JALR > > > > On some CPUs L1D and L1I can't contain the same cache line, which means the tramopline stub can bounce from L1I->L1D->L1I, which is expensive. > Even if you don't have that problem having a call to a jump is not the fastest way. > Loading the address avoids the pitsfalls of cmodx. > > This patch suggest to solve the problems with trampolines, we take small penalty in the naive case of JAL to **dest**, > and instead do by default: > > Code stream: > AUIPC > LD > JALR > Stubs: > > > An experimental option for turning trampolines back on exists. > > It should be possible to enhanced this with the WIP [Zjid](https://github.com/riscv/riscv-j-extension) by changing the JALR to JAL and nop out the auipc+ld (as the current proposal of Zjid forces the I-fetcher to fetch instruction in order (meaning we will avoid a lot issues which arm has)) when in reach and vice-versa. > > Numbers from VF2 (I have done them a few times, they are always overall in favor of this patch): > > fop (msec) 2239 | 2128 = 0.950424 > h2 (msec) 18660 | 16594 = 0.889282 > jython (msec) 22022 | 21925 = 0.995595 > luindex (msec) 2866 | 2842 = 0.991626 > lusearch (msec) 4108 | 4311 = 1.04942 > lusearch-fix (msec) 4406 | 4116 = 0.934181 > pmd (msec) 5976 | 5897 = 0.98678 > jython (msec) 22022 | 21925 = 0.995595 > Avg: 0.974112 > fop(xcomp) (msec) 2721 | 2714 = 0.997427 > h2(xcomp) (msec) 37719 | 38004 = 1.00756 > jython(xcomp) ... Robbin Ehn has updated the pull request incrementally with one additional commit since the last revision: Remove tmp file ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19453/files - new: https://git.openjdk.org/jdk/pull/19453/files/d9e55450..752796b8 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19453&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19453&range=05-06 Stats: 0 lines in 1 file changed: 0 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/19453.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19453/head:pull/19453 PR: https://git.openjdk.org/jdk/pull/19453 From stuefe at openjdk.org Wed Jun 5 13:41:20 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 5 Jun 2024 13:41:20 GMT Subject: RFR: 8322475: Extend printing for System.map [v2] In-Reply-To: References: Message-ID: On Thu, 22 Feb 2024 13:36:16 GMT, Thomas Stuefe wrote: >> This is an expansion on the new `System.map` command introduced with JDK-8318636. >> >> We now print valuable information per memory region, such as: >> >> - the actual resident set size >> - the actual number of huge pages >> - the actual used page size >> - the THP state of the region (was advised, is eligible, uses THP, ...) >> - whether the region is shared >> - whether the region had been committed (backed by swap) >> - whether the region has been swapped out. >> >> Example output: >> >> >> from to size rss hugetlb pgsz prot notes vm info/file >> 0x00000000c0000000 - 0x00000000ffe00000 1071644672 0 4194304 2M rw-p huge JAVAHEAP /anon_hugepage >> 0x00000000ffe00000 - 0x0000000100000000 2097152 0 0 2M rw-p huge JAVAHEAP /anon_hugepage >> 0x0000558016b67000 - 0x0000558016b68000 4096 4096 0 4K r--p /shared/projects/openjdk/jdk-jdk/output-fastdebug/images/jdk/bin/java >> 0x0000558016b68000 - 0x0000558016b69000 4096 4096 0 4K r-xp /shared/projects/openjdk/jdk-jdk/output-fastdebug/images/jdk/bin/java >> 0x00007f3a749f2000 - 0x00007f3a74c62000 2555904 2555904 0 4K rwxp CODE(CodeHeap 'profiled nmethods') >> 0x00007f3a74c62000 - 0x00007f3a7be51000 119468032 0 0 4K ---p nores CODE(CodeHeap 'profiled nmethods') >> 0x00007f3a7be51000 - 0x00007f3a7c1c1000 3604480 3604480 0 4K rwxp CODE(CodeHeap 'profiled nmethods') >> 0x00007f3a7c1c1000 - 0x00007f3a7c592000 4001792 0 0 4K ---p nores CODE(CodeHeap 'non-nmethods') >> 0x00007f3a7c592000 - 0x00007f3a7c802000 2555904 2555904 0 4K rwxp CODE(CodeHeap 'non-profiled nmethods') ... > > Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. Exhuming this after a pause. Reworked output, simplified code. Also clearly stated the potential costs of this command. Example output: [system-map-thp1.txt](https://github.com/user-attachments/files/15587748/system-map-thp1.txt) ------------- PR Comment: https://git.openjdk.org/jdk/pull/17158#issuecomment-2149956137 From amitkumar at openjdk.org Wed Jun 5 13:41:03 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Wed, 5 Jun 2024 13:41:03 GMT Subject: RFR: 8331733: [PPC64] saving and restoring CR is not needed at most places [v4] In-Reply-To: References: Message-ID: On Wed, 5 Jun 2024 10:35:26 GMT, Varada M wrote: >> PPC64 uses save/restore CR less often. Only LR is critical, CR is mainly needed for native-to-Java calls. >> Fastdebug: build and tier1 testing successful. [unrelated failures] > > Varada M has updated the pull request incrementally with one additional commit since the last revision: > > [PPC64] saving and restoring CR is not needed at most places Please update the copyright headers as well. ------------- Changes requested by amitkumar (Committer). PR Review: https://git.openjdk.org/jdk/pull/19494#pullrequestreview-2099238884 From stuefe at openjdk.org Wed Jun 5 13:41:16 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 5 Jun 2024 13:41:16 GMT Subject: RFR: 8322475: Extend printing for System.map [v3] In-Reply-To: References: Message-ID: > This is an expansion on the new `System.map` command introduced with JDK-8318636. > > We now print valuable information per memory region, such as: > > - the actual resident set size > - the actual number of huge pages > - the actual used page size > - the THP state of the region (was advised, is eligible, uses THP, ...) > - whether the region is shared > - whether the region had been committed (backed by swap) > - whether the region has been swapped out. > > Example output: > > > from to size rss hugetlb pgsz prot notes vm info/file > 0x00000000c0000000 - 0x00000000ffe00000 1071644672 0 4194304 2M rw-p huge JAVAHEAP /anon_hugepage > 0x00000000ffe00000 - 0x0000000100000000 2097152 0 0 2M rw-p huge JAVAHEAP /anon_hugepage > 0x0000558016b67000 - 0x0000558016b68000 4096 4096 0 4K r--p /shared/projects/openjdk/jdk-jdk/output-fastdebug/images/jdk/bin/java > 0x0000558016b68000 - 0x0000558016b69000 4096 4096 0 4K r-xp /shared/projects/openjdk/jdk-jdk/output-fastdebug/images/jdk/bin/java > 0x00007f3a749f2000 - 0x00007f3a74c62000 2555904 2555904 0 4K rwxp CODE(CodeHeap 'profiled nmethods') > 0x00007f3a74c62000 - 0x00007f3a7be51000 119468032 0 0 4K ---p nores CODE(CodeHeap 'profiled nmethods') > 0x00007f3a7be51000 - 0x00007f3a7c1c1000 3604480 3604480 0 4K rwxp CODE(CodeHeap 'profiled nmethods') > 0x00007f3a7c1c1000 - 0x00007f3a7c592000 4001792 0 0 4K ---p nores CODE(CodeHeap 'non-nmethods') > 0x00007f3a7c592000 - 0x00007f3a7c802000 2555904 2555904 0 4K rwxp CODE(CodeHeap 'non-profiled nmethods') > 0x00007f3a7c802000 - 0x00007f3a839f200... Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 16 commits: - fix whitespace issue - wip - exhuming - Merge branch 'master' into System.maps-more-info - Merge - remove codecache name printing - stefank feedback - remove page size histo - wip - get rid of the limit for ostream - ... and 6 more: https://git.openjdk.org/jdk/compare/b101dcb6...3b14b086 ------------- Changes: https://git.openjdk.org/jdk/pull/17158/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=17158&range=02 Stats: 646 lines in 14 files changed: 464 ins; 98 del; 84 mod Patch: https://git.openjdk.org/jdk/pull/17158.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17158/head:pull/17158 PR: https://git.openjdk.org/jdk/pull/17158 From simonis at openjdk.org Wed Jun 5 14:04:04 2024 From: simonis at openjdk.org (Volker Simonis) Date: Wed, 5 Jun 2024 14:04:04 GMT Subject: RFR: 8329421: Native methods can not be selectively printed [v2] In-Reply-To: References: Message-ID: On Wed, 5 Jun 2024 10:08:03 GMT, Wang Haomin wrote: >> Volker Simonis has updated the pull request incrementally with one additional commit since the last revision: >> >> Add test for -XX:+PrintNativeNMethods > > Hi @simonis , I encountered a build error after this commit. When `make images` using congiure `--with-jvm-variants=core`, the error is as follows: > > > # > # A fatal error has been detected by the Java Runtime Environment: > # > # SIGSEGV (0xb) at pc=0x0000fffdea9ad1f4, pid=1696170, tid=1696241 > # > # JRE version: OpenJDK Runtime Environment (23.0) (fastdebug build 23-internal-adhoc.wanghaomin.jdk-ls) > # Java VM: OpenJDK 64-Bit Core VM (fastdebug 23-internal-adhoc.wanghaomin.jdk-ls, interpreted mode, compressed oops, compressed class ptrs, serial gc, linux-aarch64) > # Problematic frame: > # V [libjvm.so+0x8ed1f4] BasicMatcher::match(methodHandle const&)+0x15c > # > # Core dump will be written. Default location: Core dumps may be processed with "/usr/lib/systemd/systemd-coredump %P %u %g %s %t %c %h" (or dumping to /home/wanghaomin/jdk-ls/make/core.1696170) > # > # An error report file with more information is saved as: > # /home/wanghaomin/jdk-ls/make/hs_err_pid1696170.log > # > # If you would like to submit a bug report, please visit: > # https://bugreport.java.com/bugreport/crash.jsp > # > /usr/bin/bash: line 1: 1696170 Aborted (core dumped) /home/wanghaomin/jdk-ls/build/linux-aarch64-core-fastdebug/jdk/bin/javac -g -Xlint:all -source 23 -target 23 -implicit:none -Xprefer:source -XDignore.symbol.file=true -encoding ascii -Werror --add-modules jdk.compiler,jdk.jdeps --add-exports jdk.compiler/com.sun.tools.javac.api=ALL-UNNAMED --add-exports jdk.compiler/com.sun.tools.javac.code=ALL-UNNAMED --add-exports jdk.compiler/com.sun.tools.javac.util=ALL-UNNAMED --add-exports jdk.compiler/com.sun.tools.javac.jvm=ALL-UNNAMED --add-exports jdk.jdeps/com.sun.tools.classfile=ALL-UNNAMED -Xlint:-options -XDmodifiedInputs=/home/wanghaomin/jdk-ls/build/linux-aarch64-core-fastdebug/buildtools/create_symbols_javac/_the.COMPILE_CREATE_SYMBOLS_batch.modfiles.fixed -d /home/wanghaomin/jdk-ls/build/linux-aarch64-core-fastdebug/buildtools/create_symbols_javac @/home/wanghaomin/jdk-ls/build/linux-aarch64-core-fastdebug/buildtools/create_symbols_javac/_the.COMPILE_CREATE_SYMBO LS_batch.filelist > >(/usr/bin/tee -a /home/wanghaomin/jdk-ls/build/linux-aarch64-core-fastdebug/buildtools/create_symbols_javac/_the.COMPILE_CREATE_SYMBOLS_batch.log) 2> >(/usr/bin/tee -a /home/wanghaomin/jdk-ls/build/linux-aarch64-core-fastdebug/buildtools/create_symbols_javac/_the.COMPILE_CREATE_SYMBOLS_batch.log >&2) > gmake[3]: *** [Gendata.gmk:64: /home/wanghaomin/jdk-ls/build/linux-aarch64-core-fastdebug/buildtools/c... Hi @haominw, Unfortunately I can't reproduce this crash. I've tried with the latest HEAD revision from today and with the commit introduced by this PR. In both cases, when I configure with `--with-jvm-variants=core` and kick off the build, I run into the following compilation error: /priv/simonisv/OpenJDK/Git/jdk/src/hotspot/cpu/x86/gc/z/zBarrierSetAssembler_x86.cpp: In function ?bool is_c2_compilation()?: /priv/simonisv/OpenJDK/Git/jdk/src/hotspot/cpu/x86/gc/z/zBarrierSetAssembler_x86.cpp:366:30: error: incomplete type ?ciEnv? used in nested name specifier 366 | CompileTask* task = ciEnv::current()->task(); | ^~~~~~~ Can you please detail the exact steps to reproduce the crash? ------------- PR Comment: https://git.openjdk.org/jdk/pull/18567#issuecomment-2150065791 From szaldana at openjdk.org Wed Jun 5 14:17:21 2024 From: szaldana at openjdk.org (Sonia Zaldana Calles) Date: Wed, 5 Jun 2024 14:17:21 GMT Subject: RFR: 8330420: Inverted use of DisplayVMOutputToStderr in ostream_exit Message-ID: Hi all, This PR addresses the inverted clauses in the in ostream_exit. Thanks, Sonia ------------- Commit messages: - Merge branch 'openjdk:master' into JDK-8330420 - 8330420: Inverted use of DisplayVMOutputToStderr in ostream_exit Changes: https://git.openjdk.org/jdk/pull/18897/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=18897&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8330420 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/18897.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18897/head:pull/18897 PR: https://git.openjdk.org/jdk/pull/18897 From pchilanomate at openjdk.org Wed Jun 5 14:27:59 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Wed, 5 Jun 2024 14:27:59 GMT Subject: RFR: 8311177: Switching to interpreter only mode in carrier thread can lead to crashes [v4] In-Reply-To: References: Message-ID: On Mon, 3 Jun 2024 22:55:24 GMT, Serguei Spitsyn wrote: >> Please, review the following `interp-only` issue related to carrier threads. >> There are 3 problems fixed here: >> - The `EnterInterpOnlyModeClosure::do_threads` is taking the `JvmtiThreadState` with the `jt->jvmti_thread_state()` which is incorrect when we have a deal with a carrier thread. The target state is known at the point when the `HandshakeClosure` is set, so the fix is to pass it as a constructor parameter. >> - The `state->is_pending_interp_only_mode())` was processed at mounts only but it has to be processed for unmounts as well. >> - The test `test/hotspot/jtreg/serviceability/jvmti/vthread/MethodExitTest/libMethodExitTest.cpp` has a wrong assumption that there can't be `MethodExit` event on the carrier thread when the function `breakpoint_hit1` is being executed. However, it can happen if the virtual thread gets unmounted. >> >> The fix also includes new test case `vthread/CarrierThreadEventNotification` developed by Patricio. >> >> Testing: >> - Ran new test case locally >> - Ran mach5 tiers 1-6 > > Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: > > review: get rid of unneeded casts in new test Thanks Serguei, looks good to me. ------------- Marked as reviewed by pchilanomate (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19438#pullrequestreview-2099384221 From jsjolen at openjdk.org Wed Jun 5 14:33:59 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Wed, 5 Jun 2024 14:33:59 GMT Subject: RFR: 8330420: Inverted use of DisplayVMOutputToStderr in ostream_exit In-Reply-To: References: Message-ID: <6nNJkLIeAzPNvaXyROcDqi4p1EOSdXGQvrrGFPK4PXc=.6d0894e1-1fb7-4438-bdf6-6f29c8bec485@github.com> On Mon, 22 Apr 2024 18:31:31 GMT, Sonia Zaldana Calles wrote: > Hi all, > > This PR addresses the inverted clauses in the in ostream_exit. > > Thanks, > Sonia Oops :-). I'm not very experienced with this code, I hope that someone can declare this to be trivial so that you can merge it without the 24-hour window. @coleenp, what do you think? Trivial? Thanks. ------------- Marked as reviewed by jsjolen (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18897#pullrequestreview-2099401938 From stuefe at openjdk.org Wed Jun 5 14:39:58 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 5 Jun 2024 14:39:58 GMT Subject: RFR: 8330420: Inverted use of DisplayVMOutputToStderr in ostream_exit In-Reply-To: References: Message-ID: On Mon, 22 Apr 2024 18:31:31 GMT, Sonia Zaldana Calles wrote: > Hi all, > > This PR addresses the inverted clauses in the in ostream_exit. > > Thanks, > Sonia Hah. Good catch. I wondered whether we should be concerned about backward compatibilities here, since this seems to be a very old bug and folks may be used to how this works. But this only affects a small time window at the end of VM exit. It's probably okay. ------------- Marked as reviewed by stuefe (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18897#pullrequestreview-2099421234 From stuefe at openjdk.org Wed Jun 5 14:42:59 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 5 Jun 2024 14:42:59 GMT Subject: RFR: 8330420: Inverted use of DisplayVMOutputToStderr in ostream_exit In-Reply-To: <6nNJkLIeAzPNvaXyROcDqi4p1EOSdXGQvrrGFPK4PXc=.6d0894e1-1fb7-4438-bdf6-6f29c8bec485@github.com> References: <6nNJkLIeAzPNvaXyROcDqi4p1EOSdXGQvrrGFPK4PXc=.6d0894e1-1fb7-4438-bdf6-6f29c8bec485@github.com> Message-ID: On Wed, 5 Jun 2024 14:30:55 GMT, Johan Sj?len wrote: > Oops :-). I'm not very experienced with this code, I hope that someone can declare this to be trivial so that you can merge it without the 24-hour window. > > @coleenp, what do you think? Trivial? > > Thanks. I would not push it for 23. Lets give it some cooking time in 24. Even though remote, I worry about backward compatibility issues with programs that expect the JVM output to be the way its now. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18897#issuecomment-2150224763 From szaldana at openjdk.org Wed Jun 5 16:02:01 2024 From: szaldana at openjdk.org (Sonia Zaldana Calles) Date: Wed, 5 Jun 2024 16:02:01 GMT Subject: Integrated: 8332785: Replace naked uses of UseSharedSpaces with CDSConfig::is_using_archive In-Reply-To: References: Message-ID: On Wed, 29 May 2024 18:12:25 GMT, Sonia Zaldana Calles wrote: > Hi folks, > > This PR addresses [8332785](https://bugs.openjdk.org/browse/JDK-8332785) replacing all naked uses for ```UseSharedSpaces``` with ```CDSConfig::is_using_archive```. > > Testing: > - [x] Tier 1 with GHA. > > Thanks, > Sonia This pull request has now been integrated. Changeset: 438121be Author: Sonia Zaldana Calles Committer: Thomas Stuefe URL: https://git.openjdk.org/jdk/commit/438121be6bdb085fa13ad14ec53b09ecdbd4757d Stats: 113 lines in 35 files changed: 8 ins; 0 del; 105 mod 8332785: Replace naked uses of UseSharedSpaces with CDSConfig::is_using_archive Reviewed-by: dholmes, stuefe, sspitsyn ------------- PR: https://git.openjdk.org/jdk/pull/19463 From jsjolen at openjdk.org Wed Jun 5 16:20:14 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Wed, 5 Jun 2024 16:20:14 GMT Subject: RFR: 8333658: NMT: Use an allocator with 4-byte pointers to save memory in NativeCallStackStorage Message-ID: Hi, This PR introduces a new allocator, `IndexedFreelistAllocator`. It uses a `GrowableArray` in order to have 4-byte "pointers" to its elements and also works as a freelist as unused slots form a linked list. This allocator cannot shrink its used memory. I'm always open for better names. We then use this allocator in order to store the `NativeCallStackStorage`. This saves `4 * 4099 = ~16KiB` of memory as each table element is now only 4 bytes. For each stored stack it also saves 4 bytes. The fact that the allocator cannot shrink is fine, we do not ever remove stacks from the storage. The main point of this is to introduce a pointer-generic experimential API, so I also implemented CHeap and Arena allocators. It's currently placed in NMT, but we might want to move it into utilities. It uses a bit of template-magic, but my IDE (Emacs+clangd) was actually able to figure things out when the types didn't line up correctly etc, so it's not an enemy to IDE help. It sounded expensive to have the GrowableArray continously realloc its underlying data array, so I did a basic test where we allocate 1 000 000 stacks and push them into NativeCallStackStorage backed by different allocators. This is available in the PR. The results are as follows on linux-x64-slowdebug: Generate stacks... Done Time taken with GrowableArray: 8341.240945 Time taken with CHeap: 12189.031318 Time taken with Arena: 8800.703092 Time taken with GrowableArray again: 8295.508829 And on linux-x64: Time taken with GrowableArray: 8378.018135 Time taken with CHeap: 12437.347868 Time taken with Arena: 8758.064717 Time taken with GrowableArray again: 8391.076291 Obviously, this is a very basic benchmark, but it seems like we're faster than CHeap and Arena for this case. ------------- Commit messages: - precompiled headers are great - Perf Arena also - Perf test - Alter docs - Move the files - Accommodate GrowableArray - Spacing - Use the IndexedFreelistAllocator in NCSS - The indexed freelist allocator Changes: https://git.openjdk.org/jdk/pull/18979/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=18979&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8333658 Stats: 410 lines in 4 files changed: 390 ins; 3 del; 17 mod Patch: https://git.openjdk.org/jdk/pull/18979.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18979/head:pull/18979 PR: https://git.openjdk.org/jdk/pull/18979 From matsaave at openjdk.org Wed Jun 5 16:22:00 2024 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Wed, 5 Jun 2024 16:22:00 GMT Subject: RFR: 8256828: ostream::print_cr() truncates buffer in copy-through case [v4] In-Reply-To: References: <7sjAHNVe08bQX2n2Kp38Ppxqf9zem-oYahE3dO_la84=.20f4fbd3-2c20-415a-8e75-275d69cf4f7b@github.com> Message-ID: On Tue, 4 Jun 2024 07:34:34 GMT, David Holmes wrote: >> Clarifies the behaviour of this function in regards to truncation when adding a CR. Ensures a truncation warning is always issued. >> >> Adds unit testing for the specialized cases. >> >> See JBS for discussion of other suggestions. >> >> Testing: - tiers 1-4 >> >> Thanks > > David Holmes has updated the pull request incrementally with one additional commit since the last revision: > > Simplified the test code - thanks @tstuefe! > Rewrote the comment block describing do_vsnprintf. Changes and test look good! I have two things I'm not sure about but otherwise, I approve. src/hotspot/share/utilities/ostream.hpp line 45: > 43: // This allows for redirection via -XX:+DisplayVMOutputToStdout and > 44: // -XX:+DisplayVMOutputToStderr. > 45: // Extra lines added here ------------- Marked as reviewed by matsaave (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19512#pullrequestreview-2099526414 PR Review Comment: https://git.openjdk.org/jdk/pull/19512#discussion_r1627981047 From jsjolen at openjdk.org Wed Jun 5 16:28:57 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Wed, 5 Jun 2024 16:28:57 GMT Subject: RFR: 8333658: NMT: Use an allocator with 4-byte pointers to save memory in NativeCallStackStorage In-Reply-To: References: Message-ID: On Fri, 26 Apr 2024 14:29:53 GMT, Johan Sj?len wrote: > Hi, > > This PR introduces a new allocator, `IndexedFreelistAllocator`. It uses a `GrowableArray` in order to have 4-byte "pointers" to its elements and also works as a freelist as unused slots form a linked list. This allocator cannot shrink its used memory. I'm always open for better names. > > We then use this allocator in order to store the `NativeCallStackStorage`. This saves `4 * 4099 = ~16KiB` of memory as each table element is now only 4 bytes. For each stored stack it also saves 4 bytes. The fact that the allocator cannot shrink is fine, we do not ever remove stacks from the storage. > > The main point of this is to introduce a pointer-generic experimential API, so I also implemented CHeap and Arena allocators. It's currently placed in NMT, but we might want to move it into utilities. It uses a bit of template-magic, but my IDE (Emacs+clangd) was actually able to figure things out when the types didn't line up correctly etc, so it's not an enemy to IDE help. > > It sounded expensive to have the GrowableArray continously realloc its underlying data array, so I did a basic test where we allocate 1 000 000 stacks and push them into NativeCallStackStorage backed by different allocators. This is available in the PR. > > The results are as follows on linux-x64-slowdebug: > > > Generate stacks... Done > Time taken with GrowableArray: 8341.240945 > Time taken with CHeap: 12189.031318 > Time taken with Arena: 8800.703092 > Time taken with GrowableArray again: 8295.508829 > > > And on linux-x64: > > > Time taken with GrowableArray: 8378.018135 > Time taken with CHeap: 12437.347868 > Time taken with Arena: 8758.064717 > Time taken with GrowableArray again: 8391.076291 > > > Obviously, this is a very basic benchmark, but it seems like we're faster than CHeap and Arena for this case. An obvious improvement: Have the returned pointers remember which allocator object it was allocated from when in debug mode to avoid free:ing an element using the wrong allocator. You can also support detecting double frees by assigning each allocation a unique id. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18979#issuecomment-2150479232 From cjplummer at openjdk.org Wed Jun 5 17:02:59 2024 From: cjplummer at openjdk.org (Chris Plummer) Date: Wed, 5 Jun 2024 17:02:59 GMT Subject: RFR: 8326716: JVMTI spec: clarify what nullptr means for C/C++ developers [v5] In-Reply-To: References: Message-ID: On Tue, 4 Jun 2024 23:52:36 GMT, Serguei Spitsyn wrote: >> Yes, my point was that this section is only for return values. The section is titled "Function Return Values". Maybe we should add another short section just before this one to describe what is meant by "null pointer". > > Okay, thanks. What about the following: : > > diff --git a/src/hotspot/share/prims/jvmti.xml b/src/hotspot/share/prims/jvmti.xml > index a6ebd0d42c5..a81014c70bb 100644 > --- a/src/hotspot/share/prims/jvmti.xml > +++ b/src/hotspot/share/prims/jvmti.xml > @@ -995,7 +995,10 @@ jvmtiEnv *jvmti; > across threads and are created dynamically. > > > - > + > + There are a few functions where a parameter value can be a null pointer > + (C NULL or C++ nullptr), e.g. the thread parameter > + can be a null pointer to mean the current thread. > functions always return an > error code via the > function return value. > @@ -1004,7 +1007,7 @@ jvmtiEnv *jvmti; > In some cases, functions allocate memory that your program must > explicitly deallocate. This is indicated in the individual > function descriptions. Empty lists, arrays, sequences, etc are > - returned as a null pointer (C NULL or C++ nullptr). > + returned as a null pointer. >

> In the event that the function encounters > an error (any return value other than JVMTI_ERROR_NONE) the values > > > I can try to add a couple of more examples where a null pointer can be passed as a parameter value if it is desirable. I'm still not sure this works. It seems kind of muddled. Rather than trying to retrofit in the clarifying text, why not start from scratch. That should result in better organization and clearer descriptions. For example, I think first you should clarify what is meant by a "null pointer". Maybe even make that a separate section. I can take a stab at this later today if you want. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19257#discussion_r1628123917 From sspitsyn at openjdk.org Wed Jun 5 17:27:58 2024 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Wed, 5 Jun 2024 17:27:58 GMT Subject: RFR: 8311177: Switching to interpreter only mode in carrier thread can lead to crashes [v4] In-Reply-To: References: Message-ID: On Mon, 3 Jun 2024 22:55:24 GMT, Serguei Spitsyn wrote: >> Please, review the following `interp-only` issue related to carrier threads. >> There are 3 problems fixed here: >> - The `EnterInterpOnlyModeClosure::do_threads` is taking the `JvmtiThreadState` with the `jt->jvmti_thread_state()` which is incorrect when we have a deal with a carrier thread. The target state is known at the point when the `HandshakeClosure` is set, so the fix is to pass it as a constructor parameter. >> - The `state->is_pending_interp_only_mode())` was processed at mounts only but it has to be processed for unmounts as well. >> - The test `test/hotspot/jtreg/serviceability/jvmti/vthread/MethodExitTest/libMethodExitTest.cpp` has a wrong assumption that there can't be `MethodExit` event on the carrier thread when the function `breakpoint_hit1` is being executed. However, it can happen if the virtual thread gets unmounted. >> >> The fix also includes new test case `vthread/CarrierThreadEventNotification` developed by Patricio. >> >> Testing: >> - Ran new test case locally >> - Ran mach5 tiers 1-6 > > Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: > > review: get rid of unneeded casts in new test Thank you for review, Patricio! ------------- PR Comment: https://git.openjdk.org/jdk/pull/19438#issuecomment-2150584829 From sviswanathan at openjdk.org Wed Jun 5 18:13:02 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Wed, 5 Jun 2024 18:13:02 GMT Subject: RFR: 8329031: CPUID feature detection for Advanced Performance Extensions =?UTF-8?B?KEludGVswq4=?= APX) In-Reply-To: References: Message-ID: On Mon, 1 Apr 2024 12:01:27 GMT, Jatin Bhateja wrote: > Summary of changes include with the patch:- > > 1) CPUID based feature detection check for Intel APX extension (https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html) > 2) Validation during VM initialization for extended GPRs state save / restoration by OS across context switches of java application threads executing JIT compiled code with new APX ISA. > > Kindly review and share your feedback. > > Best Regards, > Jatin Marked as reviewed by sviswanathan (Reviewer). src/hotspot/cpu/x86/vm_version_x86.cpp line 113: > 111: VM_Version_StubGenerator(CodeBuffer *c) : StubCodeGenerator(c) {} > 112: > 113: address clear_apx_test_state() { Why do we need to clear_apx_test_state? r16 onwards are not callee saved. And checking r15 save/restore is not needed so we could remove r15 changes altogether. src/hotspot/cpu/x86/vm_version_x86.cpp line 433: > 431: __ jcc(Assembler::notEqual, vector_save_restore); > 432: > 433: /* FIXME: Uncomment after integration of JDK-8328998 Did you mean to uncomment these now that JDK-8328998 has integrated? src/hotspot/cpu/x86/vm_version_x86.cpp line 434: > 432: > 433: /* FIXME: Uncomment after integration of JDK-8328998 > 434: __ mov64(r15, VM_Version::egpr_test_value()); Why are we modifying r15? It is not an APX egpr. src/hotspot/cpu/x86/vm_version_x86.cpp line 435: > 433: /* FIXME: Uncomment after integration of JDK-8328998 > 434: __ mov64(r15, VM_Version::egpr_test_value()); > 435: __ mov64(r16, VM_Version::egpr_test_value()); You would need to temporarily set UseAPX feature before generating this instruction, otherwise assembler will complain. src/hotspot/cpu/x86/vm_version_x86.cpp line 447: > 445: /* FIXME: Uncomment after integration of JDK-8328998 > 446: __ mov64(rax, VM_Version::egpr_test_value()); > 447: __ cmpq(rax, r15); Likewise r15 validation can be removed. src/hotspot/cpu/x86/vm_version_x86.cpp line 456: > 454: // Generate SEGV to signal unsuccessful save/restore. > 455: __ bind(apx_save_restore_error); > 456: __ lea(rax, ExternalAddress(VM_Version::_apx_state_restore_error_handler)); Generating an error message here won't be the right thing (especially since this is default by feature detection). It should only result in setting UseAPX feature to false. src/hotspot/cpu/x86/vm_version_x86.hpp line 476: > 474: uint32_t dcp_cpuid4_edx; // unused currently > 475: > 476: // cpuid function 7 (structured extended features enumeration leaf) Good to add here a comment: // eax = 7, ecx = 0 src/hotspot/os_cpu/bsd_x86/os_bsd_x86.cpp line 420: > 418: > 419: #ifndef PRODUCT > 420: if ((sig == SIGSEGV) && VM_Version::is_cpuinfo_segv_addr_apx(pc)) { Do we want to include SIGBUS also here like above? ------------- PR Review: https://git.openjdk.org/jdk/pull/18562#pullrequestreview-2097590632 PR Review Comment: https://git.openjdk.org/jdk/pull/18562#discussion_r1626760270 PR Review Comment: https://git.openjdk.org/jdk/pull/18562#discussion_r1628196385 PR Review Comment: https://git.openjdk.org/jdk/pull/18562#discussion_r1626759049 PR Review Comment: https://git.openjdk.org/jdk/pull/18562#discussion_r1628197767 PR Review Comment: https://git.openjdk.org/jdk/pull/18562#discussion_r1626759263 PR Review Comment: https://git.openjdk.org/jdk/pull/18562#discussion_r1628208548 PR Review Comment: https://git.openjdk.org/jdk/pull/18562#discussion_r1626753662 PR Review Comment: https://git.openjdk.org/jdk/pull/18562#discussion_r1626756465 From ccheung at openjdk.org Wed Jun 5 19:11:57 2024 From: ccheung at openjdk.org (Calvin Cheung) Date: Wed, 5 Jun 2024 19:11:57 GMT Subject: RFR: 8330198: Add some class loading related perf counters to measure VM startup [v3] In-Reply-To: <7eca2PnRmrfCFnua7sipAivaGRl_jhOlXqr_1lM5ExU=.e6ad2522-4dad-428c-9a85-4e22c201baa2@github.com> References: <7yfsvM0ff6gBYLefpro2qTcEMBmCOHd3YICcygItlZs=.d900a439-4932-46e6-b287-d1bf2789f195@github.com> <24n-bMIYvAF10yob4_Z5t1DPG_nrWypoQYE25zQ499U=.1a77aa5e-1005-4d32-9ae7-ec745838e449@github.com> <0gZ9cLiP3bLH1LNL71dwnuaUY5iN-Ewea-Qyw7eGe44=.f8fa485f-5947-47fc-a8eb-cafd1c56165d@github.com> <9k9sDnon7SehCf1oFouQwWKo2YBTlSwZrBTN3e3ezzw=.070cccf2-cfc6-4a51-9bd6-02e17eacc50b@github.com> <2H_eDsrF-8iXVs6DKjKoLzXaMQ0ZlJzXCVu0eQE51U4=.4ac41a7e-6534-4310-9b00-30644b16c358@github.com> <7eca2PnRmrfCFnua7sipAivaGRl_jhOlXqr_1lM5ExU=.e6ad2522-4dad-428c-9a85-4e22c201baa2@github.com> Message-ID: On Wed, 5 Jun 2024 04:41:22 GMT, Ioi Lam wrote: >>> > The -Xlog:init (perhaps with a better name/tag!) >>> >> >> How about `-Xlog:initcounters` as the name/tag? It is similar to @dholmes-ora 's suggestion above without the underscore between "init" and "counters". Currently, none of the logging tag contains an underscore character. >> >> @iwanowww, @iklam, @dholmes-ora, what do you guys think? > > While these counters are useful for start-up measurement, they can be used for other purposes (e.g., monitoring how much time is spend in linking classes in a long running application). So I think the logging tag should be more neutral. > > We already have `-Xlog:perf`. Maybe we can have sub options like `-Xlog:perf+class+link` for the counters in this PR? > > I know an upcoming PR will be for stats for MutexLocker. Something like `-Xlog:perf+lock` would work for such stats. The above suggestion is fine with me. The `perf` tag is pre-existence, I noticed that if I specified `-Xlog:perf*`, the following log output appeared: [0.002s][info][perf,memops] Trying to open /tmp/... [0.003s][info][perf,memops] Successfully opened It is probably ok if there are not too many of the above log output. I've made the following changes and will push another commit after some testing: - added the `link` logging tag; both `perf` and `class` tags already there; - removed the `ProfileClassLinkage` diagnostics flag; - added a global bool variable `_perf_class_link` in arguments.hpp; it will be set to true if `-Xlog:perf+class+link` is specified. I think it is more efficient to check `Argument::perf_class_link()` vs `log_is_enabled(Info, perf, class, link)` - changed the check for `log_is_enabled(Info, init)` to `log_is_enabled(Info, perf, class, link)`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18790#discussion_r1628306814 From jvernee at openjdk.org Wed Jun 5 19:29:03 2024 From: jvernee at openjdk.org (Jorn Vernee) Date: Wed, 5 Jun 2024 19:29:03 GMT Subject: RFR: 8325984: 4 jcstress tests are failing in Tier6 4 times each Message-ID: These 4 tests were failing due to an incompatibility with jcstress. They were problemlisted in past (https://bugs.openjdk.org/browse/JDK-8326062). Now that jcstress has been updated (https://github.com/openjdk/jdk/pull/19332) with the relevant fix (https://github.com/openjdk/jcstress/pull/147), we can re-enable these tests. Testing: I've verified that all 4 tests now pass on Linux-x64 ------------- Commit messages: - reenable jcstress tests Changes: https://git.openjdk.org/jdk/pull/19565/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19565&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8325984 Stats: 5 lines in 1 file changed: 0 ins; 5 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/19565.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19565/head:pull/19565 PR: https://git.openjdk.org/jdk/pull/19565 From sviswanathan at openjdk.org Wed Jun 5 19:40:57 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Wed, 5 Jun 2024 19:40:57 GMT Subject: RFR: 8329031: CPUID feature detection for Advanced Performance Extensions =?UTF-8?B?KEludGVswq4=?= APX) In-Reply-To: References: Message-ID: On Mon, 1 Apr 2024 12:01:27 GMT, Jatin Bhateja wrote: > Summary of changes include with the patch:- > > 1) CPUID based feature detection check for Intel APX extension (https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html) > 2) Validation during VM initialization for extended GPRs state save / restoration by OS across context switches of java application threads executing JIT compiled code with new APX ISA. > > Kindly review and share your feedback. > > Best Regards, > Jatin @jatin-bhateja Please ignore my approval above, it was in mistake, I don't know how to undo that. Please do look into the review comments/suggestions above. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18562#issuecomment-2150819857 From matsaave at openjdk.org Wed Jun 5 19:51:58 2024 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Wed, 5 Jun 2024 19:51:58 GMT Subject: RFR: 8256828: ostream::print_cr() truncates buffer in copy-through case [v4] In-Reply-To: References: <7sjAHNVe08bQX2n2Kp38Ppxqf9zem-oYahE3dO_la84=.20f4fbd3-2c20-415a-8e75-275d69cf4f7b@github.com> Message-ID: On Tue, 4 Jun 2024 07:34:34 GMT, David Holmes wrote: >> Clarifies the behaviour of this function in regards to truncation when adding a CR. Ensures a truncation warning is always issued. >> >> Adds unit testing for the specialized cases. >> >> See JBS for discussion of other suggestions. >> >> Testing: - tiers 1-4 >> >> Thanks > > David Holmes has updated the pull request incrementally with one additional commit since the last revision: > > Simplified the test code - thanks @tstuefe! > Rewrote the comment block describing do_vsnprintf. test/hotspot/gtest/utilities/test_ostream.cpp line 161: > 159: size_t initial_len = strlen(str); > 160: ASSERT_TRUE(initial_len < max_len); > 161: result = test(&buffer[0], buflen, false, result_len, str); I must have forgotten to add this comment before but here it is again: Is there a reason you chose to use `&buffer[0]` rather than simple `buffer`? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19512#discussion_r1628355549 From ccheung at openjdk.org Wed Jun 5 20:59:09 2024 From: ccheung at openjdk.org (Calvin Cheung) Date: Wed, 5 Jun 2024 20:59:09 GMT Subject: RFR: 8330198: Add some class loading related perf counters to measure VM startup [v7] In-Reply-To: References: Message-ID: > Adding a few perf counters related to class loading to measure VM startup. The counters are only active if the user specifies `-Xlog:init` in the command line. A diagnostic flag `ProfileClassLinkage` is added to control the new counters. The flag is set to false by default and will be enabled if `-Xlog:init` is specified. > > This change is already in the leyden/premain branch. There are more counters in the branch to measure other stuff. For now, just upstreaming class loader related counters. > > Refer to the [comment](https://bugs.openjdk.org/browse/JDK-8330198?focusedId=14665311&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14665311) in the bug report for an example output. > > Passed tiers 1 - 4 testing. Calvin Cheung has updated the pull request incrementally with one additional commit since the last revision: implement -Xlog:perf+class+link ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18790/files - new: https://git.openjdk.org/jdk/pull/18790/files/3b20f1d6..4c224f55 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18790&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18790&range=05-06 Stats: 33 lines in 9 files changed: 11 ins; 8 del; 14 mod Patch: https://git.openjdk.org/jdk/pull/18790.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18790/head:pull/18790 PR: https://git.openjdk.org/jdk/pull/18790 From sspitsyn at openjdk.org Wed Jun 5 21:49:56 2024 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Wed, 5 Jun 2024 21:49:56 GMT Subject: Integrated: 8311177: Switching to interpreter only mode in carrier thread can lead to crashes In-Reply-To: References: Message-ID: On Tue, 28 May 2024 22:24:53 GMT, Serguei Spitsyn wrote: > Please, review the following `interp-only` issue related to carrier threads. > There are 3 problems fixed here: > - The `EnterInterpOnlyModeClosure::do_threads` is taking the `JvmtiThreadState` with the `jt->jvmti_thread_state()` which is incorrect when we have a deal with a carrier thread. The target state is known at the point when the `HandshakeClosure` is set, so the fix is to pass it as a constructor parameter. > - The `state->is_pending_interp_only_mode())` was processed at mounts only but it has to be processed for unmounts as well. > - The test `test/hotspot/jtreg/serviceability/jvmti/vthread/MethodExitTest/libMethodExitTest.cpp` has a wrong assumption that there can't be `MethodExit` event on the carrier thread when the function `breakpoint_hit1` is being executed. However, it can happen if the virtual thread gets unmounted. > > The fix also includes new test case `vthread/CarrierThreadEventNotification` developed by Patricio. > > Testing: > - Ran new test case locally > - Ran mach5 tiers 1-6 This pull request has now been integrated. Changeset: 60ea17e8 Author: Serguei Spitsyn URL: https://git.openjdk.org/jdk/commit/60ea17e8482936a6acbc442bb1be199e01008072 Stats: 251 lines in 9 files changed: 229 ins; 12 del; 10 mod 8311177: Switching to interpreter only mode in carrier thread can lead to crashes Reviewed-by: pchilanomate, amenkov ------------- PR: https://git.openjdk.org/jdk/pull/19438 From ccheung at openjdk.org Wed Jun 5 22:24:48 2024 From: ccheung at openjdk.org (Calvin Cheung) Date: Wed, 5 Jun 2024 22:24:48 GMT Subject: RFR: 8330198: Add some class loading related perf counters to measure VM startup [v3] In-Reply-To: References: <7yfsvM0ff6gBYLefpro2qTcEMBmCOHd3YICcygItlZs=.d900a439-4932-46e6-b287-d1bf2789f195@github.com> <24n-bMIYvAF10yob4_Z5t1DPG_nrWypoQYE25zQ499U=.1a77aa5e-1005-4d32-9ae7-ec745838e449@github.com> <0gZ9cLiP3bLH1LNL71dwnuaUY5iN-Ewea-Qyw7eGe44=.f8fa485f-5947-47fc-a8eb-cafd1c56165d@github.com> Message-ID: <0B3DXYpRB8P6bEQP2ACupaLG9RRAfEe3PflYvpE3ORs=.4a2b8b09-2743-48ac-baff-f2fc6df3944b@github.com> On Tue, 4 Jun 2024 05:13:58 GMT, David Holmes wrote: >>> The -Xlog:init (perhaps with a better name/tag!) >> >> I'm all for a better naming scheme. Any suggestions? > >> -Xlog:init means "I want to see logs related to initialization", so it should enable all the counters for printing the related logs. > > I don't agree. Initialization logging could encompass many different things, some of which are individually controllable via different flags. Simply turning on init logging should not turn on all such flags. If you want that level of coupling then perhaps use init_counters (or something like that) to make it clear this is not a general log tag intended for any initialization code to use, but something you have chosen to tie to specific functionality. > >> We may add several groups of counters in the future. We don't want to force the user to enumerate all these counters > > It is not clear to me how you envisage that working. You want individual group switches plus a global one? @dholmes-ora, @iklam Could you review my latest commit? Thanks! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18790#discussion_r1628504608 From cjplummer at openjdk.org Wed Jun 5 22:52:45 2024 From: cjplummer at openjdk.org (Chris Plummer) Date: Wed, 5 Jun 2024 22:52:45 GMT Subject: RFR: 8326716: JVMTI spec: clarify what nullptr means for C/C++ developers [v5] In-Reply-To: References: Message-ID: On Wed, 5 Jun 2024 17:00:34 GMT, Chris Plummer wrote: >> Okay, thanks. What about the following: : >> >> diff --git a/src/hotspot/share/prims/jvmti.xml b/src/hotspot/share/prims/jvmti.xml >> index a6ebd0d42c5..a81014c70bb 100644 >> --- a/src/hotspot/share/prims/jvmti.xml >> +++ b/src/hotspot/share/prims/jvmti.xml >> @@ -995,7 +995,10 @@ jvmtiEnv *jvmti; >> across threads and are created dynamically. >> >> >> - >> + >> + There are a few functions where a parameter value can be a null pointer >> + (C NULL or C++ nullptr), e.g. the thread parameter >> + can be a null pointer to mean the current thread. >> functions always return an >> error code via the >> function return value. >> @@ -1004,7 +1007,7 @@ jvmtiEnv *jvmti; >> In some cases, functions allocate memory that your program must >> explicitly deallocate. This is indicated in the individual >> function descriptions. Empty lists, arrays, sequences, etc are >> - returned as a null pointer (C NULL or C++ nullptr). >> + returned as a null pointer. >>

>> In the event that the function encounters >> an error (any return value other than JVMTI_ERROR_NONE) the values >> >> >> I can try to add a couple of more examples where a null pointer can be passed as a parameter value if it is desirable. > > I'm still not sure this works. It seems kind of muddled. Rather than trying to retrofit in the clarifying text, why not start from scratch. That should result in better organization and clearer descriptions. For example, I think first you should clarify what is meant by a "null pointer". Maybe even make that a separate section. I can take a stab at this later today if you want. How about undoing the changes in this subsection and then just add the following as a preceding subsection: **Null Pointers** Parts of this specification refer to a "null pointer" as a possible function parameter or return value. A "null pointer" is C `NULL` or C++ `nullptr`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19257#discussion_r1628532037 From dholmes at openjdk.org Wed Jun 5 23:19:00 2024 From: dholmes at openjdk.org (David Holmes) Date: Wed, 5 Jun 2024 23:19:00 GMT Subject: RFR: 8256828: ostream::print_cr() truncates buffer in copy-through case [v5] In-Reply-To: <7sjAHNVe08bQX2n2Kp38Ppxqf9zem-oYahE3dO_la84=.20f4fbd3-2c20-415a-8e75-275d69cf4f7b@github.com> References: <7sjAHNVe08bQX2n2Kp38Ppxqf9zem-oYahE3dO_la84=.20f4fbd3-2c20-415a-8e75-275d69cf4f7b@github.com> Message-ID: <9xBpleRi18oQMyKGVeqwp1iJdUcIoH1TQvjc8Z4BzPo=.5d215184-62b8-44f5-977c-2a13f2503684@github.com> > Clarifies the behaviour of this function in regards to truncation when adding a CR. Ensures a truncation warning is always issued. > > Adds unit testing for the specialized cases. > > See JBS for discussion of other suggestions. > > Testing: - tiers 1-4 > > Thanks David Holmes has updated the pull request incrementally with one additional commit since the last revision: Remove extra line ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19512/files - new: https://git.openjdk.org/jdk/pull/19512/files/1aef14ca..e36c2c9b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19512&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19512&range=03-04 Stats: 1 line in 1 file changed: 0 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/19512.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19512/head:pull/19512 PR: https://git.openjdk.org/jdk/pull/19512 From dholmes at openjdk.org Wed Jun 5 23:19:00 2024 From: dholmes at openjdk.org (David Holmes) Date: Wed, 5 Jun 2024 23:19:00 GMT Subject: RFR: 8256828: ostream::print_cr() truncates buffer in copy-through case [v4] In-Reply-To: References: <7sjAHNVe08bQX2n2Kp38Ppxqf9zem-oYahE3dO_la84=.20f4fbd3-2c20-415a-8e75-275d69cf4f7b@github.com> Message-ID: On Wed, 5 Jun 2024 15:17:41 GMT, Matias Saavedra Silva wrote: >> David Holmes has updated the pull request incrementally with one additional commit since the last revision: >> >> Simplified the test code - thanks @tstuefe! >> Rewrote the comment block describing do_vsnprintf. > > src/hotspot/share/utilities/ostream.hpp line 45: > >> 43: // This allows for redirection via -XX:+DisplayVMOutputToStdout and >> 44: // -XX:+DisplayVMOutputToStderr. >> 45: // > > Extra lines added here Fixed ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19512#discussion_r1628550509 From sspitsyn at openjdk.org Wed Jun 5 23:49:08 2024 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Wed, 5 Jun 2024 23:49:08 GMT Subject: RFR: 8326716: JVMTI spec: clarify what nullptr means for C/C++ developers [v6] In-Reply-To: References: Message-ID: > The following RFE was fixed recently: > [8324680](https://bugs.openjdk.org/browse/JDK-8324680): Replace NULL with nullptr in JVMTI generated code > > It replaced all the `NULL`'s in the generated spec with`nullptr`. JVMTI agents can be developed in C or C++. > This update is to make it clear that `nullptr` is C programming language `null` pointer. > > I think we do not need a CSR for this fix. > > Testing: N/A (not needed) Serguei Spitsyn has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: - Merge - review: consistency and stylistical corrections - review: more null pointer corrections - review: replace nullptr with null pointer in the docs - review: corrected the nullptr clarification - 8326716: JVMTI spec: clarify what nullptr means for C/C++ developers ------------- Changes: https://git.openjdk.org/jdk/pull/19257/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19257&range=05 Stats: 82 lines in 4 files changed: 0 ins; 0 del; 82 mod Patch: https://git.openjdk.org/jdk/pull/19257.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19257/head:pull/19257 PR: https://git.openjdk.org/jdk/pull/19257 From sspitsyn at openjdk.org Wed Jun 5 23:49:08 2024 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Wed, 5 Jun 2024 23:49:08 GMT Subject: RFR: 8326716: JVMTI spec: clarify what nullptr means for C/C++ developers [v5] In-Reply-To: References: Message-ID: <5_UvYEgMkoAnMFEEyAziKgFLgMz8HVfUK6a0t1_4RgU=.844286cd-8529-4875-8938-6da37d86c6b1@github.com> On Wed, 5 Jun 2024 22:49:51 GMT, Chris Plummer wrote: >> I'm still not sure this works. It seems kind of muddled. Rather than trying to retrofit in the clarifying text, why not start from scratch. That should result in better organization and clearer descriptions. For example, I think first you should clarify what is meant by a "null pointer". Maybe even make that a separate section. I can take a stab at this later today if you want. > > How about undoing the changes in this subsection and then just add the following as a preceding subsection: > > **Null Pointers** > > Parts of this specification refer to a "null pointer" as a possible function parameter or return value. A "null pointer" is C `NULL` or C++ `nullptr`. Good suggestion, thanks! Will add it now. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19257#discussion_r1628565113 From sspitsyn at openjdk.org Wed Jun 5 23:57:02 2024 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Wed, 5 Jun 2024 23:57:02 GMT Subject: RFR: 8326716: JVMTI spec: clarify what nullptr means for C/C++ developers [v7] In-Reply-To: References: Message-ID: > The following RFE was fixed recently: > [8324680](https://bugs.openjdk.org/browse/JDK-8324680): Replace NULL with nullptr in JVMTI generated code > > It replaced all the `NULL`'s in the generated spec with`nullptr`. JVMTI agents can be developed in C or C++. > This update is to make it clear that `nullptr` is C programming language `null` pointer. > > I think we do not need a CSR for this fix. > > Testing: N/A (not needed) Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: review: add a sub-section: Null Pointers ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19257/files - new: https://git.openjdk.org/jdk/pull/19257/files/6275df3a..16cea131 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19257&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19257&range=05-06 Stats: 6 lines in 1 file changed: 5 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/19257.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19257/head:pull/19257 PR: https://git.openjdk.org/jdk/pull/19257 From cjplummer at openjdk.org Thu Jun 6 00:00:52 2024 From: cjplummer at openjdk.org (Chris Plummer) Date: Thu, 6 Jun 2024 00:00:52 GMT Subject: RFR: 8326716: JVMTI spec: clarify what nullptr means for C/C++ developers [v7] In-Reply-To: References: Message-ID: On Wed, 5 Jun 2024 23:57:02 GMT, Serguei Spitsyn wrote: >> The following RFE was fixed recently: >> [8324680](https://bugs.openjdk.org/browse/JDK-8324680): Replace NULL with nullptr in JVMTI generated code >> >> It replaced all the `NULL`'s in the generated spec with`nullptr`. JVMTI agents can be developed in C or C++. >> This update is to make it clear that `nullptr` is C programming language `null` pointer. >> >> I think we do not need a CSR for this fix. >> >> Testing: N/A (not needed) > > Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: > > review: add a sub-section: Null Pointers Looks good. Thanks for the changes. ------------- Marked as reviewed by cjplummer (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19257#pullrequestreview-2100483644 From sspitsyn at openjdk.org Thu Jun 6 00:12:51 2024 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Thu, 6 Jun 2024 00:12:51 GMT Subject: RFR: 8326716: JVMTI spec: clarify what nullptr means for C/C++ developers [v7] In-Reply-To: References: Message-ID: On Wed, 5 Jun 2024 23:57:02 GMT, Serguei Spitsyn wrote: >> The following RFE was fixed recently: >> [8324680](https://bugs.openjdk.org/browse/JDK-8324680): Replace NULL with nullptr in JVMTI generated code >> >> It replaced all the `NULL`'s in the generated spec with`nullptr`. JVMTI agents can be developed in C or C++. >> This update is to make it clear that `nullptr` is C programming language `null` pointer. >> >> I think we do not need a CSR for this fix. >> >> Testing: N/A (not needed) > > Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: > > review: add a sub-section: Null Pointers Thanks you for review, Kim Chris! Alan, Kim, David and Chris et all, thanks for the discussion and suggestion! ------------- PR Comment: https://git.openjdk.org/jdk/pull/19257#issuecomment-2151151302 From dholmes at openjdk.org Thu Jun 6 00:19:01 2024 From: dholmes at openjdk.org (David Holmes) Date: Thu, 6 Jun 2024 00:19:01 GMT Subject: RFR: 8256828: ostream::print_cr() truncates buffer in copy-through case [v6] In-Reply-To: <7sjAHNVe08bQX2n2Kp38Ppxqf9zem-oYahE3dO_la84=.20f4fbd3-2c20-415a-8e75-275d69cf4f7b@github.com> References: <7sjAHNVe08bQX2n2Kp38Ppxqf9zem-oYahE3dO_la84=.20f4fbd3-2c20-415a-8e75-275d69cf4f7b@github.com> Message-ID: <9C9P-vLrn6dfSisiDmvJGAdOexqO7ovMHqZ9PBVmVgY=.b4ef8ab1-40ec-4f4f-914e-3e5e4c59d9dd@github.com> > Clarifies the behaviour of this function in regards to truncation when adding a CR. Ensures a truncation warning is always issued. > > Adds unit testing for the specialized cases. > > See JBS for discussion of other suggestions. > > Testing: - tiers 1-4 > > Thanks David Holmes has updated the pull request incrementally with one additional commit since the last revision: Simplify test expression ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19512/files - new: https://git.openjdk.org/jdk/pull/19512/files/e36c2c9b..6b1b29d2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19512&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19512&range=04-05 Stats: 18 lines in 1 file changed: 0 ins; 0 del; 18 mod Patch: https://git.openjdk.org/jdk/pull/19512.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19512/head:pull/19512 PR: https://git.openjdk.org/jdk/pull/19512 From dholmes at openjdk.org Thu Jun 6 00:19:01 2024 From: dholmes at openjdk.org (David Holmes) Date: Thu, 6 Jun 2024 00:19:01 GMT Subject: RFR: 8256828: ostream::print_cr() truncates buffer in copy-through case [v4] In-Reply-To: References: <7sjAHNVe08bQX2n2Kp38Ppxqf9zem-oYahE3dO_la84=.20f4fbd3-2c20-415a-8e75-275d69cf4f7b@github.com> Message-ID: On Wed, 5 Jun 2024 16:19:02 GMT, Matias Saavedra Silva wrote: >> David Holmes has updated the pull request incrementally with one additional commit since the last revision: >> >> Simplified the test code - thanks @tstuefe! >> Rewrote the comment block describing do_vsnprintf. > > Changes and test look good! I have two things I'm not sure about but otherwise, I approve. Thanks for the review @matias9927 ! > test/hotspot/gtest/utilities/test_ostream.cpp line 161: > >> 159: size_t initial_len = strlen(str); >> 160: ASSERT_TRUE(initial_len < max_len); >> 161: result = test(&buffer[0], buflen, false, result_len, str); > > I must have forgotten to add this comment before but here it is again: > Is there a reason you chose to use `&buffer[0]` rather than simple `buffer`? At some point when writing the test code it wouldn't compile (perhaps I had a misplaced const at the time?) but now it does - fixed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19512#issuecomment-2151154775 PR Review Comment: https://git.openjdk.org/jdk/pull/19512#discussion_r1628579597 From dholmes at openjdk.org Thu Jun 6 00:19:01 2024 From: dholmes at openjdk.org (David Holmes) Date: Thu, 6 Jun 2024 00:19:01 GMT Subject: Integrated: 8256828: ostream::print_cr() truncates buffer in copy-through case In-Reply-To: <7sjAHNVe08bQX2n2Kp38Ppxqf9zem-oYahE3dO_la84=.20f4fbd3-2c20-415a-8e75-275d69cf4f7b@github.com> References: <7sjAHNVe08bQX2n2Kp38Ppxqf9zem-oYahE3dO_la84=.20f4fbd3-2c20-415a-8e75-275d69cf4f7b@github.com> Message-ID: On Sun, 2 Jun 2024 22:05:40 GMT, David Holmes wrote: > Clarifies the behaviour of this function in regards to truncation when adding a CR. Ensures a truncation warning is always issued. > > Adds unit testing for the specialized cases. > > See JBS for discussion of other suggestions. > > Testing: - tiers 1-4 > > Thanks This pull request has now been integrated. Changeset: ca939075 Author: David Holmes URL: https://git.openjdk.org/jdk/commit/ca9390755bc652251bdcfd9ec2a583680a63fddf Stats: 319 lines in 3 files changed: 300 ins; 2 del; 17 mod 8256828: ostream::print_cr() truncates buffer in copy-through case Reviewed-by: stuefe, matsaave ------------- PR: https://git.openjdk.org/jdk/pull/19512 From wanghaomin at openjdk.org Thu Jun 6 01:32:06 2024 From: wanghaomin at openjdk.org (Wang Haomin) Date: Thu, 6 Jun 2024 01:32:06 GMT Subject: RFR: 8329421: Native methods can not be selectively printed [v2] In-Reply-To: References: Message-ID: On Tue, 2 Apr 2024 07:23:25 GMT, Volker Simonis wrote: >> Native methods (i.e. "native wrappers") can not be selectively printed with `-XX:CompileCommand=print,class::method`. Currently the only way to print native methods is to use the global `-XX:+PrintAssembly` option. But this prints *all* compiled methods which can be too much if we're just interested in a specific native wrapper. There's no reason to not apply `-XX:CompileCommand` options correctly to native methods as well. > > Volker Simonis has updated the pull request incrementally with one additional commit since the last revision: > > Add test for -XX:+PrintNativeNMethods First, I've build on aarch64. `git checkout jdk-23+17` or `git checkout 3057dded4878b0110bc2c09b52019570a0a31c9f`. Secondly, configure with `--with-debug-level=fastdebug --with-jvm-variants=core`. Finally, `make images CONF=core`. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18567#issuecomment-2151237267 From duke at openjdk.org Thu Jun 6 03:11:51 2024 From: duke at openjdk.org (Liming Liu) Date: Thu, 6 Jun 2024 03:11:51 GMT Subject: RFR: 8324781: runtime/Thread/TestAlwaysPreTouchStacks.java failed with Expected a higher ratio between stack committed and reserved [v9] In-Reply-To: References: Message-ID: On Wed, 22 May 2024 07:49:22 GMT, Stefan Karlsson wrote: >> Liming Liu has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix the wrong condition > > I'm running this through our tier1-tier3 testing now. Hi, @stefank did the testing run fine? ------------- PR Comment: https://git.openjdk.org/jdk/pull/18592#issuecomment-2151331689 From qamai at openjdk.org Thu Jun 6 03:43:44 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Thu, 6 Jun 2024 03:43:44 GMT Subject: RFR: 8326716: JVMTI spec: clarify what nullptr means for C/C++ developers [v5] In-Reply-To: <5_UvYEgMkoAnMFEEyAziKgFLgMz8HVfUK6a0t1_4RgU=.844286cd-8529-4875-8938-6da37d86c6b1@github.com> References: <5_UvYEgMkoAnMFEEyAziKgFLgMz8HVfUK6a 0t1_4RgU=.844286cd-8529-4875-8938-6da37d86c6b1@github.com> Message-ID: On Wed, 5 Jun 2024 23:43:20 GMT, Serguei Spitsyn wrote: >> How about undoing the changes in this subsection and then just add the following as a preceding subsection: >> >> **Null Pointers** >> >> Parts of this specification refer to a "null pointer" as a possible function parameter or return value. A "null pointer" is C `NULL` or C++ `nullptr`. > > Good suggestion, thanks! Updated now. A "null pointer" is well-defined in the language itself so I don't think there is any need to clarify it here. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19257#discussion_r1628735822 From sspitsyn at openjdk.org Thu Jun 6 04:24:48 2024 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Thu, 6 Jun 2024 04:24:48 GMT Subject: Integrated: 8326716: JVMTI spec: clarify what nullptr means for C/C++ developers In-Reply-To: References: Message-ID: On Thu, 16 May 2024 02:37:40 GMT, Serguei Spitsyn wrote: > The following RFE was fixed recently: > [8324680](https://bugs.openjdk.org/browse/JDK-8324680): Replace NULL with nullptr in JVMTI generated code > > It replaced all the `NULL`'s in the generated spec with`nullptr`. JVMTI agents can be developed in C or C++. > This update is to make it clear that `nullptr` is C programming language `null` pointer. > > I think we do not need a CSR for this fix. > > Testing: N/A (not needed) This pull request has now been integrated. Changeset: 30894126 Author: Serguei Spitsyn URL: https://git.openjdk.org/jdk/commit/30894126a4ba8bc41c333c923ff3007503257688 Stats: 87 lines in 4 files changed: 5 ins; 0 del; 82 mod 8326716: JVMTI spec: clarify what nullptr means for C/C++ developers Reviewed-by: kbarrett, cjplummer ------------- PR: https://git.openjdk.org/jdk/pull/19257 From stuefe at openjdk.org Thu Jun 6 06:03:49 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 6 Jun 2024 06:03:49 GMT Subject: RFR: 8324781: runtime/Thread/TestAlwaysPreTouchStacks.java failed with Expected a higher ratio between stack committed and reserved [v9] In-Reply-To: References: Message-ID: On Mon, 6 May 2024 03:33:30 GMT, Liming Liu wrote: >> The testcase failed on Oracle CI since JDK-8315923. The root cause is that Oracle CI runs Linux-5.4.17-UEK where the value of MADV_POPULATE_WRITE (23) is used as MADV_DONTEXEC which is not supported by upstream. This PR solves the testcase failure by checking versions of kernels first, and checking the availability of MADV_POPULATE_WRITE when they are not older than 5.14. > > Liming Liu has updated the pull request incrementally with one additional commit since the last revision: > > Fix the wrong condition Just stating this here, I don't like the fact that this patch punishes all Distros for a problem that only exists in UEK kernel. This is not a problem with the mainline kernel. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18592#issuecomment-2151471883 From duke at openjdk.org Thu Jun 6 07:18:47 2024 From: duke at openjdk.org (Jin Guojie) Date: Thu, 6 Jun 2024 07:18:47 GMT Subject: RFR: 8333343: [REDO] AArch64: optimize integer remainder [v3] In-Reply-To: References: <2sQ52bHtUebVvRZ6dd0zC3So9sN2mm40kXaYLm0nm_k=.5ec3561b-8dc4-4666-af9f-c32e19ff1c04@github.com> Message-ID: On Thu, 30 May 2024 10:16:03 GMT, Andrew Haley wrote: >> Jin Guojie has updated the pull request incrementally with two additional commits since the last revision: >> >> - Merge branch 'dev0530' of https://github.com/jinguojie-alibaba/jdk into dev0530 >> - MacroAssembler::msub() takes a scratch register as an argument > > src/hotspot/cpu/aarch64/macroAssembler_aarch64.hpp line 446: > >> 444: >> 445: void msub(Register Rd, Register Rn, Register Rm, Register Ra, Register tmp = rscratch2); >> 446: void msubw(Register Rd, Register Rn, Register Rm, Register Ra, Register tmp = rscratch2); > > Please delete these two methods that use rscratch2 as a default tmp register. I tried it and encountered a problem: msub/msubw will be called in other functions, inline void mnegw(Register Rd, Register Rn, Register Rm) { msubw(Rd, Rn, Rm, zr); } inline void mneg(Register Rd, Register Rn, Register Rm) { msub(Rd, Rn, Rm, zr); } If we add a parameter to msub/msubw, then 1. All these functions (mnegw, mneg, ...) that call msub/msubw also need to add parameters, 2. Moreover, all the calling functions of mnegw and mneg also need to be modified. The above two effects involve too many code changes. Please see if we can keep a default parameter for msub/msubw so that their calling functions do not need to be modified. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19471#discussion_r1628899052 From mli at openjdk.org Thu Jun 6 07:54:46 2024 From: mli at openjdk.org (Hamlin Li) Date: Thu, 6 Jun 2024 07:54:46 GMT Subject: RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF [v6] In-Reply-To: <2VxBcA-0qxX3N35u5vnKyT920nTH5llf2k5_sKQcqT8=.23823400-536f-458e-baf7-53f99547abc4@github.com> References: <0cUurmXlMJ_B66Wy1umd2n4r9ve7_Q4WOU0ffMd8s5Y=.bbc93b65-382c-4139-aaec-cb835d94a06e@github.com> <2VxBcA-0qxX3N35u5vnKyT920nTH5llf2k5_sKQcqT8=.23823400-536f-458e-baf7-53f99547abc4@github.com> Message-ID: On Wed, 8 May 2024 17:41:23 GMT, Hamlin Li wrote: >> Hi, >> Can you help to review the patch? >> This pr is based on previous work and discussion in [pr 16234](https://github.com/openjdk/jdk/pull/16234), [pr 18294](https://github.com/openjdk/jdk/pull/18294). >> >> Compared with previous prs, the major change in this pr is to integrate the source of sleef (for the steps, please check `src/jdk.incubator.vector/linux/native/libvectormath/README`), rather than depends on external sleef things (header or lib) at build or run time. >> Besides of this change, also modify the previous changes accordingly, e.g. remove some uncessary files or changes especially in make dir of jdk. >> >> Besides of the code changes, one important task is to handle the legal process. >> >> Thanks! >> >> ## Performance >> NOTE: >> * `Src` means implementation in this pr, i.e. without depenency on external sleef. >> * `Disabled` means disable intrinsics by `-XX:-UseVectorStubs` >> * `system_sleef` means implementation in [previous pr 18294](https://github.com/openjdk/jdk/pull/18294), i.e. build and run jdk with depenency on external sleef. >> >> Basically, the perf data below shows that >> * this implementation has better performance than previous version in [pr 18294](https://github.com/openjdk/jdk/pull/18294), >> * and both sleef versions has much better performance compared with non-sleef version. >> >> |Benchmark |(size)|Src |Units|system_sleef|(system_sleef-Src)/Src|Diabled |(Disable-Src)/Src| >> |------------------------------|------|---------|-----|------------|----------------------|---------|-----------------| >> |3472:Double128Vector.ACOS |1024 |8546.842 |ns/op|8516.007 |-0.004 |16799.273|0.966 | >> |3473:Double128Vector.ASIN |1024 |6864.656 |ns/op|6987.328 |0.018 |16602.442|1.419 | >> |3474:Double128Vector.ATAN |1024 |11489.255|ns/op|12261.800 |0.067 |26329.320|1.292 | >> |3475:Double128Vector.ATAN2 |1024 |16661.170|ns/op|17234.472 |0.034 |42084.100|1.526 | >> |3476:Double128Vector.CBRT |1024 |18999.387|ns/op|20298.458 |0.068 |35998.688|0.895 | >> |3477:Double128Vector.COS |1024 |14081.857|ns/op|14846.117 |0.054 |24420.692|0.734 | >> |3478:Double128Vector.COSH |1024 |12202.306|ns/op|12237.772 |0.003 |21343.863|0.749 | >> |3479:Double128Vector.EXP |1024 |4553.108 |ns/op|4777.638 ... > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > update header files for arm in progress... ------------- PR Comment: https://git.openjdk.org/jdk/pull/18605#issuecomment-2151633043 From qpzhang at openjdk.org Thu Jun 6 08:01:48 2024 From: qpzhang at openjdk.org (Patrick Zhang) Date: Thu, 6 Jun 2024 08:01:48 GMT Subject: RFR: 8324781: runtime/Thread/TestAlwaysPreTouchStacks.java failed with Expected a higher ratio between stack committed and reserved [v9] In-Reply-To: References: Message-ID: <5BQVnl993FEOJPzQTIN-spFZSEtGtPJjFTrU_L87xrc=.b87b3e4a-a773-4c83-88d3-e038adc615f6@github.com> On Mon, 6 May 2024 03:33:30 GMT, Liming Liu wrote: >> The testcase failed on Oracle CI since JDK-8315923. The root cause is that Oracle CI runs Linux-5.4.17-UEK where the value of MADV_POPULATE_WRITE (23) is used as MADV_DONTEXEC which is not supported by upstream. This PR solves the testcase failure by checking versions of kernels first, and checking the availability of MADV_POPULATE_WRITE when they are not older than 5.14. > > Liming Liu has updated the pull request incrementally with one additional commit since the last revision: > > Fix the wrong condition To clarify, the current revised patch does **NOT** add any if-cond to check UEK, instead it enhances the check of the readiness of `MADV_POPULATE_WRITE` on `5.14 and beyond` (stated by Linux kernel official docs). In addition, the two test cases failing on UEK kernel can also be a justification that the condition of having `supportMadvPopulateWrite` needs to be strengthened. We could not say that it is a problem only exists in UEK kernel, in theory every distros could backport part of `MADV_POPULATE_WRITE` function from 5.14 to previous version or reuse any madv op code there, which is out of our control or our testing coverage. Therefore, having the >=5.14 check can be a safer way to all. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18592#issuecomment-2151650235 From aph at openjdk.org Thu Jun 6 08:31:49 2024 From: aph at openjdk.org (Andrew Haley) Date: Thu, 6 Jun 2024 08:31:49 GMT Subject: RFR: 8333343: [REDO] AArch64: optimize integer remainder [v4] In-Reply-To: References: <2sQ52bHtUebVvRZ6dd0zC3So9sN2mm40kXaYLm0nm_k=.5ec3561b-8dc4-4666-af9f-c32e19ff1c04@github.com> Message-ID: On Fri, 31 May 2024 00:45:33 GMT, Jin Guojie wrote: >> On some Arm processors, a separate multiply/subtract is actually faster than the combined instruction. >> >> (1) The following test has passed, which shows performance improvement. >> >> make test TEST="micro:java.lang.IntegerDivMod" >> make test TEST="micro:java.lang.LongDivMod" >> >> * IntegerDivMod.testDivideRemainderUnsigned baseline(ns/ops) 2223 with this pacth(ns/ops) 1885 improvement(%) 17.93% >> >> * IntegerDivMod.testRemainderUnsigned baseline(ns/ops) 2225 with this pacth(ns/ops) 1885 improvement(%) 18.03% >> >> * LongDivMod.testDivideRemainderUnsigned baseline(ns/ops) 2231 with this pacth(ns/ops) 1894 improvement(%) 17.79% >> >> * LongDivMod.testRemainderUnsigned baseline(ns/ops) 2232 with this pacth(ns/ops) 1891 improvement(%) 18.03% >> >> (2) jtreg test has passed >> >> make run-test? TEST=tier1 > > Jin Guojie has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - Merge branch 'openjdk:master' into dev0530 > - Merge branch 'dev0530' of https://github.com/jinguojie-alibaba/jdk into dev0530 > - Merge branch 'openjdk:master' into dev0530 > - MacroAssembler::msub() takes a scratch register as an argument > - 8331558: AArch64: optimize integer remainder > > On some Arm processors, a separate multiply/subtract is actually faster than the combined instruction. > > (1) The following test has passed, which shows performance improvement. > > make test TEST="micro:java.lang.IntegerDivMod" > make test TEST="micro:java.lang.LongDivMod" > > * IntegerDivMod.testDivideRemainderUnsigned > baseline(ns/ops) 2223 > with this pacth(ns/ops) 1885 > improvement(%) 17.93% > > * IntegerDivMod.testRemainderUnsigned > baseline(ns/ops) 2225 > with this pacth(ns/ops) 1885 > improvement(%) 18.03% > > * LongDivMod.testDivideRemainderUnsigned > baseline(ns/ops) 2231 > with this pacth(ns/ops) 1894 > improvement(%) 17.79% > > * LongDivMod.testRemainderUnsigned > baseline(ns/ops) 2232 > with this pacth(ns/ops) 1891 > improvement(%) 18.03% > > (2) jtreg test has passed > > make run-test? TEST=tier1 src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp line 2298: > 2296: if (VM_Version::supports_a53mac() && Ra != zr) > 2297: nop(); > 2298: if (VM_Version::is_neoverse()) { Maybe a more descriptive name? Suggestion: if (VM_Version::split_msub()) { Is there a good reason why we're restricting this to msub and not madd? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19471#discussion_r1629015638 From aph at openjdk.org Thu Jun 6 08:34:46 2024 From: aph at openjdk.org (Andrew Haley) Date: Thu, 6 Jun 2024 08:34:46 GMT Subject: RFR: 8333343: [REDO] AArch64: optimize integer remainder [v3] In-Reply-To: References: <2sQ52bHtUebVvRZ6dd0zC3So9sN2mm40kXaYLm0nm_k=.5ec3561b-8dc4-4666-af9f-c32e19ff1c04@github.com> Message-ID: On Thu, 6 Jun 2024 07:16:30 GMT, Jin Guojie wrote: >> src/hotspot/cpu/aarch64/macroAssembler_aarch64.hpp line 446: >> >>> 444: >>> 445: void msub(Register Rd, Register Rn, Register Rm, Register Ra, Register tmp = rscratch2); >>> 446: void msubw(Register Rd, Register Rn, Register Rm, Register Ra, Register tmp = rscratch2); >> >> Please delete these two methods that use rscratch2 as a default tmp register. > > I tried it and encountered a problem: msub/msubw will be called in other functions, > > inline void mnegw(Register Rd, Register Rn, Register Rm) { > msubw(Rd, Rn, Rm, zr); > } > > inline void mneg(Register Rd, Register Rn, Register Rm) { > msub(Rd, Rn, Rm, zr); > } > > If we add a parameter to msub/msubw, then > > 1. All these functions (mnegw, mneg, ...) that call msub/msubw also need to add parameters, > > 2. Moreover, all the calling functions of mnegw and mneg also need to be modified. > > The above two effects involve too many code changes. > > Please see if we can keep a default parameter for msub/msubw so that their calling functions do not need to be modified. A hidden clobber of a scratch register makes msub dangerous to use, as the failure of the first version of this PR proved. You don't have to convert 100% of the uses of `msub`, just those that are frequently executed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19471#discussion_r1629020529 From stuefe at openjdk.org Thu Jun 6 08:36:46 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 6 Jun 2024 08:36:46 GMT Subject: RFR: 8324781: runtime/Thread/TestAlwaysPreTouchStacks.java failed with Expected a higher ratio between stack committed and reserved [v9] In-Reply-To: <5BQVnl993FEOJPzQTIN-spFZSEtGtPJjFTrU_L87xrc=.b87b3e4a-a773-4c83-88d3-e038adc615f6@github.com> References: <5BQVnl993FEOJPzQTIN-spFZSEtGtPJjFTrU_L87xrc=.b87b3e4a-a773-4c83-88d3-e038adc615f6@github.com> Message-ID: <89F605eZgiLI2OBIoYugIaQz_wuiD1FZGVrSY00E-pA=.934cd754-1812-487a-be8b-ed06e104d7da@github.com> On Thu, 6 Jun 2024 07:59:32 GMT, Patrick Zhang wrote: > To clarify, the current revised patch does **NOT** add any if-cond to check UEK, instead it enhances the check of the readiness of `MADV_POPULATE_WRITE` on `5.14 and beyond` (stated by Linux kernel official docs). In addition, the two test cases failing on UEK kernel can also be a justification that the condition of having `supportMadvPopulateWrite` needs to be strengthened. We could not say that it is a problem only exists in UEK kernel, in theory every distros could backport part of `MADV_POPULATE_WRITE` function from 5.14 to previous version or reuse any madv op code there, which is out of our control or our testing coverage. The problem does not exist for other kernels if they don't break binary compatibility with the mainline kernel. Which UEK does by introducing new madvise flags that occupy numerical slots that should be unoccupied. A clean downstream kernel can downport MADV_POPULATE_WRITE without problems since there will be no numerical clash. Now, even if a clean downstream kernel were to downport support for MADV_POPULATE_WRITE, we would not benefit from it. > > Therefore, having the >=5.14 check can be a safer way to all. The real issue is a clash with UEK-proprietary flags. Adding proprietary flags is not a good practice. More clashes are waiting to happen, including clashes with the far more dangerous MADV_DOEXEC. I worry that someone, at some point, may add an innocous patch in the future using one of these and trigger usage of MADV_DOEXEC. I would feel better if we had clear safeguards to prevent that. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18592#issuecomment-2151715722 From qpzhang at openjdk.org Thu Jun 6 09:11:55 2024 From: qpzhang at openjdk.org (Patrick Zhang) Date: Thu, 6 Jun 2024 09:11:55 GMT Subject: RFR: 8324781: runtime/Thread/TestAlwaysPreTouchStacks.java failed with Expected a higher ratio between stack committed and reserved [v9] In-Reply-To: <89F605eZgiLI2OBIoYugIaQz_wuiD1FZGVrSY00E-pA=.934cd754-1812-487a-be8b-ed06e104d7da@github.com> References: <5BQVnl993FEOJPzQTIN-spFZSEtGtPJjFTrU_L87xrc=.b87b3e4a-a773-4c83-88d3-e038adc615f6@github.com> <89F605eZgiLI2OBIoYugIaQz_wuiD1FZGVrSY00E-pA=.934cd754-1812-487a-be8b-ed06e104d7da@github.com> Message-ID: On Thu, 6 Jun 2024 08:33:57 GMT, Thomas Stuefe wrote: > The real issue is a clash with UEK-proprietary flags. Adding proprietary flags is not a good practice. More clashes are waiting to happen, including clashes with the far more dangerous MADV_DOEXEC. Thanks @tstuefe, agree with you on this point, so having >=5.14 check is a way to protect JVM from accidentally executing unexpected/dangerous ops like MADV_DOEXEC, although the if-cond is not an elegant code snippet to OpenJDK. @limingliu-ampere and I did not have tests (or any downport experiments) on pre-5.14 so we are not sure if a downport support for MADV_POPULATE_WRITE would normally function or practically benefit. @stefank @jdksjolen what's your option? In the change suggested by @tstuefe , the pretouch function on UEK pre-5.14 would keep the `bug` until https://github.com/oracle/linux-uek/issues/23 gets fixed by UEK, while `-XX:-UseMadvPopulateWrite` will be added to the two test cases so failures will be gone (hidden). Please share your comments, thanks. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18592#issuecomment-2151788943 From duke at openjdk.org Thu Jun 6 10:48:53 2024 From: duke at openjdk.org (duke) Date: Thu, 6 Jun 2024 10:48:53 GMT Subject: Withdrawn: 8321266: Add diagnostic RSS threshold In-Reply-To: References: Message-ID: On Sun, 3 Dec 2023 12:51:24 GMT, Thomas Stuefe wrote: > We have `MallocLimit`, a way to trigger errors when reaching a given malloc load threshold. This PR proposes > a complementary switch, `RSSLimit`, that does the same based on the Resident Set Size of the process. > > --- > > Motivation: > > The main usage for this option is to analyze OOM kills. OOM kills can happen at various layers: the process may be either killed by the kernel OOM killer, or the whole container may get scrapped if it uses too much memory. > > One rarely has any information on the nature of the OOM, or if there even was one, and if yes, if the JVM was the culprit or just an innocent bystander. In these situations, getting a voluntary abort *before* the process gets killed from outside can give us valuable information. > > Another use of this feature can be testing: specifying an envelope of "reasonable" RSS for testing to check the expected footprint of the JVM. Also useful for a global test-wide setting to catch obvious footprint degradations early. > > Letting the JVM handle this Limit has many advantages: > > - since the limit is artificial, error reporting is not affected. Other mechanisms (e.g. ulimit) are likely to prevent effective error reporting. I usually get torn hs-err files when a limit restriction hits since error reporting needs dynamic memory (regrettably) and space on the stack to do its work. > > - Re-using the normal error reporting mechanism is powerful since: > - hs-err files contain lots of information already: machine memory status, NMT summary, heap information etc. > - Using `OnError`, that mechanism is expandable: we can run many further diagnostics like Metaspace or Compiler memory reports, detailed NMT reports, System memory maps, and even heap dumps. > - Using `ErrorLogToStd(out|err)` will redirect the hs-err file and let us see what's happening in cloud situations where file systems are often ephemeral. > > ---- > > Usage: > > Limit is given either as an absolute number or as a relative percentage of the total memory of the machine or the container, e.g. > `-XX:RssLimit=2G` or `-XX:RssLimit=80%`. > > If given as percent, JVM will also react to container limit updates. > > Example: we run the JVM inside a container as the sole payload process. Limit its RSS to 90% of the container limit, and in case we run into the limit, fire a heap dump: > > `java -XX:+UnlockDiagnosticVMOptions -XX:RssLimit=80% '-XX:OnError=jcmd %p GC.heap_dump my-dump' -Xlog:os+rss ` > > ---- > > Patch: > > Implemented for Linux, MacOS and Windows. Left out AIX since there we have a long-... This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/16938 From shade at openjdk.org Thu Jun 6 10:51:44 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 6 Jun 2024 10:51:44 GMT Subject: RFR: 8325984: 4 jcstress tests are failing in Tier6 4 times each In-Reply-To: References: Message-ID: On Wed, 5 Jun 2024 19:21:56 GMT, Jorn Vernee wrote: > These 4 tests were failing due to an incompatibility with jcstress. They were problemlisted in past (https://bugs.openjdk.org/browse/JDK-8326062). > > Now that jcstress has been updated (https://github.com/openjdk/jdk/pull/19332) with the relevant fix (https://github.com/openjdk/jcstress/pull/147), we can re-enable these tests. > > Testing: I've verified that all 4 tests now pass on Linux-x64 I think only Oracle CIs run these tests through jtreg wrappers? Anyway, this looks good to me. ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19565#pullrequestreview-2101607822 From mdoerr at openjdk.org Thu Jun 6 10:59:46 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Thu, 6 Jun 2024 10:59:46 GMT Subject: RFR: 8331733: [PPC64] saving and restoring CR is not needed at most places [v4] In-Reply-To: References: Message-ID: On Wed, 5 Jun 2024 10:35:26 GMT, Varada M wrote: >> PPC64 uses save/restore CR less often. Only LR is critical, CR is mainly needed for native-to-Java calls. >> Fastdebug: build and tier1 testing successful. [unrelated failures] > > Varada M has updated the pull request incrementally with one additional commit since the last revision: > > [PPC64] saving and restoring CR is not needed at most places The tests have passed on our side. I can approve it once the Copyright years are updated. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19494#issuecomment-2152013268 From varadam at openjdk.org Thu Jun 6 11:24:02 2024 From: varadam at openjdk.org (Varada M) Date: Thu, 6 Jun 2024 11:24:02 GMT Subject: RFR: 8331733: [PPC64] saving and restoring CR is not needed at most places [v5] In-Reply-To: References: Message-ID: > PPC64 uses save/restore CR less often. Only LR is critical, CR is mainly needed for native-to-Java calls. > Fastdebug: build and tier1 testing successful. [unrelated failures] Varada M has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: - Merge branch 'master' into LR - [PPC64] saving and restoring CR is not needed at most places - [PPC64] saving and restoring CR is not needed at most places - [PPC64] saving and restoring CR is not needed at most places - [PPC64] saving and restoring CR is not needed at most places - [PPC64] saving and restoring CR is not needed at most places - [PPC64] saving and restoring CR is not needed at most places ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19494/files - new: https://git.openjdk.org/jdk/pull/19494/files/3445e9ac..5938878b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19494&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19494&range=03-04 Stats: 90827 lines in 1720 files changed: 65308 ins; 17165 del; 8354 mod Patch: https://git.openjdk.org/jdk/pull/19494.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19494/head:pull/19494 PR: https://git.openjdk.org/jdk/pull/19494 From varadam at openjdk.org Thu Jun 6 11:24:02 2024 From: varadam at openjdk.org (Varada M) Date: Thu, 6 Jun 2024 11:24:02 GMT Subject: RFR: 8331733: [PPC64] saving and restoring CR is not needed at most places [v4] In-Reply-To: References: Message-ID: On Thu, 6 Jun 2024 10:56:52 GMT, Martin Doerr wrote: >> Varada M has updated the pull request incrementally with one additional commit since the last revision: >> >> [PPC64] saving and restoring CR is not needed at most places > > The tests have passed on our side. I can approve it once the Copyright years are updated. Thanks @TheRealMDoerr for running the tests. Only tier1 is completed from my side which looks fine. @offamitkumar I have updated the copyright headers. Thank you ------------- PR Comment: https://git.openjdk.org/jdk/pull/19494#issuecomment-2152096579 From simonis at openjdk.org Thu Jun 6 11:42:49 2024 From: simonis at openjdk.org (Volker Simonis) Date: Thu, 6 Jun 2024 11:42:49 GMT Subject: RFR: 8329421: Native methods can not be selectively printed [v2] In-Reply-To: References: Message-ID: <4Uhx8KaPA5l7EzeBd-gmnt5ORqt-UkLzUrDZIoL7KJQ=.6165f970-1281-4765-bc2f-445fa5ea0631@github.com> On Tue, 2 Apr 2024 07:23:25 GMT, Volker Simonis wrote: >> Native methods (i.e. "native wrappers") can not be selectively printed with `-XX:CompileCommand=print,class::method`. Currently the only way to print native methods is to use the global `-XX:+PrintAssembly` option. But this prints *all* compiled methods which can be too much if we're just interested in a specific native wrapper. There's no reason to not apply `-XX:CompileCommand` options correctly to native methods as well. > > Volker Simonis has updated the pull request incrementally with one additional commit since the last revision: > > Add test for -XX:+PrintNativeNMethods Still not sure how exactly it reproduces for you, but I can now reproduce the issue if I configure with `--with-jvm-variants=core --disable-jvm-feature-zgc --disable-jvm-feature-shenandoahgc --disable-jvm-feature-g1gc --disable-jvm-feature-parallelgc`. Otherwise I'm always running into compilation errors in one of the GC-related files. Maybe you compile on a platform where all these GCs are excluded by default or you excluded them manually? Either way, I think I found the problem. I'll open a JBS issue and post a PR with a fix in a few minutes. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18567#issuecomment-2152160132 From amitkumar at openjdk.org Thu Jun 6 12:10:46 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Thu, 6 Jun 2024 12:10:46 GMT Subject: RFR: 8331733: [PPC64] saving and restoring CR is not needed at most places [v5] In-Reply-To: References: Message-ID: On Thu, 6 Jun 2024 11:24:02 GMT, Varada M wrote: >> PPC64 uses save/restore CR less often. Only LR is critical, CR is mainly needed for native-to-Java calls. >> Fastdebug: build and tier1 testing successful. [unrelated failures] > > Varada M has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: > > - Merge branch 'master' into LR > - [PPC64] saving and restoring CR is not needed at most places > - [PPC64] saving and restoring CR is not needed at most places > - [PPC64] saving and restoring CR is not needed at most places > - [PPC64] saving and restoring CR is not needed at most places > - [PPC64] saving and restoring CR is not needed at most places > - [PPC64] saving and restoring CR is not needed at most places Looks good to me. I ran tier1 & vector test on `ppc-le` and these are the results: failures in tier1 (looks unrelated to this PR): gtest/GTestWrapper.java java/util/ResourceBundle/Control/MissingResourceCauseTestRun.java jdk/javadoc/doclet/testIOException/TestIOException.java Vector tests: Test summary ============================== TEST TOTAL PASS FAIL ERROR jtreg:./test/jdk/jdk/incubator/vector 78 78 0 0 ============================== TEST SUCCESS `` ------------- Marked as reviewed by amitkumar (Committer). PR Review: https://git.openjdk.org/jdk/pull/19494#pullrequestreview-2101765929 From mdoerr at openjdk.org Thu Jun 6 12:24:47 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Thu, 6 Jun 2024 12:24:47 GMT Subject: RFR: 8331733: [PPC64] saving and restoring CR is not needed at most places [v5] In-Reply-To: References: Message-ID: On Thu, 6 Jun 2024 11:24:02 GMT, Varada M wrote: >> PPC64 uses save/restore CR less often. Only LR is critical, CR is mainly needed for native-to-Java calls. >> Fastdebug: build and tier1 testing successful. [unrelated failures] > > Varada M has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: > > - Merge branch 'master' into LR > - [PPC64] saving and restoring CR is not needed at most places > - [PPC64] saving and restoring CR is not needed at most places > - [PPC64] saving and restoring CR is not needed at most places > - [PPC64] saving and restoring CR is not needed at most places > - [PPC64] saving and restoring CR is not needed at most places > - [PPC64] saving and restoring CR is not needed at most places Marked as reviewed by mdoerr (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/19494#pullrequestreview-2101793809 From stuefe at openjdk.org Thu Jun 6 12:40:51 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 6 Jun 2024 12:40:51 GMT Subject: RFR: 8324781: runtime/Thread/TestAlwaysPreTouchStacks.java failed with Expected a higher ratio between stack committed and reserved [v9] In-Reply-To: References: Message-ID: On Mon, 6 May 2024 03:33:30 GMT, Liming Liu wrote: >> The testcase failed on Oracle CI since JDK-8315923. The root cause is that Oracle CI runs Linux-5.4.17-UEK where the value of MADV_POPULATE_WRITE (23) is used as MADV_DONTEXEC which is not supported by upstream. This PR solves the testcase failure by checking versions of kernels first, and checking the availability of MADV_POPULATE_WRITE when they are not older than 5.14. > > Liming Liu has updated the pull request incrementally with one additional commit since the last revision: > > Fix the wrong condition Meanwhile, I am warming to the current approach. I understand that this it avoids referring to individual downstream vendors, which I agree may be brittle. My main concern is to prevent future flag mismatches. Therefore, my proposal is to do what this patch does, but in a more generic way. Essentially, encoding that for certain flags, we cannot rely on older kernel correctly ignoring them. But we assume that downstream kernel vendors will at least fix conflicts when they merge in flags from mainline. We sacrifice the ability to benefit from vendor-specific backports, but that is the compromise. The flags I'd like to guard for now are: 1) UEK7: MADV_DONTNEED_LOCKED -> MADV_DOEXEC 2) UEK7: MADV_COLLAPSE -> MADV_DONTEXEC 3) UEK6: MADV_POPULATE_READ -> MADV_DOEXEC 4) UEK6: MADV_POPULATE_WRITE -> MADV_DONTEXEC If the vendor keeps up its routine of just shifting the proprietary flags to the end of the numerical MADV range for each new mainline flag, we will continue to have problems and this list may grow. The mechanism could be very close to what @limingliu-ampere does now, only a tad more generic. E.g.: bool os::Linux::can_use_madvise_flag(int someflag) { // have a hardcoded array of { flag, kernel version } tupels. // Search it for someflag, and if found, return false if host kernel version is older than the encoded version. // Otherwise return true. } and then maybe wrap the madvise call with something like this: bool os::Linux::checked_madvise(..., someflag) { assert(can_use_madvise_flag(someflag)) call real madvise } in addition to something like this in initialization: if (UseMadvPopulateWrite && ! can_use_madvise_flag(MADV_POPULATE_WRITE)) { FLAG_SET_ERGO(UseMadvPopulateWrite, false); } Do you like this, does this make sense? ------------- PR Comment: https://git.openjdk.org/jdk/pull/18592#issuecomment-2152304920 From simonis at openjdk.org Thu Jun 6 13:41:51 2024 From: simonis at openjdk.org (Volker Simonis) Date: Thu, 6 Jun 2024 13:41:51 GMT Subject: RFR: 8329421: Native methods can not be selectively printed [v2] In-Reply-To: References: Message-ID: On Thu, 6 Jun 2024 01:26:49 GMT, Wang Haomin wrote: >> Volker Simonis has updated the pull request incrementally with one additional commit since the last revision: >> >> Add test for -XX:+PrintNativeNMethods > > First, I've build on aarch64-linux. `git checkout jdk-23+17` or `git checkout 3057dded4878b0110bc2c09b52019570a0a31c9f`. > Secondly, configure with `--with-debug-level=fastdebug --with-jvm-variants=core`. > Finally, `make images CONF=core`. @haominw , I have now created [JDK-8333722: 8333722: Fix CompilerDirectevies for non-compiler JVM variants](https://bugs.openjdk.org/browse/JDK-8333722) to track this issue and submitted #19578 with a proposed fix. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18567#issuecomment-2152539240 From stuefe at openjdk.org Thu Jun 6 14:15:46 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 6 Jun 2024 14:15:46 GMT Subject: RFR: 8333658: NMT: Use an allocator with 4-byte pointers to save memory in NativeCallStackStorage In-Reply-To: References: Message-ID: On Fri, 26 Apr 2024 14:29:53 GMT, Johan Sj?len wrote: > Hi, > > This PR introduces a new allocator, `IndexedFreelistAllocator`. It uses a `GrowableArray` in order to have 4-byte "pointers" to its elements and also works as a freelist as unused slots form a linked list. This allocator cannot shrink its used memory. I'm always open for better names. > > We then use this allocator in order to store the `NativeCallStackStorage` hash table. This saves `4 * 4099 = ~16KiB` of memory as each table element is now only 4 bytes. For each stored stack it also saves 4 bytes. The fact that the allocator cannot shrink is fine, we do not ever remove stacks from the storage. > > The main point of this is to introduce a pointer-generic experimential API, so I also implemented CHeap and Arena allocators. It's currently placed in NMT, but we might want to move it into utilities. It uses a bit of template-magic, but my IDE (Emacs+clangd) was actually able to figure things out when the types didn't line up correctly etc, so it's not an enemy to IDE help. > > It sounded expensive to have the GrowableArray continously realloc its underlying data array, so I did a basic test where we allocate 1 000 000 stacks and push them into NativeCallStackStorage backed by different allocators. This is available in the PR. > > The results are as follows on linux-x64-slowdebug: > > > Generate stacks... Done > Time taken with GrowableArray: 8341.240945 > Time taken with CHeap: 12189.031318 > Time taken with Arena: 8800.703092 > Time taken with GrowableArray again: 8295.508829 > > > And on linux-x64: > > > Time taken with GrowableArray: 8378.018135 > Time taken with CHeap: 12437.347868 > Time taken with Arena: 8758.064717 > Time taken with GrowableArray again: 8391.076291 > > > Obviously, this is a very basic benchmark, but it seems like we're faster than CHeap and Arena for this case. Started looking at this. I like this. The smaller beauty spot of your implementation is that by relying on GrowableHeapArray, we are bound to the size of its index type, which is int. So we will never be able to hold more than 2G-1 entries. I think thats okay for now though, and if not, its easy to change in GrowableArray. A bigger beauty spot is that it needs explicit initialization for every member (IIUC). That feels just wasteful, especially in combination with the generous hard-coded power-of-2 growth of GrowableArray. src/hotspot/share/nmt/indexedFreeListAllocator.hpp line 23: > 21: return idx == other.idx; > 22: } > 23: }; Why so much code? Why not just typedef to uint32? src/hotspot/share/nmt/indexedFreeListAllocator.hpp line 27: > 25: // A free list allocator element is either a link to the next free space > 26: // Or an actual element. > 27: union BackingElement { No need for this to be public src/hotspot/share/nmt/indexedFreeListAllocator.hpp line 38: > 36: BackingElement(E& e) { > 37: this->e = e; > 38: } What would be cool is for this data structure to work with any E without requiring E to implement default- and copy-constructors. E.g. an E with a mandatory const member and a single constructor that inits that member. Maybe we could do that by not storing E but making sure there is enough space for E, like this: union BackingElement { I link; char x[sizeof(E)]; }; Then use x and placement new, which you do already. sizeof(E) would also account for intra-array padding needed to guarantee E alignment. All we need to make sure then is to align the start pointer of the first BackingElement, but since that one is malloced from C-heap, its already aligned. And I think we don't really need any constructors here. Also, please remove unnecessary this->. Please use initializer lists. src/hotspot/share/nmt/indexedFreeListAllocator.hpp line 41: > 39: }; > 40: GrowableArrayCHeap backing_storage; > 41: I free_start; Please use underscore for membernames (like, everywhere :) src/hotspot/share/nmt/indexedFreeListAllocator.hpp line 45: > 43: IndexedFreeListAllocator() > 44: : backing_storage(8), > 45: free_start(I{0}) {} weird indentation. Also, please expose the init size to outside, possibly with a sensible default like your 8 src/hotspot/share/nmt/indexedFreeListAllocator.hpp line 58: > 56: // Follow the link to the next free element > 57: free_start = be.link; > 58: } This feels a bit uncommon and contrary to the normal linked list implementation. If possible, I would opt for a more traditional approach that is easier to understand and does not rely on at_grow (e.g. in case we want adapt this to some other form of backing storage). e.g. if (_freelist == -1) { allocate new slot } else { reuse slot at freelist } src/hotspot/share/nmt/indexedFreeListAllocator.hpp line 67: > 65: be_freed.link = free_start; > 66: free_start = i; > 67: } I don't think a free(index) is very useful. We need a free(E* e), that calculates I from &e. (Your one real world usage example, NativeCallStackStorage, conveniently never frees :=) src/hotspot/share/nmt/indexedFreeListAllocator.hpp line 71: > 69: E& operator[](I i) { > 70: return backing_storage.at(i.idx).e; > 71: } Do we need this? src/hotspot/share/nmt/indexedFreeListAllocator.hpp line 75: > 73: E& translate(I i) { > 74: return backing_storage.at(i.idx).e; > 75: } - I'd just call this function "at" as we usually do. - I also would provide a const variant that exposes const E&. - Please assert for i to be correct (not nil, not oob) src/hotspot/share/nmt/indexedFreeListAllocator.hpp line 77: > 75: } > 76: }; > 77: What are these allocators for? ------------- PR Review: https://git.openjdk.org/jdk/pull/18979#pullrequestreview-2101843216 PR Review Comment: https://git.openjdk.org/jdk/pull/18979#discussion_r1629471631 PR Review Comment: https://git.openjdk.org/jdk/pull/18979#discussion_r1629574300 PR Review Comment: https://git.openjdk.org/jdk/pull/18979#discussion_r1629474658 PR Review Comment: https://git.openjdk.org/jdk/pull/18979#discussion_r1629567480 PR Review Comment: https://git.openjdk.org/jdk/pull/18979#discussion_r1629572647 PR Review Comment: https://git.openjdk.org/jdk/pull/18979#discussion_r1629622542 PR Review Comment: https://git.openjdk.org/jdk/pull/18979#discussion_r1629627129 PR Review Comment: https://git.openjdk.org/jdk/pull/18979#discussion_r1629576662 PR Review Comment: https://git.openjdk.org/jdk/pull/18979#discussion_r1629578280 PR Review Comment: https://git.openjdk.org/jdk/pull/18979#discussion_r1629637028 From stuefe at openjdk.org Thu Jun 6 14:15:47 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 6 Jun 2024 14:15:47 GMT Subject: RFR: 8333658: NMT: Use an allocator with 4-byte pointers to save memory in NativeCallStackStorage In-Reply-To: References: Message-ID: On Thu, 6 Jun 2024 14:09:23 GMT, Thomas Stuefe wrote: >> Hi, >> >> This PR introduces a new allocator, `IndexedFreelistAllocator`. It uses a `GrowableArray` in order to have 4-byte "pointers" to its elements and also works as a freelist as unused slots form a linked list. This allocator cannot shrink its used memory. I'm always open for better names. >> >> We then use this allocator in order to store the `NativeCallStackStorage` hash table. This saves `4 * 4099 = ~16KiB` of memory as each table element is now only 4 bytes. For each stored stack it also saves 4 bytes. The fact that the allocator cannot shrink is fine, we do not ever remove stacks from the storage. >> >> The main point of this is to introduce a pointer-generic experimential API, so I also implemented CHeap and Arena allocators. It's currently placed in NMT, but we might want to move it into utilities. It uses a bit of template-magic, but my IDE (Emacs+clangd) was actually able to figure things out when the types didn't line up correctly etc, so it's not an enemy to IDE help. >> >> It sounded expensive to have the GrowableArray continously realloc its underlying data array, so I did a basic test where we allocate 1 000 000 stacks and push them into NativeCallStackStorage backed by different allocators. This is available in the PR. >> >> The results are as follows on linux-x64-slowdebug: >> >> >> Generate stacks... Done >> Time taken with GrowableArray: 8341.240945 >> Time taken with CHeap: 12189.031318 >> Time taken with Arena: 8800.703092 >> Time taken with GrowableArray again: 8295.508829 >> >> >> And on linux-x64: >> >> >> Time taken with GrowableArray: 8378.018135 >> Time taken with CHeap: 12437.347868 >> Time taken with Arena: 8758.064717 >> Time taken with GrowableArray again: 8391.076291 >> >> >> Obviously, this is a very basic benchmark, but it seems like we're faster than CHeap and Arena for this case. > > src/hotspot/share/nmt/indexedFreeListAllocator.hpp line 67: > >> 65: be_freed.link = free_start; >> 66: free_start = i; >> 67: } > > I don't think a free(index) is very useful. We need a free(E* e), that calculates I from &e. (Your one real world usage example, NativeCallStackStorage, conveniently never frees :=) And we may want a function, complementing `at()`, that gives us the index for a given element pointer. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18979#discussion_r1629633637 From stuefe at openjdk.org Thu Jun 6 14:18:45 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 6 Jun 2024 14:18:45 GMT Subject: RFR: 8333658: NMT: Use an allocator with 4-byte pointers to save memory in NativeCallStackStorage In-Reply-To: References: Message-ID: <0sXgKQEHW80r3Y_NH8TdGV-CumNOpsfNQoagmRhFTDo=.56f448da-a92f-4d9b-9d74-5e82817ce166@github.com> On Thu, 6 Jun 2024 14:06:22 GMT, Thomas Stuefe wrote: >> Hi, >> >> This PR introduces a new allocator, `IndexedFreelistAllocator`. It uses a `GrowableArray` in order to have 4-byte "pointers" to its elements and also works as a freelist as unused slots form a linked list. This allocator cannot shrink its used memory. I'm always open for better names. >> >> We then use this allocator in order to store the `NativeCallStackStorage` hash table. This saves `4 * 4099 = ~16KiB` of memory as each table element is now only 4 bytes. For each stored stack it also saves 4 bytes. The fact that the allocator cannot shrink is fine, we do not ever remove stacks from the storage. >> >> The main point of this is to introduce a pointer-generic experimential API, so I also implemented CHeap and Arena allocators. It's currently placed in NMT, but we might want to move it into utilities. It uses a bit of template-magic, but my IDE (Emacs+clangd) was actually able to figure things out when the types didn't line up correctly etc, so it's not an enemy to IDE help. >> >> It sounded expensive to have the GrowableArray continously realloc its underlying data array, so I did a basic test where we allocate 1 000 000 stacks and push them into NativeCallStackStorage backed by different allocators. This is available in the PR. >> >> The results are as follows on linux-x64-slowdebug: >> >> >> Generate stacks... Done >> Time taken with GrowableArray: 8341.240945 >> Time taken with CHeap: 12189.031318 >> Time taken with Arena: 8800.703092 >> Time taken with GrowableArray again: 8295.508829 >> >> >> And on linux-x64: >> >> >> Time taken with GrowableArray: 8378.018135 >> Time taken with CHeap: 12437.347868 >> Time taken with Arena: 8758.064717 >> Time taken with GrowableArray again: 8391.076291 >> >> >> Obviously, this is a very basic benchmark, but it seems like we're faster than CHeap and Arena for this case. > > src/hotspot/share/nmt/indexedFreeListAllocator.hpp line 58: > >> 56: // Follow the link to the next free element >> 57: free_start = be.link; >> 58: } > > This feels a bit uncommon and contrary to the normal linked list implementation. If possible, I would opt for a more traditional approach that is easier to understand and does not rely on at_grow (e.g. in case we want adapt this to some other form of backing storage). > > e.g. > > > if (_freelist == -1) { > allocate new slot > } else { > reuse slot at freelist > } Note that at_grow also could have the problem that an error, resulting in a high index, would cause a large allocation to happen. I think this class could use an optional max value. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18979#discussion_r1629643025 From heidinga at openjdk.org Thu Jun 6 14:56:52 2024 From: heidinga at openjdk.org (Dan Heidinga) Date: Thu, 6 Jun 2024 14:56:52 GMT Subject: RFR: 8293980: Resolve CONSTANT_FieldRef at CDS dump time [v5] In-Reply-To: References: <6tYcoQdH8mEhbBRUoAcOi3Gue7Lz9qIjJh3GpcLKGGE=.0b286773-cc03-434e-88ca-2c9cc5efce67@github.com> Message-ID: On Mon, 3 Jun 2024 21:23:59 GMT, Ioi Lam wrote: >> ### Overview >> >> This PR archives `CONSTANT_FieldRef` entries in the _resolved_ state when it's safe to do so. >> >> I.e., when a `CONSTANT_FieldRef` constant pool entry in class `A` refers to a *non-static* field `B.F`, >> - `B` is the same class as `A`; or >> - `B` is a supertype of `A`; or >> - `B` is one of the [vmClasses](https://github.com/openjdk/jdk/blob/3d4185a9ce482cc655a4c67f39cb2682b02ae4fe/src/hotspot/share/classfile/vmClasses.hpp), and `A` is loaded by the boot class loader. >> >> Under these conditions, it's guaranteed that whenever `A` tries to use this entry at runtime, `B` is guaranteed to have already been resolved in A's system dictionary, to the same value as resolved during dump time. >> >> Therefore, we can safely archive the `ResolvedFieldEntry` in class `A` that refers to `B.F`. >> >> (Note that we do not archive the `CONSTANT_FieldRef` entries for static fields, as the resolution of such entries can lead to class initialization at runtime. We plan to handle them in a future RFE.) >> >> ### Static CDS Archive >> >> This feature is implemented in three steps for static CDS archive dump: >> >> 1. At the end of the training run, `ClassListWriter` iterates over all loaded classes and writes the indices of their resolved `Class` and `FieldRef` constant pool entries into the classlist file, with the `@cp` prefix. E.g., the following means that the constant pool entries at indices 2, 19 and 106 were resolved during the training run: >> >> @cp java/util/Objects 2 19 106 >> >> 2. When creating the static CDS archive from the classlist file, `ClassListParser` processes the `@cp` entries and resolves all the indicated entries. >> >> 3. Inside the `ArchiveBuilder::make_klasses_shareable()` function, we iterate over all entries in all archived `ConstantPools`. When we see a _resolved_ entry that does not satisfy the safety requirements as stated in _Overview_, we revert it back to the unresolved state. >> >> ### Dynamic CDS Archive >> >> When dumping the dynamic CDS archive, `ClassListWriter` and `ClassListParser` are not used, so steps 1 and 2 are skipped. We only perform step 3 when the archive is being written. >> >> ### Limitations >> >> - For safety, we limit this optimization to only classes loaded by the boot, platform, and app class loaders. This may be relaxed in the future. >> - We archive only the constant pool entries that are actually resolved during the training run. We don't speculatively resolve other entries, as doing so may cause C2 to... > > Ioi Lam has updated the pull request incrementally with two additional commits since the last revision: > > - Added test case for safety with putfield against final fields (related to JDK-8157181) > - Moved the test ResolvedConstants.java to resolvedConstants, as we will have more tests cases in this area Marked as reviewed by heidinga (no project role). ------------- PR Review: https://git.openjdk.org/jdk/pull/19355#pullrequestreview-2102227890 From heidinga at openjdk.org Thu Jun 6 14:56:53 2024 From: heidinga at openjdk.org (Dan Heidinga) Date: Thu, 6 Jun 2024 14:56:53 GMT Subject: RFR: 8293980: Resolve CONSTANT_FieldRef at CDS dump time [v3] In-Reply-To: References: <6tYcoQdH8mEhbBRUoAcOi3Gue7Lz9qIjJh3GpcLKGGE=.0b286773-cc03-434e-88ca-2c9cc5efce67@github.com> <2cV0qix4YBr6H58RLaKdtiRmmiQ222IFHGH8kw-bWCY=.278bc880-4b3e-481c-90df-dd45a94f7822@github.com> Message-ID: On Mon, 3 Jun 2024 19:13:54 GMT, Ioi Lam wrote: >> This makes sense. I will try to prototype it in the Leyden repo and then update this PR. > > I tried skipping the `methodHandle` parameter to `InterpreterRuntime::resolve_get_put` but it's more complicated than I thought. > > 1. The `fieldDescriptor::has_initialized_final_update()` will return true IFF the class has `putfield` bytecodes to a final field outside of `` methods. See [here](https://github.com/openjdk/jdk/blob/9686e804a2b058955ff88149c54a0a7896c0a2eb/src/hotspot/share/interpreter/rewriter.cpp#L463) > 2. When `InterpreterRuntime::resolve_get_put` is called for a `putfield`, it adds `putfield` to the `ResolvedFieldEntry` only if `fieldDescriptor::has_initialized_final_update()` is false. See [here](https://github.com/openjdk/jdk/blob/9686e804a2b058955ff88149c54a0a7896c0a2eb/src/hotspot/share/interpreter/interpreterRuntime.cpp#L703) > 3. `InterpreterRuntime::resolve_get_put`calls `LinkResolver::resolve_field()`, which always checks if the `methodHandle` is `` or not, without consulting `fieldDescriptor::has_initialized_final_update()`. See [here](https://github.com/openjdk/jdk/blob/9686e804a2b058955ff88149c54a0a7896c0a2eb/src/hotspot/share/interpreter/linkResolver.cpp#L1040) > > (2) is an optimization -- if a method sets final fields only inside `` methods, we should fully resolve the `putfield` bytecodes. Otherwise every such `putfield` will result in a VM call. > > (3) is for correctness -- make sure that only `` can modify final fields. > > I am pretty sure (2) and (3) are equivalent. I.e., we should check against the method in (3) only if `fieldDescriptor::has_initialized_final_update()` is true. However, (3) is security related code, so I don't want to change it inside an optimization PR like this one. Without fixing that, I cannot call `InterpreterRuntime::resolve_get_put` with a null `methodHandle`, as it will hit the assert. > > This goes back to my original point -- I'd rather do something stupid but correct (call the existing APIs and live with the existing behavior), rather than trying to analyze the resolution code and see if we can skip certain checks. I get what you're saying... and the `fieldDescriptor::has_initialized_final_update()` is a nice optimization that we don't want to mess up. Getting the straightforward code in makes sense as we can optimize it later given it runs at CDS archive build which isn't performance critical. Down the line, we can add new entry points into the LinkResolver for use by CDS that refuse to resolve final fields and avoid this kind of issue but that's a future problem. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19355#discussion_r1629702253 From jvernee at openjdk.org Thu Jun 6 15:21:45 2024 From: jvernee at openjdk.org (Jorn Vernee) Date: Thu, 6 Jun 2024 15:21:45 GMT Subject: RFR: 8325984: 4 jcstress tests are failing in Tier6 4 times each In-Reply-To: References: Message-ID: On Thu, 6 Jun 2024 10:48:51 GMT, Aleksey Shipilev wrote: > I think only Oracle CIs run these tests through jtreg wrappers? We do run them in our CI. Not sure who else runs them that way. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19565#issuecomment-2152799029 From jsjolen at openjdk.org Thu Jun 6 15:25:45 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Thu, 6 Jun 2024 15:25:45 GMT Subject: RFR: 8333658: NMT: Use an allocator with 4-byte pointers to save memory in NativeCallStackStorage In-Reply-To: References: Message-ID: On Fri, 26 Apr 2024 14:29:53 GMT, Johan Sj?len wrote: > Hi, > > This PR introduces a new allocator, `IndexedFreelistAllocator`. It uses a `GrowableArray` in order to have 4-byte "pointers" to its elements and also works as a freelist as unused slots form a linked list. This allocator cannot shrink its used memory. I'm always open for better names. > > We then use this allocator in order to store the `NativeCallStackStorage` hash table. This saves `4 * 4099 = ~16KiB` of memory as each table element is now only 4 bytes. For each stored stack it also saves 4 bytes. The fact that the allocator cannot shrink is fine, we do not ever remove stacks from the storage. > > The main point of this is to introduce a pointer-generic experimential API, so I also implemented CHeap and Arena allocators. It's currently placed in NMT, but we might want to move it into utilities. It uses a bit of template-magic, but my IDE (Emacs+clangd) was actually able to figure things out when the types didn't line up correctly etc, so it's not an enemy to IDE help. > > It sounded expensive to have the GrowableArray continously realloc its underlying data array, so I did a basic test where we allocate 1 000 000 stacks and push them into NativeCallStackStorage backed by different allocators. This is available in the PR. > > The results are as follows on linux-x64-slowdebug: > > > Generate stacks... Done > Time taken with GrowableArray: 8341.240945 > Time taken with CHeap: 12189.031318 > Time taken with Arena: 8800.703092 > Time taken with GrowableArray again: 8295.508829 > > > And on linux-x64: > > > Time taken with GrowableArray: 8378.018135 > Time taken with CHeap: 12437.347868 > Time taken with Arena: 8758.064717 > Time taken with GrowableArray again: 8391.076291 > > > Obviously, this is a very basic benchmark, but it seems like we're faster than CHeap and Arena for this case. Thanks Thomas! Sorry for the messy code, I felt that it was in good enough shape to show people and see what they think. I'll clean it up and ping you when it's ready. I answered some of your comments also. / Johan ------------- PR Review: https://git.openjdk.org/jdk/pull/18979#pullrequestreview-2102280016 From jsjolen at openjdk.org Thu Jun 6 15:25:47 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Thu, 6 Jun 2024 15:25:47 GMT Subject: RFR: 8333658: NMT: Use an allocator with 4-byte pointers to save memory in NativeCallStackStorage In-Reply-To: References: Message-ID: On Thu, 6 Jun 2024 12:46:20 GMT, Thomas Stuefe wrote: >> Hi, >> >> This PR introduces a new allocator, `IndexedFreelistAllocator`. It uses a `GrowableArray` in order to have 4-byte "pointers" to its elements and also works as a freelist as unused slots form a linked list. This allocator cannot shrink its used memory. I'm always open for better names. >> >> We then use this allocator in order to store the `NativeCallStackStorage` hash table. This saves `4 * 4099 = ~16KiB` of memory as each table element is now only 4 bytes. For each stored stack it also saves 4 bytes. The fact that the allocator cannot shrink is fine, we do not ever remove stacks from the storage. >> >> The main point of this is to introduce a pointer-generic experimential API, so I also implemented CHeap and Arena allocators. It's currently placed in NMT, but we might want to move it into utilities. It uses a bit of template-magic, but my IDE (Emacs+clangd) was actually able to figure things out when the types didn't line up correctly etc, so it's not an enemy to IDE help. >> >> It sounded expensive to have the GrowableArray continously realloc its underlying data array, so I did a basic test where we allocate 1 000 000 stacks and push them into NativeCallStackStorage backed by different allocators. This is available in the PR. >> >> The results are as follows on linux-x64-slowdebug: >> >> >> Generate stacks... Done >> Time taken with GrowableArray: 8341.240945 >> Time taken with CHeap: 12189.031318 >> Time taken with Arena: 8800.703092 >> Time taken with GrowableArray again: 8295.508829 >> >> >> And on linux-x64: >> >> >> Time taken with GrowableArray: 8378.018135 >> Time taken with CHeap: 12437.347868 >> Time taken with Arena: 8758.064717 >> Time taken with GrowableArray again: 8391.076291 >> >> >> Obviously, this is a very basic benchmark, but it seems like we're faster than CHeap and Arena for this case. > > src/hotspot/share/nmt/indexedFreeListAllocator.hpp line 38: > >> 36: BackingElement(E& e) { >> 37: this->e = e; >> 38: } > > What would be cool is for this data structure to work with any E without requiring E to implement default- and copy-constructors. E.g. an E with a mandatory const member and a single constructor that inits that member. > > Maybe we could do that by not storing E but making sure there is enough space for E, like this: > > > union BackingElement { > I link; > char x[sizeof(E)]; > }; > > > Then use x and placement new, which you do already. > > sizeof(E) would also account for intra-array padding needed to guarantee E alignment. All we need to make sure then is to align the start pointer of the first BackingElement, but since that one is malloced from C-heap, its already aligned. > > And I think we don't really need any constructors here. > > Also, please remove unnecessary this->. Please use initializer lists. That's a good idea. It probably should be: ```c++ union alignas(E) BackingElement { I link; char e[sizeof(E)]; }; > src/hotspot/share/nmt/indexedFreeListAllocator.hpp line 71: > >> 69: E& operator[](I i) { >> 70: return backing_storage.at(i.idx).e; >> 71: } > > Do we need this? It's nice when your other operation is called `translate`, maybe not as nice when it's called `at`. > src/hotspot/share/nmt/indexedFreeListAllocator.hpp line 77: > >> 75: } >> 76: }; >> 77: > > What are these allocators for? So you can easily swap out your allocator if you have a bug and want to ensure the bug isn't in your allocator. Also just completeness, and necessary for the performance tests. We can remove them, but I think they're nice to have. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18979#discussion_r1629735033 PR Review Comment: https://git.openjdk.org/jdk/pull/18979#discussion_r1629743150 PR Review Comment: https://git.openjdk.org/jdk/pull/18979#discussion_r1629745072 From jsjolen at openjdk.org Thu Jun 6 15:25:48 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Thu, 6 Jun 2024 15:25:48 GMT Subject: RFR: 8333658: NMT: Use an allocator with 4-byte pointers to save memory in NativeCallStackStorage In-Reply-To: References: Message-ID: <0bJNjeaUTS14H_sSjeVW1l2p9lBkLJmAT_8zUaj3QV8=.64e6f390-c81d-45cb-92b7-a58b27db21e0@github.com> On Thu, 6 Jun 2024 14:10:58 GMT, Thomas Stuefe wrote: >> src/hotspot/share/nmt/indexedFreeListAllocator.hpp line 67: >> >>> 65: be_freed.link = free_start; >>> 66: free_start = i; >>> 67: } >> >> I don't think a free(index) is very useful. We need a free(E* e), that calculates I from &e. (Your one real world usage example, NativeCallStackStorage, conveniently never frees :=) > > And we may want a function, complementing `at()`, that gives us the index for a given element pointer. I did start a port of the Treap to this allocator interface and there was no particular problem with using `free(I)` instead of `free(E* e)`. I gave up on that because it was a much larger change and I didn't know what the response would be to the interface itself (plus, we really do want to give back memory). Performing an inversion might not be easy. Consider something like a chunk allocator like our arena, translating index (`uint16_t chunk; uint16_t index;`) to a pointer requires walking the chunks. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18979#discussion_r1629742386 From jsjolen at openjdk.org Thu Jun 6 15:25:48 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Thu, 6 Jun 2024 15:25:48 GMT Subject: RFR: 8333658: NMT: Use an allocator with 4-byte pointers to save memory in NativeCallStackStorage In-Reply-To: <0bJNjeaUTS14H_sSjeVW1l2p9lBkLJmAT_8zUaj3QV8=.64e6f390-c81d-45cb-92b7-a58b27db21e0@github.com> References: <0bJNjeaUTS14H_sSjeVW1l2p9lBkLJmAT_8zUaj3QV8=.64e6f390-c81d-45cb-92b7-a58b27db21e0@github.com> Message-ID: <2RbJRlC5I0ZDwHLIaMj2mhT5v15zwC7t9_UBsbcCYxw=.79487b0d-30aa-4405-9141-e8daa02bf52b@github.com> On Thu, 6 Jun 2024 15:19:16 GMT, Johan Sj?len wrote: >> And we may want a function, complementing `at()`, that gives us the index for a given element pointer. > > I did start a port of the Treap to this allocator interface and there was no particular problem with using `free(I)` instead of `free(E* e)`. I gave up on that because it was a much larger change and I didn't know what the response would be to the interface itself (plus, we really do want to give back memory). Performing an inversion might not be easy. Consider something like a chunk allocator like our arena, translating index (`uint16_t chunk; uint16_t index;`) to a pointer requires walking the chunks. The tests also uses the freeing functionality. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18979#discussion_r1629745865 From jsjolen at openjdk.org Thu Jun 6 15:32:44 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Thu, 6 Jun 2024 15:32:44 GMT Subject: RFR: 8333658: NMT: Use an allocator with 4-byte pointers to save memory in NativeCallStackStorage In-Reply-To: <2RbJRlC5I0ZDwHLIaMj2mhT5v15zwC7t9_UBsbcCYxw=.79487b0d-30aa-4405-9141-e8daa02bf52b@github.com> References: <0bJNjeaUTS14H_sSjeVW1l2p9lBkLJmAT_8zUaj3QV8=.64e6f390-c81d-45cb-92b7-a58b27db21e0@github.com> <2RbJRlC5I0ZDwHLIaMj2mhT5v15zwC7t9_UBsbcCYxw=.79487b0d-30aa-4405-9141-e8daa02bf52b@github.com> Message-ID: <2v-ZxZQyu7d1glKb1xaZyz6aEu8tN7Hpil4TLq6g4AY=.1a6f2385-b0a7-464e-825f-3ccb8913e585@github.com> On Thu, 6 Jun 2024 15:21:42 GMT, Johan Sj?len wrote: >> I did start a port of the Treap to this allocator interface and there was no particular problem with using `free(I)` instead of `free(E* e)`. I gave up on that because it was a much larger change and I didn't know what the response would be to the interface itself (plus, we really do want to give back memory). Performing an inversion might not be easy. Consider something like a chunk allocator like our arena, translating index (`uint16_t chunk; uint16_t index;`) to a pointer requires walking the chunks. > > The tests also uses the freeing functionality. >). Performing an inversion might not be easy. Consider something like a chunk allocator like our arena, translating index (uint16_t chunk; uint16_t index;) to a pointer requires walking the chunks. Wait, I should've rested before I wrote that. Obviously we need to be able to make fast inversions for accessing the underlying element anyway. Still, the rest of what I wrote makes sense! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18979#discussion_r1629757893 From stuefe at openjdk.org Thu Jun 6 16:01:43 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 6 Jun 2024 16:01:43 GMT Subject: RFR: 8333658: NMT: Use an allocator with 4-byte pointers to save memory in NativeCallStackStorage In-Reply-To: References: Message-ID: On Wed, 5 Jun 2024 16:26:13 GMT, Johan Sj?len wrote: >> Hi, >> >> This PR introduces a new allocator, `IndexedFreelistAllocator`. It uses a `GrowableArray` in order to have 4-byte "pointers" to its elements and also works as a freelist as unused slots form a linked list. This allocator cannot shrink its used memory. I'm always open for better names. >> >> We then use this allocator in order to store the `NativeCallStackStorage` hash table. This saves `4 * 4099 = ~16KiB` of memory as each table element is now only 4 bytes. For each stored stack it also saves 4 bytes. The fact that the allocator cannot shrink is fine, we do not ever remove stacks from the storage. >> >> The main point of this is to introduce a pointer-generic experimential API, so I also implemented CHeap and Arena allocators. It's currently placed in NMT, but we might want to move it into utilities. It uses a bit of template-magic, but my IDE (Emacs+clangd) was actually able to figure things out when the types didn't line up correctly etc, so it's not an enemy to IDE help. >> >> It sounded expensive to have the GrowableArray continously realloc its underlying data array, so I did a basic test where we allocate 1 000 000 stacks and push them into NativeCallStackStorage backed by different allocators. This is available in the PR. >> >> The results are as follows on linux-x64-slowdebug: >> >> >> Generate stacks... Done >> Time taken with GrowableArray: 8341.240945 >> Time taken with CHeap: 12189.031318 >> Time taken with Arena: 8800.703092 >> Time taken with GrowableArray again: 8295.508829 >> >> >> And on linux-x64: >> >> >> Time taken with GrowableArray: 8378.018135 >> Time taken with CHeap: 12437.347868 >> Time taken with Arena: 8758.064717 >> Time taken with GrowableArray again: 8391.076291 >> >> >> Obviously, this is a very basic benchmark, but it seems like we're faster than CHeap and Arena for this case. > > An obvious improvement: Have the returned pointers remember which allocator object it was allocated from when in debug mode to avoid free:ing an element using the wrong allocator. You can also support detecting double frees by assigning each allocation a unique id. @jdksjolen Github is being weird again and shows all your comments twice. Bad Github. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18979#issuecomment-2152881898 From stuefe at openjdk.org Thu Jun 6 16:01:45 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 6 Jun 2024 16:01:45 GMT Subject: RFR: 8333658: NMT: Use an allocator with 4-byte pointers to save memory in NativeCallStackStorage In-Reply-To: References: Message-ID: On Thu, 6 Jun 2024 15:14:33 GMT, Johan Sj?len wrote: >> src/hotspot/share/nmt/indexedFreeListAllocator.hpp line 38: >> >>> 36: BackingElement(E& e) { >>> 37: this->e = e; >>> 38: } >> >> What would be cool is for this data structure to work with any E without requiring E to implement default- and copy-constructors. E.g. an E with a mandatory const member and a single constructor that inits that member. >> >> Maybe we could do that by not storing E but making sure there is enough space for E, like this: >> >> >> union BackingElement { >> I link; >> char x[sizeof(E)]; >> }; >> >> >> Then use x and placement new, which you do already. >> >> sizeof(E) would also account for intra-array padding needed to guarantee E alignment. All we need to make sure then is to align the start pointer of the first BackingElement, but since that one is malloced from C-heap, its already aligned. >> >> And I think we don't really need any constructors here. >> >> Also, please remove unnecessary this->. Please use initializer lists. > > That's a good idea. It probably should be: > > ```c++ > union alignas(E) BackingElement { > I link; > char e[sizeof(E)]; > }; Even better ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18979#discussion_r1629800717 From sgibbons at openjdk.org Thu Jun 6 17:44:05 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Thu, 6 Jun 2024 17:44:05 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v52] In-Reply-To: References: Message-ID: On Thu, 30 May 2024 16:16:45 GMT, Scott Gibbons wrote: >> Re-write the IndexOf code without the use of the pcmpestri instruction, only using AVX2 instructions. This change accelerates String.IndexOf on average 1.3x for AVX2. The benchmark numbers: >> >> >> Benchmark Score Latest >> StringIndexOf.advancedWithMediumSub 343.573 317.934 0.925375393x >> StringIndexOf.advancedWithShortSub1 1039.081 1053.96 1.014319384x >> StringIndexOf.advancedWithShortSub2 55.828 110.541 1.980027943x >> StringIndexOf.constantPattern 9.361 11.906 1.271872663x >> StringIndexOf.searchCharLongSuccess 4.216 4.218 1.000474383x >> StringIndexOf.searchCharMediumSuccess 3.133 3.216 1.02649218x >> StringIndexOf.searchCharShortSuccess 3.76 3.761 1.000265957x >> StringIndexOf.success 9.186 9.713 1.057369911x >> StringIndexOf.successBig 14.341 46.343 3.231504079x >> StringIndexOfChar.latin1_AVX2_String 6220.918 12154.52 1.953814533x >> StringIndexOfChar.latin1_AVX2_char 5503.556 5540.044 1.006629895x >> StringIndexOfChar.latin1_SSE4_String 6978.854 6818.689 0.977049957x >> StringIndexOfChar.latin1_SSE4_char 5657.499 5474.624 0.967675646x >> StringIndexOfChar.latin1_Short_String 7132.541 6863.359 0.962260014x >> StringIndexOfChar.latin1_Short_char 16013.389 16162.437 1.009307711x >> StringIndexOfChar.latin1_mixed_String 7386.123 14771.622 1.999915517x >> StringIndexOfChar.latin1_mixed_char 9901.671 9782.245 0.987938803 > > Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: > > Fix copyright & a couple of comment typos Hi, everyone. I see that JDK 23 has now been forked, and new commits go into the JDK 24 branch. I would like to get this in as soon as possible to have as much time with fuzzers, etc. for everyone to be confident in the code. I have 3 positive reviews on this PR and would like to integrate. Please reply as soon as you reasonably can with objections or approval and I will integrate. Thanks. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16753#issuecomment-2153072708 From kvn at openjdk.org Thu Jun 6 18:28:07 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 6 Jun 2024 18:28:07 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v52] In-Reply-To: References: Message-ID: On Thu, 30 May 2024 16:16:45 GMT, Scott Gibbons wrote: >> Re-write the IndexOf code without the use of the pcmpestri instruction, only using AVX2 instructions. This change accelerates String.IndexOf on average 1.3x for AVX2. The benchmark numbers: >> >> >> Benchmark Score Latest >> StringIndexOf.advancedWithMediumSub 343.573 317.934 0.925375393x >> StringIndexOf.advancedWithShortSub1 1039.081 1053.96 1.014319384x >> StringIndexOf.advancedWithShortSub2 55.828 110.541 1.980027943x >> StringIndexOf.constantPattern 9.361 11.906 1.271872663x >> StringIndexOf.searchCharLongSuccess 4.216 4.218 1.000474383x >> StringIndexOf.searchCharMediumSuccess 3.133 3.216 1.02649218x >> StringIndexOf.searchCharShortSuccess 3.76 3.761 1.000265957x >> StringIndexOf.success 9.186 9.713 1.057369911x >> StringIndexOf.successBig 14.341 46.343 3.231504079x >> StringIndexOfChar.latin1_AVX2_String 6220.918 12154.52 1.953814533x >> StringIndexOfChar.latin1_AVX2_char 5503.556 5540.044 1.006629895x >> StringIndexOfChar.latin1_SSE4_String 6978.854 6818.689 0.977049957x >> StringIndexOfChar.latin1_SSE4_char 5657.499 5474.624 0.967675646x >> StringIndexOfChar.latin1_Short_String 7132.541 6863.359 0.962260014x >> StringIndexOfChar.latin1_Short_char 16013.389 16162.437 1.009307711x >> StringIndexOfChar.latin1_mixed_String 7386.123 14771.622 1.999915517x >> StringIndexOfChar.latin1_mixed_char 9901.671 9782.245 0.987938803 > > Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: > > Fix copyright & a couple of comment typos Let me do quick testing with latest mainline (JDK 24 now). ------------- PR Comment: https://git.openjdk.org/jdk/pull/16753#issuecomment-2153142794 From cslucas at openjdk.org Thu Jun 6 19:15:55 2024 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Thu, 6 Jun 2024 19:15:55 GMT Subject: RFR: 8333566: Remove unused methods Message-ID: <3vCVNjjgFzEw_YgoboTLXemBuzwht9uEeZxC8yg5Zog=.a7600184-c4ca-4a93-8b76-74df7bdff405@github.com> Please, consider this patch to remove unused methods from the code base. To the best of my knowledge, these methods are only defined but never used. Here is a list with names of delete methods: https://gist.github.com/JohnTortugo/fccc29781a1b584c03162aa4e160e874 Tested with Linux x86_64 tier1-4, GHA, and only cross building to other platforms. ------------- Commit messages: - Temove trailing whitespace. - Removing unused methods in aarch64 - Merge remote-tracking branch 'origin/main' into unused-methods - Merge remote-tracking branch 'origin/main' into unused-methods - Remove defined but unused methods. Changes: https://git.openjdk.org/jdk/pull/19550/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19550&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8333566 Stats: 1398 lines in 139 files changed: 1 ins; 1290 del; 107 mod Patch: https://git.openjdk.org/jdk/pull/19550.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19550/head:pull/19550 PR: https://git.openjdk.org/jdk/pull/19550 From amitkumar at openjdk.org Thu Jun 6 19:15:56 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Thu, 6 Jun 2024 19:15:56 GMT Subject: RFR: 8333566: Remove unused methods In-Reply-To: <3vCVNjjgFzEw_YgoboTLXemBuzwht9uEeZxC8yg5Zog=.a7600184-c4ca-4a93-8b76-74df7bdff405@github.com> References: <3vCVNjjgFzEw_YgoboTLXemBuzwht9uEeZxC8yg5Zog=.a7600184-c4ca-4a93-8b76-74df7bdff405@github.com> Message-ID: On Tue, 4 Jun 2024 20:51:52 GMT, Cesar Soares Lucas wrote: > Please, consider this patch to remove unused methods from the code base. To the best of my knowledge, these methods are only defined but never used. > > Here is a list with names of delete methods: https://gist.github.com/JohnTortugo/fccc29781a1b584c03162aa4e160e874 > > Tested with Linux x86_64 tier1-4, GHA, and only cross building to other platforms. src/hotspot/cpu/s390/vm_version_s390.hpp line 516: > 514: static void set_has_CompareTrap() { _features[0] |= GnrlInstrExtFacilityMask; } > 515: static void set_has_RelativeLoadStore() { _features[0] |= GnrlInstrExtFacilityMask; } > 516: static void set_has_GnrlInstrExtensions() { _features[0] |= GnrlInstrExtFacilityMask; } I know this PR is still in draft state. Just a thought: I would like to keep the methods in `vm_version_s390.hpp` file for now. I'm planning to remove the checks applicable to older hardware. So it would be better, If I clean these methods as a part of that PR :-) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19550#discussion_r1628627936 From cslucas at openjdk.org Thu Jun 6 19:15:56 2024 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Thu, 6 Jun 2024 19:15:56 GMT Subject: RFR: 8333566: Remove unused methods In-Reply-To: References: <3vCVNjjgFzEw_YgoboTLXemBuzwht9uEeZxC8yg5Zog=.a7600184-c4ca-4a93-8b76-74df7bdff405@github.com> Message-ID: <_Nv0J1Vf3F9z9S9N-eKLMy0JOOt9KHBa7l6D7OepBjc=.9a24926d-33ec-469d-bdee-5c0efed17fb5@github.com> On Thu, 6 Jun 2024 01:28:00 GMT, Amit Kumar wrote: >> Please, consider this patch to remove unused methods from the code base. To the best of my knowledge, these methods are only defined but never used. >> >> Here is a list with names of delete methods: https://gist.github.com/JohnTortugo/fccc29781a1b584c03162aa4e160e874 >> >> Tested with Linux x86_64 tier1-4, GHA, and only cross building to other platforms. > > src/hotspot/cpu/s390/vm_version_s390.hpp line 516: > >> 514: static void set_has_CompareTrap() { _features[0] |= GnrlInstrExtFacilityMask; } >> 515: static void set_has_RelativeLoadStore() { _features[0] |= GnrlInstrExtFacilityMask; } >> 516: static void set_has_GnrlInstrExtensions() { _features[0] |= GnrlInstrExtFacilityMask; } > > I know this PR is still in draft state. Just a thought: I would like to keep the methods in `vm_version_s390.hpp` file for now. I'm planning to remove the checks applicable to older hardware. So it would be better, If I clean these methods as a part of that PR :-) Sounds good to me! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19550#discussion_r1629762442 From amitkumar at openjdk.org Thu Jun 6 19:23:46 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Thu, 6 Jun 2024 19:23:46 GMT Subject: RFR: 8333566: Remove unused methods In-Reply-To: <3vCVNjjgFzEw_YgoboTLXemBuzwht9uEeZxC8yg5Zog=.a7600184-c4ca-4a93-8b76-74df7bdff405@github.com> References: <3vCVNjjgFzEw_YgoboTLXemBuzwht9uEeZxC8yg5Zog=.a7600184-c4ca-4a93-8b76-74df7bdff405@github.com> Message-ID: On Tue, 4 Jun 2024 20:51:52 GMT, Cesar Soares Lucas wrote: > Please, consider this patch to remove unused methods from the code base. To the best of my knowledge, these methods are only defined but never used. > > Here is a list with names of delete methods: https://gist.github.com/JohnTortugo/fccc29781a1b584c03162aa4e160e874 > > Tested with Linux x86_64 tier1-4, GHA, and only cross building to other platforms. src/hotspot/cpu/s390/vm_version_s390.hpp line 516: > 514: static void set_has_CompareTrap() { _features[0] |= GnrlInstrExtFacilityMask; } > 515: static void set_has_RelativeLoadStore() { _features[0] |= GnrlInstrExtFacilityMask; } > 516: static void set_has_ProcessorAssist() { _features[0] |= ProcessorAssistMask; } This looks incorrect; there exist a second definition below; ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19550#discussion_r1630102121 From jbhateja at openjdk.org Thu Jun 6 19:31:03 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 6 Jun 2024 19:31:03 GMT Subject: RFR: 8329031: CPUID feature detection for Advanced Performance Extensions =?UTF-8?B?KEludGVswq4=?= APX) [v2] In-Reply-To: References: Message-ID: > Summary of changes include with the patch:- > > 1) CPUID based feature detection check for Intel APX extension (https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html) > 2) Validation during VM initialization for extended GPRs state save / restoration by OS across context switches of java application threads executing JIT compiled code with new APX ISA. > > Kindly review and share your feedback. > > Best Regards, > Jatin Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Review comments resolutions. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18562/files - new: https://git.openjdk.org/jdk/pull/18562/files/0881e43c..b5da0938 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18562&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18562&range=00-01 Stats: 34 lines in 3 files changed: 14 ins; 3 del; 17 mod Patch: https://git.openjdk.org/jdk/pull/18562.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18562/head:pull/18562 PR: https://git.openjdk.org/jdk/pull/18562 From jbhateja at openjdk.org Thu Jun 6 19:31:03 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 6 Jun 2024 19:31:03 GMT Subject: RFR: 8329031: CPUID feature detection for Advanced Performance Extensions =?UTF-8?B?KEludGVswq4=?= APX) In-Reply-To: References: Message-ID: <-A6ONIqT2KzcT9yycwfiA2sBVevnWknhyvIRRysV6mU=.dbc05862-d8b1-4022-93b2-99646095bd89@github.com> On Mon, 1 Apr 2024 12:01:27 GMT, Jatin Bhateja wrote: > Summary of changes include with the patch:- > > 1) CPUID based feature detection check for Intel APX extension (https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html) > 2) Validation during VM initialization for extended GPRs state save / restoration by OS across context switches of java application threads executing JIT compiled code with new APX ISA. > > Kindly review and share your feedback. > > Best Regards, > Jatin Hi @vnkozlov , Please let us know if its good to land in 23. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18562#issuecomment-2153258321 From jbhateja at openjdk.org Thu Jun 6 19:31:04 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 6 Jun 2024 19:31:04 GMT Subject: RFR: 8329031: CPUID feature detection for Advanced Performance Extensions =?UTF-8?B?KEludGVswq4=?= APX) [v2] In-Reply-To: References: Message-ID: On Tue, 4 Jun 2024 23:58:50 GMT, Sandhya Viswanathan wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> Review comments resolutions. > > src/hotspot/cpu/x86/vm_version_x86.cpp line 113: > >> 111: VM_Version_StubGenerator(CodeBuffer *c) : StubCodeGenerator(c) {} >> 112: >> 113: address clear_apx_test_state() { > > Why do we need to clear_apx_test_state? r16 onwards are not callee saved. And checking r15 save/restore is not needed so we could remove r15 changes altogether. Yes, EGPRs are call clobbered registers, but here we are trying to ascertain if their values are preserved across signal handling. Explicit clearing of r16 and r31 during signal handling guarantees that preserved register values post signal handling were re-instantiated by operating system and not because they were not modified externally. > src/hotspot/cpu/x86/vm_version_x86.cpp line 447: > >> 445: /* FIXME: Uncomment after integration of JDK-8328998 >> 446: __ mov64(rax, VM_Version::egpr_test_value()); >> 447: __ cmpq(rax, r15); > > Likewise r15 validation can be removed. r15 validation showed contrasting results in comparison to r16 currently, But its fair enough to remove it. DONE > src/hotspot/cpu/x86/vm_version_x86.cpp line 456: > >> 454: // Generate SEGV to signal unsuccessful save/restore. >> 455: __ bind(apx_save_restore_error); >> 456: __ lea(rax, ExternalAddress(VM_Version::_apx_state_restore_error_handler)); > > Generating an error message here won't be the right thing (especially since this is default by feature detection). It should only result in setting UseAPX feature to false. DONE ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18562#discussion_r1630107399 PR Review Comment: https://git.openjdk.org/jdk/pull/18562#discussion_r1630107493 PR Review Comment: https://git.openjdk.org/jdk/pull/18562#discussion_r1630108400 From kvn at openjdk.org Thu Jun 6 19:31:48 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 6 Jun 2024 19:31:48 GMT Subject: RFR: 8333566: Remove unused methods In-Reply-To: <3vCVNjjgFzEw_YgoboTLXemBuzwht9uEeZxC8yg5Zog=.a7600184-c4ca-4a93-8b76-74df7bdff405@github.com> References: <3vCVNjjgFzEw_YgoboTLXemBuzwht9uEeZxC8yg5Zog=.a7600184-c4ca-4a93-8b76-74df7bdff405@github.com> Message-ID: On Tue, 4 Jun 2024 20:51:52 GMT, Cesar Soares Lucas wrote: > Please, consider this patch to remove unused methods from the code base. To the best of my knowledge, these methods are only defined but never used. > > Here is a list with names of delete methods: https://gist.github.com/JohnTortugo/fccc29781a1b584c03162aa4e160e874 > > Tested with Linux x86_64 tier1-4, GHA, and only cross building to other platforms. src/hotspot/cpu/x86/macroAssembler_x86.hpp line 992: > 990: // * No condition for this * void ALWAYSINLINE jecxz(Label& L, bool maybe_short = true) { jcc(Assembler::cxz, L, maybe_short); } > 991: > 992: // Short versions of the above These all branch instructions were added recently [#18893](https://github.com/openjdk/jdk/pull/18893) for JDK-8320448 which is not pushed yet. So I will suggest to not remove them. src/hotspot/cpu/x86/vm_version_x86.hpp line 666: > 664: // Feature identification which can be affected by VM settings > 665: // > 666: static bool supports_cpuid() { return _features != 0; } I suggest to not touch this file. Some CPU features could used in a future. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19550#discussion_r1630110159 PR Review Comment: https://git.openjdk.org/jdk/pull/19550#discussion_r1630112304 From kvn at openjdk.org Thu Jun 6 19:45:45 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 6 Jun 2024 19:45:45 GMT Subject: RFR: 8329031: CPUID feature detection for Advanced Performance Extensions =?UTF-8?B?KEludGVswq4=?= APX) In-Reply-To: <-A6ONIqT2KzcT9yycwfiA2sBVevnWknhyvIRRysV6mU=.dbc05862-d8b1-4022-93b2-99646095bd89@github.com> References: <-A6ONIqT2KzcT9yycwfiA2sBVevnWknhyvIRRysV6mU=.dbc05862-d8b1-4022-93b2-99646095bd89@github.com> Message-ID: On Thu, 6 Jun 2024 19:27:47 GMT, Jatin Bhateja wrote: > Hi @vnkozlov , Please let us know if its good to land in 23. No, I don't see the urgency. We need extensive testing that everything works with APX. It is actually good time to push it into JDK 24 to have long testing period before next release. Let us review it and test before integrating. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18562#issuecomment-2153275647 From kvn at openjdk.org Thu Jun 6 19:45:46 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 6 Jun 2024 19:45:46 GMT Subject: RFR: 8329031: CPUID feature detection for Advanced Performance Extensions =?UTF-8?B?KEludGVswq4=?= APX) [v2] In-Reply-To: References: Message-ID: On Thu, 6 Jun 2024 19:31:03 GMT, Jatin Bhateja wrote: >> Summary of changes include with the patch:- >> >> 1) CPUID based feature detection check for Intel APX extension (https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html) >> 2) Validation during VM initialization for extended GPRs state save / restoration by OS across context switches of java application threads executing JIT compiled code with new APX ISA. >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Review comments resolutions. Actually we can't even fully test it until VM start using all registers provided by APX. And we don't have HW currently. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18562#issuecomment-2153278275 PR Comment: https://git.openjdk.org/jdk/pull/18562#issuecomment-2153280371 From kvn at openjdk.org Thu Jun 6 20:31:19 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 6 Jun 2024 20:31:19 GMT Subject: RFR: 8329031: CPUID feature detection for Advanced Performance Extensions =?UTF-8?B?KEludGVswq4=?= APX) [v2] In-Reply-To: References: Message-ID: On Thu, 6 Jun 2024 19:31:03 GMT, Jatin Bhateja wrote: >> Summary of changes include with the patch:- >> >> 1) CPUID based feature detection check for Intel APX extension (https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html) >> 2) Validation during VM initialization for extended GPRs state save / restoration by OS across context switches of java application threads executing JIT compiled code with new APX ISA. >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Review comments resolutions. Few comments src/hotspot/cpu/x86/vm_version_x86.cpp line 1052: > 1050: > 1051: // Currently APX support is only enabled for targets supporting AVX512VL feature. > 1052: if (UseAPX && (!supports_apx_f() || !supports_avx512vl())) { This code should be after UseAVX checks. src/hotspot/cpu/x86/vm_version_x86.cpp line 1062: > 1060: if (UseAVX < 2) { > 1061: _features &= ~CPU_AVX2; > 1062: _features &= ~CPU_AVX_IFMA; Since value of UseAVX affects avx512vl it should affect UseAPX/CPU_APX_F too. src/hotspot/cpu/x86/vm_version_x86.hpp line 337: > 335: static address _cpuinfo_cont_addr; // address of instruction after the one which causes SEGV > 336: static address _cpuinfo_segv_addr_apx; // address of instruction which causes APX specific SEGV > 337: static address _cpuinfo_cont_addr_apx; // address of instruction which causes APX specific SEGV Duplicated comment. It should continuation address comment. ------------- PR Review: https://git.openjdk.org/jdk/pull/18562#pullrequestreview-2103120144 PR Review Comment: https://git.openjdk.org/jdk/pull/18562#discussion_r1630190641 PR Review Comment: https://git.openjdk.org/jdk/pull/18562#discussion_r1630185873 PR Review Comment: https://git.openjdk.org/jdk/pull/18562#discussion_r1630187773 From kvn at openjdk.org Thu Jun 6 20:31:20 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 6 Jun 2024 20:31:20 GMT Subject: RFR: 8329031: CPUID feature detection for Advanced Performance Extensions =?UTF-8?B?KEludGVswq4=?= APX) [v2] In-Reply-To: References: Message-ID: On Thu, 6 Jun 2024 19:25:02 GMT, Jatin Bhateja wrote: >> src/hotspot/cpu/x86/vm_version_x86.cpp line 113: >> >>> 111: VM_Version_StubGenerator(CodeBuffer *c) : StubCodeGenerator(c) {} >>> 112: >>> 113: address clear_apx_test_state() { >> >> Why do we need to clear_apx_test_state? r16 onwards are not callee saved. And checking r15 save/restore is not needed so we could remove r15 changes altogether. > > Yes, EGPRs are call clobbered registers, but here we are trying to ascertain if their values are preserved across signal handling. Explicit clearing of r16 and r31 during signal handling guarantees that preserved register values post signal handling were re-instantiated by operating system and not because they were not modified externally. Please, add comment about that. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18562#discussion_r1630170720 From kvn at openjdk.org Thu Jun 6 20:40:28 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 6 Jun 2024 20:40:28 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v52] In-Reply-To: References: Message-ID: On Thu, 6 Jun 2024 17:41:20 GMT, Scott Gibbons wrote: >> Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix copyright & a couple of comment typos > > Hi, everyone. I see that JDK 23 has now been forked, and new commits go into the JDK 24 branch. I would like to get this in as soon as possible to have as much time with fuzzers, etc. for everyone to be confident in the code. > > I have 3 positive reviews on this PR and would like to integrate. Please reply as soon as you reasonably can with objections or approval and I will integrate. Thanks. @asgibbons, my testing almost finished. No new failures. I think this can be pushed now. Thank you for waiting! ------------- PR Comment: https://git.openjdk.org/jdk/pull/16753#issuecomment-2153366787 From mli at openjdk.org Thu Jun 6 21:17:21 2024 From: mli at openjdk.org (Hamlin Li) Date: Thu, 6 Jun 2024 21:17:21 GMT Subject: RFR: 8332689: RISC-V: Use load instead of trampolines [v7] In-Reply-To: <3UcoOmZTsZJBu0ZkOZQ21-ynr1slrOGCDuWYFnJQz1U=.6467ec54-cd28-4aae-a90c-6b6858f37986@github.com> References: <3UcoOmZTsZJBu0ZkOZQ21-ynr1slrOGCDuWYFnJQz1U=.6467ec54-cd28-4aae-a90c-6b6858f37986@github.com> Message-ID: On Wed, 5 Jun 2024 13:14:35 GMT, Robbin Ehn wrote: >> Hi all, please consider! >> >> Today we do JAL to **dest** if **dest** is in reach (+/- 1 MB). >> Using a very small application or running very short time we have fast patchable calls. >> But any normal application running longer will increase the code size and code chrun/fragmentation. >> So whatever or not you get hot fast calls rely on luck. >> >> To be patchable and get code cache reach we also emit a stub trampoline which we can point the JAL to. >> This would be the common case for a patchable call. >> >> Code stream: >> JAL >> Stubs: >> AUIPC >> LD >> JALR >> >> >> >> On some CPUs L1D and L1I can't contain the same cache line, which means the tramopline stub can bounce from L1I->L1D->L1I, which is expensive. >> Even if you don't have that problem having a call to a jump is not the fastest way. >> Loading the address avoids the pitsfalls of cmodx. >> >> This patch suggest to solve the problems with trampolines, we take small penalty in the naive case of JAL to **dest**, >> and instead do by default: >> >> Code stream: >> AUIPC >> LD >> JALR >> Stubs: >> >> >> An experimental option for turning trampolines back on exists. >> >> It should be possible to enhanced this with the WIP [Zjid](https://github.com/riscv/riscv-j-extension) by changing the JALR to JAL and nop out the auipc+ld (as the current proposal of Zjid forces the I-fetcher to fetch instruction in order (meaning we will avoid a lot issues which arm has)) when in reach and vice-versa. >> >> Numbers from VF2 (I have done them a few times, they are always overall in favor of this patch): >> >> fop (msec) 2239 | 2128 = 0.950424 >> h2 (msec) 18660 | 16594 = 0.889282 >> jython (msec) 22022 | 21925 = 0.995595 >> luindex (msec) 2866 | 2842 = 0.991626 >> lusearch (msec) 4108 | 4311 = 1.04942 >> lusearch-fix (msec) 4406 | 4116 = 0.934181 >> pmd (msec) 5976 | 5897 = 0.98678 >> jython (msec) 22022 | 21925 = 0.995595 >> Avg: 0.974112 >> fop(xcomp) (msec) 2721 | 2714 = 0.997427 >> h2(xcomp) ... > > Robbin Ehn has updated the pull request incrementally with one additional commit since the last revision: > > Remove tmp file Some comments. src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 981: > 979: } > 980: > 981: void MacroAssembler::load_link(const address source, Register temp) { maybe modify to `load_jump_link` or `load_link_jump`? src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 987: > 985: int64_t distance = source - pc(); > 986: assert(is_simm32(distance), "Must be"); > 987: Assembler::auipc(temp, (int32_t)distance + 0x800); Is it possible to use `jal` instead of the instruction sequence when is_simm21 == true as in jump_link? src/hotspot/cpu/riscv/macroAssembler_riscv.hpp line 1571: > 1569: }; > 1570: > 1571: enum NativeShortCall { Thanks for moving these into a separate name space, looks much better. Seems the naming convention of enum is with "_", not sure if we need to stick to it. NativeShortCall also looks good. src/hotspot/cpu/riscv/nativeInst_riscv.cpp line 519: > 517: > 518: address NativeCall::instruction_address() const { > 519: if (!UseTrampolines) { use positive condition? similar suggestion for below conditions. src/hotspot/cpu/riscv/nativeInst_riscv.hpp line 72: > 70: bool is_jump() const { return MacroAssembler::is_jump_at(addr_at(0)); } > 71: bool is_call() const { return is_call_at(addr_at(0)); } > 72: static bool is_call_at(address addr); Is this indirection of `is_call_at` necessary? seems only is_call is calling is_call_at? ------------- PR Review: https://git.openjdk.org/jdk/pull/19453#pullrequestreview-2102103344 PR Review Comment: https://git.openjdk.org/jdk/pull/19453#discussion_r1629630445 PR Review Comment: https://git.openjdk.org/jdk/pull/19453#discussion_r1630196965 PR Review Comment: https://git.openjdk.org/jdk/pull/19453#discussion_r1629989406 PR Review Comment: https://git.openjdk.org/jdk/pull/19453#discussion_r1630248686 PR Review Comment: https://git.openjdk.org/jdk/pull/19453#discussion_r1630234949 From sgibbons at openjdk.org Thu Jun 6 21:47:26 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Thu, 6 Jun 2024 21:47:26 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v49] In-Reply-To: <9Gep5o1EEF96gprsHB1vDiw8KSQON-c6uh_9gBJyq9c=.43962158-2f23-4929-9e72-d4827a4fa5e8@github.com> References: <9PIuILHZnLHrZf1sz0Dsq6iup6qgyXw50mD0nGVS04c=.63bd0afd-d818-46fa-a082-a3d2066829cd@github.com> <4ZM8wZFYPZjIbjb_O6n6DNAlpYOa2EHfmhSZHVUAXNA=.b923e319-f143-4a4c-9916-face36f337db@github.com> <9Gep5o1EEF96gprsHB1vDiw8KSQON-c6uh_9gBJyq9c=.43962158-2f23-4929-9e72-d4827a4fa5e8@github.com> Message-ID: <8kmAaqEcZiqqRB0MSsNG2jbHkgQ-9p3DH_AHBZsBwr0=.be5d30cd-03b4-4446-8105-1d694cd3d7e4@github.com> On Thu, 30 May 2024 16:20:02 GMT, Emanuel Peter wrote: >> @vnkozlov OK. I'll defer to you all. I've contacted the author of the fuzzer to see what I can do to set up a local instance. Would this be sufficient to increase confidence for future submissions? We can run it perpetually on fixes (provided I can set it up). Had I done that, we could have had 6 months of fuzzing on top of our tests. Would that have alleviated this concern? > > @asgibbons I generally just stop pushing ANY RFE's a week or two before the fork. Even if you did run the fuzzer with it - there are often last-minute changes. And your code here is rather large, so even if you are confident, there must be at least one bug hiding. > > Running the fuzzer is nice as pre-integration, but it mostly only catches things post-integration. @eme64 Are you OK with me integrating? ------------- PR Comment: https://git.openjdk.org/jdk/pull/16753#issuecomment-2153456076 From wanghaomin at openjdk.org Fri Jun 7 01:22:17 2024 From: wanghaomin at openjdk.org (Wang Haomin) Date: Fri, 7 Jun 2024 01:22:17 GMT Subject: RFR: 8329421: Native methods can not be selectively printed [v2] In-Reply-To: References: Message-ID: On Tue, 2 Apr 2024 07:23:25 GMT, Volker Simonis wrote: >> Native methods (i.e. "native wrappers") can not be selectively printed with `-XX:CompileCommand=print,class::method`. Currently the only way to print native methods is to use the global `-XX:+PrintAssembly` option. But this prints *all* compiled methods which can be too much if we're just interested in a specific native wrapper. There's no reason to not apply `-XX:CompileCommand` options correctly to native methods as well. > > Volker Simonis has updated the pull request incrementally with one additional commit since the last revision: > > Add test for -XX:+PrintNativeNMethods That?s great! Thank you for resolving it. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18567#issuecomment-2153686077 From kvn at openjdk.org Fri Jun 7 02:07:24 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 7 Jun 2024 02:07:24 GMT Subject: RFR: 8329141: Obsolete RTM flags and code Message-ID: Obsolete HotSpot RTM flags which were deprecated in JDK 23. RTM related VM code and tests were removed. Tested tier1-3,stress,xcomp ------------- Commit messages: - 8329141: Obsolete RTM flags and code Changes: https://git.openjdk.org/jdk/pull/19589/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19589&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8329141 Stats: 6396 lines in 97 files changed: 27 ins; 6334 del; 35 mod Patch: https://git.openjdk.org/jdk/pull/19589.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19589/head:pull/19589 PR: https://git.openjdk.org/jdk/pull/19589 From jbhateja at openjdk.org Fri Jun 7 02:16:27 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 7 Jun 2024 02:16:27 GMT Subject: RFR: 8329031: CPUID feature detection for Advanced Performance Extensions =?UTF-8?B?KEludGVswq4=?= APX) [v3] In-Reply-To: References: Message-ID: > Summary of changes include with the patch:- > > 1) CPUID based feature detection check for Intel APX extension (https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html) > 2) Validation during VM initialization for extended GPRs state save / restoration by OS across context switches of java application threads executing JIT compiled code with new APX ISA. > > Kindly review and share your feedback. > > Best Regards, > Jatin Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Review comments addressed. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18562/files - new: https://git.openjdk.org/jdk/pull/18562/files/b5da0938..68df08ce Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18562&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18562&range=01-02 Stats: 8 lines in 2 files changed: 6 ins; 1 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/18562.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18562/head:pull/18562 PR: https://git.openjdk.org/jdk/pull/18562 From jbhateja at openjdk.org Fri Jun 7 02:16:27 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 7 Jun 2024 02:16:27 GMT Subject: RFR: 8329031: CPUID feature detection for Advanced Performance Extensions =?UTF-8?B?KEludGVswq4=?= APX) [v2] In-Reply-To: References: Message-ID: On Thu, 6 Jun 2024 20:26:43 GMT, Vladimir Kozlov wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> Review comments resolutions. > > src/hotspot/cpu/x86/vm_version_x86.cpp line 1052: > >> 1050: >> 1051: // Currently APX support is only enabled for targets supporting AVX512VL feature. >> 1052: if (UseAPX && (!supports_apx_f() || !supports_avx512vl())) { > > This code should be after UseAVX checks. Its purposefully placed after modifications to CPU_* features flags if user explicitly sets UseAVX < 3. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18562#discussion_r1630536165 From jbhateja at openjdk.org Fri Jun 7 02:19:12 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 7 Jun 2024 02:19:12 GMT Subject: RFR: 8329031: CPUID feature detection for Advanced Performance Extensions =?UTF-8?B?KEludGVswq4=?= APX) [v2] In-Reply-To: References: Message-ID: On Thu, 6 Jun 2024 19:41:25 GMT, Vladimir Kozlov wrote: > Actually we can't even fully test it until VM start using all registers provided by APX. Hi @vnkozlov , EGPR state restoration across signal handling can only be validated after OS support, CPUID and UseAPX validation has been done using [Intel? Software Development Emulator](https://www.intel.com/content/www/us/en/download/684897/intel-software-development-emulator.html), other comments addressed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18562#issuecomment-2153731083 From kvn at openjdk.org Fri Jun 7 02:23:16 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 7 Jun 2024 02:23:16 GMT Subject: RFR: 8329031: CPUID feature detection for Advanced Performance Extensions =?UTF-8?B?KEludGVswq4=?= APX) [v2] In-Reply-To: References: Message-ID: <-GsFrqzgkkbVyNAp7-mi4aIOYdwgehyRe8nI3xrV1tw=.4c723742-1d7f-4b98-859a-9d87230859fe@github.com> On Fri, 7 Jun 2024 02:12:31 GMT, Jatin Bhateja wrote: >> src/hotspot/cpu/x86/vm_version_x86.cpp line 1052: >> >>> 1050: >>> 1051: // Currently APX support is only enabled for targets supporting AVX512VL feature. >>> 1052: if (UseAPX && (!supports_apx_f() || !supports_avx512vl())) { >> >> This code should be after UseAVX checks. > > Its purposefully placed after modifications to CPU_* features flags if user explicitly sets UseAVX < 3. Got it. I missed that we have separate UseAVX checks for <3 and < 2 and < 1. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18562#discussion_r1630542872 From kvn at openjdk.org Fri Jun 7 02:47:13 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 7 Jun 2024 02:47:13 GMT Subject: RFR: 8329031: CPUID feature detection for Advanced Performance Extensions =?UTF-8?B?KEludGVswq4=?= APX) [v3] In-Reply-To: References: Message-ID: On Fri, 7 Jun 2024 02:16:27 GMT, Jatin Bhateja wrote: >> Summary of changes include with the patch:- >> >> 1) CPUID based feature detection check for Intel APX extension (https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html) >> 2) Validation during VM initialization for extended GPRs state save / restoration by OS across context switches of java application threads executing JIT compiled code with new APX ISA. >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Review comments addressed. src/hotspot/cpu/x86/vm_version_x86.cpp line 443: > 441: > 442: /* FIXME: Uncomment while integrating JDK-8329032 > 443: bool save_apx = UseAPX; What are you missing to uncomment this code? 8329032 is about `.ad` file changes. It should not affect execution of this code. You need changes in `register_x86.*` files and may be somewhere else but you don't need C2 changes for this code to work. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18562#discussion_r1630559908 From iklam at openjdk.org Fri Jun 7 02:56:14 2024 From: iklam at openjdk.org (Ioi Lam) Date: Fri, 7 Jun 2024 02:56:14 GMT Subject: RFR: 8330198: Add some class loading related perf counters to measure VM startup [v3] In-Reply-To: <0B3DXYpRB8P6bEQP2ACupaLG9RRAfEe3PflYvpE3ORs=.4a2b8b09-2743-48ac-baff-f2fc6df3944b@github.com> References: <7yfsvM0ff6gBYLefpro2qTcEMBmCOHd3YICcygItlZs=.d900a439-4932-46e6-b287-d1bf2789f195@github.com> <24n-bMIYvAF10yob4_Z5t1DPG_nrWypoQYE25zQ499U=.1a77aa5e-1005-4d32-9ae7-ec745838e449@github.com> <0gZ9cLiP3bLH1LNL71dwnuaUY5iN-Ewea-Qyw7eGe44=.f8fa485f-5947-47fc-a8eb-cafd1c56165d@github.com> <0B3DXYpRB8P6bEQP2ACupaLG9RRAfEe3PflYvpE3ORs=.4a2b8b09-2743-48ac-baff-f2fc6df3944b@github.com> Message-ID: On Wed, 5 Jun 2024 22:22:11 GMT, Calvin Cheung wrote: >>> -Xlog:init means "I want to see logs related to initialization", so it should enable all the counters for printing the related logs. >> >> I don't agree. Initialization logging could encompass many different things, some of which are individually controllable via different flags. Simply turning on init logging should not turn on all such flags. If you want that level of coupling then perhaps use init_counters (or something like that) to make it clear this is not a general log tag intended for any initialization code to use, but something you have chosen to tie to specific functionality. >> >>> We may add several groups of counters in the future. We don't want to force the user to enumerate all these counters >> >> It is not clear to me how you envisage that working. You want individual group switches plus a global one? > > @dholmes-ora, @iklam Could you review my latest commit? > Thanks! The latest version looks good to me. My only suggestion is to use `log_is_enabled(Info, perf, class, link)` directly, because it's very efficient. To verify, I wrote a function like this: void foo() { if (log_is_enabled(Info, class, load)) { tty->print_cr("Hello"); } } GCC compiles it to the following. So the test is implemented as a single load from memory. _Z3foov: .LFB8725: .cfi_startproc movq 64+_ZN16LogTagSetMappingILN6LogTag4typeE16ELS1_74ELS1_0ELS1_0ELS1_0ELS1_0EE7_tagsetE(%rip), %rax testq %rax, %rax je .L6 movq tty at GOTPCREL(%rip), %rax leaq .LC0(%rip), %rsi movq (%rax), %rdi xorl %eax, %eax jmp _ZN12outputStream8print_crEPKcz at PLT ret ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18790#discussion_r1630565751 From iklam at openjdk.org Fri Jun 7 02:56:13 2024 From: iklam at openjdk.org (Ioi Lam) Date: Fri, 7 Jun 2024 02:56:13 GMT Subject: RFR: 8330198: Add some class loading related perf counters to measure VM startup [v7] In-Reply-To: References: Message-ID: On Wed, 5 Jun 2024 20:59:09 GMT, Calvin Cheung wrote: >> Adding a few perf counters related to class loading to measure VM startup. The counters are only active if the user specifies `-Xlog:init` in the command line. A diagnostic flag `ProfileClassLinkage` is added to control the new counters. The flag is set to false by default and will be enabled if `-Xlog:init` is specified. >> >> This change is already in the leyden/premain branch. There are more counters in the branch to measure other stuff. For now, just upstreaming class loader related counters. >> >> Refer to the [comment](https://bugs.openjdk.org/browse/JDK-8330198?focusedId=14665311&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14665311) in the bug report for an example output. >> >> Passed tiers 1 - 4 testing. > > Calvin Cheung has updated the pull request incrementally with one additional commit since the last revision: > > implement -Xlog:perf+class+link Marked as reviewed by iklam (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/18790#pullrequestreview-2103531253 From jbhateja at openjdk.org Fri Jun 7 03:55:13 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 7 Jun 2024 03:55:13 GMT Subject: RFR: 8329031: CPUID feature detection for Advanced Performance Extensions =?UTF-8?B?KEludGVswq4=?= APX) [v3] In-Reply-To: References: Message-ID: On Fri, 7 Jun 2024 02:45:01 GMT, Vladimir Kozlov wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> Review comments addressed. > > src/hotspot/cpu/x86/vm_version_x86.cpp line 443: > >> 441: >> 442: /* FIXME: Uncomment while integrating JDK-8329032 >> 443: bool save_apx = UseAPX; > > What are you missing to uncomment this code? > 8329032 is about `.ad` file changes. It should not affect execution of this code. > You need changes in `register_x86.*` files and may be somewhere else but you don't need C2 changes for this code to work. Yes, we already have that in place with https://github.com/openjdk/jdk/pull/19042, which will be open for review after this patch. I added it in comments since this piece of logic is centered around CPUID feature check and pertinent to this patch. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18562#discussion_r1630611657 From amitkumar at openjdk.org Fri Jun 7 03:55:29 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Fri, 7 Jun 2024 03:55:29 GMT Subject: RFR: 8319947: Recursive lightweight locking: s390x implementation [v5] In-Reply-To: <_HXRejW4TcLfTYbXlRQUVyejaomiwx6HhObYfG1lX6E=.f89b29a0-4c2d-43e1-83be-d7bafd84816c@github.com> References: <_HXRejW4TcLfTYbXlRQUVyejaomiwx6HhObYfG1lX6E=.f89b29a0-4c2d-43e1-83be-d7bafd84816c@github.com> Message-ID: > s390x port for recursive locking. > > testing: > - [x] build fastdebug-vm > - [x] build slowdebug-vm > - [x] build release-vm > - [x] build optimized-vm > - [x] ./test/jdk/java/util/concurrent (fastdebug-vm) > - [x] with C1 > - [x] with C2 > - [x] with interpreter > - [x] ./test/jdk/java/util/concurrent (release-vm) > - [x] with C1 > - [x] with C2 > - [x] with interpreter > - [x] ./test/jdk/java/util/concurrent (slowdebug-vm) > - [x] with C1 > - [x] with C2 > - [x] with interpreter > - [x] tier1 with fastdebug-vm > - [x] tier1 with slowdebug-vm > - [x] tier1 with release-vm > > *BenchMarks*: > > Results from Performance LPARs : > > > Locking Mode = 1 (without Patch) > > Benchmark (innerCount) Mode Cnt Score Error Units > LockUnlock.testContendedLock 100 avgt 12 5.144 ? 0.035 ns/op > LockUnlock.testRecursiveLockUnlock 100 avgt 12 3824.742 ? 89.475 ns/op > LockUnlock.testRecursiveSynchronization 100 avgt 12 25.348 ? 0.559 ns/op > LockUnlock.testSerialLockUnlock 100 avgt 12 466.629 ? 3.036 ns/op > LockUnlock.testSimpleLockUnlock 100 avgt 12 468.532 ? 1.793 ns/op > Finished running test 'micro:vm.lang.LockUnlock' > > Locking Mode = 1 (with patch) > > Benchmark (innerCount) Mode Cnt Score Error Units > LockUnlock.testContendedLock 100 avgt 12 5.146 ? 0.027 ns/op > LockUnlock.testRecursiveLockUnlock 100 avgt 12 3833.175 ? 75.863 ns/op > LockUnlock.testRecursiveSynchronization 100 avgt 12 25.206 ? 0.519 ns/op > LockUnlock.testSerialLockUnlock 100 avgt 12 473.973 ? 2.103 ns/op > LockUnlock.testSimpleLockUnlock 100 avgt 12 470.749 ? 2.229 ns/op > Finished running test 'micro:vm.lang.LockUnlock' > > > > > Locking Mode = 2 (without Patch) > > Benchmark (innerCount) Mode Cnt Score Error Units > LockUnlock.testContendedLock 100 avgt 12 4.688 ? 0.051 ns/op > LockUnlock.testRecursiveLockUnlock 100 avgt 12 12800.544 ? 92.265 ns/op > LockUnlock.testRecursiveSynchronization 100 avgt 12 26.486 ? 2.229 ns/op > LockUnlock.testSerialLockUnlock 100 avgt 12 424.499 ? 0.416 ns/op > LockUnlock.testSimpleLockUnlock 100 avgt 12 424.241 ? 0.840 ns/op > Finished running test 'micro:vm.lang.Lo... Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: not using load_const_optimized in compiler_fast_lock_lightweight_object ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18878/files - new: https://git.openjdk.org/jdk/pull/18878/files/2584484d..db7b6a6f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18878&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18878&range=03-04 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/18878.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18878/head:pull/18878 PR: https://git.openjdk.org/jdk/pull/18878 From epeter at openjdk.org Fri Jun 7 05:08:26 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 7 Jun 2024 05:08:26 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v49] In-Reply-To: <8kmAaqEcZiqqRB0MSsNG2jbHkgQ-9p3DH_AHBZsBwr0=.be5d30cd-03b4-4446-8105-1d694cd3d7e4@github.com> References: <9PIuILHZnLHrZf1sz0Dsq6iup6qgyXw50mD0nGVS04c=.63bd0afd-d818-46fa-a082-a3d2066829cd@github.com> <4ZM8wZFYPZjIbjb_O6n6DNAlpYOa2EHfmhSZHVUAXNA=.b923e319-f143-4a4c-9916-face36f337db@github.com> <9Gep5o1EEF96gprsHB1vDiw8KSQON-c6uh_9gBJyq9c=.43962158-2f23-4929-9e72-d4827a4fa5e8@github.com> <8kmAaqEcZiqqRB0MSsNG2jbHkgQ-9p3DH_AHBZsBwr0=.be5d30cd-03b4-4446-8105-1d694cd3d7e4@github.com> Message-ID: <0XVuJ7gECpxt76s5lju6aMOqcZK9MJ07dtlumvonwZw=.3f199acf-047c-4aee-bcc6-e3fa9f4f4bf5@github.com> On Thu, 6 Jun 2024 21:44:44 GMT, Scott Gibbons wrote: >> @asgibbons I generally just stop pushing ANY RFE's a week or two before the fork. Even if you did run the fuzzer with it - there are often last-minute changes. And your code here is rather large, so even if you are confident, there must be at least one bug hiding. >> >> Running the fuzzer is nice as pre-integration, but it mostly only catches things post-integration. > > @eme64 Are you OK with me integrating? @asgibbons yes, ship it! ? Thanks for waiting! ------------- PR Comment: https://git.openjdk.org/jdk/pull/16753#issuecomment-2154015592 From varadam at openjdk.org Fri Jun 7 06:56:19 2024 From: varadam at openjdk.org (Varada M) Date: Fri, 7 Jun 2024 06:56:19 GMT Subject: RFR: 8331733: [PPC64] saving and restoring CR is not needed at most places [v5] In-Reply-To: References: Message-ID: On Thu, 6 Jun 2024 11:24:02 GMT, Varada M wrote: >> PPC64 uses save/restore CR less often. Only LR is critical, CR is mainly needed for native-to-Java calls. >> Fastdebug: build and tier1 testing successful. [unrelated failures] > > Varada M has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: > > - Merge branch 'master' into LR > - [PPC64] saving and restoring CR is not needed at most places > - [PPC64] saving and restoring CR is not needed at most places > - [PPC64] saving and restoring CR is not needed at most places > - [PPC64] saving and restoring CR is not needed at most places > - [PPC64] saving and restoring CR is not needed at most places > - [PPC64] saving and restoring CR is not needed at most places Thank you, ------------- PR Comment: https://git.openjdk.org/jdk/pull/19494#issuecomment-2154207330 From fyang at openjdk.org Fri Jun 7 07:20:13 2024 From: fyang at openjdk.org (Fei Yang) Date: Fri, 7 Jun 2024 07:20:13 GMT Subject: RFR: 8332689: RISC-V: Use load instead of trampolines [v7] In-Reply-To: References: <3UcoOmZTsZJBu0ZkOZQ21-ynr1slrOGCDuWYFnJQz1U=.6467ec54-cd28-4aae-a90c-6b6858f37986@github.com> Message-ID: On Thu, 6 Jun 2024 14:10:27 GMT, Hamlin Li wrote: >> Robbin Ehn has updated the pull request incrementally with one additional commit since the last revision: >> >> Remove tmp file > > src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 981: > >> 979: } >> 980: >> 981: void MacroAssembler::load_link(const address source, Register temp) { > > maybe modify to `load_jump_link` or `load_link_jump`? I am considering names like `indirect_jump_link` :-) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19453#discussion_r1630754606 From rehn at openjdk.org Fri Jun 7 07:20:14 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Fri, 7 Jun 2024 07:20:14 GMT Subject: RFR: 8332689: RISC-V: Use load instead of trampolines [v7] In-Reply-To: References: <3UcoOmZTsZJBu0ZkOZQ21-ynr1slrOGCDuWYFnJQz1U=.6467ec54-cd28-4aae-a90c-6b6858f37986@github.com> Message-ID: On Thu, 6 Jun 2024 20:31:26 GMT, Hamlin Li wrote: >> Robbin Ehn has updated the pull request incrementally with one additional commit since the last revision: >> >> Remove tmp file > > src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 987: > >> 985: int64_t distance = source - pc(); >> 986: assert(is_simm32(distance), "Must be"); >> 987: Assembler::auipc(temp, (int32_t)distance + 0x800); > > Is it possible to use `jal` instead of the instruction sequence when is_simm21 == true as in jump_link? Long story, sorry. As this is patchable callsite meaning we need to have full reach for later addresses, this site must be bable to load 'n jump also, hence we need to cmodx. Todo we do cmodx in sequential consistency maner. This is done by emitting an IPI shoot down after every store to a published instruction stream. The cost of an IPI is significant, as all CPUs need to flush everything and start over. As we have tiny CPUs with few cores and little states, we don't really care much right now. I have measured this overhead on VF2 to around 0.5% on some work-loads. But it will scale much worse than linear as core count and complexity goes up. Using this technique it would be possible. As we need to change this for the biggers cores comming, and zjid is delayed, we are getting some kernel features like setting up fenec.i on context switches. Which means we can use fence.i in userspace and trust kernel will emit fence.i if cpu is changed after we emitted it. This allows writer to skip IPI, at least in many cases. When changing a series of instruction we need to know if the instruction fetching happens in-order. Otherwise: + + + + Now we flip the jal: ` + + ` But if these are not read in-order the I-fetcher might see: ` + + ` If we do this with IPI, but then we are more locked into IPI. So before we have made an overhaul of cmodx (we may need 3-4 approached depending on CPU, if we want the best performance) I prefer to not add code which is dependant on a certain cmodx approach (when it's slow). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19453#discussion_r1630756423 From fyang at openjdk.org Fri Jun 7 08:15:17 2024 From: fyang at openjdk.org (Fei Yang) Date: Fri, 7 Jun 2024 08:15:17 GMT Subject: RFR: 8332689: RISC-V: Use load instead of trampolines [v7] In-Reply-To: <3UcoOmZTsZJBu0ZkOZQ21-ynr1slrOGCDuWYFnJQz1U=.6467ec54-cd28-4aae-a90c-6b6858f37986@github.com> References: <3UcoOmZTsZJBu0ZkOZQ21-ynr1slrOGCDuWYFnJQz1U=.6467ec54-cd28-4aae-a90c-6b6858f37986@github.com> Message-ID: On Wed, 5 Jun 2024 13:14:35 GMT, Robbin Ehn wrote: >> Hi all, please consider! >> >> Today we do JAL to **dest** if **dest** is in reach (+/- 1 MB). >> Using a very small application or running very short time we have fast patchable calls. >> But any normal application running longer will increase the code size and code chrun/fragmentation. >> So whatever or not you get hot fast calls rely on luck. >> >> To be patchable and get code cache reach we also emit a stub trampoline which we can point the JAL to. >> This would be the common case for a patchable call. >> >> Code stream: >> JAL >> Stubs: >> AUIPC >> LD >> JALR >> >> >> >> On some CPUs L1D and L1I can't contain the same cache line, which means the tramopline stub can bounce from L1I->L1D->L1I, which is expensive. >> Even if you don't have that problem having a call to a jump is not the fastest way. >> Loading the address avoids the pitsfalls of cmodx. >> >> This patch suggest to solve the problems with trampolines, we take small penalty in the naive case of JAL to **dest**, >> and instead do by default: >> >> Code stream: >> AUIPC >> LD >> JALR >> Stubs: >> >> >> An experimental option for turning trampolines back on exists. >> >> It should be possible to enhanced this with the WIP [Zjid](https://github.com/riscv/riscv-j-extension) by changing the JALR to JAL and nop out the auipc+ld (as the current proposal of Zjid forces the I-fetcher to fetch instruction in order (meaning we will avoid a lot issues which arm has)) when in reach and vice-versa. >> >> Numbers from VF2 (I have done them a few times, they are always overall in favor of this patch): >> >> fop (msec) 2239 | 2128 = 0.950424 >> h2 (msec) 18660 | 16594 = 0.889282 >> jython (msec) 22022 | 21925 = 0.995595 >> luindex (msec) 2866 | 2842 = 0.991626 >> lusearch (msec) 4108 | 4311 = 1.04942 >> lusearch-fix (msec) 4406 | 4116 = 0.934181 >> pmd (msec) 5976 | 5897 = 0.98678 >> jython (msec) 22022 | 21925 = 0.995595 >> Avg: 0.974112 >> fop(xcomp) (msec) 2721 | 2714 = 0.997427 >> h2(xcomp) ... > > Robbin Ehn has updated the pull request incrementally with one additional commit since the last revision: > > Remove tmp file src/hotspot/cpu/riscv/relocInfo_riscv.cpp line 84: > 82: if (NativeCall::is_at(addr())) { > 83: NativeCall* nc = nativeCall_at(addr()); > 84: if (nc->reloc_set_destination(x)) { Seems there is a subtle difference here. Previously, there is a cache invalidation operation in `NativeCall::set_destination_mt_safe` [1] which is called by this `Relocation::pd_set_call_destination`. Now it's gone with this change: it's not there even in `NativeShortCall::reloc_set_destination`. Is that intended? [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/riscv/nativeInst_riscv.cpp#L96 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19453#discussion_r1630816438 From varadam at openjdk.org Fri Jun 7 08:53:16 2024 From: varadam at openjdk.org (Varada M) Date: Fri, 7 Jun 2024 08:53:16 GMT Subject: Integrated: 8331733: [PPC64] saving and restoring CR is not needed at most places In-Reply-To: References: Message-ID: <-AsHbsg6KtJSUFLfY5mklB12miz3_NnAl_gOBrMx5zs=.db86b746-9603-4f74-81d1-b8c86b642da2@github.com> On Fri, 31 May 2024 08:56:36 GMT, Varada M wrote: > PPC64 uses save/restore CR less often. Only LR is critical, CR is mainly needed for native-to-Java calls. > Fastdebug: build and tier1 testing successful. [unrelated failures] This pull request has now been integrated. Changeset: 40b2fbd8 Author: Varada M Committer: Martin Doerr URL: https://git.openjdk.org/jdk/commit/40b2fbd8207404961d3d23375b288cceafc3f902 Stats: 93 lines in 11 files changed: 13 ins; 8 del; 72 mod 8331733: [PPC64] saving and restoring CR is not needed at most places Reviewed-by: mdoerr, amitkumar ------------- PR: https://git.openjdk.org/jdk/pull/19494 From shade at openjdk.org Fri Jun 7 09:09:13 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 7 Jun 2024 09:09:13 GMT Subject: RFR: 8325821: [REDO] use "dmb.ishst+dmb.ishld" for release barrier [v8] In-Reply-To: References: Message-ID: <8C0Uk9UpG01TwecXPwI-lbm-sDmNQhldXJnyhkF0xSQ=.539b0b1a-cf1d-4e72-93e5-a7ce1ceb93c7@github.com> On Wed, 5 Jun 2024 05:19:55 GMT, Tobias Hartmann wrote: > I'll run this through our correctness and performance testing and report back once it passed. How was it, Tobias? Any surprises? ------------- PR Comment: https://git.openjdk.org/jdk/pull/19278#issuecomment-2154415662 From jsjolen at openjdk.org Fri Jun 7 09:24:37 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Fri, 7 Jun 2024 09:24:37 GMT Subject: RFR: 8333658: NMT: Use an allocator with 4-byte pointers to save memory in NativeCallStackStorage [v2] In-Reply-To: References: Message-ID: <5UDmdxZyg2v7gTe5yZBNkBupxgpqJGH_hlS9eEIHIos=.8de514e2-a650-4df3-b99c-405de34fc49e@github.com> > Hi, > > This PR introduces a new allocator, `IndexedFreelistAllocator`. It uses a `GrowableArray` in order to have 4-byte "pointers" to its elements and also works as a freelist as unused slots form a linked list. This allocator cannot shrink its used memory. I'm always open for better names. > > We then use this allocator in order to store the `NativeCallStackStorage` hash table. This saves `4 * 4099 = ~16KiB` of memory as each table element is now only 4 bytes. For each stored stack it also saves 4 bytes. The fact that the allocator cannot shrink is fine, we do not ever remove stacks from the storage. > > The main point of this is to introduce a pointer-generic experimential API, so I also implemented CHeap and Arena allocators. It's currently placed in NMT, but we might want to move it into utilities. It uses a bit of template-magic, but my IDE (Emacs+clangd) was actually able to figure things out when the types didn't line up correctly etc, so it's not an enemy to IDE help. > > It sounded expensive to have the GrowableArray continously realloc its underlying data array, so I did a basic test where we allocate 1 000 000 stacks and push them into NativeCallStackStorage backed by different allocators. This is available in the PR. > > The results are as follows on linux-x64-slowdebug: > > > Generate stacks... Done > Time taken with GrowableArray: 8341.240945 > Time taken with CHeap: 12189.031318 > Time taken with Arena: 8800.703092 > Time taken with GrowableArray again: 8295.508829 > > > And on linux-x64: > > > Time taken with GrowableArray: 8378.018135 > Time taken with CHeap: 12437.347868 > Time taken with Arena: 8758.064717 > Time taken with GrowableArray again: 8391.076291 > > > Obviously, this is a very basic benchmark, but it seems like we're faster than CHeap and Arena for this case. Johan Sj?len has updated the pull request incrementally with three additional commits since the last revision: - Assert mustn't be nil, use new API - Use a char array with proper alignment - Add const variants and rename to at ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18979/files - new: https://git.openjdk.org/jdk/pull/18979/files/294e0d2c..8bdfc55a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18979&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18979&range=00-01 Stats: 26 lines in 2 files changed: 16 ins; 3 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/18979.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18979/head:pull/18979 PR: https://git.openjdk.org/jdk/pull/18979 From jsjolen at openjdk.org Fri Jun 7 09:29:34 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Fri, 7 Jun 2024 09:29:34 GMT Subject: RFR: 8333658: NMT: Use an allocator with 4-byte pointers to save memory in NativeCallStackStorage [v3] In-Reply-To: References: Message-ID: > Hi, > > This PR introduces a new allocator, `IndexedFreelistAllocator`. It uses a `GrowableArray` in order to have 4-byte "pointers" to its elements and also works as a freelist as unused slots form a linked list. This allocator cannot shrink its used memory. I'm always open for better names. > > We then use this allocator in order to store the `NativeCallStackStorage` hash table. This saves `4 * 4099 = ~16KiB` of memory as each table element is now only 4 bytes. For each stored stack it also saves 4 bytes. The fact that the allocator cannot shrink is fine, we do not ever remove stacks from the storage. > > The main point of this is to introduce a pointer-generic experimential API, so I also implemented CHeap and Arena allocators. It's currently placed in NMT, but we might want to move it into utilities. It uses a bit of template-magic, but my IDE (Emacs+clangd) was actually able to figure things out when the types didn't line up correctly etc, so it's not an enemy to IDE help. > > It sounded expensive to have the GrowableArray continously realloc its underlying data array, so I did a basic test where we allocate 1 000 000 stacks and push them into NativeCallStackStorage backed by different allocators. This is available in the PR. > > The results are as follows on linux-x64-slowdebug: > > > Generate stacks... Done > Time taken with GrowableArray: 8341.240945 > Time taken with CHeap: 12189.031318 > Time taken with Arena: 8800.703092 > Time taken with GrowableArray again: 8295.508829 > > > And on linux-x64: > > > Time taken with GrowableArray: 8378.018135 > Time taken with CHeap: 12437.347868 > Time taken with Arena: 8758.064717 > Time taken with GrowableArray again: 8391.076291 > > > Obviously, this is a very basic benchmark, but it seems like we're faster than CHeap and Arena for this case. Johan Sj?len has updated the pull request incrementally with two additional commits since the last revision: - Remove operator[] - Correct casts ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18979/files - new: https://git.openjdk.org/jdk/pull/18979/files/8bdfc55a..67671874 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18979&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18979&range=01-02 Stats: 22 lines in 2 files changed: 0 ins; 12 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/18979.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18979/head:pull/18979 PR: https://git.openjdk.org/jdk/pull/18979 From aph-open at littlepinkcloud.com Fri Jun 7 09:38:14 2024 From: aph-open at littlepinkcloud.com (Andrew Haley) Date: Fri, 7 Jun 2024 10:38:14 +0100 Subject: RFR: 8332689: RISC-V: Use load instead of trampolines In-Reply-To: References: Message-ID: <3e6ea114-b66a-47c1-82bb-9e7fd8416634@littlepinkcloud.com> On 5/29/24 15:28, Robbin Ehn wrote: > On some CPUs L1D and L1I can't contain the same cache line, which means > the tramopline stub can bounce from L1I->L1D->L1I, which is > expensive. Wouldn't it be a lot easier simply to put the target address loaded by the trampoline into the constant pool? > Even if you don't have that problem having a call to a jump is not the > fastest way. I guess the real problem there is that the jalr #imm range is pretty short, so taking a trampoline is a very common case, and you have to optimize for that. On AArch64 we optimize for the simple branch. BTW, on AArch64 we don't have all the problems described in Zjid. For example, if you modify code, do the icache invalidate dance, then patch a jump so that it points to the newly-modified code, every observer sees the new code that was modified before it sees the patched jump. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From gcao at openjdk.org Fri Jun 7 09:49:18 2024 From: gcao at openjdk.org (Gui Cao) Date: Fri, 7 Jun 2024 09:49:18 GMT Subject: RFR: 8332587: RISC-V: secondary_super_cache does not scale well [v3] In-Reply-To: References: <4BgBBn_Oqhipw03h7BA7ZV4ZbhnMhdDqXU94Z-IFshs=.bd949215-5dd8-43a6-ac4e-348feee4853b@github.com> Message-ID: On Tue, 4 Jun 2024 08:55:24 GMT, Hamlin Li wrote: > There are a bit regression in cases of testNegative63/64, although these might be rare cases or not very common cases, but it's worth to have a try to improve it if possible. I guess it's related to the implementation for the cases when bitmap is full. When it's full, before go to `repne_scan`, there're some instructions to execute. I wonder if it will help to have another "bitmap full test" just after "bitmap false test" (which is `test_bit(t0, r_bitmap, bit);`). But I'm not sure if it's feasible, maybe worth a try. Hi, Sorry for being late. Thanks for the suggestion, I gave it a try. ``` diff diff --git a/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp b/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp index 61e8016db4a..af1649c061f 100644 --- a/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp +++ b/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp @@ -3664,6 +3664,32 @@ bool MacroAssembler::lookup_secondary_supers_table(Register r_sub_klass, test_bit(t0, r_bitmap, bit); beqz(t0, L_fallthrough); + // We will consult the secondary-super array. + ld(r_array_base, Address(r_sub_klass, in_bytes(Klass::secondary_supers_offset()))); + + mv(tmp3, r_bitmap); + if (bit != 0) { + ror_imm(tmp3, tmp3, bit); + } + + Label skip; + addi(t0, tmp3, (u1)1); + bnez(t0, skip); + + // Load the array length. + lwu(r_array_length, Address(r_array_base, Array::length_offset_in_bytes())); + // And adjust the array base to point to the data. + // NB! Effectively increments current slot index by 1. + assert(Array::base_offset_in_bytes() == wordSize, ""); + addi(r_array_base, r_array_base, Array::base_offset_in_bytes()); + + repne_scan(r_array_base, r_super_klass, r_array_length, t0); + bne(r_super_klass, t0, L_fallthrough); + mv(result, zr); + beqz(result, L_fallthrough); + + bind(skip); + // Get the first array index that can contain super_klass into r_array_index. if (bit != 0) { slli(r_array_index, r_bitmap, (Klass::SECONDARY_SUPERS_TABLE_MASK - bit)); @@ -3672,9 +3698,6 @@ bool MacroAssembler::lookup_secondary_supers_table(Register r_sub_klass, mv(r_array_index, (u1)1); } - // We will consult the secondary-super array. - ld(r_array_base, Address(r_sub_klass, in_bytes(Klass::secondary_supers_offset()))); - // The value i in r_array_index is >= 1, so even though r_array_base // points to the length, we don't need to adjust it to point to the data. assert(Array::base_offset_in_bytes() == wordSize, "Adjust this code"); #### JMH tested on Banana Pi BPI-F3 board (has Zbb) and Enable UseZbb, Not Enable UseZba, UseZbs Benchmark Mode Cnt Score Error Units SecondarySupersLookup.testNegative00 avgt 15 13.262 ? 0.197 ns/op SecondarySupersLookup.testNegative01 avgt 15 13.273 ? 0.222 ns/op SecondarySupersLookup.testNegative02 avgt 15 13.264 ? 0.199 ns/op SecondarySupersLookup.testNegative03 avgt 15 13.275 ? 0.222 ns/op SecondarySupersLookup.testNegative04 avgt 15 13.264 ? 0.198 ns/op SecondarySupersLookup.testNegative05 avgt 15 13.259 ? 0.192 ns/op SecondarySupersLookup.testNegative06 avgt 15 13.260 ? 0.195 ns/op SecondarySupersLookup.testNegative07 avgt 15 13.275 ? 0.221 ns/op SecondarySupersLookup.testNegative08 avgt 15 13.261 ? 0.196 ns/op SecondarySupersLookup.testNegative09 avgt 15 13.267 ? 0.201 ns/op SecondarySupersLookup.testNegative10 avgt 15 13.272 ? 0.211 ns/op SecondarySupersLookup.testNegative16 avgt 15 13.271 ? 0.200 ns/op SecondarySupersLookup.testNegative20 avgt 15 13.271 ? 0.210 ns/op SecondarySupersLookup.testNegative30 avgt 15 13.277 ? 0.219 ns/op SecondarySupersLookup.testNegative32 avgt 15 13.280 ? 0.224 ns/op SecondarySupersLookup.testNegative40 avgt 15 13.285 ? 0.232 ns/op SecondarySupersLookup.testNegative50 avgt 15 13.288 ? 0.237 ns/op SecondarySupersLookup.testNegative55 avgt 15 54.940 ? 0.771 ns/op SecondarySupersLookup.testNegative56 avgt 15 54.934 ? 0.798 ns/op SecondarySupersLookup.testNegative57 avgt 15 54.909 ? 0.766 ns/op SecondarySupersLookup.testNegative58 avgt 15 54.679 ? 0.830 ns/op SecondarySupersLookup.testNegative59 avgt 15 54.941 ? 0.819 ns/op SecondarySupersLookup.testNegative60 avgt 15 76.957 ? 0.945 ns/op SecondarySupersLookup.testNegative61 avgt 15 76.956 ? 1.007 ns/op SecondarySupersLookup.testNegative62 avgt 15 76.938 ? 0.976 ns/op SecondarySupersLookup.testNegative63 avgt 15 138.371 ? 1.192 ns/op SecondarySupersLookup.testNegative64 avgt 15 140.137 ? 1.077 ns/op SecondarySupersLookup.testPositive01 avgt 15 10.734 ? 0.149 ns/op SecondarySupersLookup.testPositive02 avgt 15 10.730 ? 0.150 ns/op SecondarySupersLookup.testPositive03 avgt 15 10.727 ? 0.146 ns/op SecondarySupersLookup.testPositive04 avgt 15 10.735 ? 0.157 ns/op SecondarySupersLookup.testPositive05 avgt 15 10.730 ? 0.149 ns/op SecondarySupersLookup.testPositive06 avgt 15 10.735 ? 0.155 ns/op SecondarySupersLookup.testPositive07 avgt 15 10.730 ? 0.149 ns/op SecondarySupersLookup.testPositive08 avgt 15 10.730 ? 0.149 ns/op SecondarySupersLookup.testPositive09 avgt 15 10.731 ? 0.151 ns/op SecondarySupersLookup.testPositive10 avgt 15 10.728 ? 0.148 ns/op SecondarySupersLookup.testPositive16 avgt 15 10.733 ? 0.151 ns/op SecondarySupersLookup.testPositive20 avgt 15 10.728 ? 0.146 ns/op SecondarySupersLookup.testPositive30 avgt 15 10.737 ? 0.162 ns/op SecondarySupersLookup.testPositive32 avgt 15 10.733 ? 0.154 ns/op SecondarySupersLookup.testPositive40 avgt 15 10.729 ? 0.148 ns/op SecondarySupersLookup.testPositive50 avgt 15 10.732 ? 0.151 ns/op SecondarySupersLookup.testPositive60 avgt 15 10.729 ? 0.148 ns/op SecondarySupersLookup.testPositive63 avgt 15 10.730 ? 0.150 ns/op SecondarySupersLookup.testPositive64 avgt 15 10.735 ? 0.158 ns/op Finished running test 'micro:vm.lang.SecondarySupersLookup' After the fix it got better with the cases of testNegative63/64. As said above, this case is very rare, Is there a need for this additional complexity? ------------- PR Comment: https://git.openjdk.org/jdk/pull/19320#issuecomment-2154485874 From duke at openjdk.org Fri Jun 7 09:52:37 2024 From: duke at openjdk.org (Liming Liu) Date: Fri, 7 Jun 2024 09:52:37 GMT Subject: RFR: 8324781: runtime/Thread/TestAlwaysPreTouchStacks.java failed with Expected a higher ratio between stack committed and reserved [v10] In-Reply-To: References: Message-ID: > The testcase failed on Oracle CI since JDK-8315923. The root cause is that Oracle CI runs Linux-5.4.17-UEK where the value of MADV_POPULATE_WRITE (23) is used as MADV_DONTEXEC which is not supported by upstream. This PR solves the testcase failure by checking versions of kernels first, and checking the availability of MADV_POPULATE_WRITE when they are not older than 5.14. Liming Liu has updated the pull request incrementally with one additional commit since the last revision: Guard more madv numbers ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18592/files - new: https://git.openjdk.org/jdk/pull/18592/files/fe98ec0a..cb2adb8d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18592&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18592&range=08-09 Stats: 62 lines in 2 files changed: 57 ins; 2 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/18592.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18592/head:pull/18592 PR: https://git.openjdk.org/jdk/pull/18592 From jsjolen at openjdk.org Fri Jun 7 10:26:46 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Fri, 7 Jun 2024 10:26:46 GMT Subject: RFR: 8333658: NMT: Use an allocator with 4-byte pointers to save memory in NativeCallStackStorage [v4] In-Reply-To: References: Message-ID: > Hi, > > This PR introduces a new allocator, `IndexedFreelistAllocator`. It uses a `GrowableArray` in order to have 4-byte "pointers" to its elements and also works as a freelist as unused slots form a linked list. This allocator cannot shrink its used memory. I'm always open for better names. > > We then use this allocator in order to store the `NativeCallStackStorage` hash table. This saves `4 * 4099 = ~16KiB` of memory as each table element is now only 4 bytes. For each stored stack it also saves 4 bytes. The fact that the allocator cannot shrink is fine, we do not ever remove stacks from the storage. > > The main point of this is to introduce a pointer-generic experimential API, so I also implemented CHeap and Arena allocators. It's currently placed in NMT, but we might want to move it into utilities. It uses a bit of template-magic, but my IDE (Emacs+clangd) was actually able to figure things out when the types didn't line up correctly etc, so it's not an enemy to IDE help. > > It sounded expensive to have the GrowableArray continously realloc its underlying data array, so I did a basic test where we allocate 1 000 000 stacks and push them into NativeCallStackStorage backed by different allocators. This is available in the PR. > > The results are as follows on linux-x64-slowdebug: > > > Generate stacks... Done > Time taken with GrowableArray: 8341.240945 > Time taken with CHeap: 12189.031318 > Time taken with Arena: 8800.703092 > Time taken with GrowableArray again: 8295.508829 > > > And on linux-x64: > > > Time taken with GrowableArray: 8378.018135 > Time taken with CHeap: 12437.347868 > Time taken with Arena: 8758.064717 > Time taken with GrowableArray again: 8391.076291 > > > Obviously, this is a very basic benchmark, but it seems like we're faster than CHeap and Arena for this case. Johan Sj?len has updated the pull request incrementally with two additional commits since the last revision: - Fix perf style - Use %zu ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18979/files - new: https://git.openjdk.org/jdk/pull/18979/files/67671874..b1d264d4 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18979&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18979&range=02-03 Stats: 10 lines in 2 files changed: 0 ins; 4 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/18979.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18979/head:pull/18979 PR: https://git.openjdk.org/jdk/pull/18979 From jsjolen at openjdk.org Fri Jun 7 10:39:15 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Fri, 7 Jun 2024 10:39:15 GMT Subject: RFR: 8333658: NMT: Use an allocator with 4-byte pointers to save memory in NativeCallStackStorage [v4] In-Reply-To: References: Message-ID: <9-v3yeLLzx9bPGRmUMsqWXEg-HovFKdLV-fVdeZmSzE=.b1a708e4-e0ff-4ed2-a12f-a86fcff1dd74@github.com> On Fri, 7 Jun 2024 10:26:46 GMT, Johan Sj?len wrote: >> Hi, >> >> This PR introduces a new allocator, `IndexedFreelistAllocator`. It uses a `GrowableArray` in order to have 4-byte "pointers" to its elements and also works as a freelist as unused slots form a linked list. This allocator cannot shrink its used memory. I'm always open for better names. >> >> We then use this allocator in order to store the `NativeCallStackStorage` hash table. This saves `4 * 4099 = ~16KiB` of memory as each table element is now only 4 bytes. For each stored stack it also saves 4 bytes. The fact that the allocator cannot shrink is fine, we do not ever remove stacks from the storage. >> >> The main point of this is to introduce a pointer-generic experimential API, so I also implemented CHeap and Arena allocators. It's currently placed in NMT, but we might want to move it into utilities. It uses a bit of template-magic, but my IDE (Emacs+clangd) was actually able to figure things out when the types didn't line up correctly etc, so it's not an enemy to IDE help. >> >> It sounded expensive to have the GrowableArray continously realloc its underlying data array, so I did a basic test where we allocate 1 000 000 stacks and push them into NativeCallStackStorage backed by different allocators. This is available in the PR. >> >> The results are as follows on linux-x64-slowdebug: >> >> >> Generate stacks... Done >> Time taken with GrowableArray: 8341.240945 >> Time taken with CHeap: 12189.031318 >> Time taken with Arena: 8800.703092 >> Time taken with GrowableArray again: 8295.508829 >> >> >> And on linux-x64: >> >> >> Time taken with GrowableArray: 8378.018135 >> Time taken with CHeap: 12437.347868 >> Time taken with Arena: 8758.064717 >> Time taken with GrowableArray again: 8391.076291 >> >> >> Obviously, this is a very basic benchmark, but it seems like we're faster than CHeap and Arena for this case. > > Johan Sj?len has updated the pull request incrementally with two additional commits since the last revision: > > - Fix perf style > - Use %zu I actually messed up the performance numbers for `linux-x64` and accidentally just ran the slow-debug test again :). Here are the actual results, now we're beating Arena and CHeap even harder than before so that's great. Generate stacks... Done Time taken with GrowableArray: 3521.723657 Time taken with CHeap: 8549.043628 Time taken with Arena: 5510.189990 Time taken with GrowableArray again: 3535.035375 ------------- PR Comment: https://git.openjdk.org/jdk/pull/18979#issuecomment-2154566422 From jsjolen at openjdk.org Fri Jun 7 10:54:43 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Fri, 7 Jun 2024 10:54:43 GMT Subject: RFR: 8333658: NMT: Use an allocator with 4-byte pointers to save memory in NativeCallStackStorage [v5] In-Reply-To: References: Message-ID: > Hi, > > This PR introduces a new allocator, `IndexedFreelistAllocator`. It uses a `GrowableArray` in order to have 4-byte "pointers" to its elements and also works as a freelist as unused slots form a linked list. This allocator cannot shrink its used memory. I'm always open for better names. > > We then use this allocator in order to store the `NativeCallStackStorage` hash table. This saves `4 * 4099 = ~16KiB` of memory as each table element is now only 4 bytes. For each stored stack it also saves 4 bytes. The fact that the allocator cannot shrink is fine, we do not ever remove stacks from the storage. > > The main point of this is to introduce a pointer-generic experimential API, so I also implemented CHeap and Arena allocators. It's currently placed in NMT, but we might want to move it into utilities. It uses a bit of template-magic, but my IDE (Emacs+clangd) was actually able to figure things out when the types didn't line up correctly etc, so it's not an enemy to IDE help. > > It sounded expensive to have the GrowableArray continously realloc its underlying data array, so I did a basic test where we allocate 1 000 000 stacks and push them into NativeCallStackStorage backed by different allocators. This is available in the PR. > > The results are as follows on linux-x64-slowdebug: > > > Generate stacks... Done > Time taken with GrowableArray: 8341.240945 > Time taken with CHeap: 12189.031318 > Time taken with Arena: 8800.703092 > Time taken with GrowableArray again: 8295.508829 > > > And on linux-x64: > > > Time taken with GrowableArray: 8378.018135 > Time taken with CHeap: 12437.347868 > Time taken with Arena: 8758.064717 > Time taken with GrowableArray again: 8391.076291 > > > Obviously, this is a very basic benchmark, but it seems like we're faster than CHeap and Arena for this case. Johan Sj?len has updated the pull request incrementally with three additional commits since the last revision: - Fix access specifiers - Use access specifiers and class for I - Recognise when we free to the wrong owner ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18979/files - new: https://git.openjdk.org/jdk/pull/18979/files/b1d264d4..0b3de42e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18979&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18979&range=03-04 Stats: 30 lines in 1 file changed: 11 ins; 0 del; 19 mod Patch: https://git.openjdk.org/jdk/pull/18979.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18979/head:pull/18979 PR: https://git.openjdk.org/jdk/pull/18979 From jsjolen at openjdk.org Fri Jun 7 10:54:43 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Fri, 7 Jun 2024 10:54:43 GMT Subject: RFR: 8333658: NMT: Use an allocator with 4-byte pointers to save memory in NativeCallStackStorage [v5] In-Reply-To: References: Message-ID: On Thu, 6 Jun 2024 12:44:01 GMT, Thomas Stuefe wrote: >> Johan Sj?len has updated the pull request incrementally with three additional commits since the last revision: >> >> - Fix access specifiers >> - Use access specifiers and class for I >> - Recognise when we free to the wrong owner > > src/hotspot/share/nmt/indexedFreeListAllocator.hpp line 23: > >> 21: return idx == other.idx; >> 22: } >> 23: }; > > Why so much code? Why not just typedef to uint32? Now we use access specifiers to deny access to the index, except for the allocator, which is a friend. We do want the index to be opaque, so we do this. On top of that, we use an `_owner` field in debug mode to assert that we don't free to the wrong allocator. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18979#discussion_r1631024187 From jsjolen at openjdk.org Fri Jun 7 11:01:15 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Fri, 7 Jun 2024 11:01:15 GMT Subject: RFR: 8333658: NMT: Use an allocator with 4-byte pointers to save memory in NativeCallStackStorage [v5] In-Reply-To: References: Message-ID: On Thu, 6 Jun 2024 13:40:33 GMT, Thomas Stuefe wrote: >Also, please expose the init size to outside, possibly with a sensible default like your 8 Yeah, we can do this. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18979#discussion_r1631033262 From lucy at openjdk.org Fri Jun 7 11:14:23 2024 From: lucy at openjdk.org (Lutz Schmidt) Date: Fri, 7 Jun 2024 11:14:23 GMT Subject: RFR: 8333566: Remove unused methods In-Reply-To: <3vCVNjjgFzEw_YgoboTLXemBuzwht9uEeZxC8yg5Zog=.a7600184-c4ca-4a93-8b76-74df7bdff405@github.com> References: <3vCVNjjgFzEw_YgoboTLXemBuzwht9uEeZxC8yg5Zog=.a7600184-c4ca-4a93-8b76-74df7bdff405@github.com> Message-ID: On Tue, 4 Jun 2024 20:51:52 GMT, Cesar Soares Lucas wrote: > Please, consider this patch to remove unused methods from the code base. To the best of my knowledge, these methods are only defined but never used. > > Here is a list with names of delete methods: https://gist.github.com/JohnTortugo/fccc29781a1b584c03162aa4e160e874 > > Tested with Linux x86_64 tier1-4, GHA, and only cross building to other platforms. I feel very uncomfortable with this PR, at least as far as s390 is concerned. Many of the methods now set to be removed have been implemented with "implementation completeness" in mind. Not being used currently does not allow the implication of being obsolete. The approach on s390 has always been to provide support for potentially useful new instructions once they become available. Later exploitation can then focus on the use, without bloating the PR with low-level assembler* declarations. There are a few exceptions, though. One is the code around `_atomic_memory_operation_lock`. That seems to be a leftover which simply was forgotten. Same seems true for `pd_relocate_CodeBlob` ------------- Changes requested by lucy (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19550#pullrequestreview-2104286082 From jsjolen at openjdk.org Fri Jun 7 11:38:44 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Fri, 7 Jun 2024 11:38:44 GMT Subject: RFR: 8333658: NMT: Use an allocator with 4-byte pointers to save memory in NativeCallStackStorage [v6] In-Reply-To: References: Message-ID: > Hi, > > This PR introduces a new allocator, `IndexedFreelistAllocator`. It uses a `GrowableArray` in order to have 4-byte "pointers" to its elements and also works as a freelist as unused slots form a linked list. This allocator cannot shrink its used memory. I'm always open for better names. > > We then use this allocator in order to store the `NativeCallStackStorage` hash table. This saves `4 * 4099 = ~16KiB` of memory as each table element is now only 4 bytes. For each stored stack it also saves 4 bytes. The fact that the allocator cannot shrink is fine, we do not ever remove stacks from the storage. > > The main point of this is to introduce a pointer-generic experimential API, so I also implemented CHeap and Arena allocators. It's currently placed in NMT, but we might want to move it into utilities. It uses a bit of template-magic, but my IDE (Emacs+clangd) was actually able to figure things out when the types didn't line up correctly etc, so it's not an enemy to IDE help. > > It sounded expensive to have the GrowableArray continously realloc its underlying data array, so I did a basic test where we allocate 1 000 000 stacks and push them into NativeCallStackStorage backed by different allocators. This is available in the PR. > > The results are as follows on linux-x64-slowdebug: > > > Generate stacks... Done > Time taken with GrowableArray: 8341.240945 > Time taken with CHeap: 12189.031318 > Time taken with Arena: 8800.703092 > Time taken with GrowableArray again: 8295.508829 > > > And on linux-x64: > > > Time taken with GrowableArray: 8378.018135 > Time taken with CHeap: 12437.347868 > Time taken with Arena: 8758.064717 > Time taken with GrowableArray again: 8391.076291 > > > Obviously, this is a very basic benchmark, but it seems like we're faster than CHeap and Arena for this case. Johan Sj?len has updated the pull request incrementally with three additional commits since the last revision: - Fix test bug - Expose an initial_capacity - More asserts ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18979/files - new: https://git.openjdk.org/jdk/pull/18979/files/0b3de42e..396aa698 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18979&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18979&range=04-05 Stats: 36 lines in 2 files changed: 15 ins; 5 del; 16 mod Patch: https://git.openjdk.org/jdk/pull/18979.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18979/head:pull/18979 PR: https://git.openjdk.org/jdk/pull/18979 From jsjolen at openjdk.org Fri Jun 7 11:52:12 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Fri, 7 Jun 2024 11:52:12 GMT Subject: RFR: 8333566: Remove unused methods In-Reply-To: <3vCVNjjgFzEw_YgoboTLXemBuzwht9uEeZxC8yg5Zog=.a7600184-c4ca-4a93-8b76-74df7bdff405@github.com> References: <3vCVNjjgFzEw_YgoboTLXemBuzwht9uEeZxC8yg5Zog=.a7600184-c4ca-4a93-8b76-74df7bdff405@github.com> Message-ID: On Tue, 4 Jun 2024 20:51:52 GMT, Cesar Soares Lucas wrote: > Please, consider this patch to remove unused methods from the code base. To the best of my knowledge, these methods are only defined but never used. > > Here is a list with names of delete methods: https://gist.github.com/JohnTortugo/fccc29781a1b584c03162aa4e160e874 > > Tested with Linux x86_64 tier1-4, GHA, and only cross building to other platforms. Hi, Removing dead code is great, but it has to be done with reasoning behind it and preferably in smaller batches, so as to be reviewable. I also don't understand why you comment out tests. If you can make these into smaller and more localized PRs, then I'll be happy to take a look if I think I'm the right person to review it. All the best, Johan ------------- PR Comment: https://git.openjdk.org/jdk/pull/19550#issuecomment-2154676801 From jsjolen at openjdk.org Fri Jun 7 11:57:56 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Fri, 7 Jun 2024 11:57:56 GMT Subject: RFR: 8333658: NMT: Use an allocator with 4-byte pointers to save memory in NativeCallStackStorage [v7] In-Reply-To: References: Message-ID: > Hi, > > This PR introduces a new allocator, `IndexedFreelistAllocator`. It uses a `GrowableArray` in order to have 4-byte "pointers" to its elements and also works as a freelist as unused slots form a linked list. This allocator cannot shrink its used memory. I'm always open for better names. > > We then use this allocator in order to store the `NativeCallStackStorage` hash table. This saves `4 * 4099 = ~16KiB` of memory as each table element is now only 4 bytes. For each stored stack it also saves 4 bytes. The fact that the allocator cannot shrink is fine, we do not ever remove stacks from the storage. > > The main point of this is to introduce a pointer-generic experimential API, so I also implemented CHeap and Arena allocators. It's currently placed in NMT, but we might want to move it into utilities. It uses a bit of template-magic, but my IDE (Emacs+clangd) was actually able to figure things out when the types didn't line up correctly etc, so it's not an enemy to IDE help. > > It sounded expensive to have the GrowableArray continously realloc its underlying data array, so I did a basic test where we allocate 1 000 000 stacks and push them into NativeCallStackStorage backed by different allocators. This is available in the PR. > > The results are as follows on linux-x64-slowdebug: > > > Generate stacks... Done > Time taken with GrowableArray: 8341.240945 > Time taken with CHeap: 12189.031318 > Time taken with Arena: 8800.703092 > Time taken with GrowableArray again: 8295.508829 > > > And on linux-x64: > > > Time taken with GrowableArray: 8378.018135 > Time taken with CHeap: 12437.347868 > Time taken with Arena: 8758.064717 > Time taken with GrowableArray again: 8391.076291 > > > Obviously, this is a very basic benchmark, but it seems like we're faster than CHeap and Arena for this case. Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: Missed debug only ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18979/files - new: https://git.openjdk.org/jdk/pull/18979/files/396aa698..c457756d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18979&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18979&range=05-06 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/18979.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18979/head:pull/18979 PR: https://git.openjdk.org/jdk/pull/18979 From rehn at openjdk.org Fri Jun 7 11:59:15 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Fri, 7 Jun 2024 11:59:15 GMT Subject: RFR: 8332689: RISC-V: Use load instead of trampolines [v7] In-Reply-To: References: <3UcoOmZTsZJBu0ZkOZQ21-ynr1slrOGCDuWYFnJQz1U=.6467ec54-cd28-4aae-a90c-6b6858f37986@github.com> Message-ID: On Thu, 6 Jun 2024 20:56:46 GMT, Hamlin Li wrote: >> Robbin Ehn has updated the pull request incrementally with one additional commit since the last revision: >> >> Remove tmp file > > src/hotspot/cpu/riscv/nativeInst_riscv.hpp line 72: > >> 70: bool is_jump() const { return MacroAssembler::is_jump_at(addr_at(0)); } >> 71: bool is_call() const { return is_call_at(addr_at(0)); } >> 72: static bool is_call_at(address addr); > > Is this indirection of `is_call_at` necessary? seems only is_call is calling is_call_at? This follows how this was implmented. Implementation in static method taking an address. Member methods using forwarding to these static method with it's instruction address. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19453#discussion_r1631096475 From rehn at openjdk.org Fri Jun 7 11:59:16 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Fri, 7 Jun 2024 11:59:16 GMT Subject: RFR: 8332689: RISC-V: Use load instead of trampolines [v7] In-Reply-To: References: <3UcoOmZTsZJBu0ZkOZQ21-ynr1slrOGCDuWYFnJQz1U=.6467ec54-cd28-4aae-a90c-6b6858f37986@github.com> Message-ID: On Fri, 7 Jun 2024 08:09:03 GMT, Fei Yang wrote: >> Robbin Ehn has updated the pull request incrementally with one additional commit since the last revision: >> >> Remove tmp file > > src/hotspot/cpu/riscv/relocInfo_riscv.cpp line 84: > >> 82: if (NativeCall::is_at(addr())) { >> 83: NativeCall* nc = nativeCall_at(addr()); >> 84: if (nc->reloc_set_destination(x)) { > > Seems there is a subtle difference here. Previously, there is a cache invalidation operation in `NativeCall::set_destination_mt_safe` [1] which is called by this `Relocation::pd_set_call_destination`. > Now it's gone with this change: it's not there even in `NativeShortCall::reloc_set_destination`. > Is that intended? > > [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/riscv/nativeInst_riscv.cpp#L96 Yes, we do an cache flush for the entire memory after all relocations are done. void CodeBuffer::copy_code_to(CodeBlob* dest_blob) { ... relocate_code_to(&dest); // reloc done here .... // Flush generated code ICache::invalidate_range(dest_blob->code_begin(), dest_blob->code_size()); } As we need todo this for the entire method I can't find a reason why reloc would need it? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19453#discussion_r1631091872 From rehn at openjdk.org Fri Jun 7 11:59:17 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Fri, 7 Jun 2024 11:59:17 GMT Subject: RFR: 8332689: RISC-V: Use load instead of trampolines [v7] In-Reply-To: References: <3UcoOmZTsZJBu0ZkOZQ21-ynr1slrOGCDuWYFnJQz1U=.6467ec54-cd28-4aae-a90c-6b6858f37986@github.com> Message-ID: <3jyhG5L-3PLzTSIckYxLlCEgMD-lWgD80sAQAWmAET8=.6aef0226-a0aa-4f28-9c25-e9c877d8f810@github.com> On Fri, 7 Jun 2024 11:52:21 GMT, Robbin Ehn wrote: >> src/hotspot/cpu/riscv/relocInfo_riscv.cpp line 84: >> >>> 82: if (NativeCall::is_at(addr())) { >>> 83: NativeCall* nc = nativeCall_at(addr()); >>> 84: if (nc->reloc_set_destination(x)) { >> >> Seems there is a subtle difference here. Previously, there is a cache invalidation operation in `NativeCall::set_destination_mt_safe` [1] which is called by this `Relocation::pd_set_call_destination`. >> Now it's gone with this change: it's not there even in `NativeShortCall::reloc_set_destination`. >> Is that intended? >> >> [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/riscv/nativeInst_riscv.cpp#L96 > > Yes, we do an cache flush for the entire memory after all relocations are done. > > void CodeBuffer::copy_code_to(CodeBlob* dest_blob) { > ... > relocate_code_to(&dest); // reloc done here > .... > // Flush generated code > ICache::invalidate_range(dest_blob->code_begin(), dest_blob->code_size()); > } > > > As we need todo this for the entire method I can't find a reason why reloc would need it? I.e. only in mt_safe case we need an invidual cache flush, if instructions where changed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19453#discussion_r1631094069 From jsjolen at openjdk.org Fri Jun 7 12:00:38 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Fri, 7 Jun 2024 12:00:38 GMT Subject: RFR: 8333658: NMT: Use an allocator with 4-byte pointers to save memory in NativeCallStackStorage [v8] In-Reply-To: References: Message-ID: > Hi, > > This PR introduces a new allocator, `IndexedFreelistAllocator`. It uses a `GrowableArray` in order to have 4-byte "pointers" to its elements and also works as a freelist as unused slots form a linked list. This allocator cannot shrink its used memory. I'm always open for better names. > > We then use this allocator in order to store the `NativeCallStackStorage` hash table. This saves `4 * 4099 = ~16KiB` of memory as each table element is now only 4 bytes. For each stored stack it also saves 4 bytes. The fact that the allocator cannot shrink is fine, we do not ever remove stacks from the storage. > > The main point of this is to introduce a pointer-generic experimential API, so I also implemented CHeap and Arena allocators. It's currently placed in NMT, but we might want to move it into utilities. It uses a bit of template-magic, but my IDE (Emacs+clangd) was actually able to figure things out when the types didn't line up correctly etc, so it's not an enemy to IDE help. > > It sounded expensive to have the GrowableArray continously realloc its underlying data array, so I did a basic test where we allocate 1 000 000 stacks and push them into NativeCallStackStorage backed by different allocators. This is available in the PR. > > The results are as follows on linux-x64-slowdebug: > > > Generate stacks... Done > Time taken with GrowableArray: 8341.240945 > Time taken with CHeap: 12189.031318 > Time taken with Arena: 8800.703092 > Time taken with GrowableArray again: 8295.508829 > > > And on linux-x64: > > > Time taken with GrowableArray: 8378.018135 > Time taken with CHeap: 12437.347868 > Time taken with Arena: 8758.064717 > Time taken with GrowableArray again: 8391.076291 > > > Obviously, this is a very basic benchmark, but it seems like we're faster than CHeap and Arena for this case. Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: Another one :-) ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18979/files - new: https://git.openjdk.org/jdk/pull/18979/files/c457756d..e70c056c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18979&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18979&range=06-07 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/18979.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18979/head:pull/18979 PR: https://git.openjdk.org/jdk/pull/18979 From rehn at openjdk.org Fri Jun 7 12:06:12 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Fri, 7 Jun 2024 12:06:12 GMT Subject: RFR: 8332689: RISC-V: Use load instead of trampolines [v7] In-Reply-To: <3jyhG5L-3PLzTSIckYxLlCEgMD-lWgD80sAQAWmAET8=.6aef0226-a0aa-4f28-9c25-e9c877d8f810@github.com> References: <3UcoOmZTsZJBu0ZkOZQ21-ynr1slrOGCDuWYFnJQz1U=.6467ec54-cd28-4aae-a90c-6b6858f37986@github.com> <3jyhG5L-3PLzTSIckYxLlCEgMD-lWgD80sAQAWmAET8=.6aef0226-a0aa-4f28-9c25-e9c877d8f810@github.com> Message-ID: On Fri, 7 Jun 2024 11:54:37 GMT, Robbin Ehn wrote: >> Yes, we do an cache flush for the entire memory after all relocations are done. >> >> void CodeBuffer::copy_code_to(CodeBlob* dest_blob) { >> ... >> relocate_code_to(&dest); // reloc done here >> .... >> // Flush generated code >> ICache::invalidate_range(dest_blob->code_begin(), dest_blob->code_size()); >> } >> >> >> As we need todo this for the entire method I can't find a reason why reloc would need it? > > I.e. only in mt_safe case we need an invidual cache flush, if instructions where changed. Also I split them up because it was very confusing having relocations calling mt_safe. As you can see in FarCall they are quite different, made no sense to having both cases in same method. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19453#discussion_r1631102852 From eosterlund at openjdk.org Fri Jun 7 12:15:18 2024 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Fri, 7 Jun 2024 12:15:18 GMT Subject: RFR: 8333566: Remove unused methods In-Reply-To: <3vCVNjjgFzEw_YgoboTLXemBuzwht9uEeZxC8yg5Zog=.a7600184-c4ca-4a93-8b76-74df7bdff405@github.com> References: <3vCVNjjgFzEw_YgoboTLXemBuzwht9uEeZxC8yg5Zog=.a7600184-c4ca-4a93-8b76-74df7bdff405@github.com> Message-ID: On Tue, 4 Jun 2024 20:51:52 GMT, Cesar Soares Lucas wrote: > Please, consider this patch to remove unused methods from the code base. To the best of my knowledge, these methods are only defined but never used. > > Here is a list with names of delete methods: https://gist.github.com/JohnTortugo/fccc29781a1b584c03162aa4e160e874 > > Tested with Linux x86_64 tier1-4, GHA, and only cross building to other platforms. Some parts of the code, such as the assemblers, can be seen as tools that we have in our shed so that we can write other powerful code. If you have a shed full of tools, then naturally you can go through the shed and get rid of the tools we don't seem to currently use. Who needs a spade anyway? Nobody has used that spade for a year! Except that eventually, the day always comes when you need a spade. Since you have now thrown away the only spade in the shed, you will find yourself with the option to either 1) try to make do with a trowel, which is horrible but might work as a hack. Or 2) you have to make a new spade yet again. And no, we can't buy a ready made spade. It can be very annoying when you have what would seemingly be a trivial patch, but then you find out you won the lottery and you are apparently the first person in a while that needed a testl with a memory operand comparing against a 32 bit immediate, and have to go and read ISA manuals to figure out how to encode this thing correctly. It adds a large amount of extra work to add support for something that we should be able to take for granted. I'm not a big fan of throwing away all the tools we have in the shed just because they haven't been used in a while. I don't want to dig my next hole with a trowel, nor do I want to build a new spade that we already have. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19550#issuecomment-2154709857 From jsjolen at openjdk.org Fri Jun 7 12:18:42 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Fri, 7 Jun 2024 12:18:42 GMT Subject: RFR: 8333658: NMT: Use an allocator with 4-byte pointers to save memory in NativeCallStackStorage [v9] In-Reply-To: References: Message-ID: > Hi, > > This PR introduces a new allocator, `IndexedFreelistAllocator`. It uses a `GrowableArray` in order to have 4-byte "pointers" to its elements and also works as a freelist as unused slots form a linked list. This allocator cannot shrink its used memory. I'm always open for better names. > > We then use this allocator in order to store the `NativeCallStackStorage` hash table. This saves `4 * 4099 = ~16KiB` of memory as each table element is now only 4 bytes. For each stored stack it also saves 4 bytes. The fact that the allocator cannot shrink is fine, we do not ever remove stacks from the storage. > > The main point of this is to introduce a pointer-generic experimential API, so I also implemented CHeap and Arena allocators. It's currently placed in NMT, but we might want to move it into utilities. It uses a bit of template-magic, but my IDE (Emacs+clangd) was actually able to figure things out when the types didn't line up correctly etc, so it's not an enemy to IDE help. > > It sounded expensive to have the GrowableArray continously realloc its underlying data array, so I did a basic test where we allocate 1 000 000 stacks and push them into NativeCallStackStorage backed by different allocators. This is available in the PR. > > The results are as follows on linux-x64-slowdebug: > > > Generate stacks... Done > Time taken with GrowableArray: 8341.240945 > Time taken with CHeap: 12189.031318 > Time taken with Arena: 8800.703092 > Time taken with GrowableArray again: 8295.508829 > > > And on linux-x64: > > > Time taken with GrowableArray: 8378.018135 > Time taken with CHeap: 12437.347868 > Time taken with Arena: 8758.064717 > Time taken with GrowableArray again: 8391.076291 > > > Obviously, this is a very basic benchmark, but it seems like we're faster than CHeap and Arena for this case. Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: Add include guards ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18979/files - new: https://git.openjdk.org/jdk/pull/18979/files/e70c056c..9f7eaa65 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18979&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18979&range=07-08 Stats: 30 lines in 2 files changed: 30 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/18979.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18979/head:pull/18979 PR: https://git.openjdk.org/jdk/pull/18979 From rehn at openjdk.org Fri Jun 7 12:50:25 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Fri, 7 Jun 2024 12:50:25 GMT Subject: RFR: 8332689: RISC-V: Use load instead of trampolines [v8] In-Reply-To: References: Message-ID: <4TVE4G5mkc7U0LI7pj_rd6jbG0vGyFTXWhwX86wA4tg=.3f823c2e-136f-431f-8e06-d61c652f36d2@github.com> > Hi all, please consider! > > Today we do JAL to **dest** if **dest** is in reach (+/- 1 MB). > Using a very small application or running very short time we have fast patchable calls. > But any normal application running longer will increase the code size and code chrun/fragmentation. > So whatever or not you get hot fast calls rely on luck. > > To be patchable and get code cache reach we also emit a stub trampoline which we can point the JAL to. > This would be the common case for a patchable call. > > Code stream: > JAL > Stubs: > AUIPC > LD > JALR > > > > On some CPUs L1D and L1I can't contain the same cache line, which means the tramopline stub can bounce from L1I->L1D->L1I, which is expensive. > Even if you don't have that problem having a call to a jump is not the fastest way. > Loading the address avoids the pitsfalls of cmodx. > > This patch suggest to solve the problems with trampolines, we take small penalty in the naive case of JAL to **dest**, > and instead do by default: > > Code stream: > AUIPC > LD > JALR > Stubs: > > > An experimental option for turning trampolines back on exists. > > It should be possible to enhanced this with the WIP [Zjid](https://github.com/riscv/riscv-j-extension) by changing the JALR to JAL and nop out the auipc+ld (as the current proposal of Zjid forces the I-fetcher to fetch instruction in order (meaning we will avoid a lot issues which arm has)) when in reach and vice-versa. > > Numbers from VF2 (I have done them a few times, they are always overall in favor of this patch): > > fop (msec) 2239 | 2128 = 0.950424 > h2 (msec) 18660 | 16594 = 0.889282 > jython (msec) 22022 | 21925 = 0.995595 > luindex (msec) 2866 | 2842 = 0.991626 > lusearch (msec) 4108 | 4311 = 1.04942 > lusearch-fix (msec) 4406 | 4116 = 0.934181 > pmd (msec) 5976 | 5897 = 0.98678 > jython (msec) 22022 | 21925 = 0.995595 > Avg: 0.974112 > fop(xcomp) (msec) 2721 | 2714 = 0.997427 > h2(xcomp) (msec) 37719 | 38004 = 1.00756 > jython(xcomp) ... Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 13 commits: - Merge branch 'master' into 8332689 - Remove tmp file - Prepare for dynamic NativeCall size - Only allow one calling convetion, i.e. fixed sized - Merge branch 'master' into 8332689 - Review comments - Move shart/far code to cpp - Cleanup - Merge branch 'master' into 8332689 - Merge branch 'master' into 8332689 - ... and 3 more: https://git.openjdk.org/jdk/compare/40b2fbd8...f93f588d ------------- Changes: https://git.openjdk.org/jdk/pull/19453/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19453&range=07 Stats: 907 lines in 16 files changed: 652 ins; 161 del; 94 mod Patch: https://git.openjdk.org/jdk/pull/19453.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19453/head:pull/19453 PR: https://git.openjdk.org/jdk/pull/19453 From sgehwolf at openjdk.org Fri Jun 7 12:59:26 2024 From: sgehwolf at openjdk.org (Severin Gehwolf) Date: Fri, 7 Jun 2024 12:59:26 GMT Subject: RFR: 8261242: [Linux] OSContainer::is_containerized() returns true when run outside a container [v4] In-Reply-To: References: Message-ID: > Please review this enhancement to the container detection code which allows it to figure out whether the JVM is actually running inside a container (`podman`, `docker`, `crio`), or with some other means that enforces memory/cpu limits by means of the cgroup filesystem. If neither of those conditions hold, the JVM runs in not containerized mode, addressing the issue described in the JBS tracker. For example, on my Linux system `is_containerized() == false" is being indicated with the following trace log line: > > > [0.001s][debug][os,container] OSContainer::init: is_containerized() = false because no cpu or memory limit is present > > > This state is being exposed by the Java `Metrics` API class using the new (still JDK internal) `isContainerized()` method. Example: > > > java -XshowSettings:system --version > Operating System Metrics: > Provider: cgroupv1 > System not containerized. > openjdk 23-internal 2024-09-17 > OpenJDK Runtime Environment (fastdebug build 23-internal-adhoc.sgehwolf.jdk-jdk) > OpenJDK 64-Bit Server VM (fastdebug build 23-internal-adhoc.sgehwolf.jdk-jdk, mixed mode, sharing) > > > The basic property this is being built on is the observation that the cgroup controllers typically get mounted read only into containers. Note that the current container tests assert that `OSContainer::is_containerized() == true` in various tests. Therefore, using the heuristic of "is any memory or cpu limit present" isn't sufficient. I had considered that in an earlier iteration, but many container tests failed. > > Overall, I think, with this patch we improve the current situation of claiming a containerized system being present when it's actually just a regular Linux system. > > Testing: > > - [x] GHA (risc-v failure seems infra related) > - [x] Container tests on Linux x86_64 of cgroups v1 and cgroups v2 (including gtests) > - [x] Some manual testing using cri-o > > Thoughts? Severin Gehwolf has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 13 commits: - Merge branch 'master' into jdk-8261242-is-containerized-fix - Add doc for mountinfo scanning. - Unify naming of variables - Merge branch 'master' into jdk-8261242-is-containerized-fix - Merge branch 'master' into jdk-8261242-is-containerized-fix - jcheck fixes - Fix tests - Implement Metrics.isContainerized() - Some clean-up - Drop cgroups testing on plain Linux - ... and 3 more: https://git.openjdk.org/jdk/compare/40b2fbd8...02884c70 ------------- Changes: https://git.openjdk.org/jdk/pull/18201/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=18201&range=03 Stats: 406 lines in 19 files changed: 301 ins; 78 del; 27 mod Patch: https://git.openjdk.org/jdk/pull/18201.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18201/head:pull/18201 PR: https://git.openjdk.org/jdk/pull/18201 From jwilhelm at openjdk.org Fri Jun 7 13:20:13 2024 From: jwilhelm at openjdk.org (Jesper Wilhelmsson) Date: Fri, 7 Jun 2024 13:20:13 GMT Subject: RFR: 8333566: Remove unused methods In-Reply-To: <3vCVNjjgFzEw_YgoboTLXemBuzwht9uEeZxC8yg5Zog=.a7600184-c4ca-4a93-8b76-74df7bdff405@github.com> References: <3vCVNjjgFzEw_YgoboTLXemBuzwht9uEeZxC8yg5Zog=.a7600184-c4ca-4a93-8b76-74df7bdff405@github.com> Message-ID: On Tue, 4 Jun 2024 20:51:52 GMT, Cesar Soares Lucas wrote: > Please, consider this patch to remove unused methods from the code base. To the best of my knowledge, these methods are only defined but never used. > > Here is a list with names of delete methods: https://gist.github.com/JohnTortugo/fccc29781a1b584c03162aa4e160e874 > > Tested with Linux x86_64 tier1-4, GHA, and only cross building to other platforms. This seems to me like a classic case of not reading the instructions. @JohnTortugo, please read the [OpenJDK Developers' Guide](https://openjdk.org/guide/) before posting a PR with a huge change. There are sections in there that covers this exact case. The section [Contributing to an OpenJDK Project](https://openjdk.org/guide/#contributing-to-an-openjdk-project) really is mandatory reading before doing anything in OpenJDK. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19550#issuecomment-2154820179 From lucy at openjdk.org Fri Jun 7 13:23:15 2024 From: lucy at openjdk.org (Lutz Schmidt) Date: Fri, 7 Jun 2024 13:23:15 GMT Subject: RFR: 8333382: [s390x] Enhance popcnt Instruction to use Z15 facilities In-Reply-To: References: Message-ID: <20_PnuqseJW4aCapf90oXAvkdZs7VD_wA7Cw56odA-w=.4237eb1a-bc19-4990-94eb-1ce83357c153@github.com> On Sat, 1 Jun 2024 13:15:45 GMT, Amit Kumar wrote: > We need to move popcnt instruction implementation out of s390.ad file as it is required to be required some methods present in [JDK-8331126.](https://bugs.openjdk.org/browse/JDK-8331126) > > > When the miscellaneous-instruction-extensions facility 3 is not installed or bit 0 of the M3 > field is zero, a count of the number of one bits in each of the eight bytes of general register > R2 is placed into the corresponding byte of general register R1. Each byte of general register > R1 is an 8-bit binary integer in the range of 0-8. > > > > When the miscellaneous-instruction-extensions facility 3 is installed and bit 0 of the M3 field > is one, a count of the total number of one bits in the 64-bit general register R2 is placed into > general register R1. The result is a 64-bit unsigned integer in the range 0 to 64. > > > Performed tier1 test on fastdebug build and didn't see any regression. Changes requested by lucy (Reviewer). src/hotspot/cpu/s390/assembler_s390.hpp line 3095: > 3093: // Ppopulation count intrinsics. > 3094: inline void z_flogr(Register r1, Register r2); // find leftmost one > 3095: inline void z_popcnt(Register r1, Register r2, int64_t m3 = 0); // population count I do not like optional parameters if not urgently required. Why not always pass a mask value (0 or 8)? src/hotspot/cpu/s390/macroAssembler_s390.cpp line 5811: > 5809: > 5810: if (VM_Version::has_MiscInstrExt3()) { > 5811: z_llgfr(r_src, r_src); You should not modify the src register. Register allocation might not expect that. Copy into r_dst instead. src/hotspot/cpu/s390/macroAssembler_s390.cpp line 5815: > 5813: } else { > 5814: > 5815: #ifdef ASSERT You don't need this #ifdef if all enclosed code is just assert(). src/hotspot/cpu/s390/macroAssembler_s390.cpp line 5839: > 5837: } else { > 5838: > 5839: #ifdef ASSERT You don't need this #ifdef if all enclosed code is just assert(). src/hotspot/cpu/s390/s390.ad line 10692: > 10690: > 10691: // Prefer compile-time assertion over run-time SIGILL. > 10692: assert(VM_Version::has_PopCount(), "bad predicate for popCountI_Ext3"); Duplicate check! src/hotspot/cpu/s390/s390.ad line 10714: > 10712: > 10713: // Prefer compile-time assertion over run-time SIGILL. > 10714: assert(VM_Version::has_PopCount(), "bad predicate for popCountL_Ext3"); Duplicate check! src/hotspot/cpu/s390/s390.ad line 10735: > 10733: > 10734: // Prefer compile-time assertion over run-time SIGILL. > 10735: assert(VM_Version::has_PopCount(), "bad predicate for popCountI"); Duplicate check! But you may want to check for different registers (if you don't trust the effect(). src/hotspot/cpu/s390/s390.ad line 10757: > 10755: > 10756: // Prefer compile-time assertion over run-time SIGILL. > 10757: assert(VM_Version::has_PopCount(), "bad predicate for popCountL"); Duplicate check! But you may want to check for different registers (if you don't trust the effect(). ------------- PR Review: https://git.openjdk.org/jdk/pull/19509#pullrequestreview-2104342546 PR Review Comment: https://git.openjdk.org/jdk/pull/19509#discussion_r1631083645 PR Review Comment: https://git.openjdk.org/jdk/pull/19509#discussion_r1631082265 PR Review Comment: https://git.openjdk.org/jdk/pull/19509#discussion_r1631173505 PR Review Comment: https://git.openjdk.org/jdk/pull/19509#discussion_r1631173942 PR Review Comment: https://git.openjdk.org/jdk/pull/19509#discussion_r1631180515 PR Review Comment: https://git.openjdk.org/jdk/pull/19509#discussion_r1631181798 PR Review Comment: https://git.openjdk.org/jdk/pull/19509#discussion_r1631182531 PR Review Comment: https://git.openjdk.org/jdk/pull/19509#discussion_r1631190032 From jsjolen at openjdk.org Fri Jun 7 13:35:15 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Fri, 7 Jun 2024 13:35:15 GMT Subject: RFR: 8333658: NMT: Use an allocator with 4-byte pointers to save memory in NativeCallStackStorage [v9] In-Reply-To: References: Message-ID: On Thu, 6 Jun 2024 13:43:35 GMT, Thomas Stuefe wrote: >> Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: >> >> Add include guards > > src/hotspot/share/nmt/indexedFreeListAllocator.hpp line 75: > >> 73: E& translate(I i) { >> 74: return backing_storage.at(i.idx).e; >> 75: } > > - I'd just call this function "at" as we usually do. > - I also would provide a const variant that exposes const E&. > - Please assert for i to be correct (not nil, not oob) Out-of-bounds should be taken care of by GrowableArray, luckily. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18979#discussion_r1631210301 From jvernee at openjdk.org Fri Jun 7 13:35:15 2024 From: jvernee at openjdk.org (Jorn Vernee) Date: Fri, 7 Jun 2024 13:35:15 GMT Subject: RFR: 8325984: 4 jcstress tests are failing in Tier6 4 times each In-Reply-To: References: Message-ID: On Thu, 6 Jun 2024 10:48:51 GMT, Aleksey Shipilev wrote: >> These 4 tests were failing due to an incompatibility with jcstress. They were problemlisted in past (https://bugs.openjdk.org/browse/JDK-8326062). >> >> Now that jcstress has been updated (https://github.com/openjdk/jdk/pull/19332) with the relevant fix (https://github.com/openjdk/jcstress/pull/147), we can re-enable these tests. >> >> Testing: I've verified that all 4 tests now pass on Linux-x64 > > I think only Oracle CIs run these tests through jtreg wrappers? Anyway, this looks good to me. @shipilev Do you think this is trivial enough for one reviewer? ------------- PR Comment: https://git.openjdk.org/jdk/pull/19565#issuecomment-2154849759 From amitkumar at openjdk.org Fri Jun 7 13:48:13 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Fri, 7 Jun 2024 13:48:13 GMT Subject: RFR: 8333382: [s390x] Enhance popcnt Instruction to use Z15 facilities In-Reply-To: <20_PnuqseJW4aCapf90oXAvkdZs7VD_wA7Cw56odA-w=.4237eb1a-bc19-4990-94eb-1ce83357c153@github.com> References: <20_PnuqseJW4aCapf90oXAvkdZs7VD_wA7Cw56odA-w=.4237eb1a-bc19-4990-94eb-1ce83357c153@github.com> Message-ID: On Fri, 7 Jun 2024 13:18:35 GMT, Lutz Schmidt wrote: >> We need to move popcnt instruction implementation out of s390.ad file as it is required to be required some methods present in [JDK-8331126.](https://bugs.openjdk.org/browse/JDK-8331126) >> >> >> When the miscellaneous-instruction-extensions facility 3 is not installed or bit 0 of the M3 >> field is zero, a count of the number of one bits in each of the eight bytes of general register >> R2 is placed into the corresponding byte of general register R1. Each byte of general register >> R1 is an 8-bit binary integer in the range of 0-8. >> >> >> >> When the miscellaneous-instruction-extensions facility 3 is installed and bit 0 of the M3 field >> is one, a count of the total number of one bits in the 64-bit general register R2 is placed into >> general register R1. The result is a 64-bit unsigned integer in the range 0 to 64. >> >> >> Performed tier1 test on fastdebug build and didn't see any regression. > > src/hotspot/cpu/s390/s390.ad line 10757: > >> 10755: >> 10756: // Prefer compile-time assertion over run-time SIGILL. >> 10757: assert(VM_Version::has_PopCount(), "bad predicate for popCountL"); > > Duplicate check! But you may want to check for different registers (if you don't trust the effect(). I can add assert. But there is already one assert to check this in `macroAssembler_s390.cpp` file. Do you want one more here as well ? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19509#discussion_r1631239851 From lucy at openjdk.org Fri Jun 7 14:07:15 2024 From: lucy at openjdk.org (Lutz Schmidt) Date: Fri, 7 Jun 2024 14:07:15 GMT Subject: RFR: 8333382: [s390x] Enhance popcnt Instruction to use Z15 facilities In-Reply-To: References: <20_PnuqseJW4aCapf90oXAvkdZs7VD_wA7Cw56odA-w=.4237eb1a-bc19-4990-94eb-1ce83357c153@github.com> Message-ID: On Fri, 7 Jun 2024 13:45:20 GMT, Amit Kumar wrote: >> src/hotspot/cpu/s390/s390.ad line 10757: >> >>> 10755: >>> 10756: // Prefer compile-time assertion over run-time SIGILL. >>> 10757: assert(VM_Version::has_PopCount(), "bad predicate for popCountL"); >> >> Duplicate check! But you may want to check for different registers (if you don't trust the effect(). > > I can add assert. But there is already one assert to check this in `macroAssembler_s390.cpp` file. Do you want one more here as well ? Ahhh sorry. Did not cross-check. It is OK then. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19509#discussion_r1631273302 From rehn at openjdk.org Fri Jun 7 14:27:15 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Fri, 7 Jun 2024 14:27:15 GMT Subject: RFR: 8332689: RISC-V: Use load instead of trampolines [v7] In-Reply-To: References: <3UcoOmZTsZJBu0ZkOZQ21-ynr1slrOGCDuWYFnJQz1U=.6467ec54-cd28-4aae-a90c-6b6858f37986@github.com> Message-ID: On Thu, 6 Jun 2024 17:56:22 GMT, Hamlin Li wrote: >> Robbin Ehn has updated the pull request incrementally with one additional commit since the last revision: >> >> Remove tmp file > > src/hotspot/cpu/riscv/macroAssembler_riscv.hpp line 1571: > >> 1569: }; >> 1570: >> 1571: enum NativeShortCall { > > Thanks for moving these into a separate name space, looks much better. > > Seems the naming convention of enum is with "_", not sure if we need to stick to it. NativeShortCall also looks good. The common case should be e.g: `enum RoundingMode `, `enum SizePolicyTrueValues` But due to historical reason these NativeInst 'constants' use e.g.: `enum Intel_specific_constants` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19453#discussion_r1631301509 From amitkumar at openjdk.org Fri Jun 7 14:29:38 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Fri, 7 Jun 2024 14:29:38 GMT Subject: RFR: 8333382: [s390x] Enhance popcnt Instruction to use Z15 facilities [v2] In-Reply-To: References: Message-ID: > We need to move popcnt instruction implementation out of s390.ad file as it is required to be required some methods present in [JDK-8331126.](https://bugs.openjdk.org/browse/JDK-8331126) > > > When the miscellaneous-instruction-extensions facility 3 is not installed or bit 0 of the M3 > field is zero, a count of the number of one bits in each of the eight bytes of general register > R2 is placed into the corresponding byte of general register R1. Each byte of general register > R1 is an 8-bit binary integer in the range of 0-8. > > > > When the miscellaneous-instruction-extensions facility 3 is installed and bit 0 of the M3 field > is one, a count of the total number of one bits in the 64-bit general register R2 is placed into > general register R1. The result is a 64-bit unsigned integer in the range 0 to 64. > > > Performed tier1 test on fastdebug build and didn't see any regression. Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: comments from Lutz ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19509/files - new: https://git.openjdk.org/jdk/pull/19509/files/10960c2f..cbbdef54 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19509&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19509&range=00-01 Stats: 14 lines in 3 files changed: 0 ins; 9 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/19509.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19509/head:pull/19509 PR: https://git.openjdk.org/jdk/pull/19509 From amitkumar at openjdk.org Fri Jun 7 14:29:38 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Fri, 7 Jun 2024 14:29:38 GMT Subject: RFR: 8333382: [s390x] Enhance popcnt Instruction to use Z15 facilities [v2] In-Reply-To: References: <20_PnuqseJW4aCapf90oXAvkdZs7VD_wA7Cw56odA-w=.4237eb1a-bc19-4990-94eb-1ce83357c153@github.com> Message-ID: On Fri, 7 Jun 2024 14:04:56 GMT, Lutz Schmidt wrote: >> I can add assert. But there is already one assert to check this in `macroAssembler_s390.cpp` file. Do you want one more here as well ? > > Ahhh sorry. Did not cross-check. It is OK then. Done :-) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19509#discussion_r1631304743 From shade at openjdk.org Fri Jun 7 14:31:14 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 7 Jun 2024 14:31:14 GMT Subject: RFR: 8325984: 4 jcstress tests are failing in Tier6 4 times each In-Reply-To: References: Message-ID: <_uQZcxrXOnPXBMLzSk9K2itTY0HTrWILaXe_Knl1EVU=.4775cf80-a348-406c-a2ba-39350315cad8@github.com> On Wed, 5 Jun 2024 19:21:56 GMT, Jorn Vernee wrote: > These 4 tests were failing due to an incompatibility with jcstress. They were problemlisted in past (https://bugs.openjdk.org/browse/JDK-8326062). > > Now that jcstress has been updated (https://github.com/openjdk/jdk/pull/19332) with the relevant fix (https://github.com/openjdk/jcstress/pull/147), we can re-enable these tests. > > Testing: I've verified that all 4 tests now pass on Linux-x64 I think this is fine and trivial. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19565#issuecomment-2154964073 From amitkumar at openjdk.org Fri Jun 7 14:41:12 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Fri, 7 Jun 2024 14:41:12 GMT Subject: RFR: 8331117: [PPC64] secondary_super_cache does not scale well [v2] In-Reply-To: References: Message-ID: <4oTnnVeBbxCTfBDoQnldpIyHh8GlPcjXwVlmaPQPrrw=.5243b504-e336-4ff2-bb59-525766d78a34@github.com> On Tue, 28 May 2024 14:04:13 GMT, Martin Doerr wrote: >> Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix bit test and add assertion for array lenght. > > Performance seems to be not affected by that bug. Note that I have used https://github.com/openjdk/jdk/pull/19427 to run TypePollution micro benchmarks. @TheRealMDoerr I got one test failure on PPC with these changes: diff --git a/src/hotspot/share/runtime/globals.hpp b/src/hotspot/share/runtime/globals.hpp index 6bfb260606b..70897a1066e 100644 --- a/src/hotspot/share/runtime/globals.hpp +++ b/src/hotspot/share/runtime/globals.hpp @@ -1988,13 +1988,13 @@ const int ObjectAlignmentInBytes = 8; "rewriting/transformation independently of the JVMTI " \ "can_{retransform/redefine}_classes capabilities.") \ \ - product(bool, UseSecondarySupersCache, true, DIAGNOSTIC, \ + product(bool, UseSecondarySupersCache, false, DIAGNOSTIC, \ "Use secondary supers cache during subtype checks.") \ \ - product(bool, UseSecondarySupersTable, false, DIAGNOSTIC, \ + product(bool, UseSecondarySupersTable, true, DIAGNOSTIC, \ "Use hash table to lookup secondary supers.") \ \ - product(bool, VerifySecondarySupers, false, DIAGNOSTIC, \ + product(bool, VerifySecondarySupers, true, DIAGNOSTIC, \ "Check that linear and hashed secondary lookups return the same result.") \ \ product(bool, StressSecondarySupers, false, DIAGNOSTIC, \ ============================== Test summary ============================== TEST TOTAL PASS FAIL ERROR jtreg:./test/hotspot/jtreg/compiler/c2/irTests/ProfileAtTypeCheck.java >> 1 0 1 0 << ============================== TEST FAILURE But if I revert the changes I had done, then it passes. Same situation I'm facing on s390x. Is this expected ? failure log: [type_profile_failure.log](https://github.com/user-attachments/files/15741205/type_profile_failure.log) ------------- PR Comment: https://git.openjdk.org/jdk/pull/19368#issuecomment-2154983693 From kvn at openjdk.org Fri Jun 7 15:01:17 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 7 Jun 2024 15:01:17 GMT Subject: RFR: 8329031: CPUID feature detection for Advanced Performance Extensions =?UTF-8?B?KEludGVswq4=?= APX) [v3] In-Reply-To: References: Message-ID: <8U7x8J0qcT9KdK3Tah56KLzt-zi5NRuQDpJjsoKOpeE=.b43eac87-2ab5-404e-90ee-d04a6104a22c@github.com> On Fri, 7 Jun 2024 03:50:50 GMT, Jatin Bhateja wrote: >> src/hotspot/cpu/x86/vm_version_x86.cpp line 443: >> >>> 441: >>> 442: /* FIXME: Uncomment while integrating JDK-8329032 >>> 443: bool save_apx = UseAPX; >> >> What are you missing to uncomment this code? >> 8329032 is about `.ad` file changes. It should not affect execution of this code. >> You need changes in `register_x86.*` files and may be somewhere else but you don't need C2 changes for this code to work. > > Yes, we already have that in place with https://github.com/openjdk/jdk/pull/19042, which will be open for review after this patch. I added it in comments since this piece of logic is centered around CPUID feature check and pertinent to this patch. Okay. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18562#discussion_r1631347341 From kvn at openjdk.org Fri Jun 7 15:13:14 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 7 Jun 2024 15:13:14 GMT Subject: RFR: 8329031: CPUID feature detection for Advanced Performance Extensions =?UTF-8?B?KEludGVswq4=?= APX) [v3] In-Reply-To: References: Message-ID: <4X90ounLkky3ETOC0PMSFTfaqGWIPATNVAGr03Ig4OY=.42c3d5ee-8c06-4e95-9392-b19d021f1621@github.com> On Fri, 7 Jun 2024 02:16:27 GMT, Jatin Bhateja wrote: >> Summary of changes include with the patch:- >> >> 1) CPUID based feature detection check for Intel APX extension (https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html) >> 2) Validation during VM initialization for extended GPRs state save / restoration by OS across context switches of java application threads executing JIT compiled code with new APX ISA. >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Review comments addressed. src/hotspot/cpu/x86/vm_version_x86.cpp line 882: > 880: > 881: void VM_Version::report_apx_state_restore_warning() { > 882: tty->print("warning: Unsuccessful EGPRs state restoration across signal handling, setting UseAPX to false.\n"); This print is fine during development but I would instead save some value in memory to indicate that OS does not save/restore APX. And then check it after we execute this assembler code. Similar how we do that for AVX. You would not need to do runtime call and this method then. Note: `tty->print()` can do "nasty"/unexpected things which you want to avoid. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18562#discussion_r1631362108 From mdoerr at openjdk.org Fri Jun 7 15:20:13 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Fri, 7 Jun 2024 15:20:13 GMT Subject: RFR: 8331117: [PPC64] secondary_super_cache does not scale well [v2] In-Reply-To: References: Message-ID: On Wed, 29 May 2024 07:36:52 GMT, Andrew Haley wrote: >> Performance seems to be not affected by that bug. Note that I have used https://github.com/openjdk/jdk/pull/19427 to run TypePollution micro benchmarks. > >> Performance seems to be not affected by that bug. > > That is extremely suspicious. That doesn't look like a platform specific thing. I'm getting the same result on x86_64. @theRealAph: Is that a known limitation or is it worth a new JBS issue? ------------- PR Comment: https://git.openjdk.org/jdk/pull/19368#issuecomment-2155051645 From jvernee at openjdk.org Fri Jun 7 15:44:16 2024 From: jvernee at openjdk.org (Jorn Vernee) Date: Fri, 7 Jun 2024 15:44:16 GMT Subject: Integrated: 8325984: 4 jcstress tests are failing in Tier6 4 times each In-Reply-To: References: Message-ID: On Wed, 5 Jun 2024 19:21:56 GMT, Jorn Vernee wrote: > These 4 tests were failing due to an incompatibility with jcstress. They were problemlisted in past (https://bugs.openjdk.org/browse/JDK-8326062). > > Now that jcstress has been updated (https://github.com/openjdk/jdk/pull/19332) with the relevant fix (https://github.com/openjdk/jcstress/pull/147), we can re-enable these tests. > > Testing: I've verified that all 4 tests now pass on Linux-x64 This pull request has now been integrated. Changeset: ee82346b Author: Jorn Vernee URL: https://git.openjdk.org/jdk/commit/ee82346bd5ecf3024d6dc7b7529598099483a42c Stats: 5 lines in 1 file changed: 0 ins; 5 del; 0 mod 8325984: 4 jcstress tests are failing in Tier6 4 times each Reviewed-by: shade ------------- PR: https://git.openjdk.org/jdk/pull/19565 From ccheung at openjdk.org Fri Jun 7 16:11:36 2024 From: ccheung at openjdk.org (Calvin Cheung) Date: Fri, 7 Jun 2024 16:11:36 GMT Subject: RFR: 8330198: Add some class loading related perf counters to measure VM startup [v8] In-Reply-To: References: Message-ID: <-pZLP33SmfOP-3djnyabR8eaNZqVem1sLJYWD7412Qc=.4c84c9ab-392e-4b03-a0c0-e8a9a679999b@github.com> > Adding a few perf counters related to class loading to measure VM startup. The counters are only active if the user specifies `-Xlog:init` in the command line. A diagnostic flag `ProfileClassLinkage` is added to control the new counters. The flag is set to false by default and will be enabled if `-Xlog:init` is specified. > > This change is already in the leyden/premain branch. There are more counters in the branch to measure other stuff. For now, just upstreaming class loader related counters. > > Refer to the [comment](https://bugs.openjdk.org/browse/JDK-8330198?focusedId=14665311&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14665311) in the bug report for an example output. > > Passed tiers 1 - 4 testing. Calvin Cheung has updated the pull request incrementally with one additional commit since the last revision: remove Arguments::perf_class_link() ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18790/files - new: https://git.openjdk.org/jdk/pull/18790/files/4c224f55..c62f5e4f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18790&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18790&range=06-07 Stats: 30 lines in 6 files changed: 1 ins; 11 del; 18 mod Patch: https://git.openjdk.org/jdk/pull/18790.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18790/head:pull/18790 PR: https://git.openjdk.org/jdk/pull/18790 From ccheung at openjdk.org Fri Jun 7 16:11:36 2024 From: ccheung at openjdk.org (Calvin Cheung) Date: Fri, 7 Jun 2024 16:11:36 GMT Subject: RFR: 8330198: Add some class loading related perf counters to measure VM startup [v3] In-Reply-To: References: <7yfsvM0ff6gBYLefpro2qTcEMBmCOHd3YICcygItlZs=.d900a439-4932-46e6-b287-d1bf2789f195@github.com> <24n-bMIYvAF10yob4_Z5t1DPG_nrWypoQYE25zQ499U=.1a77aa5e-1005-4d32-9ae7-ec745838e449@github.com> <0gZ9cLiP3bLH1LNL71dwnuaUY5iN-Ewea-Qyw7eGe44=.f8fa485f-5947-47fc-a8eb-cafd1c56165d@github.com> <0B3DXYpRB8P6bEQP2ACupaLG9RRAfEe3PflYvpE3ORs=.4a2b8b09-2743-48ac-baff-f2fc6df3944b@github.com> Message-ID: On Fri, 7 Jun 2024 02:53:18 GMT, Ioi Lam wrote: > The latest version looks good to me. My only suggestion is to use `log_is_enabled(Info, perf, class, link)` directly, because it's very efficient. > Thanks for checking it. I also did some perf testing using `log_is_enabled(Info, perf, class, link)` and the results are similar to using `Arguments::perf_class_link()`. So I've pushed another commit based on your suggestion. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18790#discussion_r1631427524 From sgibbons at openjdk.org Fri Jun 7 17:05:33 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Fri, 7 Jun 2024 17:05:33 GMT Subject: Integrated: 8320448: Accelerate IndexOf using AVX2 In-Reply-To: References: Message-ID: On Tue, 21 Nov 2023 00:06:19 GMT, Scott Gibbons wrote: > Re-write the IndexOf code without the use of the pcmpestri instruction, only using AVX2 instructions. This change accelerates String.IndexOf on average 1.3x for AVX2. The benchmark numbers: > > > Benchmark Score Latest > StringIndexOf.advancedWithMediumSub 343.573 317.934 0.925375393x > StringIndexOf.advancedWithShortSub1 1039.081 1053.96 1.014319384x > StringIndexOf.advancedWithShortSub2 55.828 110.541 1.980027943x > StringIndexOf.constantPattern 9.361 11.906 1.271872663x > StringIndexOf.searchCharLongSuccess 4.216 4.218 1.000474383x > StringIndexOf.searchCharMediumSuccess 3.133 3.216 1.02649218x > StringIndexOf.searchCharShortSuccess 3.76 3.761 1.000265957x > StringIndexOf.success 9.186 9.713 1.057369911x > StringIndexOf.successBig 14.341 46.343 3.231504079x > StringIndexOfChar.latin1_AVX2_String 6220.918 12154.52 1.953814533x > StringIndexOfChar.latin1_AVX2_char 5503.556 5540.044 1.006629895x > StringIndexOfChar.latin1_SSE4_String 6978.854 6818.689 0.977049957x > StringIndexOfChar.latin1_SSE4_char 5657.499 5474.624 0.967675646x > StringIndexOfChar.latin1_Short_String 7132.541 6863.359 0.962260014x > StringIndexOfChar.latin1_Short_char 16013.389 16162.437 1.009307711x > StringIndexOfChar.latin1_mixed_String 7386.123 14771.622 1.999915517x > StringIndexOfChar.latin1_mixed_char 9901.671 9782.245 0.987938803 This pull request has now been integrated. Changeset: 8e72d7cf Author: Scott Gibbons Committer: Jatin Bhateja URL: https://git.openjdk.org/jdk/commit/8e72d7cf8e7dfc7eb9e66bc562f125f947e37f49 Stats: 3906 lines in 16 files changed: 3876 ins; 0 del; 30 mod 8320448: Accelerate IndexOf using AVX2 Reviewed-by: epeter, kvn, sviswanathan ------------- PR: https://git.openjdk.org/jdk/pull/16753 From aph at openjdk.org Fri Jun 7 17:39:15 2024 From: aph at openjdk.org (Andrew Haley) Date: Fri, 7 Jun 2024 17:39:15 GMT Subject: RFR: 8333382: [s390x] Enhance popcnt Instruction to use Z15 facilities [v2] In-Reply-To: References: Message-ID: On Fri, 7 Jun 2024 14:29:38 GMT, Amit Kumar wrote: >> We need to move popcnt instruction implementation out of s390.ad file as it is required to be required some methods present in [JDK-8331126.](https://bugs.openjdk.org/browse/JDK-8331126) >> >> >> When the miscellaneous-instruction-extensions facility 3 is not installed or bit 0 of the M3 >> field is zero, a count of the number of one bits in each of the eight bytes of general register >> R2 is placed into the corresponding byte of general register R1. Each byte of general register >> R1 is an 8-bit binary integer in the range of 0-8. >> >> >> >> When the miscellaneous-instruction-extensions facility 3 is installed and bit 0 of the M3 field >> is one, a count of the total number of one bits in the 64-bit general register R2 is placed into >> general register R1. The result is a 64-bit unsigned integer in the range 0 to 64. >> >> >> Performed tier1 test on fastdebug build and didn't see any regression. > > Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: > > comments from Lutz The benchmark should be included in this patch. ------------- PR Review: https://git.openjdk.org/jdk/pull/19509#pullrequestreview-2105071322 From szaldana at openjdk.org Fri Jun 7 17:39:20 2024 From: szaldana at openjdk.org (Sonia Zaldana Calles) Date: Fri, 7 Jun 2024 17:39:20 GMT Subject: Integrated: 8330420: Inverted use of DisplayVMOutputToStderr in ostream_exit In-Reply-To: References: Message-ID: On Mon, 22 Apr 2024 18:31:31 GMT, Sonia Zaldana Calles wrote: > Hi all, > > This PR addresses the inverted clauses in the in ostream_exit. > > Thanks, > Sonia This pull request has now been integrated. Changeset: 512b2b4f Author: Sonia Zaldana Calles Committer: Thomas Stuefe URL: https://git.openjdk.org/jdk/commit/512b2b4f141f9a202984150b0427372e1a409a50 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod 8330420: Inverted use of DisplayVMOutputToStderr in ostream_exit Reviewed-by: jsjolen, stuefe ------------- PR: https://git.openjdk.org/jdk/pull/18897 From stuefe at openjdk.org Fri Jun 7 17:40:18 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Fri, 7 Jun 2024 17:40:18 GMT Subject: RFR: 8333775: Small improvement to outputStream auto-indentation mode Message-ID: Almost trivial enhancement. [JDK-8333211](https://bugs.openjdk.org/browse/JDK-8333211) added automatic indentation. Some changes to complement that: - let outputStream::set_autoindent() return the old value for later restoration - add an RAII object to enable autoindent and restore the old state when leaving. ------------- Commit messages: - JDK-8333775-Small-improvement-to-outputStream-auto-indentation-mode Changes: https://git.openjdk.org/jdk/pull/19592/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19592&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8333775 Stats: 35 lines in 5 files changed: 20 ins; 9 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/19592.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19592/head:pull/19592 PR: https://git.openjdk.org/jdk/pull/19592 From duke at openjdk.org Fri Jun 7 17:49:34 2024 From: duke at openjdk.org (Elif Aslan) Date: Fri, 7 Jun 2024 17:49:34 GMT Subject: RFR: 8312412: Uninitialized klassVtable::_verify_count field Message-ID: Initilizating Uninitialized klassVtable::_verify_count field ------------- Commit messages: - Initialize verify_count Changes: https://git.openjdk.org/jdk/pull/19602/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19602&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8312412 Stats: 3 lines in 1 file changed: 3 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/19602.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19602/head:pull/19602 PR: https://git.openjdk.org/jdk/pull/19602 From shade at openjdk.org Fri Jun 7 18:20:11 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 7 Jun 2024 18:20:11 GMT Subject: RFR: 8312412: Uninitialized klassVtable::_verify_count field In-Reply-To: References: Message-ID: On Fri, 7 Jun 2024 17:42:12 GMT, Elif Aslan wrote: > [JDK-8312412](https://bugs.openjdk.org/browse/JDK-8312412) > > Initilizating Uninitialized klassVtable::_verify_count field, without proper initialization, it could lead to incorrect verification logic or misleading debug information. > Tested locally Looks good to me. ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19602#pullrequestreview-2105131981 From shade at openjdk.org Fri Jun 7 18:53:11 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 7 Jun 2024 18:53:11 GMT Subject: RFR: 8312412: Uninitialized klassVtable::_verify_count field In-Reply-To: References: Message-ID: On Fri, 7 Jun 2024 17:42:12 GMT, Elif Aslan wrote: > [JDK-8312412](https://bugs.openjdk.org/browse/JDK-8312412) > > Initilizating Uninitialized klassVtable::_verify_count field, without proper initialization, it could lead to incorrect verification logic or misleading debug information. > Tested locally A normal guideline is to wait 24 hours for everyone to be able to look at the change, unless reviewers think this is trivial. I do believe this is trivial, but let someone else look at this too. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19602#issuecomment-2155349131 From cslucas at openjdk.org Fri Jun 7 19:19:19 2024 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Fri, 7 Jun 2024 19:19:19 GMT Subject: RFR: 8333566: Remove unused methods In-Reply-To: <3vCVNjjgFzEw_YgoboTLXemBuzwht9uEeZxC8yg5Zog=.a7600184-c4ca-4a93-8b76-74df7bdff405@github.com> References: <3vCVNjjgFzEw_YgoboTLXemBuzwht9uEeZxC8yg5Zog=.a7600184-c4ca-4a93-8b76-74df7bdff405@github.com> Message-ID: On Tue, 4 Jun 2024 20:51:52 GMT, Cesar Soares Lucas wrote: > Please, consider this patch to remove unused methods from the code base. To the best of my knowledge, these methods are only defined but never used. > > Here is a list with names of delete methods: https://gist.github.com/JohnTortugo/fccc29781a1b584c03162aa4e160e874 > > Tested with Linux x86_64 tier1-4, GHA, and only cross building to other platforms. Closing this as not relevant. Thank you for the reviews. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19550#issuecomment-2155383826 From cslucas at openjdk.org Fri Jun 7 19:19:19 2024 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Fri, 7 Jun 2024 19:19:19 GMT Subject: Withdrawn: 8333566: Remove unused methods In-Reply-To: <3vCVNjjgFzEw_YgoboTLXemBuzwht9uEeZxC8yg5Zog=.a7600184-c4ca-4a93-8b76-74df7bdff405@github.com> References: <3vCVNjjgFzEw_YgoboTLXemBuzwht9uEeZxC8yg5Zog=.a7600184-c4ca-4a93-8b76-74df7bdff405@github.com> Message-ID: On Tue, 4 Jun 2024 20:51:52 GMT, Cesar Soares Lucas wrote: > Please, consider this patch to remove unused methods from the code base. To the best of my knowledge, these methods are only defined but never used. > > Here is a list with names of delete methods: https://gist.github.com/JohnTortugo/fccc29781a1b584c03162aa4e160e874 > > Tested with Linux x86_64 tier1-4, GHA, and only cross building to other platforms. This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/19550 From phh at openjdk.org Fri Jun 7 19:35:16 2024 From: phh at openjdk.org (Paul Hohensee) Date: Fri, 7 Jun 2024 19:35:16 GMT Subject: RFR: 8312412: Uninitialized klassVtable::_verify_count field In-Reply-To: References: Message-ID: On Fri, 7 Jun 2024 17:42:12 GMT, Elif Aslan wrote: > [JDK-8312412](https://bugs.openjdk.org/browse/JDK-8312412) > > Initilizating Uninitialized klassVtable::_verify_count field, without proper initialization, it could lead to incorrect verification logic or misleading debug information. > Tested locally Marked as reviewed by phh (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/19602#pullrequestreview-2105242365 From duke at openjdk.org Fri Jun 7 19:35:16 2024 From: duke at openjdk.org (Elif Aslan) Date: Fri, 7 Jun 2024 19:35:16 GMT Subject: Integrated: 8312412: Uninitialized klassVtable::_verify_count field In-Reply-To: References: Message-ID: On Fri, 7 Jun 2024 17:42:12 GMT, Elif Aslan wrote: > [JDK-8312412](https://bugs.openjdk.org/browse/JDK-8312412) > > Initilizating Uninitialized klassVtable::_verify_count field, without proper initialization, it could lead to incorrect verification logic or misleading debug information. > Tested locally This pull request has now been integrated. Changeset: c37d02ae Author: Elif Aslan Committer: Paul Hohensee URL: https://git.openjdk.org/jdk/commit/c37d02aef38da178fcf56e3c5cccc41cc5175421 Stats: 3 lines in 1 file changed: 3 ins; 0 del; 0 mod 8312412: Uninitialized klassVtable::_verify_count field Reviewed-by: shade, phh ------------- PR: https://git.openjdk.org/jdk/pull/19602 From jbhateja at openjdk.org Sat Jun 8 04:16:24 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Sat, 8 Jun 2024 04:16:24 GMT Subject: RFR: 8329031: CPUID feature detection for Advanced Performance Extensions =?UTF-8?B?KEludGVswq4=?= APX) [v4] In-Reply-To: References: Message-ID: <7FpzzRiVoeGjqOjIxTSfxdudzZcx20q7DcKTTVSWhQA=.60cb64ba-3806-4e51-8296-696bad19720d@github.com> > Summary of changes include with the patch:- > > 1) CPUID based feature detection check for Intel APX extension (https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html) > 2) Validation during VM initialization for extended GPRs state save / restoration by OS across context switches of java application threads executing JIT compiled code with new APX ISA. > > Kindly review and share your feedback. > > Best Regards, > Jatin Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Lazy restored state comparison after OS signal handling. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18562/files - new: https://git.openjdk.org/jdk/pull/18562/files/68df08ce..d8fcde93 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18562&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18562&range=02-03 Stats: 40 lines in 2 files changed: 17 ins; 16 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/18562.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18562/head:pull/18562 PR: https://git.openjdk.org/jdk/pull/18562 From jbhateja at openjdk.org Sat Jun 8 04:16:24 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Sat, 8 Jun 2024 04:16:24 GMT Subject: RFR: 8329031: CPUID feature detection for Advanced Performance Extensions =?UTF-8?B?KEludGVswq4=?= APX) [v3] In-Reply-To: <4X90ounLkky3ETOC0PMSFTfaqGWIPATNVAGr03Ig4OY=.42c3d5ee-8c06-4e95-9392-b19d021f1621@github.com> References: <4X90ounLkky3ETOC0PMSFTfaqGWIPATNVAGr03Ig4OY=.42c3d5ee-8c06-4e95-9392-b19d021f1621@github.com> Message-ID: On Fri, 7 Jun 2024 15:10:57 GMT, Vladimir Kozlov wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> Review comments addressed. > > src/hotspot/cpu/x86/vm_version_x86.cpp line 882: > >> 880: >> 881: void VM_Version::report_apx_state_restore_warning() { >> 882: tty->print("warning: Unsuccessful EGPRs state restoration across signal handling, setting UseAPX to false.\n"); > > This print is fine during development but I would instead save some value in memory to indicate that OS does not save/restore APX. And then check it after we execute this assembler code. Similar how we do that for AVX. > You would not need to do runtime call and this method then. > Note: `tty->print()` can do "nasty"/unexpected things which you want to avoid. Hi @vnkozlov , doing a lazy restored state comparison now to align with existing AVX handling. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18562#discussion_r1631865373 From amitkumar at openjdk.org Sat Jun 8 05:28:51 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Sat, 8 Jun 2024 05:28:51 GMT Subject: RFR: 8333382: [s390x] Enhance popcnt Instruction to use Z15 facilities [v3] In-Reply-To: References: Message-ID: <-UUFYRERNgu8oZjvGj-aO0d44JyNmoy5Mipe45mdffk=.a9439604-f652-4a5c-a013-b67f47713adb@github.com> > We need to move popcnt instruction implementation out of s390.ad file as it is required to be required some methods present in [JDK-8331126.](https://bugs.openjdk.org/browse/JDK-8331126) > > > When the miscellaneous-instruction-extensions facility 3 is not installed or bit 0 of the M3 > field is zero, a count of the number of one bits in each of the eight bytes of general register > R2 is placed into the corresponding byte of general register R1. Each byte of general register > R1 is an 8-bit binary integer in the range of 0-8. > > > > When the miscellaneous-instruction-extensions facility 3 is installed and bit 0 of the M3 field > is one, a count of the total number of one bits in the 64-bit general register R2 is placed into > general register R1. The result is a 64-bit unsigned integer in the range 0 to 64. > > > Performed tier1 test on fastdebug build and didn't see any regression. Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: adds PopCount benchmark ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19509/files - new: https://git.openjdk.org/jdk/pull/19509/files/cbbdef54..00a6df82 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19509&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19509&range=01-02 Stats: 70 lines in 1 file changed: 70 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/19509.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19509/head:pull/19509 PR: https://git.openjdk.org/jdk/pull/19509 From stuefe at openjdk.org Sat Jun 8 06:28:12 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Sat, 8 Jun 2024 06:28:12 GMT Subject: RFR: 8326085: Remove unnecessary UpcallContext constructor In-Reply-To: References: Message-ID: <5WMwi2lT3eM1l2ONsn5P7sUIvsWpKdIkk4wjNMoqR9M=.67a4099f-d6a8-4903-abcb-7748cfb993a7@github.com> On Fri, 26 Apr 2024 17:42:48 GMT, Sonia Zaldana Calles wrote: > Hi all, > > This PR removes the explicit constructor to UpcallContext (hotspot/share/prims/upcallLinker.cpp) that was added as workaround for [8286891](https://bugs.openjdk.org/browse/JDK-8286891). > > The minimum required version of XLC has since been bumped in [8325880](https://bugs.openjdk.org/browse/JDK-8325880), so we can remove this. > > Thanks, > Sonia @JoKern65 tested and its fine. ------------- Marked as reviewed by stuefe (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18982#pullrequestreview-2105779590 From amitkumar at openjdk.org Sat Jun 8 14:06:30 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Sat, 8 Jun 2024 14:06:30 GMT Subject: RFR: 8333382: [s390x] Enhance popcnt Instruction to use Z15 facilities [v4] In-Reply-To: References: Message-ID: > We need to move popcnt instruction implementation out of s390.ad file as it is required to be required some methods present in [JDK-8331126.](https://bugs.openjdk.org/browse/JDK-8331126) > > > When the miscellaneous-instruction-extensions facility 3 is not installed or bit 0 of the M3 > field is zero, a count of the number of one bits in each of the eight bytes of general register > R2 is placed into the corresponding byte of general register R1. Each byte of general register > R1 is an 8-bit binary integer in the range of 0-8. > > > > When the miscellaneous-instruction-extensions facility 3 is installed and bit 0 of the M3 field > is one, a count of the total number of one bits in the 64-bit general register R2 is placed into > general register R1. The result is a 64-bit unsigned integer in the range 0 to 64. > > > Performed tier1 test on fastdebug build and didn't see any regression. Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: removes warmup code, changes time unit and iteration number ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19509/files - new: https://git.openjdk.org/jdk/pull/19509/files/00a6df82..53889759 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19509&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19509&range=02-03 Stats: 16 lines in 1 file changed: 0 ins; 13 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/19509.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19509/head:pull/19509 PR: https://git.openjdk.org/jdk/pull/19509 From kvn at openjdk.org Sat Jun 8 16:17:14 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Sat, 8 Jun 2024 16:17:14 GMT Subject: RFR: 8329031: CPUID feature detection for Advanced Performance Extensions =?UTF-8?B?KEludGVswq4=?= APX) [v4] In-Reply-To: <7FpzzRiVoeGjqOjIxTSfxdudzZcx20q7DcKTTVSWhQA=.60cb64ba-3806-4e51-8296-696bad19720d@github.com> References: <7FpzzRiVoeGjqOjIxTSfxdudzZcx20q7DcKTTVSWhQA=.60cb64ba-3806-4e51-8296-696bad19720d@github.com> Message-ID: <1TZX__fVTCvGNWxwyFfucdxunEKOlYbJJ7Bvw46XDTQ=.5cc7ca7c-4bea-4cf9-b6ef-9c79dff84ef4@github.com> On Sat, 8 Jun 2024 04:16:24 GMT, Jatin Bhateja wrote: >> Summary of changes include with the patch:- >> >> 1) CPUID based feature detection check for Intel APX extension (https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html) >> 2) Validation during VM initialization for extended GPRs state save / restoration by OS across context switches of java application threads executing JIT compiled code with new APX ISA. >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Lazy restored state comparison after OS signal handling. Good. Let me test it. ------------- PR Review: https://git.openjdk.org/jdk/pull/18562#pullrequestreview-2105876232 From kvn at openjdk.org Sat Jun 8 20:46:24 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Sat, 8 Jun 2024 20:46:24 GMT Subject: RFR: 8329031: CPUID feature detection for Advanced Performance Extensions =?UTF-8?B?KEludGVswq4=?= APX) [v4] In-Reply-To: <7FpzzRiVoeGjqOjIxTSfxdudzZcx20q7DcKTTVSWhQA=.60cb64ba-3806-4e51-8296-696bad19720d@github.com> References: <7FpzzRiVoeGjqOjIxTSfxdudzZcx20q7DcKTTVSWhQA=.60cb64ba-3806-4e51-8296-696bad19720d@github.com> Message-ID: On Sat, 8 Jun 2024 04:16:24 GMT, Jatin Bhateja wrote: >> Summary of changes include with the patch:- >> >> 1) CPUID based feature detection check for Intel APX extension (https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html) >> 2) Validation during VM initialization for extended GPRs state save / restoration by OS across context switches of java application threads executing JIT compiled code with new APX ISA. >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Lazy restored state comparison after OS signal handling. My testing passed. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18562#pullrequestreview-2105937938 From jbhateja at openjdk.org Sun Jun 9 00:50:21 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Sun, 9 Jun 2024 00:50:21 GMT Subject: RFR: 8329031: CPUID feature detection for Advanced Performance Extensions =?UTF-8?B?KEludGVswq4=?= APX) [v2] In-Reply-To: References: Message-ID: On Thu, 6 Jun 2024 19:42:52 GMT, Vladimir Kozlov wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> Review comments resolutions. > > And we don't have HW currently. Thanks @vnkozlov and @sviswa7 ------------- PR Comment: https://git.openjdk.org/jdk/pull/18562#issuecomment-2156247123 From jbhateja at openjdk.org Sun Jun 9 00:50:21 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Sun, 9 Jun 2024 00:50:21 GMT Subject: Integrated: 8329031: CPUID feature detection for Advanced Performance Extensions =?UTF-8?B?KEludGVswq4=?= APX) In-Reply-To: References: Message-ID: On Mon, 1 Apr 2024 12:01:27 GMT, Jatin Bhateja wrote: > Summary of changes include with the patch:- > > 1) CPUID based feature detection check for Intel APX extension (https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html) > 2) Validation during VM initialization for extended GPRs state save / restoration by OS across context switches of java application threads executing JIT compiled code with new APX ISA. > > Kindly review and share your feedback. > > Best Regards, > Jatin This pull request has now been integrated. Changeset: a9413973 Author: Jatin Bhateja URL: https://git.openjdk.org/jdk/commit/a941397327972f130e683167a1b429f17603df46 Stats: 195 lines in 8 files changed: 169 ins; 10 del; 16 mod 8329031: CPUID feature detection for Advanced Performance Extensions (Intel? APX) Reviewed-by: sviswanathan, kvn ------------- PR: https://git.openjdk.org/jdk/pull/18562 From qpzhang at openjdk.org Sun Jun 9 01:40:33 2024 From: qpzhang at openjdk.org (Patrick Zhang) Date: Sun, 9 Jun 2024 01:40:33 GMT Subject: RFR: 8324781: runtime/Thread/TestAlwaysPreTouchStacks.java failed with Expected a higher ratio between stack committed and reserved [v10] In-Reply-To: References: Message-ID: On Fri, 7 Jun 2024 09:52:37 GMT, Liming Liu wrote: >> The testcase failed on Oracle CI since JDK-8315923. The root cause is that Oracle CI runs Linux-5.4.17-UEK where the value of MADV_POPULATE_WRITE (23) is used as MADV_DONTEXEC which is not supported by upstream. This PR solves the testcase failure by checking versions of kernels first, and checking the availability of MADV_POPULATE_WRITE when they are not older than 5.14. > > Liming Liu has updated the pull request incrementally with one additional commit since the last revision: > > Guard more madv numbers FYI, https://github.com/oracle/linux-uek/issues/23 has recently fixed the madv flag conflict issue, at the three concerned branches: `UEK7U2`, `UEK6U3`, and `UEK5U5`. - #define MADV_DOEXEC 22 /* do inherit across exec */ - #define MADV_DONTEXEC 23 /* don't inherit across exec */ + #define MADV_DOEXEC 201 /* do inherit across exec */ + #define MADV_DONTEXEC 202 /* don't inherit across exec */ [uek7/u2](https://github.com/oracle/linux-uek/commits/uek7/u2/include/uapi/asm-generic/mman-common.h): https://github.com/oracle/linux-uek/commit/606472268c9ca1edb06b3f0e17477a6b8f229c29 [uek6/u3](https://github.com/oracle/linux-uek/commits/uek6/u3/include/uapi/asm-generic/mman-common.h): https://github.com/oracle/linux-uek/commit/b5974a18d78ec21bc75107b15f8fff8ccb81d19f [uek5/u5](https://github.com/oracle/linux-uek/commits/uek5/u5/include/uapi/asm-generic/mman-common.h): https://github.com/oracle/linux-uek/commit/818b8c8d7993659f12e5254067de2eed5519a349 ------------- PR Comment: https://git.openjdk.org/jdk/pull/18592#issuecomment-2156262397 From dnsimon at openjdk.org Sun Jun 9 15:37:12 2024 From: dnsimon at openjdk.org (Doug Simon) Date: Sun, 9 Jun 2024 15:37:12 GMT Subject: RFR: 8333300: [JVMCI] add support for generational ZGC In-Reply-To: References: Message-ID: On Thu, 30 May 2024 21:58:08 GMT, Vladimir Kozlov wrote: >> This exposes the required values for JVMCI to support generational ZGC. It includes a few things worth mentioning. JVMCI still exports XBarrierSetRuntime as fields in CompilerToVM::Data under the original name of ZBarrierSetRuntime. I have exported the XBarrierSetRuntime and ZBarrierSetRuntime functions as addresses under their actual name. This permits backward compatibility until all the required parts are in place. We can eventually delete the CompilerToVM::Data names. >> >> I added ZBarrierSetRuntime::load_barrier_on_oop_array paralleling XBarrierSetRuntime::load_barrier_on_oop_array as we use that for a vector barrier. I could create the function as part of JVMCIRuntime if there are any concerns about including that in the ZGC core. > > And it broke RISC build based on GHA failure for cross compilation. @vnkozlov @fisk any feedback on changes since you approved? I'd love to merge this asap. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19490#issuecomment-2156657191 From kvn at openjdk.org Sun Jun 9 18:57:18 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Sun, 9 Jun 2024 18:57:18 GMT Subject: RFR: 8333300: [JVMCI] add support for generational ZGC [v3] In-Reply-To: References: Message-ID: On Tue, 4 Jun 2024 16:36:49 GMT, Tom Rodriguez wrote: >> This exposes the required values for JVMCI to support generational ZGC. It includes a few things worth mentioning. JVMCI still exports XBarrierSetRuntime as fields in CompilerToVM::Data under the original name of ZBarrierSetRuntime. I have exported the XBarrierSetRuntime and ZBarrierSetRuntime functions as addresses under their actual name. This permits backward compatibility until all the required parts are in place. We can eventually delete the CompilerToVM::Data names. >> >> I added ZBarrierSetRuntime::load_barrier_on_oop_array paralleling XBarrierSetRuntime::load_barrier_on_oop_array as we use that for a vector barrier. I could create the function as part of JVMCIRuntime if there are any concerns about including that in the ZGC core. > > Tom Rodriguez has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains ten additional commits since the last revision: > > - Merge branch 'tkr-genz' of github.com:tkrodriguez/jdk into tkr-genz > - Fix riscv compilation > - 8333300: [JVMCI] add support for generational ZGC > - Merge branch 'master' into tkr-genz > - Merge branch 'master' into tkr-genz > - Use NativeAccess to read from handles > - Enable support for UseEpsilonGC > - Fix riscv compilation > - 8333300: [JVMCI] add support for generational ZGC Last version is good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19490#pullrequestreview-2106342028 From fyang at openjdk.org Mon Jun 10 03:31:13 2024 From: fyang at openjdk.org (Fei Yang) Date: Mon, 10 Jun 2024 03:31:13 GMT Subject: RFR: 8332689: RISC-V: Use load instead of trampolines [v8] In-Reply-To: <4TVE4G5mkc7U0LI7pj_rd6jbG0vGyFTXWhwX86wA4tg=.3f823c2e-136f-431f-8e06-d61c652f36d2@github.com> References: <4TVE4G5mkc7U0LI7pj_rd6jbG0vGyFTXWhwX86wA4tg=.3f823c2e-136f-431f-8e06-d61c652f36d2@github.com> Message-ID: <_MuwK3i7Ru8buBWkrto34BYlmlfDFPKDBwsp3Vaq2S8=.93e16dca-260a-4e4d-b414-fc66412bd5c7@github.com> On Fri, 7 Jun 2024 12:50:25 GMT, Robbin Ehn wrote: >> Hi all, please consider! >> >> Today we do JAL to **dest** if **dest** is in reach (+/- 1 MB). >> Using a very small application or running very short time we have fast patchable calls. >> But any normal application running longer will increase the code size and code chrun/fragmentation. >> So whatever or not you get hot fast calls rely on luck. >> >> To be patchable and get code cache reach we also emit a stub trampoline which we can point the JAL to. >> This would be the common case for a patchable call. >> >> Code stream: >> JAL >> Stubs: >> AUIPC >> LD >> JALR >> >> >> >> On some CPUs L1D and L1I can't contain the same cache line, which means the tramopline stub can bounce from L1I->L1D->L1I, which is expensive. >> Even if you don't have that problem having a call to a jump is not the fastest way. >> Loading the address avoids the pitsfalls of cmodx. >> >> This patch suggest to solve the problems with trampolines, we take small penalty in the naive case of JAL to **dest**, >> and instead do by default: >> >> Code stream: >> AUIPC >> LD >> JALR >> Stubs: >> >> >> An experimental option for turning trampolines back on exists. >> >> It should be possible to enhanced this with the WIP [Zjid](https://github.com/riscv/riscv-j-extension) by changing the JALR to JAL and nop out the auipc+ld (as the current proposal of Zjid forces the I-fetcher to fetch instruction in order (meaning we will avoid a lot issues which arm has)) when in reach and vice-versa. >> >> Numbers from VF2 (I have done them a few times, they are always overall in favor of this patch): >> >> fop (msec) 2239 | 2128 = 0.950424 >> h2 (msec) 18660 | 16594 = 0.889282 >> jython (msec) 22022 | 21925 = 0.995595 >> luindex (msec) 2866 | 2842 = 0.991626 >> lusearch (msec) 4108 | 4311 = 1.04942 >> lusearch-fix (msec) 4406 | 4116 = 0.934181 >> pmd (msec) 5976 | 5897 = 0.98678 >> jython (msec) 22022 | 21925 = 0.995595 >> Avg: 0.974112 >> fop(xcomp) (msec) 2721 | 2714 = 0.997427 >> h2(xcomp) ... > > Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 13 commits: > > - Merge branch 'master' into 8332689 > - Remove tmp file > - Prepare for dynamic NativeCall size > - Only allow one calling convetion, i.e. fixed sized > - Merge branch 'master' into 8332689 > - Review comments > - Move shart/far code to cpp > - Cleanup > - Merge branch 'master' into 8332689 > - Merge branch 'master' into 8332689 > - ... and 3 more: https://git.openjdk.org/jdk/compare/40b2fbd8...f93f588d > _Mailing list message from [Andrew Haley](mailto:aph-open at littlepinkcloud.com) on [hotspot-dev](mailto:hotspot-dev at mail.openjdk.org):_ > > On 5/29/24 15:28, Robbin Ehn wrote: > > > On some CPUs L1D and L1I can't contain the same cache line, which means > > the tramopline stub can bounce from L1I->L1D->L1I, which is > > expensive. > > Wouldn't it be a lot easier simply to put the target address loaded by the trampoline into the constant pool? Seem to me that will be more cleaner than the current solution (`MacroAssembler::emit_address_stub` which uses `trampoline_stub_Relocation::spec` relocation holder but emits an 'address stub' instead of a real trampline). And I see PPC is putting the entry point as a constant into the constant pool [1] when emitting a call with trampoline stub. [1] [MacroAssembler::emit_address_stub](https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/ppc/ppc.ad#L1308) ------------- PR Comment: https://git.openjdk.org/jdk/pull/19453#issuecomment-2157129523 From stuefe at openjdk.org Mon Jun 10 05:17:28 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Mon, 10 Jun 2024 05:17:28 GMT Subject: RFR: 8330174: Establish no-access zone at the start of Klass encoding range [v2] In-Reply-To: <9RShpjQGr5MI3aqK6VqpYgDiUJS3q_Q6Bdo4jWmtJ5g=.764b3747-69be-4a70-a599-d6cb9a02bddd@github.com> References: <9RShpjQGr5MI3aqK6VqpYgDiUJS3q_Q6Bdo4jWmtJ5g=.764b3747-69be-4a70-a599-d6cb9a02bddd@github.com> Message-ID: > After having reserved an address range for the Klass encoding range, we either: > a) Place CDS, then class space, into that address range > b) Place only class space in that range (if CDS is off). > > For an nKlass of 0, the decoded Klasspointer points to the beginning of the encoding range. Since nKlass=0 is a special value, both CDS (a) and Metaspace (b) ensure that no Klass is placed right at the start of the Klass range. > > However, it would also be good to establish a no-access zone at the range's start. Dereferencing an nKlass=0 would then result in an immediate, obvious crash instead of in reading invalid data. > > This would closely mimic what we do in the compressed-oops-enabled java heap (albeit there we do it for fault-based null checks, too) and what Operating Systems do with low-address ranges. > > --- > > The patch: > > We can neither move the encoding base down one page (the encoding base is carefully chosen to fit the platform's decoding). Nor can we move CDS archive space up one page (since CDS relies on the archive being placed exactly at the encoding base address). Nor do we want to move class space up (since class space start has a high alignment requirement of 16MB, protection zone would need to be 16MB large, which is a waste of address space). > > Instead, as before, we just let Metaspace and CDS handle the protection zone internally. For Metaspace, this is very simple. We just protect the first page of class space. > > For CDS, it is a tiny bit more complex since we need to leave a "protection-zone-shaped hole" in the first region of the archive when we dump it. We do just that and then give that region a new property, "has protection zone". At runtime, we protect the underlying memory if a mapped region has a protection zone. > > With CDS, because the page size can differ between dump- and runtime, the protection zone is the size of CDS core region alignment, not page-sized (e.g. dumping on Linux aarch64 with 4KB pages shall generate an archive that can be used in Docker on MacOS with 16KB pages). > > ---- > > Tests: > - ran CDS and AppCDS jtreg tests manually on Mac m1 > - manually tested that decoding, then dereferencing an nKlass=0 gives us the new "Fault address is narrow Klass base - dereferencing a zero nKlass?" output in the hs-err file > - GHAs (which include the new regression test) Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: - Merge branch 'openjdk:master' into cds-metaspace-prot-prefix - Merge branch 'openjdk:master' into cds-metaspace-prot-prefix - Update metaspace.cpp - cds-metaspace-prot-prefix ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19290/files - new: https://git.openjdk.org/jdk/pull/19290/files/983bf39d..0477e957 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19290&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19290&range=00-01 Stats: 27773 lines in 580 files changed: 20211 ins; 5102 del; 2460 mod Patch: https://git.openjdk.org/jdk/pull/19290.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19290/head:pull/19290 PR: https://git.openjdk.org/jdk/pull/19290 From rehn at openjdk.org Mon Jun 10 05:59:15 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Mon, 10 Jun 2024 05:59:15 GMT Subject: RFR: 8332689: RISC-V: Use load instead of trampolines [v8] In-Reply-To: <_MuwK3i7Ru8buBWkrto34BYlmlfDFPKDBwsp3Vaq2S8=.93e16dca-260a-4e4d-b414-fc66412bd5c7@github.com> References: <4TVE4G5mkc7U0LI7pj_rd6jbG0vGyFTXWhwX86wA4tg=.3f823c2e-136f-431f-8e06-d61c652f36d2@github.com> <_MuwK3i7Ru8buBWkrto34BYlmlfDFPKDBwsp3Vaq2S8=.93e16dca-260a-4e4d-b414-fc66412bd5c7@github.com> Message-ID: On Mon, 10 Jun 2024 03:28:08 GMT, Fei Yang wrote: > > _Mailing list message from [Andrew Haley](mailto:aph-open at littlepinkcloud.com) on [hotspot-dev](mailto:hotspot-dev at mail.openjdk.org):_ > > On 5/29/24 15:28, Robbin Ehn wrote: > > > On some CPUs L1D and L1I can't contain the same cache line, which means > > > the tramopline stub can bounce from L1I->L1D->L1I, which is > > > expensive. > > > > > > Wouldn't it be a lot easier simply to put the target address loaded by the trampoline into the constant pool? > > Seem to me that will be more cleaner than the current solution (`MacroAssembler::emit_address_stub` which uses `trampoline_stub_Relocation::spec` relocation holder but emits an 'address stub' instead of a real trampline). And I see PPC is putting the entry point as a constant into the constant pool [1] when emitting a call with trampoline stub. > > [1] [MacroAssembler::emit_address_stub](https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/ppc/ppc.ad#L1308) This was just a bit easier as I have both cases. I'll look into cp. Thanks ------------- PR Comment: https://git.openjdk.org/jdk/pull/19453#issuecomment-2157328503 From rehn at openjdk.org Mon Jun 10 06:10:22 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Mon, 10 Jun 2024 06:10:22 GMT Subject: RFR: 8332689: RISC-V: Use load instead of trampolines [v9] In-Reply-To: References: Message-ID: > Hi all, please consider! > > Today we do JAL to **dest** if **dest** is in reach (+/- 1 MB). > Using a very small application or running very short time we have fast patchable calls. > But any normal application running longer will increase the code size and code chrun/fragmentation. > So whatever or not you get hot fast calls rely on luck. > > To be patchable and get code cache reach we also emit a stub trampoline which we can point the JAL to. > This would be the common case for a patchable call. > > Code stream: > JAL > Stubs: > AUIPC > LD > JALR > > > > On some CPUs L1D and L1I can't contain the same cache line, which means the tramopline stub can bounce from L1I->L1D->L1I, which is expensive. > Even if you don't have that problem having a call to a jump is not the fastest way. > Loading the address avoids the pitsfalls of cmodx. > > This patch suggest to solve the problems with trampolines, we take small penalty in the naive case of JAL to **dest**, > and instead do by default: > > Code stream: > AUIPC > LD > JALR > Stubs: > > > An experimental option for turning trampolines back on exists. > > It should be possible to enhanced this with the WIP [Zjid](https://github.com/riscv/riscv-j-extension) by changing the JALR to JAL and nop out the auipc+ld (as the current proposal of Zjid forces the I-fetcher to fetch instruction in order (meaning we will avoid a lot issues which arm has)) when in reach and vice-versa. > > Numbers from VF2 (I have done them a few times, they are always overall in favor of this patch): > > fop (msec) 2239 | 2128 = 0.950424 > h2 (msec) 18660 | 16594 = 0.889282 > jython (msec) 22022 | 21925 = 0.995595 > luindex (msec) 2866 | 2842 = 0.991626 > lusearch (msec) 4108 | 4311 = 1.04942 > lusearch-fix (msec) 4406 | 4116 = 0.934181 > pmd (msec) 5976 | 5897 = 0.98678 > jython (msec) 22022 | 21925 = 0.995595 > Avg: 0.974112 > fop(xcomp) (msec) 2721 | 2714 = 0.997427 > h2(xcomp) (msec) 37719 | 38004 = 1.00756 > jython(xcomp) ... Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 14 commits: - Merge branch 'master' into 8332689 - Merge branch 'master' into 8332689 - Remove tmp file - Prepare for dynamic NativeCall size - Only allow one calling convetion, i.e. fixed sized - Merge branch 'master' into 8332689 - Review comments - Move shart/far code to cpp - Cleanup - Merge branch 'master' into 8332689 - ... and 4 more: https://git.openjdk.org/jdk/compare/a9413973...742c6561 ------------- Changes: https://git.openjdk.org/jdk/pull/19453/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19453&range=08 Stats: 907 lines in 16 files changed: 652 ins; 161 del; 94 mod Patch: https://git.openjdk.org/jdk/pull/19453.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19453/head:pull/19453 PR: https://git.openjdk.org/jdk/pull/19453 From dholmes at openjdk.org Mon Jun 10 08:01:17 2024 From: dholmes at openjdk.org (David Holmes) Date: Mon, 10 Jun 2024 08:01:17 GMT Subject: RFR: 8330198: Add some class loading related perf counters to measure VM startup [v8] In-Reply-To: <-pZLP33SmfOP-3djnyabR8eaNZqVem1sLJYWD7412Qc=.4c84c9ab-392e-4b03-a0c0-e8a9a679999b@github.com> References: <-pZLP33SmfOP-3djnyabR8eaNZqVem1sLJYWD7412Qc=.4c84c9ab-392e-4b03-a0c0-e8a9a679999b@github.com> Message-ID: On Fri, 7 Jun 2024 16:11:36 GMT, Calvin Cheung wrote: >> Adding a few perf counters related to class loading to measure VM startup. The counters are only active if the user specifies `-Xlog:init` in the command line. A diagnostic flag `ProfileClassLinkage` is added to control the new counters. The flag is set to false by default and will be enabled if `-Xlog:init` is specified. >> >> This change is already in the leyden/premain branch. There are more counters in the branch to measure other stuff. For now, just upstreaming class loader related counters. >> >> Refer to the [comment](https://bugs.openjdk.org/browse/JDK-8330198?focusedId=14665311&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14665311) in the bug report for an example output. >> >> Passed tiers 1 - 4 testing. > > Calvin Cheung has updated the pull request incrementally with one additional commit since the last revision: > > remove Arguments::perf_class_link() I can't decide whether using the logging state to control the counter initialization as well as their printing is clever, or combining things the wrong way. When you have additional counters enabled by slightly different logging settings, I can't see how you will compose things. I also still wonder about the overhead of the empty perf timer events in heavy duty classloading applications. src/hotspot/share/interpreter/linkResolver.cpp line 1735: > 1733: > 1734: PerfTraceTimedEvent timer(ClassLoader::perf_resolve_invokehandle_time(), > 1735: ClassLoader::perf_resolve_invokehandle_count()); What does this do when the counters are not enabled? src/hotspot/share/runtime/java.cpp line 160: > 158: } > 159: > 160: void log_vm_stats(outputStream *st) { I assume this generic name is because in the future it will print a lot more VM stats? src/hotspot/share/runtime/java.cpp line 164: > 162: if (log.is_enabled()) { > 163: ClassLoader::print_counters(st); > 164: } Probably worth adding a comment here as to why we actually print to the passed in stream and not the log stream., given we check if the log stream is enabled. Someone could easily think this is a typo/bug. src/hotspot/share/runtime/threads.cpp line 835: > 833: log.print_cr("At VM initialization completion:"); > 834: log_vm_stats(&log); > 835: } If we are going to have more types of VM stats in the future, it is not clear how you will change this if-condition? Nor what stream you would pass in. ??? ------------- PR Review: https://git.openjdk.org/jdk/pull/18790#pullrequestreview-2106917892 PR Review Comment: https://git.openjdk.org/jdk/pull/18790#discussion_r1632745676 PR Review Comment: https://git.openjdk.org/jdk/pull/18790#discussion_r1632753151 PR Review Comment: https://git.openjdk.org/jdk/pull/18790#discussion_r1632756003 PR Review Comment: https://git.openjdk.org/jdk/pull/18790#discussion_r1632761703 From thartmann at openjdk.org Mon Jun 10 09:33:21 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 10 Jun 2024 09:33:21 GMT Subject: RFR: 8325821: [REDO] use "dmb.ishst+dmb.ishld" for release barrier [v8] In-Reply-To: References: Message-ID: On Wed, 5 Jun 2024 03:52:24 GMT, kuaiwei wrote: >> he origin patch for https://bugs.openjdk.org/browse/JDK-8324186 has 2 issues: >> 1 It show regression in some platform, like Apple silicon in mac os >> 2 Can not handle instruction sequence like "dmb.ishld; dmb.ishst; dmb.ishld; dmb.ishld" >> >> It can be fixed by: >> 1 Enable AlwaysMergeDMB by default, only disable it in architecture we can see performance improvement (N1 or N2) >> 2 Check the special pattern and merge the subsequent dmb. >> >> It also fix a bug when code buffer is expanding, st/ld/dmb can not be merged. I added unit tests for these. >> >> This patch still has a unhandled case. Insts like "dmb.ishld; dmb.ishst; dmb.ish", it will merge the last 2 instructions and can not merge all three. Because when emitting dmb.ish, if merge all previous dmbs, the code buffer will shrink the size. I think it may break some resumption and think it's not a common pattern. >> >> In previous PR https://github.com/openjdk/jdk/pull/18467 , I tried an implementation to use state machine for merging. But it looks risky to pending instruction during emitting. > > kuaiwei has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains eight commits: > > - Merge master > - Use constexpr for test encoding > - Add comment in aarch64.ad > - Remove tailing white space > - Refine merge dmb test cases > - Add more unit tests > - Make MacroAssembler::merge more clear > - 8325821: [REDO] use "dmb.ishst+dmb.ishld" for release barrier Sorry for the delay, I had to re-run a subset of the benchmarks due to high variance. All green now. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19278#issuecomment-2157834049 From jsjolen at openjdk.org Mon Jun 10 09:37:11 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Mon, 10 Jun 2024 09:37:11 GMT Subject: RFR: 8333775: Small improvement to outputStream auto-indentation mode In-Reply-To: References: Message-ID: On Fri, 7 Jun 2024 08:40:22 GMT, Thomas Stuefe wrote: > Almost trivial enhancement. > > [JDK-8333211](https://bugs.openjdk.org/browse/JDK-8333211) added automatic indentation. Some changes to complement that: > > - let outputStream::set_autoindent() return the old value for later restoration > - add an RAII object to enable autoindent and restore the old state when leaving. Hi Thomas, LGTM but can you make the `StreamAutoIndentor` `NONCOPYABLE' also? Copying it makes no sense. Trivial. ------------- Marked as reviewed by jsjolen (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19592#pullrequestreview-2107269818 From shade at openjdk.org Mon Jun 10 09:49:16 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 10 Jun 2024 09:49:16 GMT Subject: RFR: 8325821: [REDO] use "dmb.ishst+dmb.ishld" for release barrier [v8] In-Reply-To: References: Message-ID: On Mon, 10 Jun 2024 09:30:25 GMT, Tobias Hartmann wrote: > Sorry for the delay, I had to re-run a subset of the benchmarks due to high variance. All green now. Great, thanks for testing. I think we are ready to integrate this now. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19278#issuecomment-2157870940 From dnsimon at openjdk.org Mon Jun 10 09:49:17 2024 From: dnsimon at openjdk.org (Doug Simon) Date: Mon, 10 Jun 2024 09:49:17 GMT Subject: RFR: 8333300: [JVMCI] add support for generational ZGC [v3] In-Reply-To: References: Message-ID: On Tue, 4 Jun 2024 16:36:49 GMT, Tom Rodriguez wrote: >> This exposes the required values for JVMCI to support generational ZGC. It includes a few things worth mentioning. JVMCI still exports XBarrierSetRuntime as fields in CompilerToVM::Data under the original name of ZBarrierSetRuntime. I have exported the XBarrierSetRuntime and ZBarrierSetRuntime functions as addresses under their actual name. This permits backward compatibility until all the required parts are in place. We can eventually delete the CompilerToVM::Data names. >> >> I added ZBarrierSetRuntime::load_barrier_on_oop_array paralleling XBarrierSetRuntime::load_barrier_on_oop_array as we use that for a vector barrier. I could create the function as part of JVMCIRuntime if there are any concerns about including that in the ZGC core. > > Tom Rodriguez has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains ten additional commits since the last revision: > > - Merge branch 'tkr-genz' of github.com:tkrodriguez/jdk into tkr-genz > - Fix riscv compilation > - 8333300: [JVMCI] add support for generational ZGC > - Merge branch 'master' into tkr-genz > - Merge branch 'master' into tkr-genz > - Use NativeAccess to read from handles > - Enable support for UseEpsilonGC > - Fix riscv compilation > - 8333300: [JVMCI] add support for generational ZGC Thanks for all the reviews and input. I'm integrating this on Tom's behalf since he is on vacation. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19490#issuecomment-2157871422 From mbaesken at openjdk.org Mon Jun 10 09:52:12 2024 From: mbaesken at openjdk.org (Matthias Baesken) Date: Mon, 10 Jun 2024 09:52:12 GMT Subject: RFR: 8333775: Small improvement to outputStream auto-indentation mode In-Reply-To: References: Message-ID: On Fri, 7 Jun 2024 08:40:22 GMT, Thomas Stuefe wrote: > Almost trivial enhancement. > > [JDK-8333211](https://bugs.openjdk.org/browse/JDK-8333211) added automatic indentation. Some changes to complement that: > > - let outputStream::set_autoindent() return the old value for later restoration > - add an RAII object to enable autoindent and restore the old state when leaving. Marked as reviewed by mbaesken (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/19592#pullrequestreview-2107320315 From eosterlund at openjdk.org Mon Jun 10 10:03:14 2024 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Mon, 10 Jun 2024 10:03:14 GMT Subject: RFR: 8333300: [JVMCI] add support for generational ZGC [v3] In-Reply-To: References: Message-ID: On Tue, 4 Jun 2024 16:36:49 GMT, Tom Rodriguez wrote: >> This exposes the required values for JVMCI to support generational ZGC. It includes a few things worth mentioning. JVMCI still exports XBarrierSetRuntime as fields in CompilerToVM::Data under the original name of ZBarrierSetRuntime. I have exported the XBarrierSetRuntime and ZBarrierSetRuntime functions as addresses under their actual name. This permits backward compatibility until all the required parts are in place. We can eventually delete the CompilerToVM::Data names. >> >> I added ZBarrierSetRuntime::load_barrier_on_oop_array paralleling XBarrierSetRuntime::load_barrier_on_oop_array as we use that for a vector barrier. I could create the function as part of JVMCIRuntime if there are any concerns about including that in the ZGC core. > > Tom Rodriguez has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains ten additional commits since the last revision: > > - Merge branch 'tkr-genz' of github.com:tkrodriguez/jdk into tkr-genz > - Fix riscv compilation > - 8333300: [JVMCI] add support for generational ZGC > - Merge branch 'master' into tkr-genz > - Merge branch 'master' into tkr-genz > - Use NativeAccess to read from handles > - Enable support for UseEpsilonGC > - Fix riscv compilation > - 8333300: [JVMCI] add support for generational ZGC Still good. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19490#issuecomment-2157900015 From aboldtch at openjdk.org Mon Jun 10 10:03:14 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Mon, 10 Jun 2024 10:03:14 GMT Subject: RFR: 8333300: [JVMCI] add support for generational ZGC [v3] In-Reply-To: References: Message-ID: On Tue, 4 Jun 2024 16:36:49 GMT, Tom Rodriguez wrote: >> This exposes the required values for JVMCI to support generational ZGC. It includes a few things worth mentioning. JVMCI still exports XBarrierSetRuntime as fields in CompilerToVM::Data under the original name of ZBarrierSetRuntime. I have exported the XBarrierSetRuntime and ZBarrierSetRuntime functions as addresses under their actual name. This permits backward compatibility until all the required parts are in place. We can eventually delete the CompilerToVM::Data names. >> >> I added ZBarrierSetRuntime::load_barrier_on_oop_array paralleling XBarrierSetRuntime::load_barrier_on_oop_array as we use that for a vector barrier. I could create the function as part of JVMCIRuntime if there are any concerns about including that in the ZGC core. > > Tom Rodriguez has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains ten additional commits since the last revision: > > - Merge branch 'tkr-genz' of github.com:tkrodriguez/jdk into tkr-genz > - Fix riscv compilation > - 8333300: [JVMCI] add support for generational ZGC > - Merge branch 'master' into tkr-genz > - Merge branch 'master' into tkr-genz > - Use NativeAccess to read from handles > - Enable support for UseEpsilonGC > - Fix riscv compilation > - 8333300: [JVMCI] add support for generational ZGC Just note it still says `JVMIC`. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19490#issuecomment-2157902536 From stuefe at openjdk.org Mon Jun 10 10:23:26 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Mon, 10 Jun 2024 10:23:26 GMT Subject: RFR: 8322475: Extend printing for System.map [v4] In-Reply-To: References: Message-ID: > This is an expansion on the new `System.map` command introduced with JDK-8318636. > > We now print valuable information per memory region, such as: > > - the actual resident set size > - the actual number of huge pages > - the actual used page size > - the THP state of the region (was advised, is eligible, uses THP, ...) > - whether the region is shared > - whether the region had been committed (backed by swap) > - whether the region has been swapped out. > > Example output: > > > from to size rss hugetlb pgsz prot notes vm info/file > 0x00000000c0000000 - 0x00000000ffe00000 1071644672 0 4194304 2M rw-p huge JAVAHEAP /anon_hugepage > 0x00000000ffe00000 - 0x0000000100000000 2097152 0 0 2M rw-p huge JAVAHEAP /anon_hugepage > 0x0000558016b67000 - 0x0000558016b68000 4096 4096 0 4K r--p /shared/projects/openjdk/jdk-jdk/output-fastdebug/images/jdk/bin/java > 0x0000558016b68000 - 0x0000558016b69000 4096 4096 0 4K r-xp /shared/projects/openjdk/jdk-jdk/output-fastdebug/images/jdk/bin/java > 0x00007f3a749f2000 - 0x00007f3a74c62000 2555904 2555904 0 4K rwxp CODE(CodeHeap 'profiled nmethods') > 0x00007f3a74c62000 - 0x00007f3a7be51000 119468032 0 0 4K ---p nores CODE(CodeHeap 'profiled nmethods') > 0x00007f3a7be51000 - 0x00007f3a7c1c1000 3604480 3604480 0 4K rwxp CODE(CodeHeap 'profiled nmethods') > 0x00007f3a7c1c1000 - 0x00007f3a7c592000 4001792 0 0 4K ---p nores CODE(CodeHeap 'non-nmethods') > 0x00007f3a7c592000 - 0x00007f3a7c802000 2555904 2555904 0 4K rwxp CODE(CodeHeap 'non-profiled nmethods') > 0x00007f3a7c802000 - 0x00007f3a839f200... Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 18 commits: - fix merge issue - Merge branch 'master' into System.maps-more-info - fix whitespace issue - wip - exhuming - Merge branch 'master' into System.maps-more-info - Merge - remove codecache name printing - stefank feedback - remove page size histo - ... and 8 more: https://git.openjdk.org/jdk/compare/a9413973...14c17f8e ------------- Changes: https://git.openjdk.org/jdk/pull/17158/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=17158&range=03 Stats: 646 lines in 14 files changed: 464 ins; 98 del; 84 mod Patch: https://git.openjdk.org/jdk/pull/17158.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17158/head:pull/17158 PR: https://git.openjdk.org/jdk/pull/17158 From stuefe at openjdk.org Mon Jun 10 10:34:50 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Mon, 10 Jun 2024 10:34:50 GMT Subject: RFR: 8333775: Small improvement to outputStream auto-indentation mode [v2] In-Reply-To: References: Message-ID: > Almost trivial enhancement. > > [JDK-8333211](https://bugs.openjdk.org/browse/JDK-8333211) added automatic indentation. Some changes to complement that: > > - let outputStream::set_autoindent() return the old value for later restoration > - add an RAII object to enable autoindent and restore the old state when leaving. Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: feedback johann ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19592/files - new: https://git.openjdk.org/jdk/pull/19592/files/4d70ec07..9599ebaf Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19592&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19592&range=00-01 Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/19592.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19592/head:pull/19592 PR: https://git.openjdk.org/jdk/pull/19592 From chagedorn at openjdk.org Mon Jun 10 10:42:15 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 10 Jun 2024 10:42:15 GMT Subject: RFR: 8329141: Obsolete RTM flags and code In-Reply-To: References: Message-ID: On Fri, 7 Jun 2024 02:00:47 GMT, Vladimir Kozlov wrote: > Obsolete HotSpot RTM flags which were deprecated in JDK 23. > RTM related VM code and tests were removed. > > Tested tier1-3,stress,xcomp Otherwise, looks good to me. src/hotspot/share/adlc/output_c.cpp line 1617: > 1615: } > 1616: > 1617: if (node->is_ideal_fastlock() && new_inst->is_ideal_fastlock()) { You should also update the copyright year of this file. src/hotspot/share/opto/parse1.cpp line 2228: > 2226: // the check will fold. > 2227: Node* profile_state = makecon(TypeInt::make(ProfileRTM)); > 2228: Node* opq = _gvn.transform( new Opaque3Node(C, rtm_state, Opaque3Node::RTM_OPT) ); This was the last (and probably the only ever) use of `Opaque3Nodes`, so you could remove this class as well. ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19589#pullrequestreview-2107082285 PR Review Comment: https://git.openjdk.org/jdk/pull/19589#discussion_r1632837314 PR Review Comment: https://git.openjdk.org/jdk/pull/19589#discussion_r1632841502 From stuefe at openjdk.org Mon Jun 10 12:36:16 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Mon, 10 Jun 2024 12:36:16 GMT Subject: Integrated: 8333775: Small improvement to outputStream auto-indentation mode In-Reply-To: References: Message-ID: On Fri, 7 Jun 2024 08:40:22 GMT, Thomas Stuefe wrote: > Almost trivial enhancement. > > [JDK-8333211](https://bugs.openjdk.org/browse/JDK-8333211) added automatic indentation. Some changes to complement that: > > - let outputStream::set_autoindent() return the old value for later restoration > - add an RAII object to enable autoindent and restore the old state when leaving. This pull request has now been integrated. Changeset: e22fc121 Author: Thomas Stuefe URL: https://git.openjdk.org/jdk/commit/e22fc121aed56dad2eedfdc3a53f2a655c3b200b Stats: 36 lines in 5 files changed: 21 ins; 8 del; 7 mod 8333775: Small improvement to outputStream auto-indentation mode Reviewed-by: jsjolen, mbaesken ------------- PR: https://git.openjdk.org/jdk/pull/19592 From duke at openjdk.org Mon Jun 10 13:00:27 2024 From: duke at openjdk.org (kuaiwei) Date: Mon, 10 Jun 2024 13:00:27 GMT Subject: Integrated: 8325821: [REDO] use "dmb.ishst+dmb.ishld" for release barrier In-Reply-To: References: Message-ID: On Fri, 17 May 2024 08:57:20 GMT, kuaiwei wrote: > he origin patch for https://bugs.openjdk.org/browse/JDK-8324186 has 2 issues: > 1 It show regression in some platform, like Apple silicon in mac os > 2 Can not handle instruction sequence like "dmb.ishld; dmb.ishst; dmb.ishld; dmb.ishld" > > It can be fixed by: > 1 Enable AlwaysMergeDMB by default, only disable it in architecture we can see performance improvement (N1 or N2) > 2 Check the special pattern and merge the subsequent dmb. > > It also fix a bug when code buffer is expanding, st/ld/dmb can not be merged. I added unit tests for these. > > This patch still has a unhandled case. Insts like "dmb.ishld; dmb.ishst; dmb.ish", it will merge the last 2 instructions and can not merge all three. Because when emitting dmb.ish, if merge all previous dmbs, the code buffer will shrink the size. I think it may break some resumption and think it's not a common pattern. > > In previous PR https://github.com/openjdk/jdk/pull/18467 , I tried an implementation to use state machine for merging. But it looks risky to pending instruction during emitting. This pull request has now been integrated. Changeset: 2a242db0 Author: Kuai Wei Committer: Aleksey Shipilev URL: https://git.openjdk.org/jdk/commit/2a242db01ed1d502affa4a954e601266fa98dfbe Stats: 523 lines in 9 files changed: 510 ins; 0 del; 13 mod 8325821: [REDO] use "dmb.ishst+dmb.ishld" for release barrier Reviewed-by: shade, aph ------------- PR: https://git.openjdk.org/jdk/pull/19278 From lucy at openjdk.org Mon Jun 10 13:10:15 2024 From: lucy at openjdk.org (Lutz Schmidt) Date: Mon, 10 Jun 2024 13:10:15 GMT Subject: RFR: 8333382: [s390x] Enhance popcnt Instruction to use Z15 facilities [v4] In-Reply-To: References: Message-ID: On Sat, 8 Jun 2024 14:06:30 GMT, Amit Kumar wrote: >> We need to move popcnt instruction implementation out of s390.ad file as it is required to be required some methods present in [JDK-8331126.](https://bugs.openjdk.org/browse/JDK-8331126) >> >> >> When the miscellaneous-instruction-extensions facility 3 is not installed or bit 0 of the M3 >> field is zero, a count of the number of one bits in each of the eight bytes of general register >> R2 is placed into the corresponding byte of general register R1. Each byte of general register >> R1 is an 8-bit binary integer in the range of 0-8. >> >> >> >> When the miscellaneous-instruction-extensions facility 3 is installed and bit 0 of the M3 field >> is one, a count of the total number of one bits in the 64-bit general register R2 is placed into >> general register R1. The result is a 64-bit unsigned integer in the range 0 to 64. >> >> >> Performed tier1 test on fastdebug build and didn't see any regression. > > Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: > > removes warmup code, changes time unit and iteration number I like it much better now (aka LGTM). Would be nice if you would remove the comments I complained about. src/hotspot/cpu/s390/s390.ad line 10691: > 10689: Register Rsrc = $src$$Register; > 10690: > 10691: // Prefer compile-time assertion over run-time SIGILL. This comment isn't very helpful anymore. src/hotspot/cpu/s390/s390.ad line 10711: > 10709: Register Rsrc = $src$$Register; > 10710: > 10711: // Prefer compile-time assertion over run-time SIGILL. This comment isn't very helpful anymore. src/hotspot/cpu/s390/s390.ad line 10731: > 10729: Register Rtmp = $tmp$$Register; > 10730: > 10731: // Prefer compile-time assertion over run-time SIGILL. This comment isn't very helpful anymore. src/hotspot/cpu/s390/s390.ad line 10752: > 10750: Register Rtmp = $tmp$$Register; > 10751: > 10752: // Prefer compile-time assertion over run-time SIGILL. This comment isn't very helpful anymore. ------------- Marked as reviewed by lucy (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19509#pullrequestreview-2107741714 PR Review Comment: https://git.openjdk.org/jdk/pull/19509#discussion_r1633215587 PR Review Comment: https://git.openjdk.org/jdk/pull/19509#discussion_r1633216081 PR Review Comment: https://git.openjdk.org/jdk/pull/19509#discussion_r1633217583 PR Review Comment: https://git.openjdk.org/jdk/pull/19509#discussion_r1633218248 From aph at openjdk.org Mon Jun 10 13:10:17 2024 From: aph at openjdk.org (Andrew Haley) Date: Mon, 10 Jun 2024 13:10:17 GMT Subject: RFR: 8333382: [s390x] Enhance popcnt Instruction to use Z15 facilities [v4] In-Reply-To: References: Message-ID: On Sat, 8 Jun 2024 14:06:30 GMT, Amit Kumar wrote: >> We need to move popcnt instruction implementation out of s390.ad file as it is required to be required some methods present in [JDK-8331126.](https://bugs.openjdk.org/browse/JDK-8331126) >> >> >> When the miscellaneous-instruction-extensions facility 3 is not installed or bit 0 of the M3 >> field is zero, a count of the number of one bits in each of the eight bytes of general register >> R2 is placed into the corresponding byte of general register R1. Each byte of general register >> R1 is an 8-bit binary integer in the range of 0-8. >> >> >> >> When the miscellaneous-instruction-extensions facility 3 is installed and bit 0 of the M3 field >> is one, a count of the total number of one bits in the 64-bit general register R2 is placed into >> general register R1. The result is a 64-bit unsigned integer in the range 0 to 64. >> >> >> Performed tier1 test on fastdebug build and didn't see any regression. > > Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: > > removes warmup code, changes time unit and iteration number src/hotspot/cpu/s390/macroAssembler_s390.cpp line 5816: > 5814: > 5815: assert(r_tmp != noreg, "temp register required for popcnt, for machines < z15"); > 5816: assert_different_registers(r_dst, r_tmp); // if r_src is same as r_tmp, it should be fine Please move these assertions out of the if block. We want to ensure this for testing purposes at all times, so that even if we test on Z15 we expect it to run on others. src/hotspot/cpu/s390/macroAssembler_s390.cpp line 5838: > 5836: > 5837: assert(r_tmp != noreg, "temp register required for popcnt, for machines < z15"); > 5838: assert_different_registers(r_dst, r_tmp); // if r_src is same as r_tmp, it should be fine Same here. src/hotspot/cpu/s390/macroAssembler_s390.hpp line 3: > 1: /* > 2: * Copyright (c) 2016, 2024, Oracle and/or its affiliates. All rights reserved. > 3: * Copyright (c) 2016, 2024 SAP SE. All rights reserved. Add IBM copyright here? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19509#discussion_r1633219071 PR Review Comment: https://git.openjdk.org/jdk/pull/19509#discussion_r1633219593 PR Review Comment: https://git.openjdk.org/jdk/pull/19509#discussion_r1633220458 From lucy at openjdk.org Mon Jun 10 13:19:15 2024 From: lucy at openjdk.org (Lutz Schmidt) Date: Mon, 10 Jun 2024 13:19:15 GMT Subject: RFR: 8333382: [s390x] Enhance popcnt Instruction to use Z15 facilities [v4] In-Reply-To: References: Message-ID: On Mon, 10 Jun 2024 13:06:32 GMT, Andrew Haley wrote: >> Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: >> >> removes warmup code, changes time unit and iteration number > > src/hotspot/cpu/s390/macroAssembler_s390.cpp line 5816: > >> 5814: >> 5815: assert(r_tmp != noreg, "temp register required for popcnt, for machines < z15"); >> 5816: assert_different_registers(r_dst, r_tmp); // if r_src is same as r_tmp, it should be fine > > Please move these assertions out of the if block. We want to ensure this for testing purposes at all times, so that even if we test on Z15 we expect it to run on others. @theRealAph, I understand your argument. Nonetheless, I'm not happy with this request. The match rules in s390.ad have been designed such that, in case the more potent instruction is available, the register allocator does not need to provide a temp register. Regards, Lutz ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19509#discussion_r1633235453 From amitkumar at openjdk.org Mon Jun 10 13:38:13 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Mon, 10 Jun 2024 13:38:13 GMT Subject: RFR: 8333382: [s390x] Enhance popcnt Instruction to use Z15 facilities [v4] In-Reply-To: References: Message-ID: On Mon, 10 Jun 2024 13:16:55 GMT, Lutz Schmidt wrote: >> src/hotspot/cpu/s390/macroAssembler_s390.cpp line 5816: >> >>> 5814: >>> 5815: assert(r_tmp != noreg, "temp register required for popcnt, for machines < z15"); >>> 5816: assert_different_registers(r_dst, r_tmp); // if r_src is same as r_tmp, it should be fine >> >> Please move these assertions out of the if block. We want to ensure this for testing purposes at all times, so that even if we test on Z15 we expect it to run on others. > > @theRealAph, > I understand your argument. Nonetheless, I'm not happy with this request. The match rules in s390.ad have been designed such that, in case the more potent instruction is available, the register allocator does not need to provide a temp register. > Regards, Lutz I agree with Lutz. I have added a separate match rule which will be applicable only for machines `Z15 onwards`. Which will not pass `r_tmp` and it will be considered `noreg` will not satisfy first assert requirement. > so that even if we test on Z15 we expect it to run on others. `else` block will never run on `Z15 or Z16` machines. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19509#discussion_r1633265264 From amitkumar at openjdk.org Mon Jun 10 13:43:41 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Mon, 10 Jun 2024 13:43:41 GMT Subject: RFR: 8333382: [s390x] Enhance popcnt Instruction to use Z15 facilities [v5] In-Reply-To: References: Message-ID: > We need to move popcnt instruction implementation out of s390.ad file as it is required to be required some methods present in [JDK-8331126.](https://bugs.openjdk.org/browse/JDK-8331126) > > > When the miscellaneous-instruction-extensions facility 3 is not installed or bit 0 of the M3 > field is zero, a count of the number of one bits in each of the eight bytes of general register > R2 is placed into the corresponding byte of general register R1. Each byte of general register > R1 is an 8-bit binary integer in the range of 0-8. > > > > When the miscellaneous-instruction-extensions facility 3 is installed and bit 0 of the M3 field > is one, a count of the total number of one bits in the 64-bit general register R2 is placed into > general register R1. The result is a 64-bit unsigned integer in the range 0 to 64. > > > Performed tier1 test on fastdebug build and didn't see any regression. Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: adds ibm copyright header & remove redundant comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19509/files - new: https://git.openjdk.org/jdk/pull/19509/files/53889759..b9acd6d8 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19509&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19509&range=03-04 Stats: 5 lines in 2 files changed: 1 ins; 4 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/19509.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19509/head:pull/19509 PR: https://git.openjdk.org/jdk/pull/19509 From amitkumar at openjdk.org Mon Jun 10 13:43:42 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Mon, 10 Jun 2024 13:43:42 GMT Subject: RFR: 8333382: [s390x] Enhance popcnt Instruction to use Z15 facilities [v4] In-Reply-To: References: Message-ID: On Mon, 10 Jun 2024 13:07:22 GMT, Andrew Haley wrote: >> Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: >> >> removes warmup code, changes time unit and iteration number > > src/hotspot/cpu/s390/macroAssembler_s390.hpp line 3: > >> 1: /* >> 2: * Copyright (c) 2016, 2024, Oracle and/or its affiliates. All rights reserved. >> 3: * Copyright (c) 2016, 2024 SAP SE. All rights reserved. > > Add IBM copyright here? done :-) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19509#discussion_r1633273341 From szaldana at openjdk.org Mon Jun 10 13:44:22 2024 From: szaldana at openjdk.org (Sonia Zaldana Calles) Date: Mon, 10 Jun 2024 13:44:22 GMT Subject: Integrated: 8326085: Remove unnecessary UpcallContext constructor In-Reply-To: References: Message-ID: On Fri, 26 Apr 2024 17:42:48 GMT, Sonia Zaldana Calles wrote: > Hi all, > > This PR removes the explicit constructor to UpcallContext (hotspot/share/prims/upcallLinker.cpp) that was added as workaround for [8286891](https://bugs.openjdk.org/browse/JDK-8286891). > > The minimum required version of XLC has since been bumped in [8325880](https://bugs.openjdk.org/browse/JDK-8325880), so we can remove this. > > Thanks, > Sonia This pull request has now been integrated. Changeset: e0afe0b5 Author: Sonia Zaldana Calles Committer: Thomas Stuefe URL: https://git.openjdk.org/jdk/commit/e0afe0b5e4f9bfa1f608be98e0a4f3bb4a7e4d30 Stats: 1 line in 1 file changed: 0 ins; 1 del; 0 mod 8326085: Remove unnecessary UpcallContext constructor Reviewed-by: kbarrett, stuefe ------------- PR: https://git.openjdk.org/jdk/pull/18982 From aph at openjdk.org Mon Jun 10 13:55:18 2024 From: aph at openjdk.org (Andrew Haley) Date: Mon, 10 Jun 2024 13:55:18 GMT Subject: RFR: 8333382: [s390x] Enhance popcnt Instruction to use Z15 facilities [v4] In-Reply-To: References: Message-ID: On Mon, 10 Jun 2024 13:35:29 GMT, Amit Kumar wrote: >> @theRealAph, >> I understand your argument. Nonetheless, I'm not happy with this request. The match rules in s390.ad have been designed such that, in case the more potent instruction is available, the register allocator does not need to provide a temp register. >> Regards, Lutz > > I agree with Lutz. I have added a separate match rule which will be applicable only for machines `Z15 onwards`. Which will not pass `r_tmp` and it will be considered `noreg` will not satisfy first assert requirement. > >> so that even if we test on Z15 we expect it to run on others. > > `else` block will never run on `Z15 or Z16` machines. I see. I wasn't so much thinking about the match rules, but anywhere else that needs to use popcount. While usages from the ad file are undoubtedly safe, anywhere else must provide a register to use or have some sort of conditionals at the point pop_count_int is used. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19509#discussion_r1633292332 From mbaesken at openjdk.org Mon Jun 10 13:57:48 2024 From: mbaesken at openjdk.org (Matthias Baesken) Date: Mon, 10 Jun 2024 13:57:48 GMT Subject: RFR: 8333887: ubsan: unsafe.cpp:247:13: runtime error: store to null pointer of type 'volatile int' Message-ID: When running with ubsan enabled binaries, in a number of tests like jdk/jfr/event/runtime/TestShutdownEvent.jtr jdk/jfr/jvm/TestDumpOnCrash.jtr we get those ubsan-errors : src/hotspot/share/prims/unsafe.cpp:247:13: runtime error: store to null pointer of type 'volatile int' #0 0x7f0be9a3e10d in MemoryAccess::put(int) src/hotspot/share/prims/unsafe.cpp:247 #1 0x7f0be9a3e10d in Unsafe_PutInt src/hotspot/share/prims/unsafe.cpp:315 #2 0x7f0bd0502e7b () #3 0x7f0bd04fe01f () #4 0x7f0bd04fe01f () #5 0x7f0bd04fe525 () #6 0x7f0bd04f6c85 () #7 0x7f0be80a2972 in JavaCalls::call_helper(JavaValue*, methodHandle const&, JavaCallArguments*, JavaThread*) src/hotspot/share/runtime/javaCalls.cpp:415 #8 0x7f0be83160d8 in jni_invoke_static src/hotspot/share/prims/jni.cpp:888 #9 0x7f0be831d875 in jni_CallStaticVoidMethod src/hotspot/share/prims/jni.cpp:1717 #10 0x7f0beed32cf8 in invokeStaticMainWithArgs src/java.base/share/native/libjli/java.c:418 #11 0x7f0beed35894 in JavaMain src/java.base/share/native/libjli/java.c:623 #12 0x7f0beed3cf68 in ThreadJavaMain src/java.base/unix/native/libjli/java_md.c:653 #13 0x7f0beeceb6e9 in start_thread (/lib64/libpthread.so.0+0xa6e9) (BuildId: 2f8d3c2d0f4d7888c2598d2ff6356537f5708a73) Looks like we use unsafe to put/write to 0 e.g. to cause a crash. Probably we could add an attribute to the function so that ubsan stops complaining (the put to 0 is done for a reason but ubsan cannot know this). ------------- Commit messages: - JDK-8333887 Changes: https://git.openjdk.org/jdk/pull/19630/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19630&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8333887 Stats: 5 lines in 1 file changed: 5 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/19630.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19630/head:pull/19630 PR: https://git.openjdk.org/jdk/pull/19630 From stuefe at openjdk.org Mon Jun 10 14:17:14 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Mon, 10 Jun 2024 14:17:14 GMT Subject: RFR: 8333658: NMT: Use an allocator with 4-byte pointers to save memory in NativeCallStackStorage [v9] In-Reply-To: References: Message-ID: On Fri, 7 Jun 2024 10:49:54 GMT, Johan Sj?len wrote: > Now we use access specifiers to deny access to the index, except for the allocator, which is a friend. We do want the index to be opaque, so we do this. On top of that, we use an `_owner` field in debug mode to assert that we don't free to the wrong allocator. The last part I don't like. It causes `I` to quadruple in size in debug builds only. For something that is potentially used as member in bunch of different data structures, it can cause considerable deltas in memory layout between debug and release builds. I don't think that is a good thing. What you test in debug should be close to what's running at a customer. I also don't think it is particularly useful. I would instead do a simple range check on I. That gives you a part of the sanity checks without the negatives. Couple that with a sanity check wrt to slot state (don't free free slots) you are pretty well covered already. Other than that, I think using an own class to hide a simple i32 is unnecessary complex, but I leave that up to you. If you do this, please take care of constness. And make it work for its money, e.g. by giving it an is_nil instead of exposing nil, and requiring the user to manually compare it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18979#discussion_r1633324959 From stuefe at openjdk.org Mon Jun 10 14:17:15 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Mon, 10 Jun 2024 14:17:15 GMT Subject: RFR: 8333658: NMT: Use an allocator with 4-byte pointers to save memory in NativeCallStackStorage [v9] In-Reply-To: References: Message-ID: <0wzHsav13as-hrZeVpnwSt7QUPH-iabCbTEd3xmXUW8=.bfa2e209-56cb-42a9-ac34-93cab4b64a54@github.com> On Fri, 7 Jun 2024 12:18:42 GMT, Johan Sj?len wrote: >> Hi, >> >> This PR introduces a new allocator, `IndexedFreelistAllocator`. It uses a `GrowableArray` in order to have 4-byte "pointers" to its elements and also works as a freelist as unused slots form a linked list. This allocator cannot shrink its used memory. I'm always open for better names. >> >> We then use this allocator in order to store the `NativeCallStackStorage` hash table. This saves `4 * 4099 = ~16KiB` of memory as each table element is now only 4 bytes. For each stored stack it also saves 4 bytes. The fact that the allocator cannot shrink is fine, we do not ever remove stacks from the storage. >> >> The main point of this is to introduce a pointer-generic experimential API, so I also implemented CHeap and Arena allocators. It's currently placed in NMT, but we might want to move it into utilities. It uses a bit of template-magic, but my IDE (Emacs+clangd) was actually able to figure things out when the types didn't line up correctly etc, so it's not an enemy to IDE help. >> >> It sounded expensive to have the GrowableArray continously realloc its underlying data array, so I did a basic test where we allocate 1 000 000 stacks and push them into NativeCallStackStorage backed by different allocators. This is available in the PR. >> >> The results are as follows on linux-x64-slowdebug: >> >> >> Generate stacks... Done >> Time taken with GrowableArray: 8341.240945 >> Time taken with CHeap: 12189.031318 >> Time taken with Arena: 8800.703092 >> Time taken with GrowableArray again: 8295.508829 >> >> >> And on linux-x64: >> >> >> Time taken with GrowableArray: 8378.018135 >> Time taken with CHeap: 12437.347868 >> Time taken with Arena: 8758.064717 >> Time taken with GrowableArray again: 8391.076291 >> >> >> Obviously, this is a very basic benchmark, but it seems like we're faster than CHeap and Arena for this case. > > Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: > > Add include guards src/hotspot/share/nmt/indexedFreeListAllocator.hpp line 68: > 66: char e[sizeof(E)]; > 67: > 68: BackingElement() { Init list? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18979#discussion_r1633328767 From amitkumar at openjdk.org Mon Jun 10 14:18:13 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Mon, 10 Jun 2024 14:18:13 GMT Subject: RFR: 8333382: [s390x] Enhance popcnt Instruction to use Z15 facilities [v4] In-Reply-To: References: Message-ID: On Mon, 10 Jun 2024 13:52:11 GMT, Andrew Haley wrote: >anywhere else must provide a register to use or have some sort of conditionals at the point pop_count_int is used. So for machines `<=Z14` we will never go in the `if` block; And the moment we enter in `else` block there is check, which confirms `r_tmp` shouldn't be `noreg` which will force author to pass a temporary register. Isn't it ? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19509#discussion_r1633330779 From aph at openjdk.org Mon Jun 10 15:15:14 2024 From: aph at openjdk.org (Andrew Haley) Date: Mon, 10 Jun 2024 15:15:14 GMT Subject: RFR: 8333382: [s390x] Enhance popcnt Instruction to use Z15 facilities [v4] In-Reply-To: References: Message-ID: On Mon, 10 Jun 2024 14:15:44 GMT, Amit Kumar wrote: > > anywhere else must provide a register to use or have some sort of conditionals at the point pop_count_int is used. > > So for machines `<=Z14` we will never go in the `if` block; And the moment we enter in `else` block there is check, which confirms `r_tmp` shouldn't be `noreg` which will force author to pass a temporary register. Isn't it ? The point is to avoid the failure where testing is done on Z15 and up, but some customers have earlier machines. A maintenance programmer uses popcount and all the testing on Z15 passes. The first person to see the failure is the customer. We're setting a trap for maintainers. This is not a complaint about the ad file use case. It's about anywhere else that popcount is used in hand-written assembly code. I'm telling you this from painful experience on AArch64. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19509#discussion_r1633423607 From kvn at openjdk.org Mon Jun 10 15:33:26 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 10 Jun 2024 15:33:26 GMT Subject: RFR: 8329141: Obsolete RTM flags and code [v2] In-Reply-To: References: Message-ID: > Obsolete HotSpot RTM flags which were deprecated in JDK 23. > RTM related VM code and tests were removed. > > Tested tier1-3,stress,xcomp Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: Addressed comments, removed Opaque3 node which is now unused ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19589/files - new: https://git.openjdk.org/jdk/pull/19589/files/fcfb62fd..45f8e73c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19589&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19589&range=00-01 Stats: 39 lines in 6 files changed: 0 ins; 37 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/19589.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19589/head:pull/19589 PR: https://git.openjdk.org/jdk/pull/19589 From kvn at openjdk.org Mon Jun 10 15:33:27 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 10 Jun 2024 15:33:27 GMT Subject: RFR: 8329141: Obsolete RTM flags and code [v2] In-Reply-To: References: Message-ID: On Mon, 10 Jun 2024 10:39:53 GMT, Christian Hagedorn wrote: >> Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: >> >> Addressed comments, removed Opaque3 node which is now unused > > Otherwise, looks good to me. Thank you, @chhagedorn, for review. I addressed your comments. > src/hotspot/share/adlc/output_c.cpp line 1617: > >> 1615: } >> 1616: >> 1617: if (node->is_ideal_fastlock() && new_inst->is_ideal_fastlock()) { > > You should also update the copyright year of this file. Done > src/hotspot/share/opto/parse1.cpp line 2228: > >> 2226: // the check will fold. >> 2227: Node* profile_state = makecon(TypeInt::make(ProfileRTM)); >> 2228: Node* opq = _gvn.transform( new Opaque3Node(C, rtm_state, Opaque3Node::RTM_OPT) ); > > This was the last (and probably the only ever) use of `Opaque3Nodes`, so you could remove this class as well. Done ------------- PR Comment: https://git.openjdk.org/jdk/pull/19589#issuecomment-2158665625 PR Review Comment: https://git.openjdk.org/jdk/pull/19589#discussion_r1633448708 PR Review Comment: https://git.openjdk.org/jdk/pull/19589#discussion_r1633448955 From aph at openjdk.org Mon Jun 10 15:36:13 2024 From: aph at openjdk.org (Andrew Haley) Date: Mon, 10 Jun 2024 15:36:13 GMT Subject: RFR: 8333382: [s390x] Enhance popcnt Instruction to use Z15 facilities [v4] In-Reply-To: References: Message-ID: On Mon, 10 Jun 2024 15:12:47 GMT, Andrew Haley wrote: >>>anywhere else must provide a register to use or have some sort of conditionals at the point pop_count_int is used. >> >> So for machines `<=Z14` we will never go in the `if` block; And the moment we enter in `else` block there is check, which confirms `r_tmp` shouldn't be `noreg` which will force author to pass a temporary register. Isn't it ? > >> > anywhere else must provide a register to use or have some sort of conditionals at the point pop_count_int is used. >> >> So for machines `<=Z14` we will never go in the `if` block; And the moment we enter in `else` block there is check, which confirms `r_tmp` shouldn't be `noreg` which will force author to pass a temporary register. Isn't it ? > > The point is to avoid the failure where testing is done on Z15 and up, but some customers have earlier machines. A maintenance programmer uses popcount and all the testing on Z15 passes. The first person to see the failure is the customer. We're setting a trap for maintainers. > > This is not a complaint about the ad file use case. It's about anywhere else that popcount is used in hand-written assembly code. I'm telling you this from painful experience on AArch64. It'd be safer to break `pop_count_long` into two parts, for new and old versions, then use the two parts explicitly in s390x.ad patterns. Say, like this: instruct popCountI_Ext3(iRegI dst, iRegI src, flagsReg cr) %{ match(Set dst (PopCountI src)); effect(TEMP_DEF dst, KILL cr); predicate(UsePopCountInstruction && VM_Version::has_PopCount() && VM_Version::has_MiscInstrExt3()); ins_cost(DEFAULT_COST); size(8); // popcnt + llgfr format %{ "POPCNT $dst,$src\t # pop count int" %} ins_encode %{ Register Rdst = $dst$$Register; Register Rsrc = $src$$Register; __ pop_count_int_post_z15(Rdst, Rsrc); %} ins_pipe(pipe_class_dummy); %} void MacroAssembler::pop_count_int(Register r_dst, Register r_src, Register r_tmp) { assert(r_tmp != noreg, "temp register required for popcnt, for machines < z15"); assert_different_registers(r_dst, r_tmp); // if r_src is same as r_tmp, it should be fine if (VM_Version::has_MiscInstrExt3()) { pop_count_int_post_z15(r_dst, r_src); } else { pop_count_int_pre_z15(r_dst, r_src, r_tmp); } } ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19509#discussion_r1633454445 From amitkumar at openjdk.org Mon Jun 10 15:47:13 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Mon, 10 Jun 2024 15:47:13 GMT Subject: RFR: 8333382: [s390x] Enhance popcnt Instruction to use Z15 facilities [v4] In-Reply-To: References: Message-ID: <8OgV6y1wcSkIFDNt5y7QdF53Gy3Hj4SAbv3src0K14g=.d26770d5-1b72-4104-9d84-a2aa260fb69f@github.com> On Mon, 10 Jun 2024 15:33:32 GMT, Andrew Haley wrote: >>> > anywhere else must provide a register to use or have some sort of conditionals at the point pop_count_int is used. >>> >>> So for machines `<=Z14` we will never go in the `if` block; And the moment we enter in `else` block there is check, which confirms `r_tmp` shouldn't be `noreg` which will force author to pass a temporary register. Isn't it ? >> >> The point is to avoid the failure where testing is done on Z15 and up, but some customers have earlier machines. A maintenance programmer uses popcount and all the testing on Z15 passes. The first person to see the failure is the customer. We're setting a trap for maintainers. >> >> This is not a complaint about the ad file use case. It's about anywhere else that popcount is used in hand-written assembly code. I'm telling you this from painful experience on AArch64. > > It'd be safer to break `pop_count_long` into two parts, for new and old versions, then use the two parts explicitly in s390x.ad patterns. Say, like this: > > > instruct popCountI_Ext3(iRegI dst, iRegI src, flagsReg cr) %{ > match(Set dst (PopCountI src)); > effect(TEMP_DEF dst, KILL cr); > predicate(UsePopCountInstruction && > VM_Version::has_PopCount() && > VM_Version::has_MiscInstrExt3()); > ins_cost(DEFAULT_COST); > size(8); // popcnt + llgfr > format %{ "POPCNT $dst,$src\t # pop count int" %} > ins_encode %{ > Register Rdst = $dst$$Register; > Register Rsrc = $src$$Register; > > __ pop_count_int_post_z15(Rdst, Rsrc); > > %} > ins_pipe(pipe_class_dummy); > %} > > > > > void MacroAssembler::pop_count_int(Register r_dst, Register r_src, Register r_tmp) { > assert(r_tmp != noreg, "temp register required for popcnt, for machines < z15"); > assert_different_registers(r_dst, r_tmp); // if r_src is same as r_tmp, it should be fine > > if (VM_Version::has_MiscInstrExt3()) { > pop_count_int_post_z15(r_dst, r_src); > } else { > pop_count_int_pre_z15(r_dst, r_src, r_tmp); > } > } Okay I'm fine with your suggestion. But I only hope that nobody directly uses `z_popcnt` instruction. Otherwise this solution is not gonna prevent the bug you described. Maybe I should put a comment in `assembler_s390.hpp` file as well, to have a look at the helper method present in `macroAssembler_s390.hpp` file instead of using it vanilla. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19509#discussion_r1633471193 From aph at openjdk.org Mon Jun 10 16:17:14 2024 From: aph at openjdk.org (Andrew Haley) Date: Mon, 10 Jun 2024 16:17:14 GMT Subject: RFR: 8333382: [s390x] Enhance popcnt Instruction to use Z15 facilities [v4] In-Reply-To: <8OgV6y1wcSkIFDNt5y7QdF53Gy3Hj4SAbv3src0K14g=.d26770d5-1b72-4104-9d84-a2aa260fb69f@github.com> References: <8OgV6y1wcSkIFDNt5y7QdF53Gy3Hj4SAbv3src0K14g=.d26770d5-1b72-4104-9d84-a2aa260fb69f@github.com> Message-ID: On Mon, 10 Jun 2024 15:45:02 GMT, Amit Kumar wrote: >> It'd be safer to break `pop_count_long` into two parts, for new and old versions, then use the two parts explicitly in s390x.ad patterns. Say, like this: >> >> >> instruct popCountI_Ext3(iRegI dst, iRegI src, flagsReg cr) %{ >> match(Set dst (PopCountI src)); >> effect(TEMP_DEF dst, KILL cr); >> predicate(UsePopCountInstruction && >> VM_Version::has_PopCount() && >> VM_Version::has_MiscInstrExt3()); >> ins_cost(DEFAULT_COST); >> size(8); // popcnt + llgfr >> format %{ "POPCNT $dst,$src\t # pop count int" %} >> ins_encode %{ >> Register Rdst = $dst$$Register; >> Register Rsrc = $src$$Register; >> >> __ pop_count_int_post_z15(Rdst, Rsrc); >> >> %} >> ins_pipe(pipe_class_dummy); >> %} >> >> >> >> >> void MacroAssembler::pop_count_int(Register r_dst, Register r_src, Register r_tmp) { >> assert(r_tmp != noreg, "temp register required for popcnt, for machines < z15"); >> assert_different_registers(r_dst, r_tmp); // if r_src is same as r_tmp, it should be fine >> >> if (VM_Version::has_MiscInstrExt3()) { >> pop_count_int_post_z15(r_dst, r_src); >> } else { >> pop_count_int_pre_z15(r_dst, r_src, r_tmp); >> } >> } > > Okay I'm fine with your suggestion. But I only hope that nobody directly uses `z_popcnt` instruction. Otherwise this solution is not gonna prevent the bug you described. Maybe I should put a comment in `assembler_s390.hpp` file as well, to have a look at the helper method present in `macroAssembler_s390.hpp` file instead of using it vanilla. There's no need to hope, make `z_popcnt` protected. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19509#discussion_r1633510367 From kvn at openjdk.org Mon Jun 10 18:01:22 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 10 Jun 2024 18:01:22 GMT Subject: Integrated: 8329141: Obsolete RTM flags and code In-Reply-To: References: Message-ID: On Fri, 7 Jun 2024 02:00:47 GMT, Vladimir Kozlov wrote: > Obsolete HotSpot RTM flags which were deprecated in JDK 23. > RTM related VM code and tests were removed. > > Tested tier1-3,stress,xcomp This pull request has now been integrated. Changeset: 96911537 Author: Vladimir Kozlov URL: https://git.openjdk.org/jdk/commit/96911537557dd95cd11598cd9a9f4e64e05e6aac Stats: 6435 lines in 99 files changed: 27 ins; 6371 del; 37 mod 8329141: Obsolete RTM flags and code Reviewed-by: chagedorn ------------- PR: https://git.openjdk.org/jdk/pull/19589 From mli at openjdk.org Mon Jun 10 18:35:14 2024 From: mli at openjdk.org (Hamlin Li) Date: Mon, 10 Jun 2024 18:35:14 GMT Subject: RFR: 8332587: RISC-V: secondary_super_cache does not scale well [v3] In-Reply-To: <4BgBBn_Oqhipw03h7BA7ZV4ZbhnMhdDqXU94Z-IFshs=.bd949215-5dd8-43a6-ac4e-348feee4853b@github.com> References: <4BgBBn_Oqhipw03h7BA7ZV4ZbhnMhdDqXU94Z-IFshs=.bd949215-5dd8-43a6-ac4e-348feee4853b@github.com> Message-ID: On Tue, 4 Jun 2024 02:11:35 GMT, Gui Cao wrote: >> Implementation of subtype checking [JDK-8180450](https://bugs.openjdk.org/browse/JDK-8180450) for linux-riscv64. >> This optimization depends on availability of the Zbb extension which has the cpop instruction. >> >> ### Correctness testing: >> >> - [x] Run tier1-3, hotspot:tier4 tests on SOPHON SG2042 (release) >> - [x] Run tier1-3 tests with -XX:+UseRVV on qemu 8.1.0 (fastdebug) >> - [x] Run all of tier1 with `-XX:+VerifySecondarySupers` >> >> ### JMH tested on Banana Pi BPI-F3 board (has Zbb) and Enable UseZbb, Not Enable UseZba, UseZbs >> Original: >> >> Benchmark Mode Cnt Score Error Units >> SecondarySuperCacheHits.test avgt 15 11.375 ? 0.071 ns/op >> SecondarySuperCacheInterContention.test avgt 15 646.087 ? 32.587 ns/op >> SecondarySuperCacheInterContention.test:t1 avgt 15 600.090 ? 83.779 ns/op >> SecondarySuperCacheInterContention.test:t2 avgt 15 692.084 ? 73.218 ns/op >> SecondarySupersLookup.testNegative00 avgt 15 16.420 ? 0.239 ns/op >> SecondarySupersLookup.testNegative01 avgt 15 18.307 ? 0.260 ns/op >> SecondarySupersLookup.testNegative02 avgt 15 21.695 ? 0.458 ns/op >> SecondarySupersLookup.testNegative03 avgt 15 24.855 ? 0.664 ns/op >> SecondarySupersLookup.testNegative04 avgt 15 27.305 ? 0.522 ns/op >> SecondarySupersLookup.testNegative05 avgt 15 29.719 ? 0.385 ns/op >> SecondarySupersLookup.testNegative06 avgt 15 32.231 ? 0.498 ns/op >> SecondarySupersLookup.testNegative07 avgt 15 33.747 ? 0.603 ns/op >> SecondarySupersLookup.testNegative08 avgt 15 35.856 ? 0.629 ns/op >> SecondarySupersLookup.testNegative09 avgt 15 37.077 ? 0.546 ns/op >> SecondarySupersLookup.testNegative10 avgt 15 39.408 ? 0.465 ns/op >> SecondarySupersLookup.testNegative16 avgt 15 51.041 ? 0.547 ns/op >> SecondarySupersLookup.testNegative20 avgt 15 58.722 ? 0.922 ns/op >> SecondarySupersLookup.testNegative30 avgt 15 77.310 ? 0.654 ns/op >> SecondarySupersLookup.testNegative32 avgt 15 81.116 ? 0.854 ns/op >> SecondarySupersLookup.testNegative40 avgt 15 96.311 ? 0.840 ns/op >> SecondarySupersLookup.testNegative50 avgt 15 115.427 ? 0.838 ns/op >> SecondarySupersLookup.testNegative55 avgt 15 124.371 ? 1.076 ns/op >> SecondarySupersLookup.testNegative56 avgt 15 126.796 ? 0.916 ns/op >> SecondarySupersLookup.testNegative57 avgt 15 127.952 ? 1.202 ns/op >> SecondarySupersLookup.testNegative58 avgt 15 131.956 ? 4.515 ns/op >> Seco... > > Gui Cao has updated the pull request incrementally with one additional commit since the last revision: > > Fix Code format Thanks for updating! With the fix, although it improves the perf for testNegative63/64, but seems it brings some regression for testNegative55-62, in this sense the fix should not be taken. I'll take another look, sorry for long waiting. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19320#issuecomment-2159035516 From jsjolen at openjdk.org Mon Jun 10 21:26:32 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Mon, 10 Jun 2024 21:26:32 GMT Subject: RFR: 8333658: NMT: Use an allocator with 4-byte pointers to save memory in NativeCallStackStorage [v10] In-Reply-To: References: Message-ID: > Hi, > > This PR introduces a new allocator, `IndexedFreelistAllocator`. It uses a `GrowableArray` in order to have 4-byte "pointers" to its elements and also works as a freelist as unused slots form a linked list. This allocator cannot shrink its used memory. I'm always open for better names. > > We then use this allocator in order to store the `NativeCallStackStorage` hash table. This saves `4 * 4099 = ~16KiB` of memory as each table element is now only 4 bytes. For each stored stack it also saves 4 bytes. The fact that the allocator cannot shrink is fine, we do not ever remove stacks from the storage. > > The main point of this is to introduce a pointer-generic experimential API, so I also implemented CHeap and Arena allocators. It's currently placed in NMT, but we might want to move it into utilities. It uses a bit of template-magic, but my IDE (Emacs+clangd) was actually able to figure things out when the types didn't line up correctly etc, so it's not an enemy to IDE help. > > It sounded expensive to have the GrowableArray continously realloc its underlying data array, so I did a basic test where we allocate 1 000 000 stacks and push them into NativeCallStackStorage backed by different allocators. This is available in the PR. > > The results are as follows on linux-x64-slowdebug: > > > Generate stacks... Done > Time taken with GrowableArray: 8341.240945 > Time taken with CHeap: 12189.031318 > Time taken with Arena: 8800.703092 > Time taken with GrowableArray again: 8295.508829 > > > And on linux-x64: > > > Time taken with GrowableArray: 8378.018135 > Time taken with CHeap: 12437.347868 > Time taken with Arena: 8758.064717 > Time taken with GrowableArray again: 8391.076291 > > > Obviously, this is a very basic benchmark, but it seems like we're faster than CHeap and Arena for this case. Johan Sj?len has updated the pull request incrementally with six additional commits since the last revision: - Use new interface in native call stack storage - Make it impossible for external users to create I()s - Make IFLA non-copyable - Clean up tests - Use initializer list - Small style ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18979/files - new: https://git.openjdk.org/jdk/pull/18979/files/9f7eaa65..91ef29e6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18979&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18979&range=08-09 Stats: 192 lines in 3 files changed: 71 ins; 71 del; 50 mod Patch: https://git.openjdk.org/jdk/pull/18979.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18979/head:pull/18979 PR: https://git.openjdk.org/jdk/pull/18979 From jsjolen at openjdk.org Mon Jun 10 21:32:51 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Mon, 10 Jun 2024 21:32:51 GMT Subject: RFR: 8333658: NMT: Use an allocator with 4-byte pointers to save memory in NativeCallStackStorage [v11] In-Reply-To: References: Message-ID: > Hi, > > This PR introduces a new allocator, `IndexedFreelistAllocator`. It uses a `GrowableArray` in order to have 4-byte "pointers" to its elements and also works as a freelist as unused slots form a linked list. This allocator cannot shrink its used memory. I'm always open for better names. > > We then use this allocator in order to store the `NativeCallStackStorage` hash table. This saves `4 * 4099 = ~16KiB` of memory as each table element is now only 4 bytes. For each stored stack it also saves 4 bytes. The fact that the allocator cannot shrink is fine, we do not ever remove stacks from the storage. > > The main point of this is to introduce a pointer-generic experimential API, so I also implemented CHeap and Arena allocators. It's currently placed in NMT, but we might want to move it into utilities. It uses a bit of template-magic, but my IDE (Emacs+clangd) was actually able to figure things out when the types didn't line up correctly etc, so it's not an enemy to IDE help. > > It sounded expensive to have the GrowableArray continously realloc its underlying data array, so I did a basic test where we allocate 1 000 000 stacks and push them into NativeCallStackStorage backed by different allocators. This is available in the PR. > > The results are as follows on linux-x64-slowdebug: > > > Generate stacks... Done > Time taken with GrowableArray: 8341.240945 > Time taken with CHeap: 12189.031318 > Time taken with Arena: 8800.703092 > Time taken with GrowableArray again: 8295.508829 > > > And on linux-x64: > > > Time taken with GrowableArray: 8378.018135 > Time taken with CHeap: 12437.347868 > Time taken with Arena: 8758.064717 > Time taken with GrowableArray again: 8391.076291 > > > Obviously, this is a very basic benchmark, but it seems like we're faster than CHeap and Arena for this case. Johan Sj?len has updated the pull request incrementally with three additional commits since the last revision: - Right, GrowableArray. - D'oh it returns a value - You can't default copy assignment operator ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18979/files - new: https://git.openjdk.org/jdk/pull/18979/files/91ef29e6..19f3b436 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18979&range=10 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18979&range=09-10 Stats: 6 lines in 1 file changed: 3 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/18979.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18979/head:pull/18979 PR: https://git.openjdk.org/jdk/pull/18979 From jsjolen at openjdk.org Mon Jun 10 21:36:13 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Mon, 10 Jun 2024 21:36:13 GMT Subject: RFR: 8333658: NMT: Use an allocator with 4-byte pointers to save memory in NativeCallStackStorage [v9] In-Reply-To: References: Message-ID: On Fri, 7 Jun 2024 12:18:42 GMT, Johan Sj?len wrote: >> Hi, >> >> This PR introduces a new allocator, `IndexedFreelistAllocator`. It uses a `GrowableArray` in order to have 4-byte "pointers" to its elements and also works as a freelist as unused slots form a linked list. This allocator cannot shrink its used memory. I'm always open for better names. >> >> We then use this allocator in order to store the `NativeCallStackStorage` hash table. This saves `4 * 4099 = ~16KiB` of memory as each table element is now only 4 bytes. For each stored stack it also saves 4 bytes. The fact that the allocator cannot shrink is fine, we do not ever remove stacks from the storage. >> >> The main point of this is to introduce a pointer-generic experimential API, so I also implemented CHeap and Arena allocators. It's currently placed in NMT, but we might want to move it into utilities. It uses a bit of template-magic, but my IDE (Emacs+clangd) was actually able to figure things out when the types didn't line up correctly etc, so it's not an enemy to IDE help. >> >> It sounded expensive to have the GrowableArray continously realloc its underlying data array, so I did a basic test where we allocate 1 000 000 stacks and push them into NativeCallStackStorage backed by different allocators. This is available in the PR. >> >> The results are as follows on linux-x64-slowdebug: >> >> >> Generate stacks... Done >> Time taken with GrowableArray: 8341.240945 >> Time taken with CHeap: 12189.031318 >> Time taken with Arena: 8800.703092 >> Time taken with GrowableArray again: 8295.508829 >> >> >> And on linux-x64: >> >> >> Time taken with GrowableArray: 8378.018135 >> Time taken with CHeap: 12437.347868 >> Time taken with Arena: 8758.064717 >> Time taken with GrowableArray again: 8391.076291 >> >> >> Obviously, this is a very basic benchmark, but it seems like we're faster than CHeap and Arena for this case. > > Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: > > Add include guards I've got some stuff left to do: 1. Clean up the tests 2. Fix the maximum bound that you suggested Btw, I figured it out: Github shows my comments twice: Once as part of the review I'm making, and once separately where they should be. I make and submit reviews because that bunches my e-mails up in one long one for the mailing lists -- I think it does so at least. ------------- PR Review: https://git.openjdk.org/jdk/pull/18979#pullrequestreview-2108772798 From jsjolen at openjdk.org Mon Jun 10 21:36:14 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Mon, 10 Jun 2024 21:36:14 GMT Subject: RFR: 8333658: NMT: Use an allocator with 4-byte pointers to save memory in NativeCallStackStorage [v9] In-Reply-To: <0wzHsav13as-hrZeVpnwSt7QUPH-iabCbTEd3xmXUW8=.bfa2e209-56cb-42a9-ac34-93cab4b64a54@github.com> References: <0wzHsav13as-hrZeVpnwSt7QUPH-iabCbTEd3xmXUW8=.bfa2e209-56cb-42a9-ac34-93cab4b64a54@github.com> Message-ID: <6fQ2LjPbq8g-MrbOnKgy2gHhZPM5mLJJE2fiwwIHf98=.589efefb-b705-42c1-918f-884aac83ad15@github.com> On Mon, 10 Jun 2024 14:14:30 GMT, Thomas Stuefe wrote: >> Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: >> >> Add include guards > > src/hotspot/share/nmt/indexedFreeListAllocator.hpp line 68: > >> 66: char e[sizeof(E)]; >> 67: >> 68: BackingElement() { > > Init list? It's on its way, I haven't pushed all of my changes yet :-). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18979#discussion_r1633849677 From jsjolen at openjdk.org Mon Jun 10 21:36:13 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Mon, 10 Jun 2024 21:36:13 GMT Subject: RFR: 8333658: NMT: Use an allocator with 4-byte pointers to save memory in NativeCallStackStorage [v11] In-Reply-To: References: Message-ID: On Mon, 10 Jun 2024 14:12:06 GMT, Thomas Stuefe wrote: >> Now we use access specifiers to deny access to the index, except for the allocator, which is a friend. We do want the index to be opaque, so we do this. On top of that, we use an `_owner` field in debug mode to assert that we don't free to the wrong allocator. > >> Now we use access specifiers to deny access to the index, except for the allocator, which is a friend. We do want the index to be opaque, so we do this. On top of that, we use an `_owner` field in debug mode to assert that we don't free to the wrong allocator. > > The last part I don't like. It causes `I` to quadruple in size in debug builds only. For something that is potentially used as member in bunch of different data structures, it can cause considerable deltas in memory layout between debug and release builds. I don't think that is a good thing. What you test in debug should be close to what's running at a customer. > > I also don't think it is particularly useful. I would instead do a simple range check on I. That gives you a part of the sanity checks without the negatives. Couple that with a sanity check wrt to slot state (don't free free slots) you are pretty well covered already. > > Other than that, I think using an own class to hide a simple i32 is unnecessary complex, but I leave that up to you. If you do this, please take care of constness. And make it work for its money, e.g. by giving it an is_nil instead of exposing nil, and requiring the user to manually compare it. Alright, I get the counterpoint re: memory usage of the slots. I can live with that, though I am a bit disappointed as I had double-free coverage in mind also :). >Other than that, I think using an own class to hide a simple i32 is unnecessary complex, but I leave that up to you. If you do this, please take care of constness. And make it work for its money, e.g. by giving it an is_nil instead of exposing nil, and requiring the user to manually compare it. Yes, I'll make `_idx` const. `nil` is also exposed so that you can create nil pointers, which is important. I wanted to make the `I(int32_t idx)` constructor private, as only the allocator should construct pointers. This turned out to not be possible due to the design of `GrowableArray`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18979#discussion_r1633841703 From jsjolen at openjdk.org Mon Jun 10 21:54:48 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Mon, 10 Jun 2024 21:54:48 GMT Subject: RFR: 8333658: NMT: Use an allocator with 4-byte pointers to save memory in NativeCallStackStorage [v12] In-Reply-To: References: Message-ID: > Hi, > > This PR introduces a new allocator, `IndexedFreelistAllocator`. It uses a `GrowableArray` in order to have 4-byte "pointers" to its elements and also works as a freelist as unused slots form a linked list. This allocator cannot shrink its used memory. I'm always open for better names. > > We then use this allocator in order to store the `NativeCallStackStorage` hash table. This saves `4 * 4099 = ~16KiB` of memory as each table element is now only 4 bytes. For each stored stack it also saves 4 bytes. The fact that the allocator cannot shrink is fine, we do not ever remove stacks from the storage. > > The main point of this is to introduce a pointer-generic experimential API, so I also implemented CHeap and Arena allocators. It's currently placed in NMT, but we might want to move it into utilities. It uses a bit of template-magic, but my IDE (Emacs+clangd) was actually able to figure things out when the types didn't line up correctly etc, so it's not an enemy to IDE help. > > It sounded expensive to have the GrowableArray continously realloc its underlying data array, so I did a basic test where we allocate 1 000 000 stacks and push them into NativeCallStackStorage backed by different allocators. This is available in the PR. > > The results are as follows on linux-x64-slowdebug: > > > Generate stacks... Done > Time taken with GrowableArray: 8341.240945 > Time taken with CHeap: 12189.031318 > Time taken with Arena: 8800.703092 > Time taken with GrowableArray again: 8295.508829 > > > And on linux-x64: > > > Time taken with GrowableArray: 8378.018135 > Time taken with CHeap: 12437.347868 > Time taken with Arena: 8758.064717 > Time taken with GrowableArray again: 8391.076291 > > > Obviously, this is a very basic benchmark, but it seems like we're faster than CHeap and Arena for this case. Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: Create a default constructor for GrowableArray ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18979/files - new: https://git.openjdk.org/jdk/pull/18979/files/19f3b436..48659b6e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18979&range=11 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18979&range=10-11 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/18979.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18979/head:pull/18979 PR: https://git.openjdk.org/jdk/pull/18979 From ccheung at openjdk.org Tue Jun 11 00:11:14 2024 From: ccheung at openjdk.org (Calvin Cheung) Date: Tue, 11 Jun 2024 00:11:14 GMT Subject: RFR: 8330198: Add some class loading related perf counters to measure VM startup [v8] In-Reply-To: References: <-pZLP33SmfOP-3djnyabR8eaNZqVem1sLJYWD7412Qc=.4c84c9ab-392e-4b03-a0c0-e8a9a679999b@github.com> Message-ID: On Mon, 10 Jun 2024 07:44:21 GMT, David Holmes wrote: >> Calvin Cheung has updated the pull request incrementally with one additional commit since the last revision: >> >> remove Arguments::perf_class_link() > > src/hotspot/share/interpreter/linkResolver.cpp line 1735: > >> 1733: >> 1734: PerfTraceTimedEvent timer(ClassLoader::perf_resolve_invokehandle_time(), >> 1735: ClassLoader::perf_resolve_invokehandle_count()); > > What does this do when the counters are not enabled? It should just return. The following code in perfData.hpp. inline PerfTraceTimedEvent(PerfLongCounter* timerp, PerfLongCounter* eventp): PerfTraceTime(timerp), _eventp(eventp) { if (!UsePerfData || timerp == nullptr) { return; } > src/hotspot/share/runtime/java.cpp line 160: > >> 158: } >> 159: >> 160: void log_vm_stats(outputStream *st) { > > I assume this generic name is because in the future it will print a lot more VM stats? Yes. > src/hotspot/share/runtime/java.cpp line 164: > >> 162: if (log.is_enabled()) { >> 163: ClassLoader::print_counters(st); >> 164: } > > Probably worth adding a comment here as to why we actually print to the passed in stream and not the log stream., given we check if the log stream is enabled. Someone could easily think this is a typo/bug. I can change `log_vm_stats` to accept a `bool` argument so that the the `st` becomes clear. void log_vm_stats(bool use_tty) { LogStreamHandle(Info, perf, class, link) log; if (log.is_enabled()) { outputStream* st = use_tty ? tty : &log; ClassLoader::print_counters(st); } } > src/hotspot/share/runtime/threads.cpp line 835: > >> 833: log.print_cr("At VM initialization completion:"); >> 834: log_vm_stats(&log); >> 835: } > > If we are going to have more types of VM stats in the future, it is not clear how you will change this if-condition? Nor what stream you would pass in. ??? The if-condition could be something like: if (log_is_enabled(Info, perf, class, link) || log_is_enabled(Info, perf, xxx, yyy) || ...) Regarding which stream to pass in, with my proposed change in `log_vm_stats` above, the current fix would look like when calling from threads.cpp: `log_vm_stats(false /* use_tty */);` when calling from java.cpp: `log_vm_stats(true /* use_tty */);` Or do you prefer not having the `log_vm_stats` function and calling `ClassLoader::print_counters` directly? If so, we don't need the compound `if` conditions in the above. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18790#discussion_r1633995004 PR Review Comment: https://git.openjdk.org/jdk/pull/18790#discussion_r1633995051 PR Review Comment: https://git.openjdk.org/jdk/pull/18790#discussion_r1633995194 PR Review Comment: https://git.openjdk.org/jdk/pull/18790#discussion_r1633995900 From jwaters at openjdk.org Tue Jun 11 00:20:16 2024 From: jwaters at openjdk.org (Julian Waters) Date: Tue, 11 Jun 2024 00:20:16 GMT Subject: RFR: 8331352: error: template-id not allowed for constructor/destructor in C++20 In-Reply-To: References: Message-ID: On Tue, 30 Apr 2024 02:01:01 GMT, Jan Kratochvil wrote: > When compiling trunk (819f3d6fc70ff6fe54ac5f9033c17c3dd4326aa5 2024-04-29) by gcc-14.0.1-0.15.fc40.x86_64 there are many errors: > > In file included from src/hotspot/share/memory/allocation.hpp:30, > from src/hotspot/share/ci/ciBaseObject.hpp:29, > from src/hotspot/share/ci/ciMetadata.hpp:28, > from src/hotspot/share/ci/ciType.hpp:28, > from src/hotspot/share/ci/ciKlass.hpp:28, > from src/hotspot/share/ci/ciArrayKlass.hpp:28, > from src/hotspot/share/ci/ciArray.hpp:28, > from src/hotspot/share/ci/compilerInterface.hpp:28, > from src/hotspot/share/compiler/abstractCompiler.hpp:28, > from src/hotspot/share/compiler/abstractCompiler.cpp:25: > src/hotspot/share/utilities/linkedlist.hpp:85:15: error: template-id not allowed for constructor in C++20 [-Werror=template-id-cdtor] > 85 | NONCOPYABLE(LinkedList); > | ^~~~~~~~~~~~~ > src/hotspot/share/utilities/globalDefinitions.hpp:87:26: note: in definition of macro ?NONCOPYABLE? > 87 | #define NONCOPYABLE(C) C(C const&) = delete; C& operator=(C const&) = delete /* next token must be ; */ > | ^ > src/hotspot/share/utilities/linkedlist.hpp:85:15: note: remove the ?< >? > 85 | NONCOPYABLE(LinkedList); > | ^~~~~~~~~~~~~ > src/hotspot/share/utilities/globalDefinitions.hpp:87:26: note: in definition of macro ?NONCOPYABLE? > 87 | #define NONCOPYABLE(C) C(C const&) = delete; C& operator=(C const&) = delete /* next token must be ; */ > | ^ > > In file included from src/hotspot/share/gc/z/zGranuleMap.inline.hpp:30, > from src/hotspot/share/gc/z/zForwardingTable.inline.hpp:32, > from src/hotspot/share/gc/z/zHeap.inline.hpp:30, > from src/hotspot/share/gc/z/zGeneration.inline.hpp:30, > from src/hotspot/share/gc/z/zBarrier.inline.hpp:30, > from src/hotspot/share/gc/z/zBarrierSet.inline.hpp:31, > from src/hotspot/share/gc/shared/barrierSetConfig.inline.hpp:44, > from src/hotspot/share/oops/access.inline.hpp:31, > from src/hotspot/share/memory/iterator.inline.hpp:32, > from src/hotspot/share/oops/oop.inline.hpp:31, > from src/hotspot/share/compiler/abstractDisassembler.cpp:32: > src/hotspot/share/gc/z/zArray.inline.hpp:99:21: error: template-id not allowed f... Ironic that I'm now facing the same issue on Windows: C:/users/vertig0/downloads/eclipse-committers-2023-12-r-win32-x86_64/workspace/jdk/src/hotspot/os/windows/symbolengine.cpp:93:67: error: template-id not allowed for constructor in C++20 [-Werror=template-id-cdtor] 93 | SimpleBufferWithFallback () | ^ C:/users/vertig0/downloads/eclipse-committers-2023-12-r-win32-x86_64/workspace/jdk/src/hotspot/os/windows/symbolengine.cpp:93:67: note: remove the '< >' cc1plus.exe: all warnings being treated as errors I was going to suppress the warning in the Makefiles, but then remembered about this Pull Request. Should I send a changeset to fix this upstream or just suppress the warning in my Windows/gcc Port? ------------- PR Comment: https://git.openjdk.org/jdk/pull/19009#issuecomment-2159536378 From fyang at openjdk.org Tue Jun 11 04:10:14 2024 From: fyang at openjdk.org (Fei Yang) Date: Tue, 11 Jun 2024 04:10:14 GMT Subject: RFR: 8332587: RISC-V: secondary_super_cache does not scale well [v3] In-Reply-To: <4BgBBn_Oqhipw03h7BA7ZV4ZbhnMhdDqXU94Z-IFshs=.bd949215-5dd8-43a6-ac4e-348feee4853b@github.com> References: <4BgBBn_Oqhipw03h7BA7ZV4ZbhnMhdDqXU94Z-IFshs=.bd949215-5dd8-43a6-ac4e-348feee4853b@github.com> Message-ID: On Tue, 4 Jun 2024 02:11:35 GMT, Gui Cao wrote: >> Implementation of subtype checking [JDK-8180450](https://bugs.openjdk.org/browse/JDK-8180450) for linux-riscv64. >> This optimization depends on availability of the Zbb extension which has the cpop instruction. >> >> ### Correctness testing: >> >> - [x] Run tier1-3, hotspot:tier4 tests on SOPHON SG2042 (release) >> - [x] Run tier1-3 tests with -XX:+UseRVV on qemu 8.1.0 (fastdebug) >> - [x] Run all of tier1 with `-XX:+VerifySecondarySupers` >> >> ### JMH tested on Banana Pi BPI-F3 board (has Zbb) and Enable UseZbb, Not Enable UseZba, UseZbs >> Original: >> >> Benchmark Mode Cnt Score Error Units >> SecondarySuperCacheHits.test avgt 15 11.375 ? 0.071 ns/op >> SecondarySuperCacheInterContention.test avgt 15 646.087 ? 32.587 ns/op >> SecondarySuperCacheInterContention.test:t1 avgt 15 600.090 ? 83.779 ns/op >> SecondarySuperCacheInterContention.test:t2 avgt 15 692.084 ? 73.218 ns/op >> SecondarySupersLookup.testNegative00 avgt 15 16.420 ? 0.239 ns/op >> SecondarySupersLookup.testNegative01 avgt 15 18.307 ? 0.260 ns/op >> SecondarySupersLookup.testNegative02 avgt 15 21.695 ? 0.458 ns/op >> SecondarySupersLookup.testNegative03 avgt 15 24.855 ? 0.664 ns/op >> SecondarySupersLookup.testNegative04 avgt 15 27.305 ? 0.522 ns/op >> SecondarySupersLookup.testNegative05 avgt 15 29.719 ? 0.385 ns/op >> SecondarySupersLookup.testNegative06 avgt 15 32.231 ? 0.498 ns/op >> SecondarySupersLookup.testNegative07 avgt 15 33.747 ? 0.603 ns/op >> SecondarySupersLookup.testNegative08 avgt 15 35.856 ? 0.629 ns/op >> SecondarySupersLookup.testNegative09 avgt 15 37.077 ? 0.546 ns/op >> SecondarySupersLookup.testNegative10 avgt 15 39.408 ? 0.465 ns/op >> SecondarySupersLookup.testNegative16 avgt 15 51.041 ? 0.547 ns/op >> SecondarySupersLookup.testNegative20 avgt 15 58.722 ? 0.922 ns/op >> SecondarySupersLookup.testNegative30 avgt 15 77.310 ? 0.654 ns/op >> SecondarySupersLookup.testNegative32 avgt 15 81.116 ? 0.854 ns/op >> SecondarySupersLookup.testNegative40 avgt 15 96.311 ? 0.840 ns/op >> SecondarySupersLookup.testNegative50 avgt 15 115.427 ? 0.838 ns/op >> SecondarySupersLookup.testNegative55 avgt 15 124.371 ? 1.076 ns/op >> SecondarySupersLookup.testNegative56 avgt 15 126.796 ? 0.916 ns/op >> SecondarySupersLookup.testNegative57 avgt 15 127.952 ? 1.202 ns/op >> SecondarySupersLookup.testNegative58 avgt 15 131.956 ? 4.515 ns/op >> Seco... > > Gui Cao has updated the pull request incrementally with one additional commit since the last revision: > > Fix Code format src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 3486: > 3484: // Both inline code and stub use specific registers and may jump from inline code/stub to stub, > 3485: // Ensure that the inline code and the stub are using the same registers. > 3486: // And which need to be declared registers in the C2-related instruct first. Maybe simply as: // Ensure that the inline code and the stub are using the same registers. // as we need to call the stub from inline code when there is a collision // in the hashed lookup in the secondary supers array. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19320#discussion_r1634141569 From kbarrett at openjdk.org Tue Jun 11 04:38:18 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Tue, 11 Jun 2024 04:38:18 GMT Subject: RFR: 8331352: error: template-id not allowed for constructor/destructor in C++20 In-Reply-To: References: Message-ID: On Tue, 11 Jun 2024 00:17:29 GMT, Julian Waters wrote: > Ironic that I'm now facing the same issue on Windows: > > ``` > C:/users/vertig0/downloads/eclipse-committers-2023-12-r-win32-x86_64/workspace/jdk/src/hotspot/os/windows/symbolengine.cpp:93:67: error: template-id not allowed for constructor in C++20 [-Werror=template-id-cdtor] > 93 | SimpleBufferWithFallback () > | ^ > C:/users/vertig0/downloads/eclipse-committers-2023-12-r-win32-x86_64/workspace/jdk/src/hotspot/os/windows/symbolengine.cpp:93:67: note: remove the '< >' > cc1plus.exe: all warnings being treated as errors > ``` > > I was going to suppress the warning in the Makefiles, but then remembered about this Pull Request. Should I send a changeset to fix this upstream or just suppress the warning in my Windows/gcc Port? I'm okay with having this fixed in openjdk. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19009#issuecomment-2159775466 From dholmes at openjdk.org Tue Jun 11 05:41:12 2024 From: dholmes at openjdk.org (David Holmes) Date: Tue, 11 Jun 2024 05:41:12 GMT Subject: RFR: 8333133: Simplify QuickSort::sort In-Reply-To: References: Message-ID: On Wed, 29 May 2024 18:52:03 GMT, Kim Barrett wrote: > The "idempotent" argument is removed from that function, with associated > simplifications to the implementation. Callers are updated to remove that > argument. Callers that were providing a false value are unaffected in their > behavior. The 3 callers that were providing a true value to request the > associated feature are also unaffected (other than by being made faster), > because the arrays involved don't contain any equivalent pairs. > > There are also some miscellaneous cleanups, including using the swap utility > and fixing some comments. > > Testing: mach5 tier1-3 So .... IIUC the only code that would be affected by this change would be code that passes true, which could also have equivalent elements to sort, and which requires the sort order to always be the same regardless of the order the elements are found. I think only the archive related code cares about deterministic order, and package and module names should be unique, so this seems fine. One pre-existing nit in a comment but otherwise looks good. Thanks src/hotspot/share/utilities/quickSort.hpp line 43: > 41: // We swap these three values into the right place in the array. This > 42: // means that this method not only returns the index of the pivot > 43: // element. It also alters the array so that: Pre-existing nit: this should be one sentence: "... element, it also ..." ------------- Marked as reviewed by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19464#pullrequestreview-2109357549 PR Review Comment: https://git.openjdk.org/jdk/pull/19464#discussion_r1634213493 From amitkumar at openjdk.org Tue Jun 11 05:41:47 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Tue, 11 Jun 2024 05:41:47 GMT Subject: RFR: 8333382: [s390x] Enhance popcnt Instruction to use Z15 facilities [v6] In-Reply-To: References: Message-ID: > We need to move popcnt instruction implementation out of s390.ad file as it is required to be required some methods present in [JDK-8331126.](https://bugs.openjdk.org/browse/JDK-8331126) > > > When the miscellaneous-instruction-extensions facility 3 is not installed or bit 0 of the M3 > field is zero, a count of the number of one bits in each of the eight bytes of general register > R2 is placed into the corresponding byte of general register R1. Each byte of general register > R1 is an 8-bit binary integer in the range of 0-8. > > > > When the miscellaneous-instruction-extensions facility 3 is installed and bit 0 of the M3 field > is one, a count of the total number of one bits in the 64-bit general register R2 is placed into > general register R1. The result is a 64-bit unsigned integer in the range 0 to 64. > > > Performed tier1 test on fastdebug build and didn't see any regression. Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: address suggestion from Andrew ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19509/files - new: https://git.openjdk.org/jdk/pull/19509/files/b9acd6d8..da124d8a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19509&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19509&range=04-05 Stats: 101 lines in 4 files changed: 70 ins; 12 del; 19 mod Patch: https://git.openjdk.org/jdk/pull/19509.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19509/head:pull/19509 PR: https://git.openjdk.org/jdk/pull/19509 From gcao at openjdk.org Tue Jun 11 06:04:56 2024 From: gcao at openjdk.org (Gui Cao) Date: Tue, 11 Jun 2024 06:04:56 GMT Subject: RFR: 8332587: RISC-V: secondary_super_cache does not scale well [v4] In-Reply-To: References: Message-ID: > Implementation of subtype checking [JDK-8180450](https://bugs.openjdk.org/browse/JDK-8180450) for linux-riscv64. > This optimization depends on availability of the Zbb extension which has the cpop instruction. > > ### Correctness testing: > > - [x] Run tier1-3, hotspot:tier4 tests on SOPHON SG2042 (release) > - [x] Run tier1-3 tests with -XX:+UseRVV on qemu 8.1.0 (fastdebug) > - [x] Run all of tier1 with `-XX:+VerifySecondarySupers` > > ### JMH tested on Banana Pi BPI-F3 board (has Zbb) and Enable UseZbb, Not Enable UseZba, UseZbs > Original: > > Benchmark Mode Cnt Score Error Units > SecondarySuperCacheHits.test avgt 15 11.375 ? 0.071 ns/op > SecondarySuperCacheInterContention.test avgt 15 646.087 ? 32.587 ns/op > SecondarySuperCacheInterContention.test:t1 avgt 15 600.090 ? 83.779 ns/op > SecondarySuperCacheInterContention.test:t2 avgt 15 692.084 ? 73.218 ns/op > SecondarySupersLookup.testNegative00 avgt 15 16.420 ? 0.239 ns/op > SecondarySupersLookup.testNegative01 avgt 15 18.307 ? 0.260 ns/op > SecondarySupersLookup.testNegative02 avgt 15 21.695 ? 0.458 ns/op > SecondarySupersLookup.testNegative03 avgt 15 24.855 ? 0.664 ns/op > SecondarySupersLookup.testNegative04 avgt 15 27.305 ? 0.522 ns/op > SecondarySupersLookup.testNegative05 avgt 15 29.719 ? 0.385 ns/op > SecondarySupersLookup.testNegative06 avgt 15 32.231 ? 0.498 ns/op > SecondarySupersLookup.testNegative07 avgt 15 33.747 ? 0.603 ns/op > SecondarySupersLookup.testNegative08 avgt 15 35.856 ? 0.629 ns/op > SecondarySupersLookup.testNegative09 avgt 15 37.077 ? 0.546 ns/op > SecondarySupersLookup.testNegative10 avgt 15 39.408 ? 0.465 ns/op > SecondarySupersLookup.testNegative16 avgt 15 51.041 ? 0.547 ns/op > SecondarySupersLookup.testNegative20 avgt 15 58.722 ? 0.922 ns/op > SecondarySupersLookup.testNegative30 avgt 15 77.310 ? 0.654 ns/op > SecondarySupersLookup.testNegative32 avgt 15 81.116 ? 0.854 ns/op > SecondarySupersLookup.testNegative40 avgt 15 96.311 ? 0.840 ns/op > SecondarySupersLookup.testNegative50 avgt 15 115.427 ? 0.838 ns/op > SecondarySupersLookup.testNegative55 avgt 15 124.371 ? 1.076 ns/op > SecondarySupersLookup.testNegative56 avgt 15 126.796 ? 0.916 ns/op > SecondarySupersLookup.testNegative57 avgt 15 127.952 ? 1.202 ns/op > SecondarySupersLookup.testNegative58 avgt 15 131.956 ? 4.515 ns/op > SecondarySupersLookup.testNegative59 avgt 15 131.858 ? 1.066 ns/op > SecondaryS... Gui Cao has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains ten additional commits since the last revision: - Merge remote-tracking branch 'upstream/master' into JDK-8332587 - Polish Code Comment - Merge remote-tracking branch 'upstream/master' into JDK-8332587 - Fix Code format - Fix for Hamlin comment - Merge remote-tracking branch 'upstream/master' into JDK-8332587 - Fix client VM build - Merge remote-tracking branch 'upstream/master' into JDK-8332587 - 8332587: RISC-V: secondary_super_cache does not scale well ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19320/files - new: https://git.openjdk.org/jdk/pull/19320/files/0c7c9f59..d87b2aff Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19320&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19320&range=02-03 Stats: 34412 lines in 654 files changed: 20431 ins; 11672 del; 2309 mod Patch: https://git.openjdk.org/jdk/pull/19320.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19320/head:pull/19320 PR: https://git.openjdk.org/jdk/pull/19320 From duke at openjdk.org Tue Jun 11 06:18:45 2024 From: duke at openjdk.org (Liming Liu) Date: Tue, 11 Jun 2024 06:18:45 GMT Subject: RFR: 8324781: runtime/Thread/TestAlwaysPreTouchStacks.java failed with Expected a higher ratio between stack committed and reserved [v11] In-Reply-To: References: Message-ID: > The testcase failed on Oracle CI since JDK-8315923. The root cause is that Oracle CI runs Linux-5.4.17-UEK where the value of MADV_POPULATE_WRITE (23) is used as MADV_DONTEXEC which is not supported by upstream. This PR solves the testcase failure by checking versions of kernels first, and checking the availability of MADV_POPULATE_WRITE when they are not older than 5.14. Liming Liu has updated the pull request incrementally with three additional commits since the last revision: - Remove it from the problem list - Not to use MADV_POPULATE_WRITE on the tests - Revert changes in src for the two reasons: - UEK has fixed the compatibility with upstream kernels; - Pretouch behaviors will be covered by other tests. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18592/files - new: https://git.openjdk.org/jdk/pull/18592/files/cb2adb8d..5bbeaee9 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18592&range=10 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18592&range=09-10 Stats: 126 lines in 4 files changed: 20 ins; 94 del; 12 mod Patch: https://git.openjdk.org/jdk/pull/18592.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18592/head:pull/18592 PR: https://git.openjdk.org/jdk/pull/18592 From gcao at openjdk.org Tue Jun 11 07:19:41 2024 From: gcao at openjdk.org (Gui Cao) Date: Tue, 11 Jun 2024 07:19:41 GMT Subject: RFR: 8332587: RISC-V: secondary_super_cache does not scale well [v5] In-Reply-To: References: Message-ID: > Implementation of subtype checking [JDK-8180450](https://bugs.openjdk.org/browse/JDK-8180450) for linux-riscv64. > This optimization depends on availability of the Zbb extension which has the cpop instruction. > > ### Correctness testing: > > - [x] Run tier1-3, hotspot:tier4 tests on SOPHON SG2042 (release) > - [x] Run tier1-3 tests with -XX:+UseRVV on qemu 8.1.0 (fastdebug) > - [x] Run all of tier1 with `-XX:+VerifySecondarySupers` > > ### JMH tested on Banana Pi BPI-F3 board (has Zbb) and Enable UseZbb, Not Enable UseZba, UseZbs > Original: > > Benchmark Mode Cnt Score Error Units > SecondarySuperCacheHits.test avgt 15 11.375 ? 0.071 ns/op > SecondarySuperCacheInterContention.test avgt 15 646.087 ? 32.587 ns/op > SecondarySuperCacheInterContention.test:t1 avgt 15 600.090 ? 83.779 ns/op > SecondarySuperCacheInterContention.test:t2 avgt 15 692.084 ? 73.218 ns/op > SecondarySupersLookup.testNegative00 avgt 15 16.420 ? 0.239 ns/op > SecondarySupersLookup.testNegative01 avgt 15 18.307 ? 0.260 ns/op > SecondarySupersLookup.testNegative02 avgt 15 21.695 ? 0.458 ns/op > SecondarySupersLookup.testNegative03 avgt 15 24.855 ? 0.664 ns/op > SecondarySupersLookup.testNegative04 avgt 15 27.305 ? 0.522 ns/op > SecondarySupersLookup.testNegative05 avgt 15 29.719 ? 0.385 ns/op > SecondarySupersLookup.testNegative06 avgt 15 32.231 ? 0.498 ns/op > SecondarySupersLookup.testNegative07 avgt 15 33.747 ? 0.603 ns/op > SecondarySupersLookup.testNegative08 avgt 15 35.856 ? 0.629 ns/op > SecondarySupersLookup.testNegative09 avgt 15 37.077 ? 0.546 ns/op > SecondarySupersLookup.testNegative10 avgt 15 39.408 ? 0.465 ns/op > SecondarySupersLookup.testNegative16 avgt 15 51.041 ? 0.547 ns/op > SecondarySupersLookup.testNegative20 avgt 15 58.722 ? 0.922 ns/op > SecondarySupersLookup.testNegative30 avgt 15 77.310 ? 0.654 ns/op > SecondarySupersLookup.testNegative32 avgt 15 81.116 ? 0.854 ns/op > SecondarySupersLookup.testNegative40 avgt 15 96.311 ? 0.840 ns/op > SecondarySupersLookup.testNegative50 avgt 15 115.427 ? 0.838 ns/op > SecondarySupersLookup.testNegative55 avgt 15 124.371 ? 1.076 ns/op > SecondarySupersLookup.testNegative56 avgt 15 126.796 ? 0.916 ns/op > SecondarySupersLookup.testNegative57 avgt 15 127.952 ? 1.202 ns/op > SecondarySupersLookup.testNegative58 avgt 15 131.956 ? 4.515 ns/op > SecondarySupersLookup.testNegative59 avgt 15 131.858 ? 1.066 ns/op > SecondaryS... Gui Cao has updated the pull request incrementally with one additional commit since the last revision: Code Format ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19320/files - new: https://git.openjdk.org/jdk/pull/19320/files/d87b2aff..e3a53408 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19320&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19320&range=03-04 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/19320.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19320/head:pull/19320 PR: https://git.openjdk.org/jdk/pull/19320 From duke at openjdk.org Tue Jun 11 07:21:43 2024 From: duke at openjdk.org (Liming Liu) Date: Tue, 11 Jun 2024 07:21:43 GMT Subject: RFR: 8324781: runtime/Thread/TestAlwaysPreTouchStacks.java failed with Expected a higher ratio between stack committed and reserved [v12] In-Reply-To: References: Message-ID: > The testcase failed on Oracle CI since JDK-8315923. The root cause is that Oracle CI runs Linux-5.4.17-UEK where the value of MADV_POPULATE_WRITE (23) is used as MADV_DONTEXEC which is not supported by upstream. This PR solves the testcase failure by checking versions of kernels first, and checking the availability of MADV_POPULATE_WRITE when they are not older than 5.14. Liming Liu has updated the pull request incrementally with one additional commit since the last revision: Fix variable names ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18592/files - new: https://git.openjdk.org/jdk/pull/18592/files/5bbeaee9..11167d87 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18592&range=11 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18592&range=10-11 Stats: 5 lines in 1 file changed: 0 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/18592.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18592/head:pull/18592 PR: https://git.openjdk.org/jdk/pull/18592 From amitkumar at openjdk.org Tue Jun 11 07:57:43 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Tue, 11 Jun 2024 07:57:43 GMT Subject: RFR: 8333382: [s390x] Enhance popcnt Instruction to use Z15 facilities [v7] In-Reply-To: References: Message-ID: > We need to move popcnt instruction implementation out of s390.ad file as it is required to be required some methods present in [JDK-8331126.](https://bugs.openjdk.org/browse/JDK-8331126) > > > When the miscellaneous-instruction-extensions facility 3 is not installed or bit 0 of the M3 > field is zero, a count of the number of one bits in each of the eight bytes of general register > R2 is placed into the corresponding byte of general register R1. Each byte of general register > R1 is an 8-bit binary integer in the range of 0-8. > > > > When the miscellaneous-instruction-extensions facility 3 is installed and bit 0 of the M3 field > is one, a count of the total number of one bits in the 64-bit general register R2 is placed into > general register R1. The result is a 64-bit unsigned integer in the range 0 to 64. > > > Performed tier1 test on fastdebug build and didn't see any regression. Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: corrects the regsiter for z_popcnt in pop_count_long_post_z15 method ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19509/files - new: https://git.openjdk.org/jdk/pull/19509/files/da124d8a..ebca276e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19509&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19509&range=05-06 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/19509.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19509/head:pull/19509 PR: https://git.openjdk.org/jdk/pull/19509 From amitkumar at openjdk.org Tue Jun 11 07:57:43 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Tue, 11 Jun 2024 07:57:43 GMT Subject: RFR: 8333382: [s390x] Enhance popcnt Instruction to use Z15 facilities [v6] In-Reply-To: References: Message-ID: On Tue, 11 Jun 2024 05:41:47 GMT, Amit Kumar wrote: >> We need to move popcnt instruction implementation out of s390.ad file as it is required to be required some methods present in [JDK-8331126.](https://bugs.openjdk.org/browse/JDK-8331126) >> >> >> When the miscellaneous-instruction-extensions facility 3 is not installed or bit 0 of the M3 >> field is zero, a count of the number of one bits in each of the eight bytes of general register >> R2 is placed into the corresponding byte of general register R1. Each byte of general register >> R1 is an 8-bit binary integer in the range of 0-8. >> >> >> >> When the miscellaneous-instruction-extensions facility 3 is installed and bit 0 of the M3 field >> is one, a count of the total number of one bits in the 64-bit general register R2 is placed into >> general register R1. The result is a 64-bit unsigned integer in the range 0 to 64. >> >> >> Performed tier1 test on fastdebug build and didn't see any regression. > > Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: > > address suggestion from Andrew I got some test failure after last commit, working on them. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19509#issuecomment-2159910657 From amitkumar at openjdk.org Tue Jun 11 07:57:43 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Tue, 11 Jun 2024 07:57:43 GMT Subject: RFR: 8333382: [s390x] Enhance popcnt Instruction to use Z15 facilities [v4] In-Reply-To: References: <8OgV6y1wcSkIFDNt5y7QdF53Gy3Hj4SAbv3src0K14g=.d26770d5-1b72-4104-9d84-a2aa260fb69f@github.com> Message-ID: On Mon, 10 Jun 2024 16:14:38 GMT, Andrew Haley wrote: >> Okay I'm fine with your suggestion. But I only hope that nobody directly uses `z_popcnt` instruction. Otherwise this solution is not gonna prevent the bug you described. Maybe I should put a comment in `assembler_s390.hpp` file as well, to have a look at the helper method present in `macroAssembler_s390.hpp` file instead of using it vanilla. > > There's no need to hope, make `z_popcnt` protected. done; please have a look at new commits; ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19509#discussion_r1634378072 From ayang at openjdk.org Tue Jun 11 08:24:39 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Tue, 11 Jun 2024 08:24:39 GMT Subject: RFR: 8333962: Obsolete OldSize Message-ID: Obsolete OldSize and related code. An internal variable `OldSize` is kept to capture the capacity of old-gen size. ------------- Commit messages: - obsolete-old-size Changes: https://git.openjdk.org/jdk/pull/19647/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19647&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8333962 Stats: 193 lines in 15 files changed: 8 ins; 168 del; 17 mod Patch: https://git.openjdk.org/jdk/pull/19647.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19647/head:pull/19647 PR: https://git.openjdk.org/jdk/pull/19647 From jsjolen at openjdk.org Tue Jun 11 08:34:58 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Tue, 11 Jun 2024 08:34:58 GMT Subject: RFR: 8333658: NMT: Use an allocator with 4-byte pointers to save memory in NativeCallStackStorage [v13] In-Reply-To: References: Message-ID: > Hi, > > This PR introduces a new allocator, `IndexedFreelistAllocator`. It uses a `GrowableArray` in order to have 4-byte "pointers" to its elements and also works as a freelist as unused slots form a linked list. This allocator cannot shrink its used memory. I'm always open for better names. > > We then use this allocator in order to store the `NativeCallStackStorage` hash table. This saves `4 * 4099 = ~16KiB` of memory as each table element is now only 4 bytes. For each stored stack it also saves 4 bytes. The fact that the allocator cannot shrink is fine, we do not ever remove stacks from the storage. > > The main point of this is to introduce a pointer-generic experimential API, so I also implemented CHeap and Arena allocators. It's currently placed in NMT, but we might want to move it into utilities. It uses a bit of template-magic, but my IDE (Emacs+clangd) was actually able to figure things out when the types didn't line up correctly etc, so it's not an enemy to IDE help. > > It sounded expensive to have the GrowableArray continously realloc its underlying data array, so I did a basic test where we allocate 1 000 000 stacks and push them into NativeCallStackStorage backed by different allocators. This is available in the PR. > > The results are as follows on linux-x64-slowdebug: > > > Generate stacks... Done > Time taken with GrowableArray: 8341.240945 > Time taken with CHeap: 12189.031318 > Time taken with Arena: 8800.703092 > Time taken with GrowableArray again: 8295.508829 > > > And on linux-x64: > > > Time taken with GrowableArray: 8378.018135 > Time taken with CHeap: 12437.347868 > Time taken with Arena: 8758.064717 > Time taken with GrowableArray again: 8391.076291 > > > Obviously, this is a very basic benchmark, but it seems like we're faster than CHeap and Arena for this case. Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: Give up on the fancy constructors, revert to implicitly defined ones ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18979/files - new: https://git.openjdk.org/jdk/pull/18979/files/48659b6e..ede76aea Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18979&range=12 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18979&range=11-12 Stats: 12 lines in 1 file changed: 0 ins; 12 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/18979.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18979/head:pull/18979 PR: https://git.openjdk.org/jdk/pull/18979 From aph at openjdk.org Tue Jun 11 09:18:17 2024 From: aph at openjdk.org (Andrew Haley) Date: Tue, 11 Jun 2024 09:18:17 GMT Subject: RFR: 8333382: [s390x] Enhance popcnt Instruction to use Z15 facilities [v7] In-Reply-To: References: Message-ID: On Tue, 11 Jun 2024 07:57:43 GMT, Amit Kumar wrote: >> We need to move popcnt instruction implementation out of s390.ad file as it is required to be required some methods present in [JDK-8331126.](https://bugs.openjdk.org/browse/JDK-8331126) >> >> >> When the miscellaneous-instruction-extensions facility 3 is not installed or bit 0 of the M3 >> field is zero, a count of the number of one bits in each of the eight bytes of general register >> R2 is placed into the corresponding byte of general register R1. Each byte of general register >> R1 is an 8-bit binary integer in the range of 0-8. >> >> >> >> When the miscellaneous-instruction-extensions facility 3 is installed and bit 0 of the M3 field >> is one, a count of the total number of one bits in the 64-bit general register R2 is placed into >> general register R1. The result is a 64-bit unsigned integer in the range 0 to 64. >> >> >> Performed tier1 test on fastdebug build and didn't see any regression. > > Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: > > corrects the regsiter for z_popcnt in pop_count_long_post_z15 method src/hotspot/cpu/s390/macroAssembler_s390.cpp line 5871: > 5869: } > 5870: > 5871: void MacroAssembler::pop_count_long_post_z15(Register r_dst, Register r_src) { I know the name was my suggestion, but perhaps `pop_count_long_ext3` and `pop_count_long_pre_ext3` would be better. src/hotspot/cpu/s390/macroAssembler_s390.cpp line 5877: > 5875: z_popcnt(r_dst, r_src, 8); > 5876: } else { > 5877: stop("this hardware doesn't support miscellaneous-instruction-extensions facility 3, still pop_count_long_post_z15 is used"); Use a `guarantee()` here instead of the if/then/else blocks. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19509#discussion_r1634498158 PR Review Comment: https://git.openjdk.org/jdk/pull/19509#discussion_r1634494732 From amitkumar at openjdk.org Tue Jun 11 10:06:17 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Tue, 11 Jun 2024 10:06:17 GMT Subject: RFR: 8333382: [s390x] Enhance popcnt Instruction to use Z15 facilities [v7] In-Reply-To: References: Message-ID: On Tue, 11 Jun 2024 09:15:41 GMT, Andrew Haley wrote: >> Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: >> >> corrects the regsiter for z_popcnt in pop_count_long_post_z15 method > > src/hotspot/cpu/s390/macroAssembler_s390.cpp line 5871: > >> 5869: } >> 5870: >> 5871: void MacroAssembler::pop_count_long_post_z15(Register r_dst, Register r_src) { > > I know the name was my suggestion, but perhaps `pop_count_long_ext3` and `pop_count_long_pre_ext3` would be better. what about using with and without like this ?pop_count_with_ext3? ? But I still think that current naming is also not that bad?. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19509#discussion_r1634575049 From amitkumar at openjdk.org Tue Jun 11 10:39:30 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Tue, 11 Jun 2024 10:39:30 GMT Subject: RFR: 8333382: [s390x] Enhance popcnt Instruction to use Z15 facilities [v8] In-Reply-To: References: Message-ID: <2H2TRZYhF6WWmQMAyWdop_kH7mg6rcMbs9_mcwvpb_U=.189c0876-47e0-444a-894e-e2c8d61e5417@github.com> > We need to move popcnt instruction implementation out of s390.ad file as it is required to be required some methods present in [JDK-8331126.](https://bugs.openjdk.org/browse/JDK-8331126) > > > When the miscellaneous-instruction-extensions facility 3 is not installed or bit 0 of the M3 > field is zero, a count of the number of one bits in each of the eight bytes of general register > R2 is placed into the corresponding byte of general register R1. Each byte of general register > R1 is an 8-bit binary integer in the range of 0-8. > > > > When the miscellaneous-instruction-extensions facility 3 is installed and bit 0 of the M3 field > is one, a count of the total number of one bits in the 64-bit general register R2 is placed into > general register R1. The result is a 64-bit unsigned integer in the range 0 to 64. > > > Performed tier1 test on fastdebug build and didn't see any regression. Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: removes if and uses guarantee ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19509/files - new: https://git.openjdk.org/jdk/pull/19509/files/ebca276e..712b9b45 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19509&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19509&range=06-07 Stats: 11 lines in 1 file changed: 0 ins; 4 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/19509.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19509/head:pull/19509 PR: https://git.openjdk.org/jdk/pull/19509 From rehn at openjdk.org Tue Jun 11 11:17:23 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Tue, 11 Jun 2024 11:17:23 GMT Subject: RFR: 8332689: RISC-V: Use load instead of trampolines [v10] In-Reply-To: References: Message-ID: > Hi all, please consider! > > Today we do JAL to **dest** if **dest** is in reach (+/- 1 MB). > Using a very small application or running very short time we have fast patchable calls. > But any normal application running longer will increase the code size and code chrun/fragmentation. > So whatever or not you get hot fast calls rely on luck. > > To be patchable and get code cache reach we also emit a stub trampoline which we can point the JAL to. > This would be the common case for a patchable call. > > Code stream: > JAL > Stubs: > AUIPC > LD > JALR > > > > On some CPUs L1D and L1I can't contain the same cache line, which means the tramopline stub can bounce from L1I->L1D->L1I, which is expensive. > Even if you don't have that problem having a call to a jump is not the fastest way. > Loading the address avoids the pitsfalls of cmodx. > > This patch suggest to solve the problems with trampolines, we take small penalty in the naive case of JAL to **dest**, > and instead do by default: > > Code stream: > AUIPC > LD > JALR > Stubs: > > > An experimental option for turning trampolines back on exists. > > It should be possible to enhanced this with the WIP [Zjid](https://github.com/riscv/riscv-j-extension) by changing the JALR to JAL and nop out the auipc+ld (as the current proposal of Zjid forces the I-fetcher to fetch instruction in order (meaning we will avoid a lot issues which arm has)) when in reach and vice-versa. > > Numbers from VF2 (I have done them a few times, they are always overall in favor of this patch): > > fop (msec) 2239 | 2128 = 0.950424 > h2 (msec) 18660 | 16594 = 0.889282 > jython (msec) 22022 | 21925 = 0.995595 > luindex (msec) 2866 | 2842 = 0.991626 > lusearch (msec) 4108 | 4311 = 1.04942 > lusearch-fix (msec) 4406 | 4116 = 0.934181 > pmd (msec) 5976 | 5897 = 0.98678 > jython (msec) 22022 | 21925 = 0.995595 > Avg: 0.974112 > fop(xcomp) (msec) 2721 | 2714 = 0.997427 > h2(xcomp) (msec) 37719 | 38004 = 1.00756 > jython(xcomp) ... Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 15 commits: - Merge branch 'master' into 8332689 - Merge branch 'master' into 8332689 - Merge branch 'master' into 8332689 - Remove tmp file - Prepare for dynamic NativeCall size - Only allow one calling convetion, i.e. fixed sized - Merge branch 'master' into 8332689 - Review comments - Move shart/far code to cpp - Cleanup - ... and 5 more: https://git.openjdk.org/jdk/compare/93f3918e...eb30360a ------------- Changes: https://git.openjdk.org/jdk/pull/19453/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19453&range=09 Stats: 908 lines in 16 files changed: 652 ins; 162 del; 94 mod Patch: https://git.openjdk.org/jdk/pull/19453.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19453/head:pull/19453 PR: https://git.openjdk.org/jdk/pull/19453 From rehn at openjdk.org Tue Jun 11 11:22:13 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Tue, 11 Jun 2024 11:22:13 GMT Subject: RFR: 8332689: RISC-V: Use load instead of trampolines [v8] In-Reply-To: References: <4TVE4G5mkc7U0LI7pj_rd6jbG0vGyFTXWhwX86wA4tg=.3f823c2e-136f-431f-8e06-d61c652f36d2@github.com> <_MuwK3i7Ru8buBWkrto34BYlmlfDFPKDBwsp3Vaq2S8=.93e16dca-260a-4e4d-b414-fc66412bd5c7@github.com> Message-ID: On Mon, 10 Jun 2024 05:57:04 GMT, Robbin Ehn wrote: > > > _Mailing list message from [Andrew Haley](mailto:aph-open at littlepinkcloud.com) on [hotspot-dev](mailto:hotspot-dev at mail.openjdk.org):_ > > > On 5/29/24 15:28, Robbin Ehn wrote: > > > > On some CPUs L1D and L1I can't contain the same cache line, which means > > > > the tramopline stub can bounce from L1I->L1D->L1I, which is > > > > expensive. > > > > > > > > > Wouldn't it be a lot easier simply to put the target address loaded by the trampoline into the constant pool? > > > > > > Seem to me that will be more cleaner than the current solution (`MacroAssembler::emit_address_stub` which uses `trampoline_stub_Relocation::spec` relocation holder but emits an 'address stub' instead of a real trampline). And I see PPC is putting the entry point as a constant into the constant pool [1] when emitting a call with trampoline stub. > > [1] [MacroAssembler::emit_address_stub](https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/ppc/ppc.ad#L1308) > > This was just a bit easier as I have both cases. I'll look into cp. > > Thanks ppc version is not possible as CP offset is stored in instruction stream. We need to use runtime_call_w_cp_type, as s390, to keep track of the offset when CP moves around during growth/relocation. It needs a bit work. What you think? ------------- PR Comment: https://git.openjdk.org/jdk/pull/19453#issuecomment-2160492753 From aph at openjdk.org Tue Jun 11 11:44:15 2024 From: aph at openjdk.org (Andrew Haley) Date: Tue, 11 Jun 2024 11:44:15 GMT Subject: RFR: 8333382: [s390x] Enhance popcnt Instruction to use Z15 facilities [v7] In-Reply-To: References: Message-ID: On Tue, 11 Jun 2024 10:03:46 GMT, Amit Kumar wrote: >> src/hotspot/cpu/s390/macroAssembler_s390.cpp line 5871: >> >>> 5869: } >>> 5870: >>> 5871: void MacroAssembler::pop_count_long_post_z15(Register r_dst, Register r_src) { >> >> I know the name was my suggestion, but perhaps `pop_count_long_ext3` and `pop_count_long_pre_ext3` would be better. > > what about using with and without like this ?pop_count_with_ext3? ? > > But I still think that current naming is also not that bad?. ?pop_count_with_ext3? is fine. Strictly speaking, "post" means "after", so is a bit confusing. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19509#discussion_r1634711365 From aph at openjdk.org Tue Jun 11 11:49:17 2024 From: aph at openjdk.org (Andrew Haley) Date: Tue, 11 Jun 2024 11:49:17 GMT Subject: RFR: 8333382: [s390x] Enhance popcnt Instruction to use Z15 facilities [v8] In-Reply-To: <2H2TRZYhF6WWmQMAyWdop_kH7mg6rcMbs9_mcwvpb_U=.189c0876-47e0-444a-894e-e2c8d61e5417@github.com> References: <2H2TRZYhF6WWmQMAyWdop_kH7mg6rcMbs9_mcwvpb_U=.189c0876-47e0-444a-894e-e2c8d61e5417@github.com> Message-ID: On Tue, 11 Jun 2024 10:39:30 GMT, Amit Kumar wrote: >> We need to move popcnt instruction implementation out of s390.ad file as it is required to be required some methods present in [JDK-8331126.](https://bugs.openjdk.org/browse/JDK-8331126) >> >> >> When the miscellaneous-instruction-extensions facility 3 is not installed or bit 0 of the M3 >> field is zero, a count of the number of one bits in each of the eight bytes of general register >> R2 is placed into the corresponding byte of general register R1. Each byte of general register >> R1 is an 8-bit binary integer in the range of 0-8. >> >> >> >> When the miscellaneous-instruction-extensions facility 3 is installed and bit 0 of the M3 field >> is one, a count of the total number of one bits in the 64-bit general register R2 is placed into >> general register R1. The result is a 64-bit unsigned integer in the range 0 to 64. >> >> >> Performed tier1 test on fastdebug build and didn't see any regression. > > Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: > > removes if and uses guarantee src/hotspot/cpu/s390/macroAssembler_s390.hpp line 1032: > 1030: > 1031: // Up for an adventure ? use these instructions :) > 1032: // Should be only used when you're sure that instruction will "only" run on hardware older than z15 Suggestion: // For legacy (pre-z15) use, but will work on all supported s390 implementations. src/hotspot/cpu/s390/macroAssembler_s390.hpp line 1037: > 1035: > 1036: // Should be used in a case, where you're sure that instruction will never touch a hardware older than z15 > 1037: // it will only run on either a z15 machine or successor of it Suggestion: // Only for use on z15 or later s390 implementations. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19509#discussion_r1634716109 PR Review Comment: https://git.openjdk.org/jdk/pull/19509#discussion_r1634716972 From aph at openjdk.org Tue Jun 11 11:54:14 2024 From: aph at openjdk.org (Andrew Haley) Date: Tue, 11 Jun 2024 11:54:14 GMT Subject: RFR: 8333382: [s390x] Enhance popcnt Instruction to use Z15 facilities [v8] In-Reply-To: <2H2TRZYhF6WWmQMAyWdop_kH7mg6rcMbs9_mcwvpb_U=.189c0876-47e0-444a-894e-e2c8d61e5417@github.com> References: <2H2TRZYhF6WWmQMAyWdop_kH7mg6rcMbs9_mcwvpb_U=.189c0876-47e0-444a-894e-e2c8d61e5417@github.com> Message-ID: On Tue, 11 Jun 2024 10:39:30 GMT, Amit Kumar wrote: >> We need to move popcnt instruction implementation out of s390.ad file as it is required to be required some methods present in [JDK-8331126.](https://bugs.openjdk.org/browse/JDK-8331126) >> >> >> When the miscellaneous-instruction-extensions facility 3 is not installed or bit 0 of the M3 >> field is zero, a count of the number of one bits in each of the eight bytes of general register >> R2 is placed into the corresponding byte of general register R1. Each byte of general register >> R1 is an 8-bit binary integer in the range of 0-8. >> >> >> >> When the miscellaneous-instruction-extensions facility 3 is installed and bit 0 of the M3 field >> is one, a count of the total number of one bits in the 64-bit general register R2 is placed into >> general register R1. The result is a 64-bit unsigned integer in the range 0 to 64. >> >> >> Performed tier1 test on fastdebug build and didn't see any regression. > > Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: > > removes if and uses guarantee src/hotspot/cpu/s390/macroAssembler_s390.hpp line 1027: > 1025: > 1026: // if you're unsure whether your instruction will run on older hardware or newer hardware than Z15 > 1027: // then please use these instruction to avoid the compatibility issues Suggestion: // These generate optimized code for all supported s390 implementations, and are preferred for most uses. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19509#discussion_r1634727795 From amitkumar at openjdk.org Tue Jun 11 12:17:43 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Tue, 11 Jun 2024 12:17:43 GMT Subject: RFR: 8333382: [s390x] Enhance popcnt Instruction to use Z15 facilities [v9] In-Reply-To: References: Message-ID: <9vV7WcXhb9RJlAhz_O8UH7ft4shcJTeJWT5-E91tdYE=.4d4baa53-4f7e-45b8-8ea0-94e8722410aa@github.com> > We need to move popcnt instruction implementation out of s390.ad file as it is required to be required some methods present in [JDK-8331126.](https://bugs.openjdk.org/browse/JDK-8331126) > > > When the miscellaneous-instruction-extensions facility 3 is not installed or bit 0 of the M3 > field is zero, a count of the number of one bits in each of the eight bytes of general register > R2 is placed into the corresponding byte of general register R1. Each byte of general register > R1 is an 8-bit binary integer in the range of 0-8. > > > > When the miscellaneous-instruction-extensions facility 3 is installed and bit 0 of the M3 field > is one, a count of the total number of one bits in the 64-bit general register R2 is placed into > general register R1. The result is a 64-bit unsigned integer in the range 0 to 64. > > > Performed tier1 test on fastdebug build and didn't see any regression. Amit Kumar has updated the pull request incrementally with three additional commits since the last revision: - Update src/hotspot/cpu/s390/macroAssembler_s390.hpp Co-authored-by: Andrew Haley - Update src/hotspot/cpu/s390/macroAssembler_s390.hpp Co-authored-by: Andrew Haley - Update src/hotspot/cpu/s390/macroAssembler_s390.hpp Co-authored-by: Andrew Haley ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19509/files - new: https://git.openjdk.org/jdk/pull/19509/files/712b9b45..cda1046d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19509&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19509&range=07-08 Stats: 6 lines in 1 file changed: 0 ins; 3 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/19509.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19509/head:pull/19509 PR: https://git.openjdk.org/jdk/pull/19509 From amitkumar at openjdk.org Tue Jun 11 12:35:46 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Tue, 11 Jun 2024 12:35:46 GMT Subject: RFR: 8333382: [s390x] Enhance popcnt Instruction to use Z15 facilities [v10] In-Reply-To: References: Message-ID: > We need to move popcnt instruction implementation out of s390.ad file as it is required to be required some methods present in [JDK-8331126.](https://bugs.openjdk.org/browse/JDK-8331126) > > > When the miscellaneous-instruction-extensions facility 3 is not installed or bit 0 of the M3 > field is zero, a count of the number of one bits in each of the eight bytes of general register > R2 is placed into the corresponding byte of general register R1. Each byte of general register > R1 is an 8-bit binary integer in the range of 0-8. > > > > When the miscellaneous-instruction-extensions facility 3 is installed and bit 0 of the M3 field > is one, a count of the total number of one bits in the 64-bit general register R2 is placed into > general register R1. The result is a 64-bit unsigned integer in the range 0 to 64. > > > Performed tier1 test on fastdebug build and didn't see any regression. Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: changes pre_z15 -> without_ext3 and post_z15 -> with_ext3 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19509/files - new: https://git.openjdk.org/jdk/pull/19509/files/cda1046d..5a39a8b7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19509&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19509&range=08-09 Stats: 26 lines in 3 files changed: 0 ins; 0 del; 26 mod Patch: https://git.openjdk.org/jdk/pull/19509.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19509/head:pull/19509 PR: https://git.openjdk.org/jdk/pull/19509 From amitkumar at openjdk.org Tue Jun 11 12:41:15 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Tue, 11 Jun 2024 12:41:15 GMT Subject: RFR: 8333382: [s390x] Enhance popcnt Instruction to use Z15 facilities [v7] In-Reply-To: References: Message-ID: On Tue, 11 Jun 2024 11:41:26 GMT, Andrew Haley wrote: >> what about using with and without like this ?pop_count_with_ext3? ? >> >> But I still think that current naming is also not that bad?. > > ?pop_count_with_ext3? is fine. Strictly speaking, "post" means "after", so is a bit confusing. Done, please have a look at the latest changes; ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19509#discussion_r1634810646 From sspitsyn at openjdk.org Tue Jun 11 12:57:20 2024 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Tue, 11 Jun 2024 12:57:20 GMT Subject: [jdk23] RFR: 8333931: Problemlist serviceability/jvmti/vthread/CarrierThreadEventNotification Message-ID: Please, review a jdk23 backport of the: [8333931](https://bugs.openjdk.org/browse/JDK-8333931): Problemlist serviceability/jvmti/vthread/CarrierThreadEventNotification Thanks ------------- Commit messages: - Backport fe9c63cf73db7833646345e362cbda020ac403d1 Changes: https://git.openjdk.org/jdk/pull/19651/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19651&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8333931 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/19651.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19651/head:pull/19651 PR: https://git.openjdk.org/jdk/pull/19651 From cjplummer at openjdk.org Tue Jun 11 17:08:14 2024 From: cjplummer at openjdk.org (Chris Plummer) Date: Tue, 11 Jun 2024 17:08:14 GMT Subject: [jdk23] RFR: 8333931: Problemlist serviceability/jvmti/vthread/CarrierThreadEventNotification In-Reply-To: References: Message-ID: On Tue, 11 Jun 2024 11:34:34 GMT, Serguei Spitsyn wrote: > Please, review a jdk23 backport of the: > [8333931](https://bugs.openjdk.org/browse/JDK-8333931): Problemlist serviceability/jvmti/vthread/CarrierThreadEventNotification > > Thanks Marked as reviewed by cjplummer (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/19651#pullrequestreview-2111015465 From sspitsyn at openjdk.org Tue Jun 11 17:22:22 2024 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Tue, 11 Jun 2024 17:22:22 GMT Subject: [jdk23] RFR: 8333931: Problemlist serviceability/jvmti/vthread/CarrierThreadEventNotification In-Reply-To: References: Message-ID: On Tue, 11 Jun 2024 11:34:34 GMT, Serguei Spitsyn wrote: > Please, review a jdk23 backport of the: > [8333931](https://bugs.openjdk.org/browse/JDK-8333931): Problemlist serviceability/jvmti/vthread/CarrierThreadEventNotification > > Thanks Thank you for review, Chris. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19651#issuecomment-2161256191 From sspitsyn at openjdk.org Tue Jun 11 17:22:22 2024 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Tue, 11 Jun 2024 17:22:22 GMT Subject: [jdk23] Integrated: 8333931: Problemlist serviceability/jvmti/vthread/CarrierThreadEventNotification In-Reply-To: References: Message-ID: On Tue, 11 Jun 2024 11:34:34 GMT, Serguei Spitsyn wrote: > Please, review a jdk23 backport of the: > [8333931](https://bugs.openjdk.org/browse/JDK-8333931): Problemlist serviceability/jvmti/vthread/CarrierThreadEventNotification > > Thanks This pull request has now been integrated. Changeset: b17a1c09 Author: Serguei Spitsyn URL: https://git.openjdk.org/jdk/commit/b17a1c092ff082b58d4e9ad64c516a49e4f3adb9 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod 8333931: Problemlist serviceability/jvmti/vthread/CarrierThreadEventNotification Reviewed-by: cjplummer Backport-of: fe9c63cf73db7833646345e362cbda020ac403d1 ------------- PR: https://git.openjdk.org/jdk/pull/19651 From mli at openjdk.org Tue Jun 11 20:35:16 2024 From: mli at openjdk.org (Hamlin Li) Date: Tue, 11 Jun 2024 20:35:16 GMT Subject: RFR: 8332689: RISC-V: Use load instead of trampolines [v7] In-Reply-To: References: <3UcoOmZTsZJBu0ZkOZQ21-ynr1slrOGCDuWYFnJQz1U=.6467ec54-cd28-4aae-a90c-6b6858f37986@github.com> Message-ID: On Fri, 7 Jun 2024 07:14:47 GMT, Fei Yang wrote: >> src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 981: >> >>> 979: } >>> 980: >>> 981: void MacroAssembler::load_link(const address source, Register temp) { >> >> maybe modify to `load_jump_link` or `load_link_jump`? > > I am considering names like `indirect_jump_link` :-) I'm not sure, but better have a `jump` in its name, just `load` is misleading. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19453#discussion_r1635450759 From mli at openjdk.org Tue Jun 11 20:56:16 2024 From: mli at openjdk.org (Hamlin Li) Date: Tue, 11 Jun 2024 20:56:16 GMT Subject: RFR: 8332689: RISC-V: Use load instead of trampolines [v10] In-Reply-To: References: Message-ID: On Tue, 11 Jun 2024 11:17:23 GMT, Robbin Ehn wrote: >> Hi all, please consider! >> >> Today we do JAL to **dest** if **dest** is in reach (+/- 1 MB). >> Using a very small application or running very short time we have fast patchable calls. >> But any normal application running longer will increase the code size and code chrun/fragmentation. >> So whatever or not you get hot fast calls rely on luck. >> >> To be patchable and get code cache reach we also emit a stub trampoline which we can point the JAL to. >> This would be the common case for a patchable call. >> >> Code stream: >> JAL >> Stubs: >> AUIPC >> LD >> JALR >> >> >> >> On some CPUs L1D and L1I can't contain the same cache line, which means the tramopline stub can bounce from L1I->L1D->L1I, which is expensive. >> Even if you don't have that problem having a call to a jump is not the fastest way. >> Loading the address avoids the pitsfalls of cmodx. >> >> This patch suggest to solve the problems with trampolines, we take small penalty in the naive case of JAL to **dest**, >> and instead do by default: >> >> Code stream: >> AUIPC >> LD >> JALR >> Stubs: >> >> >> An experimental option for turning trampolines back on exists. >> >> It should be possible to enhanced this with the WIP [Zjid](https://github.com/riscv/riscv-j-extension) by changing the JALR to JAL and nop out the auipc+ld (as the current proposal of Zjid forces the I-fetcher to fetch instruction in order (meaning we will avoid a lot issues which arm has)) when in reach and vice-versa. >> >> Numbers from VF2 (I have done them a few times, they are always overall in favor of this patch): >> >> fop (msec) 2239 | 2128 = 0.950424 >> h2 (msec) 18660 | 16594 = 0.889282 >> jython (msec) 22022 | 21925 = 0.995595 >> luindex (msec) 2866 | 2842 = 0.991626 >> lusearch (msec) 4108 | 4311 = 1.04942 >> lusearch-fix (msec) 4406 | 4116 = 0.934181 >> pmd (msec) 5976 | 5897 = 0.98678 >> jython (msec) 22022 | 21925 = 0.995595 >> Avg: 0.974112 >> fop(xcomp) (msec) 2721 | 2714 = 0.997427 >> h2(xcomp) ... > > Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 15 commits: > > - Merge branch 'master' into 8332689 > - Merge branch 'master' into 8332689 > - Merge branch 'master' into 8332689 > - Remove tmp file > - Prepare for dynamic NativeCall size > - Only allow one calling convetion, i.e. fixed sized > - Merge branch 'master' into 8332689 > - Review comments > - Move shart/far code to cpp > - Cleanup > - ... and 5 more: https://git.openjdk.org/jdk/compare/93f3918e...eb30360a src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 986: > 984: assert_cond(source != nullptr); > 985: int64_t distance = source - pc(); > 986: assert(is_simm32(distance), "Must be"); seems load_link can jump to about +/-2G dest from pc, jump_link seems support full address range jump (e.g. 48 bits)? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19453#discussion_r1635453818 From dholmes at openjdk.org Wed Jun 12 01:49:18 2024 From: dholmes at openjdk.org (David Holmes) Date: Wed, 12 Jun 2024 01:49:18 GMT Subject: RFR: 8333887: ubsan: unsafe.cpp:247:13: runtime error: store to null pointer of type 'volatile int' In-Reply-To: References: Message-ID: On Mon, 10 Jun 2024 13:52:15 GMT, Matthias Baesken wrote: > When running with ubsan enabled binaries, in a number of tests like > jdk/jfr/event/runtime/TestShutdownEvent.jtr > jdk/jfr/jvm/TestDumpOnCrash.jtr > we get those ubsan-errors : > > src/hotspot/share/prims/unsafe.cpp:247:13: runtime error: store to null pointer of type 'volatile int' > #0 0x7f0be9a3e10d in MemoryAccess::put(int) src/hotspot/share/prims/unsafe.cpp:247 > #1 0x7f0be9a3e10d in Unsafe_PutInt src/hotspot/share/prims/unsafe.cpp:315 > #2 0x7f0bd0502e7b () > #3 0x7f0bd04fe01f () > #4 0x7f0bd04fe01f () > #5 0x7f0bd04fe525 () > #6 0x7f0bd04f6c85 () > #7 0x7f0be80a2972 in JavaCalls::call_helper(JavaValue*, methodHandle const&, JavaCallArguments*, JavaThread*) src/hotspot/share/runtime/javaCalls.cpp:415 > #8 0x7f0be83160d8 in jni_invoke_static src/hotspot/share/prims/jni.cpp:888 > #9 0x7f0be831d875 in jni_CallStaticVoidMethod src/hotspot/share/prims/jni.cpp:1717 > #10 0x7f0beed32cf8 in invokeStaticMainWithArgs src/java.base/share/native/libjli/java.c:418 > #11 0x7f0beed35894 in JavaMain src/java.base/share/native/libjli/java.c:623 > #12 0x7f0beed3cf68 in ThreadJavaMain src/java.base/unix/native/libjli/java_md.c:653 > #13 0x7f0beeceb6e9 in start_thread (/lib64/libpthread.so.0+0xa6e9) (BuildId: 2f8d3c2d0f4d7888c2598d2ff6356537f5708a73) > > Looks like we use unsafe to put/write to 0 e.g. to cause a crash. Probably we could add an attribute to the function so that ubsan stops complaining (the put to 0 is done for a reason but ubsan cannot know this). src/hotspot/share/prims/unsafe.cpp line 249: > 247: #if defined(__clang__) || defined(__GNUC__) > 248: __attribute__((no_sanitize("undefined"))) > 249: #endif Can we hide this in a macro like `SUPPRESS_UBSAN_WARNING`? If it turns out we need to do this in a few places then it will look nicer. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19630#discussion_r1635677103 From dholmes at openjdk.org Wed Jun 12 02:42:16 2024 From: dholmes at openjdk.org (David Holmes) Date: Wed, 12 Jun 2024 02:42:16 GMT Subject: RFR: 8330198: Add some class loading related perf counters to measure VM startup [v8] In-Reply-To: References: <-pZLP33SmfOP-3djnyabR8eaNZqVem1sLJYWD7412Qc=.4c84c9ab-392e-4b03-a0c0-e8a9a679999b@github.com> Message-ID: On Tue, 11 Jun 2024 00:06:53 GMT, Calvin Cheung wrote: >> src/hotspot/share/runtime/java.cpp line 164: >> >>> 162: if (log.is_enabled()) { >>> 163: ClassLoader::print_counters(st); >>> 164: } >> >> Probably worth adding a comment here as to why we actually print to the passed in stream and not the log stream., given we check if the log stream is enabled. Someone could easily think this is a typo/bug. > > I can change `log_vm_stats` to accept a `bool` argument so that the the `st` becomes clear. > > > void log_vm_stats(bool use_tty) { > LogStreamHandle(Info, perf, class, link) log; > if (log.is_enabled()) { > outputStream* st = use_tty ? tty : &log; > ClassLoader::print_counters(st); > } > } That makes the tty usage a lot clearer - thanks. >> src/hotspot/share/runtime/threads.cpp line 835: >> >>> 833: log.print_cr("At VM initialization completion:"); >>> 834: log_vm_stats(&log); >>> 835: } >> >> If we are going to have more types of VM stats in the future, it is not clear how you will change this if-condition? Nor what stream you would pass in. ??? > > The if-condition could be something like: > > > if (log_is_enabled(Info, perf, class, link) || > log_is_enabled(Info, perf, xxx, yyy) || > ...) > > > Regarding which stream to pass in, with my proposed change in `log_vm_stats` above, the current fix would look like when calling from threads.cpp: > `log_vm_stats(false /* use_tty */);` > when calling from java.cpp: > `log_vm_stats(true /* use_tty */);` > > Or do you prefer not having the `log_vm_stats` function and calling `ClassLoader::print_counters` directly? > If so, we don't need the compound `if` conditions in the above. The problem is that the two different logging configurations could have been given different destinations and need not write to the same "stream". ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18790#discussion_r1635735627 PR Review Comment: https://git.openjdk.org/jdk/pull/18790#discussion_r1635735421 From sjayagond at openjdk.org Wed Jun 12 02:56:22 2024 From: sjayagond at openjdk.org (Sidraya Jayagond) Date: Wed, 12 Jun 2024 02:56:22 GMT Subject: RFR: 8327652: S390x: Implements SLP support [v7] In-Reply-To: References: Message-ID: <0WCRTLCKLlYjvy10RuGv1F2ZSegn4fRgAzHUBrRc7-I=.71e5f243-40db-435e-a26c-c6b1fd8a3c63@github.com> On Tue, 26 Mar 2024 15:10:37 GMT, Sidraya Jayagond wrote: >> This PR Adds SIMD support on s390x. > > Sidraya Jayagond has updated the pull request incrementally with one additional commit since the last revision: > > PopCountVI supported by z14 onwards. Commenting to keep PR Open. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18162#issuecomment-2162006435 From amitkumar at openjdk.org Wed Jun 12 04:45:43 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Wed, 12 Jun 2024 04:45:43 GMT Subject: RFR: 8333382: [s390x] Enhance popcnt Instruction to use Z15 facilities [v11] In-Reply-To: References: Message-ID: > We need to move popcnt instruction implementation out of s390.ad file as it is required to be required some methods present in [JDK-8331126.](https://bugs.openjdk.org/browse/JDK-8331126) > > > When the miscellaneous-instruction-extensions facility 3 is not installed or bit 0 of the M3 > field is zero, a count of the number of one bits in each of the eight bytes of general register > R2 is placed into the corresponding byte of general register R1. Each byte of general register > R1 is an 8-bit binary integer in the range of 0-8. > > > > When the miscellaneous-instruction-extensions facility 3 is installed and bit 0 of the M3 field > is one, a count of the total number of one bits in the 64-bit general register R2 is placed into > general register R1. The result is a 64-bit unsigned integer in the range 0 to 64. > > > Performed tier1 test on fastdebug build and didn't see any regression. Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: comment is not relevant anymore ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19509/files - new: https://git.openjdk.org/jdk/pull/19509/files/5a39a8b7..5ae34f83 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19509&range=10 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19509&range=09-10 Stats: 1 line in 1 file changed: 0 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/19509.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19509/head:pull/19509 PR: https://git.openjdk.org/jdk/pull/19509 From ccheung at openjdk.org Wed Jun 12 05:23:41 2024 From: ccheung at openjdk.org (Calvin Cheung) Date: Wed, 12 Jun 2024 05:23:41 GMT Subject: RFR: 8330198: Add some class loading related perf counters to measure VM startup [v9] In-Reply-To: References: Message-ID: > Adding a few perf counters related to class loading to measure VM startup. The counters are only active if the user specifies `-Xlog:init` in the command line. A diagnostic flag `ProfileClassLinkage` is added to control the new counters. The flag is set to false by default and will be enabled if `-Xlog:init` is specified. > > This change is already in the leyden/premain branch. There are more counters in the branch to measure other stuff. For now, just upstreaming class loader related counters. > > Refer to the [comment](https://bugs.openjdk.org/browse/JDK-8330198?focusedId=14665311&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14665311) in the bug report for an example output. > > Passed tiers 1 - 4 testing. Calvin Cheung has updated the pull request incrementally with one additional commit since the last revision: remove log_vm_stats() ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18790/files - new: https://git.openjdk.org/jdk/pull/18790/files/c62f5e4f..8a46e632 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18790&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18790&range=07-08 Stats: 13 lines in 3 files changed: 2 ins; 9 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/18790.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18790/head:pull/18790 PR: https://git.openjdk.org/jdk/pull/18790 From ccheung at openjdk.org Wed Jun 12 05:30:15 2024 From: ccheung at openjdk.org (Calvin Cheung) Date: Wed, 12 Jun 2024 05:30:15 GMT Subject: RFR: 8330198: Add some class loading related perf counters to measure VM startup [v8] In-Reply-To: References: <-pZLP33SmfOP-3djnyabR8eaNZqVem1sLJYWD7412Qc=.4c84c9ab-392e-4b03-a0c0-e8a9a679999b@github.com> Message-ID: On Wed, 12 Jun 2024 02:39:39 GMT, David Holmes wrote: >> The if-condition could be something like: >> >> >> if (log_is_enabled(Info, perf, class, link) || >> log_is_enabled(Info, perf, xxx, yyy) || >> ...) >> >> >> Regarding which stream to pass in, with my proposed change in `log_vm_stats` above, the current fix would look like when calling from threads.cpp: >> `log_vm_stats(false /* use_tty */);` >> when calling from java.cpp: >> `log_vm_stats(true /* use_tty */);` >> >> Or do you prefer not having the `log_vm_stats` function and calling `ClassLoader::print_counters` directly? >> If so, we don't need the compound `if` conditions in the above. > > The problem is that the two different logging configurations could have been given different destinations and need not write to the same "stream". I think it's better not to have `log_vm_stats` but calling `ClassLoader::print_counters` directly. Otherwise, in `log_vm_stats`, it needs to check every -Xlog:perf+... tag like the following: void log_vm_stats(bool use_tty) { LogStreamHandle(Info, perf, class, link) log; if (log.is_enabled()) { outputStream* st = use_tty ? tty : &log; ClassLoader::print_counters(st); } LogStreamHandle(Info, perf, xxx, yyy) log2; if (log2.is_enabled()) { outputStream* st = use_tty ? tty : &log2; XXX::print_counters(st); } ... } I've pushed another commit without `log_vm_stats`. Also checked the performance using the PetClinic app which loads more than 17,000 classes during boot up. Not much performance difference was observed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18790#discussion_r1635834291 From mbaesken at openjdk.org Wed Jun 12 07:05:14 2024 From: mbaesken at openjdk.org (Matthias Baesken) Date: Wed, 12 Jun 2024 07:05:14 GMT Subject: RFR: 8333887: ubsan: unsafe.cpp:247:13: runtime error: store to null pointer of type 'volatile int' In-Reply-To: References: Message-ID: On Wed, 12 Jun 2024 01:46:57 GMT, David Holmes wrote: >> When running with ubsan enabled binaries, in a number of tests like >> jdk/jfr/event/runtime/TestShutdownEvent.jtr >> jdk/jfr/jvm/TestDumpOnCrash.jtr >> we get those ubsan-errors : >> >> src/hotspot/share/prims/unsafe.cpp:247:13: runtime error: store to null pointer of type 'volatile int' >> #0 0x7f0be9a3e10d in MemoryAccess::put(int) src/hotspot/share/prims/unsafe.cpp:247 >> #1 0x7f0be9a3e10d in Unsafe_PutInt src/hotspot/share/prims/unsafe.cpp:315 >> #2 0x7f0bd0502e7b () >> #3 0x7f0bd04fe01f () >> #4 0x7f0bd04fe01f () >> #5 0x7f0bd04fe525 () >> #6 0x7f0bd04f6c85 () >> #7 0x7f0be80a2972 in JavaCalls::call_helper(JavaValue*, methodHandle const&, JavaCallArguments*, JavaThread*) src/hotspot/share/runtime/javaCalls.cpp:415 >> #8 0x7f0be83160d8 in jni_invoke_static src/hotspot/share/prims/jni.cpp:888 >> #9 0x7f0be831d875 in jni_CallStaticVoidMethod src/hotspot/share/prims/jni.cpp:1717 >> #10 0x7f0beed32cf8 in invokeStaticMainWithArgs src/java.base/share/native/libjli/java.c:418 >> #11 0x7f0beed35894 in JavaMain src/java.base/share/native/libjli/java.c:623 >> #12 0x7f0beed3cf68 in ThreadJavaMain src/java.base/unix/native/libjli/java_md.c:653 >> #13 0x7f0beeceb6e9 in start_thread (/lib64/libpthread.so.0+0xa6e9) (BuildId: 2f8d3c2d0f4d7888c2598d2ff6356537f5708a73) >> >> Looks like we use unsafe to put/write to 0 e.g. to cause a crash. Probably we could add an attribute to the function so that ubsan stops complaining (the put to 0 is done for a reason but ubsan cannot know this). > > src/hotspot/share/prims/unsafe.cpp line 249: > >> 247: #if defined(__clang__) || defined(__GNUC__) >> 248: __attribute__((no_sanitize("undefined"))) >> 249: #endif > > Can we hide this in a macro like `SUPPRESS_UBSAN_WARNING`? If it turns out we need to do this in a few places then it will look nicer. Sounds like a good idea. See also the discussion about `ATTRIBUTE_NO_UBSAN` here https://github.com/openjdk/jdk/pull/19597 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19630#discussion_r1635921287 From stuefe at openjdk.org Wed Jun 12 07:13:30 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 12 Jun 2024 07:13:30 GMT Subject: RFR: 8333994: NMT: call stacks should show source information Message-ID: Patch adds printing of source file and line number to NMT call stacks that are printed in detail mode. Useful for hunting down leaks. ------------- Commit messages: - exclude macos from testing source info - copyrights - test - JDK-8333994-NMT-call-stacks-should-show-source-information Changes: https://git.openjdk.org/jdk/pull/19655/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19655&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8333994 Stats: 39 lines in 2 files changed: 19 ins; 5 del; 15 mod Patch: https://git.openjdk.org/jdk/pull/19655.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19655/head:pull/19655 PR: https://git.openjdk.org/jdk/pull/19655 From dnsimon at openjdk.org Wed Jun 12 07:20:28 2024 From: dnsimon at openjdk.org (Doug Simon) Date: Wed, 12 Jun 2024 07:20:28 GMT Subject: RFR: 8280481: Duplicated stubs to interpreter for static calls In-Reply-To: References: <9N1GcHDRvyX1bnPrRcyw96zWIgrrAm4mfrzp8dQ-BBk=.6d55c5fd-7d05-4058-99b6-7d40a92450bf@github.com> Message-ID: On Tue, 30 Aug 2022 09:04:21 GMT, Evgeny Astigeevich wrote: >>> Hi @eastig , I'd like to ask you how to get the experiment results, aka. `Saved bytes`, `Nmethods with shared stubs`,`Final # of nmethods`. Thank you! >> >> You can get `Final # of nmethods` with `-XX:+PrintCodeCache`. >> To get `Saved bytes`, `Nmethods with shared stubs` you need to instrument `emit_shared_stubs_to_interp` to count shared stubs and nmethods sharing them. > >> Hi @eastig , >> I would like to recurring your experimental data and I would be very grateful if you could provide a small patch to help me get the result of `Saved bytes` and `Nmethods with shared stubs`. >> Thank you! > > > diff --git a/src/hotspot/share/asm/codeBuffer.inline.hpp b/src/hotspot/share/asm/codeBuffer.inline.hpp > index 045cff13f25..9af26730cbd 100644 > --- a/src/hotspot/share/asm/codeBuffer.inline.hpp > +++ b/src/hotspot/share/asm/codeBuffer.inline.hpp > @@ -45,6 +45,7 @@ bool emit_shared_stubs_to_interp(CodeBuffer* cb, SharedStubToInterpRequests* sha > }; > shared_stub_to_interp_requests->sort(by_shared_method); > MacroAssembler masm(cb); > + bool has_shared = false; > for (int i = 0; i < shared_stub_to_interp_requests->length();) { > address stub = masm.start_a_stub(CompiledStaticCall::to_interp_stub_size()); > if (stub == NULL) { > @@ -53,13 +54,22 @@ bool emit_shared_stubs_to_interp(CodeBuffer* cb, SharedStubToInterpRequests* sha > } > > ciMethod* method = shared_stub_to_interp_requests->at(i).shared_method(); > + int shared = 0; > do { > address caller_pc = cb->insts_begin() + shared_stub_to_interp_requests->at(i).call_offset(); > masm.relocate(static_stub_Relocation::spec(caller_pc), relocate_format); > ++i; > + ++shared; > } while (i < shared_stub_to_interp_requests->length() && shared_stub_to_interp_requests->at(i).shared_method() == method); > masm.emit_static_call_stub(); > masm.end_a_stub(); > + if (UseNewCode && shared > 1) { > + has_shared = true; > + tty->print_cr("Saved: %d", (shared - 1) * CompiledStaticCall::to_interp_stub_size()); > + } > + } > + if (has_shared) { > + tty->print_cr("nm_has_shared"); > } > return true; > } > > > You will need to use `-XX:+UseNewCode` in your runs. > `grep nm_has_shared run.log | wc -l` is a number of nmethods having a shared stub. > `grep Saved: run.log | awk '{print $2}' | grep -o '[0-9]*' | paste -s -d+ - | bc` prints a number of saved bytes. @eastig as I understand, this optimization is about saving code cache memory. The sharing is within an nmethod, not across nmethods, correct? I'm trying to prioritize an effort to adopt this optimization in Graal. In addition to the numbers you present for code cache bytes saved in the benchmarks, can you say anything about how much that is relative to the code cache used in the benchmarks? ------------- PR Comment: https://git.openjdk.org/jdk/pull/8816#issuecomment-2162285030 From fyang at openjdk.org Wed Jun 12 07:45:15 2024 From: fyang at openjdk.org (Fei Yang) Date: Wed, 12 Jun 2024 07:45:15 GMT Subject: RFR: 8332689: RISC-V: Use load instead of trampolines [v8] In-Reply-To: References: <4TVE4G5mkc7U0LI7pj_rd6jbG0vGyFTXWhwX86wA4tg=.3f823c2e-136f-431f-8e06-d61c652f36d2@github.com> <_MuwK3i7Ru8buBWkrto34BYlmlfDFPKDBwsp3Vaq2S8=.93e16dca-260a-4e4d-b414-fc66412bd5c7@github.com> Message-ID: On Tue, 11 Jun 2024 11:19:50 GMT, Robbin Ehn wrote: > > > > _Mailing list message from [Andrew Haley](mailto:aph-open at littlepinkcloud.com) on [hotspot-dev](mailto:hotspot-dev at mail.openjdk.org):_ > > > > On 5/29/24 15:28, Robbin Ehn wrote: > > > > > On some CPUs L1D and L1I can't contain the same cache line, which means > > > > > the tramopline stub can bounce from L1I->L1D->L1I, which is > > > > > expensive. > > > > > > > > > > > > Wouldn't it be a lot easier simply to put the target address loaded by the trampoline into the constant pool? > > > > > > > > > Seem to me that will be more cleaner than the current solution (`MacroAssembler::emit_address_stub` which uses `trampoline_stub_Relocation::spec` relocation holder but emits an 'address stub' instead of a real trampline). And I see PPC is putting the entry point as a constant into the constant pool [1] when emitting a call with trampoline stub. > > > [1] [MacroAssembler::emit_address_stub](https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/ppc/ppc.ad#L1308) > > > > > > This was just a bit easier as I have both cases. I'll look into cp. > > Thanks > > ppc version is not possible as CP offset is stored in instruction stream. We need to use runtime_call_w_cp_type, as s390, to keep track of the offset when CP moves around during growth/relocation. It needs a bit work. What you think? Ah, that doesn't seems to be easier than before ~. I am not familar with s390 and I didn't see where it updates the target address in CP. I will take another look at our current approach and see if there are further improvements we can do. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19453#issuecomment-2162327345 From aph at openjdk.org Wed Jun 12 09:15:16 2024 From: aph at openjdk.org (Andrew Haley) Date: Wed, 12 Jun 2024 09:15:16 GMT Subject: RFR: 8333382: [s390x] Enhance popcnt Instruction to use Z15 facilities [v11] In-Reply-To: References: Message-ID: On Wed, 12 Jun 2024 04:45:43 GMT, Amit Kumar wrote: >> We need to move popcnt instruction implementation out of s390.ad file as it is required to be required some methods present in [JDK-8331126.](https://bugs.openjdk.org/browse/JDK-8331126) >> >> >> When the miscellaneous-instruction-extensions facility 3 is not installed or bit 0 of the M3 >> field is zero, a count of the number of one bits in each of the eight bytes of general register >> R2 is placed into the corresponding byte of general register R1. Each byte of general register >> R1 is an 8-bit binary integer in the range of 0-8. >> >> >> >> When the miscellaneous-instruction-extensions facility 3 is installed and bit 0 of the M3 field >> is one, a count of the total number of one bits in the 64-bit general register R2 is placed into >> general register R1. The result is a 64-bit unsigned integer in the range 0 to 64. >> >> >> Performed tier1 test on fastdebug build and didn't see any regression. > > Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: > > comment is not relevant anymore Yes, thanks. That looks very clear now. ------------- Marked as reviewed by aph (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19509#pullrequestreview-2112431032 From amitkumar at openjdk.org Wed Jun 12 09:21:17 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Wed, 12 Jun 2024 09:21:17 GMT Subject: RFR: 8333382: [s390x] Enhance popcnt Instruction to use Z15 facilities [v11] In-Reply-To: References: Message-ID: On Wed, 12 Jun 2024 04:45:43 GMT, Amit Kumar wrote: >> We need to move popcnt instruction implementation out of s390.ad file as it is required to be required some methods present in [JDK-8331126.](https://bugs.openjdk.org/browse/JDK-8331126) >> >> >> When the miscellaneous-instruction-extensions facility 3 is not installed or bit 0 of the M3 >> field is zero, a count of the number of one bits in each of the eight bytes of general register >> R2 is placed into the corresponding byte of general register R1. Each byte of general register >> R1 is an 8-bit binary integer in the range of 0-8. >> >> >> >> When the miscellaneous-instruction-extensions facility 3 is installed and bit 0 of the M3 field >> is one, a count of the total number of one bits in the 64-bit general register R2 is placed into >> general register R1. The result is a 64-bit unsigned integer in the range 0 to 64. >> >> >> Performed tier1 test on fastdebug build and didn't see any regression. > > Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: > > comment is not relevant anymore Thanks Andrew, @RealLucy would you like to take another look ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/19509#issuecomment-2162523435 From sgehwolf at openjdk.org Wed Jun 12 09:49:15 2024 From: sgehwolf at openjdk.org (Severin Gehwolf) Date: Wed, 12 Jun 2024 09:49:15 GMT Subject: RFR: 8261242: [Linux] OSContainer::is_containerized() returns true when run outside a container [v4] In-Reply-To: References: Message-ID: <52rFNWs8r1zHN5eKvb6XfkT6rLtlFRNiKbmaDZbUyE0=.70728fb6-14cf-4d93-bb41-bae48d3d193f@github.com> On Fri, 7 Jun 2024 12:59:26 GMT, Severin Gehwolf wrote: >> Please review this enhancement to the container detection code which allows it to figure out whether the JVM is actually running inside a container (`podman`, `docker`, `crio`), or with some other means that enforces memory/cpu limits by means of the cgroup filesystem. If neither of those conditions hold, the JVM runs in not containerized mode, addressing the issue described in the JBS tracker. For example, on my Linux system `is_containerized() == false" is being indicated with the following trace log line: >> >> >> [0.001s][debug][os,container] OSContainer::init: is_containerized() = false because no cpu or memory limit is present >> >> >> This state is being exposed by the Java `Metrics` API class using the new (still JDK internal) `isContainerized()` method. Example: >> >> >> java -XshowSettings:system --version >> Operating System Metrics: >> Provider: cgroupv1 >> System not containerized. >> openjdk 23-internal 2024-09-17 >> OpenJDK Runtime Environment (fastdebug build 23-internal-adhoc.sgehwolf.jdk-jdk) >> OpenJDK 64-Bit Server VM (fastdebug build 23-internal-adhoc.sgehwolf.jdk-jdk, mixed mode, sharing) >> >> >> The basic property this is being built on is the observation that the cgroup controllers typically get mounted read only into containers. Note that the current container tests assert that `OSContainer::is_containerized() == true` in various tests. Therefore, using the heuristic of "is any memory or cpu limit present" isn't sufficient. I had considered that in an earlier iteration, but many container tests failed. >> >> Overall, I think, with this patch we improve the current situation of claiming a containerized system being present when it's actually just a regular Linux system. >> >> Testing: >> >> - [x] GHA (risc-v failure seems infra related) >> - [x] Container tests on Linux x86_64 of cgroups v1 and cgroups v2 (including gtests) >> - [x] Some manual testing using cri-o >> >> Thoughts? > > Severin Gehwolf has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 13 commits: > > - Merge branch 'master' into jdk-8261242-is-containerized-fix > - Add doc for mountinfo scanning. > - Unify naming of variables > - Merge branch 'master' into jdk-8261242-is-containerized-fix > - Merge branch 'master' into jdk-8261242-is-containerized-fix > - jcheck fixes > - Fix tests > - Implement Metrics.isContainerized() > - Some clean-up > - Drop cgroups testing on plain Linux > - ... and 3 more: https://git.openjdk.org/jdk/compare/40b2fbd8...02884c70 [tools/javac/annotations/typeAnnotations/api/ArrayCreationTree](https://github.com/jerboaa/jdk/actions/runs/9417350160#user-content-tools_javac_annotations_typeannotations_api_arraycreationtree) test failure in GHA on 32 bit Linux seems unrelated. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18201#issuecomment-2162580613 From lucy at openjdk.org Wed Jun 12 10:19:19 2024 From: lucy at openjdk.org (Lutz Schmidt) Date: Wed, 12 Jun 2024 10:19:19 GMT Subject: RFR: 8333887: ubsan: unsafe.cpp:247:13: runtime error: store to null pointer of type 'volatile int' In-Reply-To: References: Message-ID: On Mon, 10 Jun 2024 13:52:15 GMT, Matthias Baesken wrote: > When running with ubsan enabled binaries, in a number of tests like > jdk/jfr/event/runtime/TestShutdownEvent.jtr > jdk/jfr/jvm/TestDumpOnCrash.jtr > we get those ubsan-errors : > > src/hotspot/share/prims/unsafe.cpp:247:13: runtime error: store to null pointer of type 'volatile int' > #0 0x7f0be9a3e10d in MemoryAccess::put(int) src/hotspot/share/prims/unsafe.cpp:247 > #1 0x7f0be9a3e10d in Unsafe_PutInt src/hotspot/share/prims/unsafe.cpp:315 > #2 0x7f0bd0502e7b () > #3 0x7f0bd04fe01f () > #4 0x7f0bd04fe01f () > #5 0x7f0bd04fe525 () > #6 0x7f0bd04f6c85 () > #7 0x7f0be80a2972 in JavaCalls::call_helper(JavaValue*, methodHandle const&, JavaCallArguments*, JavaThread*) src/hotspot/share/runtime/javaCalls.cpp:415 > #8 0x7f0be83160d8 in jni_invoke_static src/hotspot/share/prims/jni.cpp:888 > #9 0x7f0be831d875 in jni_CallStaticVoidMethod src/hotspot/share/prims/jni.cpp:1717 > #10 0x7f0beed32cf8 in invokeStaticMainWithArgs src/java.base/share/native/libjli/java.c:418 > #11 0x7f0beed35894 in JavaMain src/java.base/share/native/libjli/java.c:623 > #12 0x7f0beed3cf68 in ThreadJavaMain src/java.base/unix/native/libjli/java_md.c:653 > #13 0x7f0beeceb6e9 in start_thread (/lib64/libpthread.so.0+0xa6e9) (BuildId: 2f8d3c2d0f4d7888c2598d2ff6356537f5708a73) > > Looks like we use unsafe to put/write to 0 e.g. to cause a crash. Probably we could add an attribute to the function so that ubsan stops complaining (the put to 0 is done for a reason but ubsan cannot know this). LGTM. ------------- Marked as reviewed by lucy (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19630#pullrequestreview-2112584541 From lucy at openjdk.org Wed Jun 12 10:19:20 2024 From: lucy at openjdk.org (Lutz Schmidt) Date: Wed, 12 Jun 2024 10:19:20 GMT Subject: RFR: 8333887: ubsan: unsafe.cpp:247:13: runtime error: store to null pointer of type 'volatile int' In-Reply-To: References: Message-ID: On Wed, 12 Jun 2024 07:02:17 GMT, Matthias Baesken wrote: >> src/hotspot/share/prims/unsafe.cpp line 249: >> >>> 247: #if defined(__clang__) || defined(__GNUC__) >>> 248: __attribute__((no_sanitize("undefined"))) >>> 249: #endif >> >> Can we hide this in a macro like `SUPPRESS_UBSAN_WARNING`? If it turns out we need to do this in a few places then it will look nicer. > > Sounds like a good idea. See also the discussion about `ATTRIBUTE_NO_UBSAN` here https://github.com/openjdk/jdk/pull/19597 I like the idea of an encapsulating macro as well. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19630#discussion_r1636199928 From amitkumar at openjdk.org Wed Jun 12 10:47:15 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Wed, 12 Jun 2024 10:47:15 GMT Subject: RFR: 8333382: [s390x] Enhance popcnt Instruction to use Z15 facilities [v11] In-Reply-To: References: Message-ID: <69xIVO2-T6AoJX7icuDEGf_0uORPTOnxlL7vWgeWsJg=.c35c6172-12b0-46a6-84fd-ece28b0fd680@github.com> On Wed, 12 Jun 2024 04:45:43 GMT, Amit Kumar wrote: >> We need to move popcnt instruction implementation out of s390.ad file as it is required to be required some methods present in [JDK-8331126.](https://bugs.openjdk.org/browse/JDK-8331126) >> >> >> When the miscellaneous-instruction-extensions facility 3 is not installed or bit 0 of the M3 >> field is zero, a count of the number of one bits in each of the eight bytes of general register >> R2 is placed into the corresponding byte of general register R1. Each byte of general register >> R1 is an 8-bit binary integer in the range of 0-8. >> >> >> >> When the miscellaneous-instruction-extensions facility 3 is installed and bit 0 of the M3 field >> is one, a count of the total number of one bits in the 64-bit general register R2 is placed into >> general register R1. The result is a 64-bit unsigned integer in the range 0 to 64. >> >> >> Performed tier1 test on fastdebug build and didn't see any regression. > > Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: > > comment is not relevant anymore I ran `tier1` on `Release` and `fastdebug` VMs. I do not see any regression. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19509#issuecomment-2162689238 From mbaesken at openjdk.org Wed Jun 12 11:12:12 2024 From: mbaesken at openjdk.org (Matthias Baesken) Date: Wed, 12 Jun 2024 11:12:12 GMT Subject: RFR: 8333887: ubsan: unsafe.cpp:247:13: runtime error: store to null pointer of type 'volatile int' In-Reply-To: References: Message-ID: On Mon, 10 Jun 2024 13:52:15 GMT, Matthias Baesken wrote: > When running with ubsan enabled binaries, in a number of tests like > jdk/jfr/event/runtime/TestShutdownEvent.jtr > jdk/jfr/jvm/TestDumpOnCrash.jtr > we get those ubsan-errors : > > src/hotspot/share/prims/unsafe.cpp:247:13: runtime error: store to null pointer of type 'volatile int' > #0 0x7f0be9a3e10d in MemoryAccess::put(int) src/hotspot/share/prims/unsafe.cpp:247 > #1 0x7f0be9a3e10d in Unsafe_PutInt src/hotspot/share/prims/unsafe.cpp:315 > #2 0x7f0bd0502e7b () > #3 0x7f0bd04fe01f () > #4 0x7f0bd04fe01f () > #5 0x7f0bd04fe525 () > #6 0x7f0bd04f6c85 () > #7 0x7f0be80a2972 in JavaCalls::call_helper(JavaValue*, methodHandle const&, JavaCallArguments*, JavaThread*) src/hotspot/share/runtime/javaCalls.cpp:415 > #8 0x7f0be83160d8 in jni_invoke_static src/hotspot/share/prims/jni.cpp:888 > #9 0x7f0be831d875 in jni_CallStaticVoidMethod src/hotspot/share/prims/jni.cpp:1717 > #10 0x7f0beed32cf8 in invokeStaticMainWithArgs src/java.base/share/native/libjli/java.c:418 > #11 0x7f0beed35894 in JavaMain src/java.base/share/native/libjli/java.c:623 > #12 0x7f0beed3cf68 in ThreadJavaMain src/java.base/unix/native/libjli/java_md.c:653 > #13 0x7f0beeceb6e9 in start_thread (/lib64/libpthread.so.0+0xa6e9) (BuildId: 2f8d3c2d0f4d7888c2598d2ff6356537f5708a73) > > Looks like we use unsafe to put/write to 0 e.g. to cause a crash. Probably we could add an attribute to the function so that ubsan stops complaining (the put to 0 is done for a reason but ubsan cannot know this). Hi Lutz, thanks for the review ! I plan to do the ATTRIBUTE_NO_UBSAN change in a follow up for various code locations . ------------- PR Comment: https://git.openjdk.org/jdk/pull/19630#issuecomment-2162737100 From jsjolen at openjdk.org Wed Jun 12 12:19:56 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Wed, 12 Jun 2024 12:19:56 GMT Subject: RFR: 8333658: NMT: Use an allocator with 4-byte pointers to save memory in NativeCallStackStorage [v14] In-Reply-To: References: Message-ID: > Hi, > > This PR introduces a new allocator, `IndexedFreelistAllocator`. It uses a `GrowableArray` in order to have 4-byte "pointers" to its elements and also works as a freelist as unused slots form a linked list. This allocator cannot shrink its used memory. I'm always open for better names. > > We then use this allocator in order to store the `NativeCallStackStorage` hash table. This saves `4 * 4099 = ~16KiB` of memory as each table element is now only 4 bytes. For each stored stack it also saves 4 bytes. The fact that the allocator cannot shrink is fine, we do not ever remove stacks from the storage. > > The main point of this is to introduce a pointer-generic experimential API, so I also implemented CHeap and Arena allocators. It's currently placed in NMT, but we might want to move it into utilities. It uses a bit of template-magic, but my IDE (Emacs+clangd) was actually able to figure things out when the types didn't line up correctly etc, so it's not an enemy to IDE help. > > It sounded expensive to have the GrowableArray continously realloc its underlying data array, so I did a basic test where we allocate 1 000 000 stacks and push them into NativeCallStackStorage backed by different allocators. This is available in the PR. > > The results are as follows on linux-x64-slowdebug: > > > Generate stacks... Done > Time taken with GrowableArray: 8341.240945 > Time taken with CHeap: 12189.031318 > Time taken with Arena: 8800.703092 > Time taken with GrowableArray again: 8295.508829 > > > And on linux-x64: > > > Time taken with GrowableArray: 8378.018135 > Time taken with CHeap: 12437.347868 > Time taken with Arena: 8758.064717 > Time taken with GrowableArray again: 8391.076291 > > > Obviously, this is a very basic benchmark, but it seems like we're faster than CHeap and Arena for this case. Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: Const removal ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18979/files - new: https://git.openjdk.org/jdk/pull/18979/files/ede76aea..63413da0 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18979&range=13 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18979&range=12-13 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/18979.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18979/head:pull/18979 PR: https://git.openjdk.org/jdk/pull/18979 From amitkumar at openjdk.org Wed Jun 12 13:27:25 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Wed, 12 Jun 2024 13:27:25 GMT Subject: RFR: 8333382: [s390x] Enhance popcnt Instruction to use Z15 facilities [v11] In-Reply-To: References: Message-ID: On Wed, 12 Jun 2024 04:45:43 GMT, Amit Kumar wrote: >> We need to move popcnt instruction implementation out of s390.ad file as it is required to be required some methods present in [JDK-8331126.](https://bugs.openjdk.org/browse/JDK-8331126) >> >> >> When the miscellaneous-instruction-extensions facility 3 is not installed or bit 0 of the M3 >> field is zero, a count of the number of one bits in each of the eight bytes of general register >> R2 is placed into the corresponding byte of general register R1. Each byte of general register >> R1 is an 8-bit binary integer in the range of 0-8. >> >> >> >> When the miscellaneous-instruction-extensions facility 3 is installed and bit 0 of the M3 field >> is one, a count of the total number of one bits in the 64-bit general register R2 is placed into >> general register R1. The result is a 64-bit unsigned integer in the range 0 to 64. >> >> >> Performed tier1 test on fastdebug build and didn't see any regression. > > Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: > > comment is not relevant anymore Ok; Let's ship it; ------------- PR Comment: https://git.openjdk.org/jdk/pull/19509#issuecomment-2163004278 From amitkumar at openjdk.org Wed Jun 12 13:27:26 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Wed, 12 Jun 2024 13:27:26 GMT Subject: Integrated: 8333382: [s390x] Enhance popcnt Instruction to use Z15 facilities In-Reply-To: References: Message-ID: On Sat, 1 Jun 2024 13:15:45 GMT, Amit Kumar wrote: > We need to move popcnt instruction implementation out of s390.ad file as it is required to be required some methods present in [JDK-8331126.](https://bugs.openjdk.org/browse/JDK-8331126) > > > When the miscellaneous-instruction-extensions facility 3 is not installed or bit 0 of the M3 > field is zero, a count of the number of one bits in each of the eight bytes of general register > R2 is placed into the corresponding byte of general register R1. Each byte of general register > R1 is an 8-bit binary integer in the range of 0-8. > > > > When the miscellaneous-instruction-extensions facility 3 is installed and bit 0 of the M3 field > is one, a count of the total number of one bits in the 64-bit general register R2 is placed into > general register R1. The result is a 64-bit unsigned integer in the range 0 to 64. > > > Performed tier1 test on fastdebug build and didn't see any regression. This pull request has now been integrated. Changeset: 5a8a9fdf Author: Amit Kumar URL: https://git.openjdk.org/jdk/commit/5a8a9fdfa599e8939a5c6675883a92c869474979 Stats: 228 lines in 6 files changed: 199 ins; 21 del; 8 mod 8333382: [s390x] Enhance popcnt Instruction to use Z15 facilities Reviewed-by: lucy, aph ------------- PR: https://git.openjdk.org/jdk/pull/19509 From aboldtch at openjdk.org Wed Jun 12 14:09:21 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Wed, 12 Jun 2024 14:09:21 GMT Subject: RFR: 8332139: SymbolTableHash::Node allocations allocates twice the required memory In-Reply-To: References: <8Q1-f5OGC6_vqM0W-k370VibVVLs7M8Dsyyele4FWT8=.53e09e58-0b6d-437c-85e4-ca89de97c123@github.com> Message-ID: <98ckfvy2mFg7sXNA3mjWJPaj-yN-AIgzXVlHGxCTcoE=.53cdad58-a87b-4a81-a09d-c89fb7908123@github.com> On Tue, 4 Jun 2024 15:48:28 GMT, Johan Sj?len wrote: >> The symbols are inline and allocated together with the ConcurrentHashTable (CHT) Nodes. The calculation used for the required size is `alloc_size = size + value.byte_size() + value.effective_length();` >> >> Where >> * `size == sizeof(SymbolTableHash::Node) == sizeof(void*) + sizeof(Symbol)` >> * `value.byte_size() == dynamic_sizeof(Symbol) == sizeof(Symbol) + ` >> * `value.effective_length() == dynamic_sizeof(Symbol) - sizeof(Symbol) == ` >> >> So `alloc_size` ends up being `sizeof(void*) /* node metadata */ + 2 * dynamic_sizeof(Symbol)` >> >> Because using the CHT with dynamically sized (and inlined) types requires knowing about its implementation details I chose to make the functionality for calculating the the allocation size a property of the CHT. It now queries the CHT for the node allocation size given the dynamic size required for the VALUE. >> >> The only current (implicit) restriction regarding using dynamically sized (and inlined) types in CHT is that the _value field C++ object ends where the Node object ends, so there is not padding bytes where the dynamic payload is allocated. (effectively `sizeof(VALUE) % alignof(Node) == 0` as long as there are no non-standard alignment fields in the Node metadata). I chose to test this as a runtime assert that the _value ends where the Node object ends, instead of a static assert with the alignment as it seemed to more explicitly show the intent of the check. >> >> Running testing tier1-7 > > Hi Axel, > > I don't understand why the patch isn't just `size_t alloc_size = size + value.effective_length()`, as this would be `sizeof(void*) + sizeof(Symbol) + sizeof()`. Could you explain that, please? > > Thank you. Thanks for the reviews. Discussed with @jdksjolen offline. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19214#issuecomment-2163098717 From aboldtch at openjdk.org Wed Jun 12 14:09:22 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Wed, 12 Jun 2024 14:09:22 GMT Subject: Integrated: 8332139: SymbolTableHash::Node allocations allocates twice the required memory In-Reply-To: <8Q1-f5OGC6_vqM0W-k370VibVVLs7M8Dsyyele4FWT8=.53e09e58-0b6d-437c-85e4-ca89de97c123@github.com> References: <8Q1-f5OGC6_vqM0W-k370VibVVLs7M8Dsyyele4FWT8=.53e09e58-0b6d-437c-85e4-ca89de97c123@github.com> Message-ID: On Mon, 13 May 2024 12:30:38 GMT, Axel Boldt-Christmas wrote: > The symbols are inline and allocated together with the ConcurrentHashTable (CHT) Nodes. The calculation used for the required size is `alloc_size = size + value.byte_size() + value.effective_length();` > > Where > * `size == sizeof(SymbolTableHash::Node) == sizeof(void*) + sizeof(Symbol)` > * `value.byte_size() == dynamic_sizeof(Symbol) == sizeof(Symbol) + ` > * `value.effective_length() == dynamic_sizeof(Symbol) - sizeof(Symbol) == ` > > So `alloc_size` ends up being `sizeof(void*) /* node metadata */ + 2 * dynamic_sizeof(Symbol)` > > Because using the CHT with dynamically sized (and inlined) types requires knowing about its implementation details I chose to make the functionality for calculating the the allocation size a property of the CHT. It now queries the CHT for the node allocation size given the dynamic size required for the VALUE. > > The only current (implicit) restriction regarding using dynamically sized (and inlined) types in CHT is that the _value field C++ object ends where the Node object ends, so there is not padding bytes where the dynamic payload is allocated. (effectively `sizeof(VALUE) % alignof(Node) == 0` as long as there are no non-standard alignment fields in the Node metadata). I chose to test this as a runtime assert that the _value ends where the Node object ends, instead of a static assert with the alignment as it seemed to more explicitly show the intent of the check. > > Running testing tier1-7 This pull request has now been integrated. Changeset: 2c1da6c6 Author: Axel Boldt-Christmas URL: https://git.openjdk.org/jdk/commit/2c1da6c6fa2e50856ea71c0e266961171bee1037 Stats: 22 lines in 4 files changed: 17 ins; 2 del; 3 mod 8332139: SymbolTableHash::Node allocations allocates twice the required memory Reviewed-by: iwalulya, coleenp ------------- PR: https://git.openjdk.org/jdk/pull/19214 From rrich at openjdk.org Wed Jun 12 14:17:17 2024 From: rrich at openjdk.org (Richard Reingruber) Date: Wed, 12 Jun 2024 14:17:17 GMT Subject: RFR: 8331117: [PPC64] secondary_super_cache does not scale well [v4] In-Reply-To: References: Message-ID: <3o6fTTneEk0a3_lQyQ2K4Yhg_8MTAr5hOC8z-PaAvmg=.acaf8112-56cb-4d91-bc47-aacdc2280563@github.com> On Wed, 29 May 2024 08:14:29 GMT, Martin Doerr wrote: >> PPC64 implementation of [JDK-8180450](https://bugs.openjdk.org/browse/JDK-8180450). Please review! >> I noticed that `r_array_length` is sometimes 0 and I don't see code for that on x86. Any idea? >> How can we verify it? By comparing the performance using the micro benchmarks? >> >> Micro benchmark results without patch (measured on Power10 with 2*8 hardware threads): >> >> Original >> SecondarySuperCacheHits: 13.033 ?(99.9%) 0.058 ns/op [Average] >> SecondarySuperCacheInterContention.test avgt 15 432.366 ? 8.364 ns/op >> SecondarySuperCacheInterContention.test:t1 avgt 15 432.310 ? 8.460 ns/op >> SecondarySuperCacheInterContention.test:t2 avgt 15 432.422 ? 10.819 ns/op >> SecondarySuperCacheIntraContention.test avgt 15 355.192 ? 3.597 ns/op >> SecondarySupersLookup.testNegative00 avgt 15 12.274 ? 0.026 ns/op >> SecondarySupersLookup.testNegative01 avgt 15 12.300 ? 0.039 ns/op >> SecondarySupersLookup.testNegative02 avgt 15 12.304 ? 0.034 ns/op >> SecondarySupersLookup.testNegative03 avgt 15 12.276 ? 0.050 ns/op >> SecondarySupersLookup.testNegative04 avgt 15 12.235 ? 0.044 ns/op >> SecondarySupersLookup.testNegative05 avgt 15 12.308 ? 0.156 ns/op >> SecondarySupersLookup.testNegative06 avgt 15 12.291 ? 0.048 ns/op >> SecondarySupersLookup.testNegative07 avgt 15 12.307 ? 0.052 ns/op >> SecondarySupersLookup.testNegative08 avgt 15 12.398 ? 0.075 ns/op >> SecondarySupersLookup.testNegative09 avgt 15 12.552 ? 0.122 ns/op >> SecondarySupersLookup.testNegative10 avgt 15 12.490 ? 0.083 ns/op >> SecondarySupersLookup.testNegative16 avgt 15 12.565 ? 0.092 ns/op >> SecondarySupersLookup.testNegative20 avgt 15 19.059 ? 0.958 ns/op >> SecondarySupersLookup.testNegative30 avgt 15 19.268 ? 0.124 ns/op >> SecondarySupersLookup.testNegative32 avgt 15 20.059 ? 0.114 ns/op >> SecondarySupersLookup.testNegative40 avgt 15 25.117 ? 0.368 ns/op >> SecondarySupersLookup.testNegative50 avgt 15 32.735 ? 0.359 ns/op >> SecondarySupersLookup.testNegative55 avgt 15 34.866 ? 0.152 ns/op >> SecondarySupersLookup.testNegative56 avgt 15 35.492 ? 0.276 ns/op >> SecondarySupersLookup.testNegative57 avgt 15 36.620 ? 0.334 ns/op >> SecondarySupersLookup.testNegative58 avgt 15 37.226 ? 0.180 ns/op >> SecondarySupersLookup.testNegative59 avgt 15 37.774 ? 0.241 ns/op >> SecondarySupersLookup.testNegative60 avgt 15 38.627 ? 1.451 ns/op >> SecondarySupersLookup.testNegative61 avgt 15 ... > > Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: > > Fix check for sign bit. src/hotspot/cpu/ppc/macroAssembler_ppc.cpp line 2210: > 2208: // data. > 2209: assert(Array::base_offset_in_bytes() == wordSize, "Adjust this code"); > 2210: assert(Array::length_offset_in_bytes() == 0, "Adjust this code"); I don't understand why the assertion for `Array::length_offset_in_bytes()` is needed. Isn't it sufficient to assert `Array::base_offset_in_bytes() == wordSize`? What would break if `Array::length_offset_in_bytes() == 4`? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19368#discussion_r1636557697 From jsjolen at openjdk.org Wed Jun 12 14:34:39 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Wed, 12 Jun 2024 14:34:39 GMT Subject: RFR: 8333658: NMT: Use an allocator with 4-byte pointers to save memory in NativeCallStackStorage [v15] In-Reply-To: References: Message-ID: <9gtY0aKcNme3bSB9FI_Zj6AWCr7Q2fIsmvvvksCIAJs=.242ee65e-9ad8-4aa3-a2fc-992db8360188@github.com> > Hi, > > This PR introduces a new allocator, `IndexedFreelistAllocator`. It uses a `GrowableArray` in order to have 4-byte "pointers" to its elements and also works as a freelist as unused slots form a linked list. This allocator cannot shrink its used memory. I'm always open for better names. > > We then use this allocator in order to store the `NativeCallStackStorage` hash table. This saves `4 * 4099 = ~16KiB` of memory as each table element is now only 4 bytes. For each stored stack it also saves 4 bytes. The fact that the allocator cannot shrink is fine, we do not ever remove stacks from the storage. > > The main point of this is to introduce a pointer-generic experimential API, so I also implemented CHeap and Arena allocators. It's currently placed in NMT, but we might want to move it into utilities. It uses a bit of template-magic, but my IDE (Emacs+clangd) was actually able to figure things out when the types didn't line up correctly etc, so it's not an enemy to IDE help. > > It sounded expensive to have the GrowableArray continously realloc its underlying data array, so I did a basic test where we allocate 1 000 000 stacks and push them into NativeCallStackStorage backed by different allocators. This is available in the PR. > > The results are as follows on linux-x64-slowdebug: > > > Generate stacks... Done > Time taken with GrowableArray: 8341.240945 > Time taken with CHeap: 12189.031318 > Time taken with Arena: 8800.703092 > Time taken with GrowableArray again: 8295.508829 > > > And on linux-x64: > > > Time taken with GrowableArray: 8378.018135 > Time taken with CHeap: 12437.347868 > Time taken with Arena: 8758.064717 > Time taken with GrowableArray again: 8391.076291 > > > Obviously, this is a very basic benchmark, but it seems like we're faster than CHeap and Arena for this case. Johan Sj?len has updated the pull request incrementally with three additional commits since the last revision: - Remove perf test - Add is_nil() to the CHeap and Arena allocators - Hopefully the last fix ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18979/files - new: https://git.openjdk.org/jdk/pull/18979/files/63413da0..6739241c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18979&range=14 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18979&range=13-14 Stats: 88 lines in 2 files changed: 7 ins; 77 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/18979.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18979/head:pull/18979 PR: https://git.openjdk.org/jdk/pull/18979 From mdoerr at openjdk.org Wed Jun 12 14:35:42 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 12 Jun 2024 14:35:42 GMT Subject: RFR: 8331117: [PPC64] secondary_super_cache does not scale well [v5] In-Reply-To: References: Message-ID: > PPC64 implementation of [JDK-8180450](https://bugs.openjdk.org/browse/JDK-8180450). Please review! > I noticed that `r_array_length` is sometimes 0 and I don't see code for that on x86. Any idea? > How can we verify it? By comparing the performance using the micro benchmarks? > > Micro benchmark results without patch (measured on Power10 with 2*8 hardware threads): > > Original > SecondarySuperCacheHits: 13.033 ?(99.9%) 0.058 ns/op [Average] > SecondarySuperCacheInterContention.test avgt 15 432.366 ? 8.364 ns/op > SecondarySuperCacheInterContention.test:t1 avgt 15 432.310 ? 8.460 ns/op > SecondarySuperCacheInterContention.test:t2 avgt 15 432.422 ? 10.819 ns/op > SecondarySuperCacheIntraContention.test avgt 15 355.192 ? 3.597 ns/op > SecondarySupersLookup.testNegative00 avgt 15 12.274 ? 0.026 ns/op > SecondarySupersLookup.testNegative01 avgt 15 12.300 ? 0.039 ns/op > SecondarySupersLookup.testNegative02 avgt 15 12.304 ? 0.034 ns/op > SecondarySupersLookup.testNegative03 avgt 15 12.276 ? 0.050 ns/op > SecondarySupersLookup.testNegative04 avgt 15 12.235 ? 0.044 ns/op > SecondarySupersLookup.testNegative05 avgt 15 12.308 ? 0.156 ns/op > SecondarySupersLookup.testNegative06 avgt 15 12.291 ? 0.048 ns/op > SecondarySupersLookup.testNegative07 avgt 15 12.307 ? 0.052 ns/op > SecondarySupersLookup.testNegative08 avgt 15 12.398 ? 0.075 ns/op > SecondarySupersLookup.testNegative09 avgt 15 12.552 ? 0.122 ns/op > SecondarySupersLookup.testNegative10 avgt 15 12.490 ? 0.083 ns/op > SecondarySupersLookup.testNegative16 avgt 15 12.565 ? 0.092 ns/op > SecondarySupersLookup.testNegative20 avgt 15 19.059 ? 0.958 ns/op > SecondarySupersLookup.testNegative30 avgt 15 19.268 ? 0.124 ns/op > SecondarySupersLookup.testNegative32 avgt 15 20.059 ? 0.114 ns/op > SecondarySupersLookup.testNegative40 avgt 15 25.117 ? 0.368 ns/op > SecondarySupersLookup.testNegative50 avgt 15 32.735 ? 0.359 ns/op > SecondarySupersLookup.testNegative55 avgt 15 34.866 ? 0.152 ns/op > SecondarySupersLookup.testNegative56 avgt 15 35.492 ? 0.276 ns/op > SecondarySupersLookup.testNegative57 avgt 15 36.620 ? 0.334 ns/op > SecondarySupersLookup.testNegative58 avgt 15 37.226 ? 0.180 ns/op > SecondarySupersLookup.testNegative59 avgt 15 37.774 ? 0.241 ns/op > SecondarySupersLookup.testNegative60 avgt 15 38.627 ? 1.451 ns/op > SecondarySupersLookup.testNegative61 avgt 15 39.395 ? 0.249 ns/op > SecondarySupersLookup.testNegative62 avgt 15 ... Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: Remove pointless assertion. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19368/files - new: https://git.openjdk.org/jdk/pull/19368/files/14fc650f..1736aa6a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19368&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19368&range=03-04 Stats: 1 line in 1 file changed: 0 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/19368.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19368/head:pull/19368 PR: https://git.openjdk.org/jdk/pull/19368 From mdoerr at openjdk.org Wed Jun 12 14:35:42 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 12 Jun 2024 14:35:42 GMT Subject: RFR: 8331117: [PPC64] secondary_super_cache does not scale well [v4] In-Reply-To: <3o6fTTneEk0a3_lQyQ2K4Yhg_8MTAr5hOC8z-PaAvmg=.acaf8112-56cb-4d91-bc47-aacdc2280563@github.com> References: <3o6fTTneEk0a3_lQyQ2K4Yhg_8MTAr5hOC8z-PaAvmg=.acaf8112-56cb-4d91-bc47-aacdc2280563@github.com> Message-ID: On Wed, 12 Jun 2024 14:14:25 GMT, Richard Reingruber wrote: >> Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix check for sign bit. > > src/hotspot/cpu/ppc/macroAssembler_ppc.cpp line 2210: > >> 2208: // data. >> 2209: assert(Array::base_offset_in_bytes() == wordSize, "Adjust this code"); >> 2210: assert(Array::length_offset_in_bytes() == 0, "Adjust this code"); > > I don't understand why the assertion for `Array::length_offset_in_bytes()` is needed. > Isn't it sufficient to assert `Array::base_offset_in_bytes() == wordSize`? > What would break if `Array::length_offset_in_bytes() == 4`? The assertion is pointless. I've removed it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19368#discussion_r1636589017 From jsjolen at openjdk.org Wed Jun 12 14:45:15 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Wed, 12 Jun 2024 14:45:15 GMT Subject: RFR: 8333658: NMT: Use an allocator with 4-byte pointers to save memory in NativeCallStackStorage [v15] In-Reply-To: <9gtY0aKcNme3bSB9FI_Zj6AWCr7Q2fIsmvvvksCIAJs=.242ee65e-9ad8-4aa3-a2fc-992db8360188@github.com> References: <9gtY0aKcNme3bSB9FI_Zj6AWCr7Q2fIsmvvvksCIAJs=.242ee65e-9ad8-4aa3-a2fc-992db8360188@github.com> Message-ID: On Wed, 12 Jun 2024 14:34:39 GMT, Johan Sj?len wrote: >> Hi, >> >> This PR introduces a new allocator, `IndexedFreelistAllocator`. It uses a `GrowableArray` in order to have 4-byte "pointers" to its elements and also works as a freelist as unused slots form a linked list. This allocator cannot shrink its used memory. I'm always open for better names. >> >> We then use this allocator in order to store the `NativeCallStackStorage` hash table. This saves `4 * 4099 = ~16KiB` of memory as each table element is now only 4 bytes. For each stored stack it also saves 4 bytes. The fact that the allocator cannot shrink is fine, we do not ever remove stacks from the storage. >> >> The main point of this is to introduce a pointer-generic experimential API, so I also implemented CHeap and Arena allocators. It's currently placed in NMT, but we might want to move it into utilities. It uses a bit of template-magic, but my IDE (Emacs+clangd) was actually able to figure things out when the types didn't line up correctly etc, so it's not an enemy to IDE help. >> >> It sounded expensive to have the GrowableArray continously realloc its underlying data array, so I did a basic test where we allocate 1 000 000 stacks and push them into NativeCallStackStorage backed by different allocators. This is available in the PR. >> >> The results are as follows on linux-x64-slowdebug: >> >> >> Generate stacks... Done >> Time taken with GrowableArray: 8341.240945 >> Time taken with CHeap: 12189.031318 >> Time taken with Arena: 8800.703092 >> Time taken with GrowableArray again: 8295.508829 >> >> >> And on linux-x64: >> >> >> Time taken with GrowableArray: 8378.018135 >> Time taken with CHeap: 12437.347868 >> Time taken with Arena: 8758.064717 >> Time taken with GrowableArray again: 8391.076291 >> >> >> Obviously, this is a very basic benchmark, but it seems like we're faster than CHeap and Arena for this case. > > Johan Sj?len has updated the pull request incrementally with three additional commits since the last revision: > > - Remove perf test > - Add is_nil() to the CHeap and Arena allocators > - Hopefully the last fix I had a go at making the `_idx` field const and hiding the constructor from external users. It turns out this is super painful, as we suddenly need to accommodate `GrowableArray`s long list of expected behavior from `E`. This means implementing constructors/copy constructors/copy assignment for I and BackingElement, much more than just a simple data carrier. Annoying, but doable. Personally, I'm fine leaving it as is, but I'd like the reviewers' opinions on this. With regards to double frees: I'm not entirely sure on how to implement this yet, as we don't have a tagged union so we can't discern between freeing an actual element and a pointer in the freelist. The obvious, but computationally expensive, solution is to traverse the freelist and see if we find the index which is being freed. The other solution is to, of course, tag the union at debug time, but that adds size to the elements. I think it might be better to do it with a tag, but that circles back to the issue that Thomas had with `I::_owner`. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18979#issuecomment-2163200919 From jsjolen at openjdk.org Wed Jun 12 14:49:32 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Wed, 12 Jun 2024 14:49:32 GMT Subject: RFR: 8333658: NMT: Use an allocator with 4-byte pointers to save memory in NativeCallStackStorage [v16] In-Reply-To: References: Message-ID: <7Cct-1XEeiZHrWVzyS0raTYcsVSOhW2ilF15g3G41LM=.b54a1b88-db2d-44bf-868d-414c9d72059d@github.com> > Hi, > > This PR introduces a new allocator, `IndexedFreelistAllocator`. It uses a `GrowableArray` in order to have 4-byte "pointers" to its elements and also works as a freelist as unused slots form a linked list. This allocator cannot shrink its used memory. I'm always open for better names. > > We then use this allocator in order to store the `NativeCallStackStorage` hash table. This saves `4 * 4099 = ~16KiB` of memory as each table element is now only 4 bytes. For each stored stack it also saves 4 bytes. The fact that the allocator cannot shrink is fine, we do not ever remove stacks from the storage. > > The main point of this is to introduce a pointer-generic experimential API, so I also implemented CHeap and Arena allocators. It's currently placed in NMT, but we might want to move it into utilities. It uses a bit of template-magic, but my IDE (Emacs+clangd) was actually able to figure things out when the types didn't line up correctly etc, so it's not an enemy to IDE help. > > It sounded expensive to have the GrowableArray continously realloc its underlying data array, so I did a basic test where we allocate 1 000 000 stacks and push them into NativeCallStackStorage backed by different allocators. This is available in the PR. > > The results are as follows on linux-x64-slowdebug: > > > Generate stacks... Done > Time taken with GrowableArray: 8341.240945 > Time taken with CHeap: 12189.031318 > Time taken with Arena: 8800.703092 > Time taken with GrowableArray again: 8295.508829 > > > And on linux-x64: > > > Time taken with GrowableArray: 8378.018135 > Time taken with CHeap: 12437.347868 > Time taken with Arena: 8758.064717 > Time taken with GrowableArray again: 8391.076291 > > > Obviously, this is a very basic benchmark, but it seems like we're faster than CHeap and Arena for this case. Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: Return on free if is_nil() ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18979/files - new: https://git.openjdk.org/jdk/pull/18979/files/6739241c..4b210c8a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18979&range=15 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18979&range=14-15 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/18979.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18979/head:pull/18979 PR: https://git.openjdk.org/jdk/pull/18979 From jsjolen at openjdk.org Wed Jun 12 15:23:14 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Wed, 12 Jun 2024 15:23:14 GMT Subject: RFR: 8333658: NMT: Use an allocator with 4-byte pointers to save memory in NativeCallStackStorage [v16] In-Reply-To: <7Cct-1XEeiZHrWVzyS0raTYcsVSOhW2ilF15g3G41LM=.b54a1b88-db2d-44bf-868d-414c9d72059d@github.com> References: <7Cct-1XEeiZHrWVzyS0raTYcsVSOhW2ilF15g3G41LM=.b54a1b88-db2d-44bf-868d-414c9d72059d@github.com> Message-ID: On Wed, 12 Jun 2024 14:49:32 GMT, Johan Sj?len wrote: >> Hi, >> >> This PR introduces a new allocator, `IndexedFreelistAllocator`. It uses a `GrowableArray` in order to have 4-byte "pointers" to its elements and also works as a freelist as unused slots form a linked list. This allocator cannot shrink its used memory. I'm always open for better names. >> >> We then use this allocator in order to store the `NativeCallStackStorage` hash table. This saves `4 * 4099 = ~16KiB` of memory as each table element is now only 4 bytes. For each stored stack it also saves 4 bytes. The fact that the allocator cannot shrink is fine, we do not ever remove stacks from the storage. >> >> The main point of this is to introduce a pointer-generic experimential API, so I also implemented CHeap and Arena allocators. It's currently placed in NMT, but we might want to move it into utilities. It uses a bit of template-magic, but my IDE (Emacs+clangd) was actually able to figure things out when the types didn't line up correctly etc, so it's not an enemy to IDE help. >> >> It sounded expensive to have the GrowableArray continously realloc its underlying data array, so I did a basic test where we allocate 1 000 000 stacks and push them into NativeCallStackStorage backed by different allocators. This is available in the PR. >> >> The results are as follows on linux-x64-slowdebug: >> >> >> Generate stacks... Done >> Time taken with GrowableArray: 8341.240945 >> Time taken with CHeap: 12189.031318 >> Time taken with Arena: 8800.703092 >> Time taken with GrowableArray again: 8295.508829 >> >> >> And on linux-x64: >> >> >> Time taken with GrowableArray: 8378.018135 >> Time taken with CHeap: 12437.347868 >> Time taken with Arena: 8758.064717 >> Time taken with GrowableArray again: 8391.076291 >> >> >> Obviously, this is a very basic benchmark, but it seems like we're faster than CHeap and Arena for this case. > > Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: > > Return on free if is_nil() I need to take a think about this, it's quite annoying that we're generating a new `I` per `IFLA`. I think that we could make it a global struct `IndexPointer`, or something like that. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18979#issuecomment-2163311133 From jonathanjoo at google.com Wed Jun 12 23:32:28 2024 From: jonathanjoo at google.com (Jonathan Joo) Date: Wed, 12 Jun 2024 16:32:28 -0700 Subject: Adaptable Heap Sizing for G1 GC Message-ID: Hello hotspot-dev and hotspot-gc-dev, I'd like to reopen discussion on Adaptable Heap Sizing (AHS) for the G1 Garbage Collector, since we now have some time to dedicate to bringing this effort to the OpenJDK Community. Please see https://mail.openjdk.org/pipermail/hotspot-gc-dev/2022-September/040096.html for the original thread. The bullet points contained in the above link are still largely the same, and we have made significant improvements to the service over the past few years, and found success deploying it broadly across jobs internally. Now that we feel the feature has matured, we'd like to introduce it to the OpenJDK community in hopes that it can be adopted for broader use. In short - the goal of Adaptable Heap Sizing is to improve memory usage and reduce OOMs for Java applications, especially those deployed in containerized environments. The key insights are as follows: 1. Applications with low memory requirements but configured with high RAM often use RAM unnecessarily. We can utilize GC CPU overhead metrics to help guide heap sizing, allowing for RAM savings in these scenarios. 2. For Java applications running in containers, we can bound Java heap usage based on our knowledge of the current container memory usage as well as the current container size, to prevent container OOMs. The implementation of AHS currently involves some fairly lightweight changes to the JVM, through the introduction of two new manageable flags. They are essentially the same as these two (open feature requests): - https://bugs.openjdk.org/browse/JDK-8236073 - https://bugs.openjdk.org/browse/JDK-8204088 In addition, we have a separate thread (outside of the JVM, in our custom Java launcher) which reads in GC CPU overhead data and container information, and calculates appropriate values for these two flags. We call this the AHS worker thread, and this thread updates frequently (currently every second). The vast majority of the AHS logic is in this worker thread - the introduction of the new JVM flags above simply gives AHS a way to tune GC heuristics given this additional information. Thomas Schatzl mentioned there is a similar-sounding effort going on in ZGC , and also there were folks outside of Google who expressed interest in this project, so I think it is an appropriate time to discuss this again on an open forum. Given the positive results we've had deploying AHS internally at Google, we feel this is a valuable feature to the broader Java community that should be able to be leveraged by all to achieve more stable and efficient Java heap behavior ? I'd appreciate hearing peoples' thoughts on this. Thank you! ~ Jonathan (P.S. For more information, a talk given about this project can be viewed here , though it is somewhat dated.) -------------- next part -------------- An HTML attachment was scrubbed... URL: From dholmes at openjdk.org Thu Jun 13 02:27:17 2024 From: dholmes at openjdk.org (David Holmes) Date: Thu, 13 Jun 2024 02:27:17 GMT Subject: RFR: 8330198: Add some class loading related perf counters to measure VM startup [v9] In-Reply-To: References: Message-ID: On Wed, 12 Jun 2024 05:23:41 GMT, Calvin Cheung wrote: >> Adding a few perf counters related to class loading to measure VM startup. The counters are only active if the user specifies `-Xlog:init` in the command line. A diagnostic flag `ProfileClassLinkage` is added to control the new counters. The flag is set to false by default and will be enabled if `-Xlog:init` is specified. >> >> This change is already in the leyden/premain branch. There are more counters in the branch to measure other stuff. For now, just upstreaming class loader related counters. >> >> Refer to the [comment](https://bugs.openjdk.org/browse/JDK-8330198?focusedId=14665311&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14665311) in the bug report for an example output. >> >> Passed tiers 1 - 4 testing. > > Calvin Cheung has updated the pull request incrementally with one additional commit since the last revision: > > remove log_vm_stats() Changes requested by dholmes (Reviewer). src/hotspot/share/runtime/java.cpp line 362: > 360: if (log_is_enabled(Info, perf, class, link)) { > 361: ClassLoader::print_counters(tty); > 362: } Again this needs to comment why we check the log is active but write to tty instead of the logstream. Or we could put the `log_is_enabled` check inside `print_counters` rather than require callers to do it. That also allows us to print that the counters are disabled eg. void ClassLoader::print_counters(outputStream *st) { // The counters are only active if the logging is enabled, but // we print to the passed in outputStream as requested. if (log_is_enabled(Info, perf, class, link)) { st->print_cr("ClassLoader:"); ... } else { st->print_cr("ClassLoader: "); } ------------- PR Review: https://git.openjdk.org/jdk/pull/18790#pullrequestreview-2114548996 PR Review Comment: https://git.openjdk.org/jdk/pull/18790#discussion_r1637396085 From dholmes at openjdk.org Thu Jun 13 02:50:13 2024 From: dholmes at openjdk.org (David Holmes) Date: Thu, 13 Jun 2024 02:50:13 GMT Subject: RFR: 8333887: ubsan: unsafe.cpp:247:13: runtime error: store to null pointer of type 'volatile int' In-Reply-To: References: Message-ID: On Wed, 12 Jun 2024 10:16:32 GMT, Lutz Schmidt wrote: >> Sounds like a good idea. See also the discussion about `ATTRIBUTE_NO_UBSAN` here https://github.com/openjdk/jdk/pull/19597 > > I like the idea of an encapsulating macro as well. I guess ATTRIBUTE_NO_UBSAN will do if we have a precedent for that naming. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19630#discussion_r1637422903 From iklam at openjdk.org Thu Jun 13 04:40:47 2024 From: iklam at openjdk.org (Ioi Lam) Date: Thu, 13 Jun 2024 04:40:47 GMT Subject: RFR: 8293980: Resolve CONSTANT_FieldRef at CDS dump time [v6] In-Reply-To: <6tYcoQdH8mEhbBRUoAcOi3Gue7Lz9qIjJh3GpcLKGGE=.0b286773-cc03-434e-88ca-2c9cc5efce67@github.com> References: <6tYcoQdH8mEhbBRUoAcOi3Gue7Lz9qIjJh3GpcLKGGE=.0b286773-cc03-434e-88ca-2c9cc5efce67@github.com> Message-ID: > ### Overview > > This PR archives `CONSTANT_FieldRef` entries in the _resolved_ state when it's safe to do so. > > I.e., when a `CONSTANT_FieldRef` constant pool entry in class `A` refers to a *non-static* field `B.F`, > - `B` is the same class as `A`; or > - `B` is a supertype of `A`; or > - `B` is one of the [vmClasses](https://github.com/openjdk/jdk/blob/3d4185a9ce482cc655a4c67f39cb2682b02ae4fe/src/hotspot/share/classfile/vmClasses.hpp), and `A` is loaded by the boot class loader. > > Under these conditions, it's guaranteed that whenever `A` tries to use this entry at runtime, `B` is guaranteed to have already been resolved in A's system dictionary, to the same value as resolved during dump time. > > Therefore, we can safely archive the `ResolvedFieldEntry` in class `A` that refers to `B.F`. > > (Note that we do not archive the `CONSTANT_FieldRef` entries for static fields, as the resolution of such entries can lead to class initialization at runtime. We plan to handle them in a future RFE.) > > ### Static CDS Archive > > This feature is implemented in three steps for static CDS archive dump: > > 1. At the end of the training run, `ClassListWriter` iterates over all loaded classes and writes the indices of their resolved `Class` and `FieldRef` constant pool entries into the classlist file, with the `@cp` prefix. E.g., the following means that the constant pool entries at indices 2, 19 and 106 were resolved during the training run: > > @cp java/util/Objects 2 19 106 > > 2. When creating the static CDS archive from the classlist file, `ClassListParser` processes the `@cp` entries and resolves all the indicated entries. > > 3. Inside the `ArchiveBuilder::make_klasses_shareable()` function, we iterate over all entries in all archived `ConstantPools`. When we see a _resolved_ entry that does not satisfy the safety requirements as stated in _Overview_, we revert it back to the unresolved state. > > ### Dynamic CDS Archive > > When dumping the dynamic CDS archive, `ClassListWriter` and `ClassListParser` are not used, so steps 1 and 2 are skipped. We only perform step 3 when the archive is being written. > > ### Limitations > > - For safety, we limit this optimization to only classes loaded by the boot, platform, and app class loaders. This may be relaxed in the future. > - We archive only the constant pool entries that are actually resolved during the training run. We don't speculatively resolve other entries, as doing so may cause C2 to unnecessarily generate code for paths that are never taken by the app... Ioi Lam has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains ten additional commits since the last revision: - Merge branch 'master' into 8293980-resolve-fields-at-dumptime - Added test case for safety with putfield against final fields (related to JDK-8157181) - Moved the test ResolvedConstants.java to resolvedConstants, as we will have more tests cases in this area - @DanHeidinga comments - Fixed typo in previous commit - Merge branch 'master' into 8293980-resolve-fields-at-dumptime - @matias9927 comments - moved remove_resolved_field_entries_if_non_deterministic() to cpCache - Merge branch 'master' into 8293980-resolve-fields-at-dumptime - 8293980: Resolve CONSTANT_FieldRef at CDS dump time ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19355/files - new: https://git.openjdk.org/jdk/pull/19355/files/58e08e18..828683f5 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19355&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19355&range=04-05 Stats: 54223 lines in 1318 files changed: 33196 ins; 15947 del; 5080 mod Patch: https://git.openjdk.org/jdk/pull/19355.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19355/head:pull/19355 PR: https://git.openjdk.org/jdk/pull/19355 From ccheung at openjdk.org Thu Jun 13 05:00:33 2024 From: ccheung at openjdk.org (Calvin Cheung) Date: Thu, 13 Jun 2024 05:00:33 GMT Subject: RFR: 8330198: Add some class loading related perf counters to measure VM startup [v10] In-Reply-To: References: Message-ID: <-l1P3QkSX4ynnQA_sEQmaTUK63x6X30hF72n-FmoZuI=.dd3cc763-be18-4fd5-8478-f2a777769893@github.com> > Adding a few perf counters related to class loading to measure VM startup. The counters are only active if the user specifies `-Xlog:init` in the command line. A diagnostic flag `ProfileClassLinkage` is added to control the new counters. The flag is set to false by default and will be enabled if `-Xlog:init` is specified. > > This change is already in the leyden/premain branch. There are more counters in the branch to measure other stuff. For now, just upstreaming class loader related counters. > > Refer to the [comment](https://bugs.openjdk.org/browse/JDK-8330198?focusedId=14665311&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14665311) in the bug report for an example output. > > Passed tiers 1 - 4 testing. Calvin Cheung has updated the pull request incrementally with one additional commit since the last revision: check log_is_enable() in ClassLoader::print_counters ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18790/files - new: https://git.openjdk.org/jdk/pull/18790/files/8a46e632..d0c857c5 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18790&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18790&range=08-09 Stats: 17 lines in 2 files changed: 3 ins; 2 del; 12 mod Patch: https://git.openjdk.org/jdk/pull/18790.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18790/head:pull/18790 PR: https://git.openjdk.org/jdk/pull/18790 From ccheung at openjdk.org Thu Jun 13 05:03:15 2024 From: ccheung at openjdk.org (Calvin Cheung) Date: Thu, 13 Jun 2024 05:03:15 GMT Subject: RFR: 8330198: Add some class loading related perf counters to measure VM startup [v9] In-Reply-To: References: Message-ID: On Thu, 13 Jun 2024 02:19:16 GMT, David Holmes wrote: >> Calvin Cheung has updated the pull request incrementally with one additional commit since the last revision: >> >> remove log_vm_stats() > > src/hotspot/share/runtime/java.cpp line 362: > >> 360: if (log_is_enabled(Info, perf, class, link)) { >> 361: ClassLoader::print_counters(tty); >> 362: } > > Again this needs to comment why we check the log is active but write to tty instead of the logstream. > > Or we could put the `log_is_enabled` check inside `print_counters` rather than require callers to do it. That also allows us to print that the counters are disabled eg. > > void ClassLoader::print_counters(outputStream *st) { > // The counters are only active if the logging is enabled, but > // we print to the passed in outputStream as requested. > if (log_is_enabled(Info, perf, class, link)) { > st->print_cr("ClassLoader:"); > ... > } else { > st->print_cr("ClassLoader: "); > } I've made the above changes as you suggested without the else part to be consistent with printing of other statistics. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18790#discussion_r1637537375 From stuefe at openjdk.org Thu Jun 13 05:06:13 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 13 Jun 2024 05:06:13 GMT Subject: RFR: 8333994: NMT: call stacks should show source information In-Reply-To: References: Message-ID: <98_0fQBMuoVwLgIA71z04hFVsf-1fZIEdlr8JuN9-14=.5f828abf-47be-44fd-9eab-763df685a1b2@github.com> On Tue, 11 Jun 2024 12:38:09 GMT, Thomas Stuefe wrote: > Patch adds printing of source file and line number to NMT call stacks that are printed in detail mode. Useful for hunting down leaks. Ping @jdksjolen @afshin-zafari @gerard-ziemski? Its simple. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19655#issuecomment-2164381763 From gcao at openjdk.org Thu Jun 13 05:09:19 2024 From: gcao at openjdk.org (Gui Cao) Date: Thu, 13 Jun 2024 05:09:19 GMT Subject: RFR: 8334078: TestIntVect.java fails without RVV after JDK-8332153 Message-ID: Hi, test/hotspot/jtreg/compiler/c2/cr7200264/TestIntVect.java fails without RVV after [JDK-8332153](https://bugs.openjdk.org/browse/JDK-8332153) in fastdebug mode. see jbs issue for exception information. As discussed on jbs, we prefixed the single letter cpu features with rv so that there would be no problem. And to synchronize the test cases. ### Testing - [ ] All Tests related to all changes in this patch on Banana Pi BPI-F3 board (with RVV1.0) (fastdebug) - [ ] All Tests related to all changes in this patch on SOPHON SG2042 (fastdebug) - [ ] Run tier1-3 tests on SOPHON SG2042 (fastdebug) ------------- Commit messages: - Simplify strcat - 8334078: TestIntVect.java fails without RVV after JDK-8332153 Changes: https://git.openjdk.org/jdk/pull/19686/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19686&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8334078 Stats: 207 lines in 14 files changed: 3 ins; 149 del; 55 mod Patch: https://git.openjdk.org/jdk/pull/19686.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19686/head:pull/19686 PR: https://git.openjdk.org/jdk/pull/19686 From fyang at openjdk.org Thu Jun 13 05:52:12 2024 From: fyang at openjdk.org (Fei Yang) Date: Thu, 13 Jun 2024 05:52:12 GMT Subject: RFR: 8334078: TestIntVect.java fails without RVV after JDK-8332153 In-Reply-To: References: Message-ID: On Thu, 13 Jun 2024 03:24:15 GMT, Gui Cao wrote: > Hi, test/hotspot/jtreg/compiler/c2/cr7200264/TestIntVect.java fails without RVV after [JDK-8332153](https://bugs.openjdk.org/browse/JDK-8332153) in fastdebug mode. see jbs issue for exception information. > > As discussed on jbs, we prefixed the single letter cpu features with rv so that there would be no problem. And to synchronize the test cases. > > ### Testing > - [ ] All Tests related to all changes in this patch on Banana Pi BPI-F3 board (with RVV1.0) (fastdebug) > - [ ] All Tests related to all changes in this patch on SOPHON SG2042 (fastdebug) > - [ ] Run tier1-3 tests on SOPHON SG2042 (fastdebug) Nice cleanup! Thanks! Suggestion about the JBS title: `RISC-V: TestIntVect.java fails after JDK-8332153 when running without RVV` ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19686#pullrequestreview-2114761145 From rehn at openjdk.org Thu Jun 13 07:25:15 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Thu, 13 Jun 2024 07:25:15 GMT Subject: RFR: 8332689: RISC-V: Use load instead of trampolines [v8] In-Reply-To: References: <4TVE4G5mkc7U0LI7pj_rd6jbG0vGyFTXWhwX86wA4tg=.3f823c2e-136f-431f-8e06-d61c652f36d2@github.com> <_MuwK3i7Ru8buBWkrto34BYlmlfDFPKDBwsp3Vaq2S8=.93e16dca-260a-4e4d-b414-fc66412bd5c7@github.com> Message-ID: On Wed, 12 Jun 2024 07:42:54 GMT, Fei Yang wrote: > Ah, that doesn't seems to be easier than before ~. I am not familar with s390 and I didn't see where it updates the target address in CP. I will take another look at our current approach and see if there are further improvements we can do. I have tested a few different approaches, as we are crossing code sections we need to track this. The two which seems most reasoanble is: - Add offset to the four relocators static/virt/opt/runtime as RV only. When we set the destination on relocation we update the instructions to point to CP + offset (where we get offset from relocator), and finally set the new destination at this location. - Add a second relocator to the call which tracks the constant pool address. When we set the destination on relocation we lookup the second relocator and get the address from it, and finally set the dest at this location. - Maybe there is a smarter way not apparent to me? As there is a lot of assumptions about relocators both requires random shared code changes. E.g. CodeBuffers with 0 sized CP, assuming only one relocator for PC-range, etc... I suggest this require a more elaborate fix as I would want to add support for global table (in range of auipc+ld from CC) where we can put shared addresses. I think the second solution with additional relocator where we just get the address, which then could either be local CP or global CP would be best? Hence I'm suggesting this work would be done outside this PR, and suggest we keep using stubs for now. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19453#issuecomment-2164794944 From dholmes at openjdk.org Thu Jun 13 07:51:15 2024 From: dholmes at openjdk.org (David Holmes) Date: Thu, 13 Jun 2024 07:51:15 GMT Subject: RFR: 8330198: Add some class loading related perf counters to measure VM startup [v10] In-Reply-To: <-l1P3QkSX4ynnQA_sEQmaTUK63x6X30hF72n-FmoZuI=.dd3cc763-be18-4fd5-8478-f2a777769893@github.com> References: <-l1P3QkSX4ynnQA_sEQmaTUK63x6X30hF72n-FmoZuI=.dd3cc763-be18-4fd5-8478-f2a777769893@github.com> Message-ID: On Thu, 13 Jun 2024 05:00:33 GMT, Calvin Cheung wrote: >> Adding a few perf counters related to class loading to measure VM startup. The counters are only active if the user specifies `-Xlog:init` in the command line. A diagnostic flag `ProfileClassLinkage` is added to control the new counters. The flag is set to false by default and will be enabled if `-Xlog:init` is specified. >> >> This change is already in the leyden/premain branch. There are more counters in the branch to measure other stuff. For now, just upstreaming class loader related counters. >> >> Refer to the [comment](https://bugs.openjdk.org/browse/JDK-8330198?focusedId=14665311&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14665311) in the bug report for an example output. >> >> Passed tiers 1 - 4 testing. > > Calvin Cheung has updated the pull request incrementally with one additional commit since the last revision: > > check log_is_enable() in ClassLoader::print_counters Okay. But lets see if @iklam is happy with where things ended up before integrating. Thanks ------------- Marked as reviewed by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18790#pullrequestreview-2114999956 From jsjolen at openjdk.org Thu Jun 13 08:14:11 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Thu, 13 Jun 2024 08:14:11 GMT Subject: RFR: 8333994: NMT: call stacks should show source information In-Reply-To: References: Message-ID: On Tue, 11 Jun 2024 12:38:09 GMT, Thomas Stuefe wrote: > Patch adds printing of source file and line number to NMT call stacks that are printed in detail mode. Useful for hunting down leaks. LGTM, but I'd like another reviewer. ------------- Marked as reviewed by jsjolen (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19655#pullrequestreview-2115086433 From stuefe at openjdk.org Thu Jun 13 08:20:15 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 13 Jun 2024 08:20:15 GMT Subject: RFR: 8333658: NMT: Use an allocator with 4-byte pointers to save memory in NativeCallStackStorage [v16] In-Reply-To: References: <7Cct-1XEeiZHrWVzyS0raTYcsVSOhW2ilF15g3G41LM=.b54a1b88-db2d-44bf-868d-414c9d72059d@github.com> Message-ID: On Wed, 12 Jun 2024 15:20:51 GMT, Johan Sj?len wrote: > I need to take a think about this, it's quite annoying that we're generating a new `I` per `IFLA`. I think that we could make it a global struct `IndexPointer`, or something like that. TBH, I am quite opinionated about this. One example: Yesterday, I tried to rework the VMATreeTest.TestConsistencyWithSimpleTracker_vm. See issue https://bugs.openjdk.org/browse/JDK-8334179. I first tried to implement a more efficient memory layout for the SimpleVMATracker (everything in SimpleVMATracker::Info could, essentially, stored in a single byte, giving us a much better cache performance). But for that, I would have needed the numerical value of the stack index. But that is hidden away in NativeCallStackStorage::StackIndex. Apart from giving me RSI from typing, I cannot just extract the integer from NativeCallStackStorage::StackIndex since its sole purpose in life is to hide that integer. Of course, I can declare yet another friend class. But that's ugly and does not scale. I really abhor the friend class concept since it subverts good encapsulation. It means I have different classes that can freely rummage around in my innards. So I need IDE introspection to actually know whats going on. For a terrible example of how complex this gets see Chunk and Arena and all its child classes, and try to figure out who does modify the Arena innards when and why. Compare that with Metaspace, which is the spiritual successor of Arenas. Much better separation of concerns and encapsulation. I can kind of see the point with test classes, but even then, a better way would be to make the test class an inner class. TL;DR: I think these wrappers make life difficult while giving only minuscule protection against errors - if any at all. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18979#issuecomment-2164955577 From stuefe at openjdk.org Thu Jun 13 08:37:15 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 13 Jun 2024 08:37:15 GMT Subject: RFR: 8333658: NMT: Use an allocator with 4-byte pointers to save memory in NativeCallStackStorage [v16] In-Reply-To: <7Cct-1XEeiZHrWVzyS0raTYcsVSOhW2ilF15g3G41LM=.b54a1b88-db2d-44bf-868d-414c9d72059d@github.com> References: <7Cct-1XEeiZHrWVzyS0raTYcsVSOhW2ilF15g3G41LM=.b54a1b88-db2d-44bf-868d-414c9d72059d@github.com> Message-ID: On Wed, 12 Jun 2024 14:49:32 GMT, Johan Sj?len wrote: >> Hi, >> >> This PR introduces a new allocator, `IndexedFreelistAllocator`. It uses a `GrowableArray` in order to have 4-byte "pointers" to its elements and also works as a freelist as unused slots form a linked list. This allocator cannot shrink its used memory. I'm always open for better names. >> >> We then use this allocator in order to store the `NativeCallStackStorage` hash table. This saves `4 * 4099 = ~16KiB` of memory as each table element is now only 4 bytes. For each stored stack it also saves 4 bytes. The fact that the allocator cannot shrink is fine, we do not ever remove stacks from the storage. >> >> The main point of this is to introduce a pointer-generic experimential API, so I also implemented CHeap and Arena allocators. It's currently placed in NMT, but we might want to move it into utilities. It uses a bit of template-magic, but my IDE (Emacs+clangd) was actually able to figure things out when the types didn't line up correctly etc, so it's not an enemy to IDE help. >> >> It sounded expensive to have the GrowableArray continously realloc its underlying data array, so I did a basic test where we allocate 1 000 000 stacks and push them into NativeCallStackStorage backed by different allocators. This is available in the PR. >> >> The results are as follows on linux-x64-slowdebug: >> >> >> Generate stacks... Done >> Time taken with GrowableArray: 8341.240945 >> Time taken with CHeap: 12189.031318 >> Time taken with Arena: 8800.703092 >> Time taken with GrowableArray again: 8295.508829 >> >> >> And on linux-x64: >> >> >> Time taken with GrowableArray: 8378.018135 >> Time taken with CHeap: 12437.347868 >> Time taken with Arena: 8758.064717 >> Time taken with GrowableArray again: 8391.076291 >> >> >> Obviously, this is a very basic benchmark, but it seems like we're faster than CHeap and Arena for this case. > > Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: > > Return on free if is_nil() Some general remarks, (apart from my remark about the integer wrapper), and then I wait until you say its ready for review. 1) I would remove/not bother with the Allocators. There isn't much of a point, and small code is good code. The arena version is particularly questionable since Arenas don't really support arbitrary deallocation. I would not want anyone to use this allocator in real code. 2) I would like to have a non-growing version. Optionally, one where I can hand in an address range, and that gets used. Could possibly be combined for simplicity (if you specify a range, its a non-growing array). Reason: I want to be able to place stuff that needs to be address-stable, and I often need to do this in pre-allocated ranges. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18979#issuecomment-2165000046 From erik.osterlund at oracle.com Thu Jun 13 11:17:52 2024 From: erik.osterlund at oracle.com (Erik Osterlund) Date: Thu, 13 Jun 2024 11:17:52 +0000 Subject: Adaptable Heap Sizing for G1 GC In-Reply-To: References: Message-ID: <8717965B-DD60-4D97-8AA8-564194083D51@oracle.com> Hi Jonathan, I?m currently working on automatic heap sizing for ZGC. My vision is that users shouldn?t have to set heap sizes. Would love to see that in G1 as well. What you are describing sounds like it would do something similar. Having said that, it seems like the concrete changes you are proposing for OpenJDK, would not actually yield automatic heap sizing for the user. By the sound of it, you would need your special launcher with an extra thread that contains the actual heap sizing policy. The proposed JVM changes are mostly for being *able* to change the heap sizing policies externally, but without any policy shipped that actually changes it. While having a pluggable policy is great because anyone can put in their own favourite policy, there is also an obvious disadvantage that 99.9% of deployments won?t have any special launcher or supplier of an external heap sizing policy, or even know what we are talking about. Therefore, unless we also ship the policies, I unfortunately think that limits the usefulness of the feature. If, however, a policy was shipped so the heap can be sized automatically, I think that would make it much more widely useful. In my automatic heap sizing work, the goal is to ship both the mechanisms and the policies needed to automatically size (and resize) the heap, adapting to changing load and environments. Are you open to the idea of shipping a policy that actually changes the heap size as well? It would be great to be aligned on this, I think. Thanks, /Erik On 13 Jun 2024, at 01:32, Jonathan Joo wrote: Hello hotspot-dev and hotspot-gc-dev, I'd like to reopen discussion on Adaptable Heap Sizing (AHS) for the G1 Garbage Collector, since we now have some time to dedicate to bringing this effort to the OpenJDK Community. Please see https://mail.openjdk.org/pipermail/hotspot-gc-dev/2022-September/040096.html for the original thread. The bullet points contained in the above link are still largely the same, and we have made significant improvements to the service over the past few years, and found success deploying it broadly across jobs internally. Now that we feel the feature has matured, we'd like to introduce it to the OpenJDK community in hopes that it can be adopted for broader use. In short - the goal of Adaptable Heap Sizing is to improve memory usage and reduce OOMs for Java applications, especially those deployed in containerized environments. The key insights are as follows: 1. Applications with low memory requirements but configured with high RAM often use RAM unnecessarily. We can utilize GC CPU overhead metrics to help guide heap sizing, allowing for RAM savings in these scenarios. 2. For Java applications running in containers, we can bound Java heap usage based on our knowledge of the current container memory usage as well as the current container size, to prevent container OOMs. The implementation of AHS currently involves some fairly lightweight changes to the JVM, through the introduction of two new manageable flags. They are essentially the same as these two (open feature requests): * https://bugs.openjdk.org/browse/JDK-8236073 * https://bugs.openjdk.org/browse/JDK-8204088 In addition, we have a separate thread (outside of the JVM, in our custom Java launcher) which reads in GC CPU overhead data and container information, and calculates appropriate values for these two flags. We call this the AHS worker thread, and this thread updates frequently (currently every second). The vast majority of the AHS logic is in this worker thread - the introduction of the new JVM flags above simply gives AHS a way to tune GC heuristics given this additional information. Thomas Schatzl mentioned there is a similar-sounding effort going on in ZGC, and also there were folks outside of Google who expressed interest in this project, so I think it is an appropriate time to discuss this again on an open forum. Given the positive results we've had deploying AHS internally at Google, we feel this is a valuable feature to the broader Java community that should be able to be leveraged by all to achieve more stable and efficient Java heap behavior ? I'd appreciate hearing peoples' thoughts on this. Thank you! ~ Jonathan (P.S. For more information, a talk given about this project can be viewed here, though it is somewhat dated.) -------------- next part -------------- An HTML attachment was scrubbed... URL: From amitkumar at openjdk.org Thu Jun 13 13:17:43 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Thu, 13 Jun 2024 13:17:43 GMT Subject: RFR: 8332602: [s390x] Improve itable_stub Message-ID: s390x Port similar to [JDK-8305959 (x86)](https://bugs.openjdk.org/browse/JDK-8305959) and [JDK-8307352(aarch64)](https://bugs.openjdk.org/browse/JDK-8307352) ------------- Commit messages: - z_r1 is free now, so we can use that - s390x Port Changes: https://git.openjdk.org/jdk/pull/19698/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19698&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8332602 Stats: 158 lines in 3 files changed: 140 ins; 7 del; 11 mod Patch: https://git.openjdk.org/jdk/pull/19698.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19698/head:pull/19698 PR: https://git.openjdk.org/jdk/pull/19698 From szaldana at openjdk.org Thu Jun 13 13:45:26 2024 From: szaldana at openjdk.org (Sonia Zaldana Calles) Date: Thu, 13 Jun 2024 13:45:26 GMT Subject: RFR: 8333769: Pretouching tests dont test pretouching Message-ID: Hi all, This PR addresses [8333769](https://bugs.openjdk.org/browse/JDK-8333769). We already have a test for parallel GC that makes sure pretouching behaviour is correct ([test/hotspot/jtreg/gc/parallel/TestAlwaysPreTouchBehavior.java](https://github.com/openjdk/jdk/blob/master/test/hotspot/jtreg/gc/parallel/TestAlwaysPreTouchBehavior.java)). Unfortunately this test is limited to linux because of the scanning of `/proc/pid/status`. With this patch I propose two changes: - Adding a function to the os namespace `os::rss` and exposing this API via WhiteBox. This in turn allows us to generalize the above test to be used across all platforms. - Running the modified test with all collectors. Additionally, I considered removing other pre-existing pretouch tests (for example, this [z test](https://github.com/openjdk/jdk/blob/master/test/hotspot/jtreg/gc/z/TestAlwaysPreTouch.java)), as this new test is a bit more thorough. However, I noticed that some of these tests run alongside other configurables such as varying numbers of parallel GC threads, varying heap sizes, etc. Therefore, there might not be any harm in running these tests as well. Looking forward to your comments, Sonia ------------- Commit messages: - 8333769: Pretouching tests dont test pretouching Changes: https://git.openjdk.org/jdk/pull/19699/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19699&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8333769 Stats: 240 lines in 9 files changed: 160 ins; 80 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/19699.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19699/head:pull/19699 PR: https://git.openjdk.org/jdk/pull/19699 From mdoerr at openjdk.org Thu Jun 13 14:00:22 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Thu, 13 Jun 2024 14:00:22 GMT Subject: RFR: 8333887: ubsan: unsafe.cpp:247:13: runtime error: store to null pointer of type 'volatile int' In-Reply-To: References: Message-ID: On Mon, 10 Jun 2024 13:52:15 GMT, Matthias Baesken wrote: > When running with ubsan enabled binaries, in a number of tests like > jdk/jfr/event/runtime/TestShutdownEvent.jtr > jdk/jfr/jvm/TestDumpOnCrash.jtr > we get those ubsan-errors : > > src/hotspot/share/prims/unsafe.cpp:247:13: runtime error: store to null pointer of type 'volatile int' > #0 0x7f0be9a3e10d in MemoryAccess::put(int) src/hotspot/share/prims/unsafe.cpp:247 > #1 0x7f0be9a3e10d in Unsafe_PutInt src/hotspot/share/prims/unsafe.cpp:315 > #2 0x7f0bd0502e7b () > #3 0x7f0bd04fe01f () > #4 0x7f0bd04fe01f () > #5 0x7f0bd04fe525 () > #6 0x7f0bd04f6c85 () > #7 0x7f0be80a2972 in JavaCalls::call_helper(JavaValue*, methodHandle const&, JavaCallArguments*, JavaThread*) src/hotspot/share/runtime/javaCalls.cpp:415 > #8 0x7f0be83160d8 in jni_invoke_static src/hotspot/share/prims/jni.cpp:888 > #9 0x7f0be831d875 in jni_CallStaticVoidMethod src/hotspot/share/prims/jni.cpp:1717 > #10 0x7f0beed32cf8 in invokeStaticMainWithArgs src/java.base/share/native/libjli/java.c:418 > #11 0x7f0beed35894 in JavaMain src/java.base/share/native/libjli/java.c:623 > #12 0x7f0beed3cf68 in ThreadJavaMain src/java.base/unix/native/libjli/java_md.c:653 > #13 0x7f0beeceb6e9 in start_thread (/lib64/libpthread.so.0+0xa6e9) (BuildId: 2f8d3c2d0f4d7888c2598d2ff6356537f5708a73) > > Looks like we use unsafe to put/write to 0 e.g. to cause a crash. Probably we could add an attribute to the function so that ubsan stops complaining (the put to 0 is done for a reason but ubsan cannot know this). Fine with me if you do the macro thing later. ------------- Marked as reviewed by mdoerr (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19630#pullrequestreview-2115889145 From mbaesken at openjdk.org Thu Jun 13 14:05:17 2024 From: mbaesken at openjdk.org (Matthias Baesken) Date: Thu, 13 Jun 2024 14:05:17 GMT Subject: RFR: 8333887: ubsan: unsafe.cpp:247:13: runtime error: store to null pointer of type 'volatile int' In-Reply-To: References: Message-ID: On Mon, 10 Jun 2024 13:52:15 GMT, Matthias Baesken wrote: > When running with ubsan enabled binaries, in a number of tests like > jdk/jfr/event/runtime/TestShutdownEvent.jtr > jdk/jfr/jvm/TestDumpOnCrash.jtr > we get those ubsan-errors : > > src/hotspot/share/prims/unsafe.cpp:247:13: runtime error: store to null pointer of type 'volatile int' > #0 0x7f0be9a3e10d in MemoryAccess::put(int) src/hotspot/share/prims/unsafe.cpp:247 > #1 0x7f0be9a3e10d in Unsafe_PutInt src/hotspot/share/prims/unsafe.cpp:315 > #2 0x7f0bd0502e7b () > #3 0x7f0bd04fe01f () > #4 0x7f0bd04fe01f () > #5 0x7f0bd04fe525 () > #6 0x7f0bd04f6c85 () > #7 0x7f0be80a2972 in JavaCalls::call_helper(JavaValue*, methodHandle const&, JavaCallArguments*, JavaThread*) src/hotspot/share/runtime/javaCalls.cpp:415 > #8 0x7f0be83160d8 in jni_invoke_static src/hotspot/share/prims/jni.cpp:888 > #9 0x7f0be831d875 in jni_CallStaticVoidMethod src/hotspot/share/prims/jni.cpp:1717 > #10 0x7f0beed32cf8 in invokeStaticMainWithArgs src/java.base/share/native/libjli/java.c:418 > #11 0x7f0beed35894 in JavaMain src/java.base/share/native/libjli/java.c:623 > #12 0x7f0beed3cf68 in ThreadJavaMain src/java.base/unix/native/libjli/java_md.c:653 > #13 0x7f0beeceb6e9 in start_thread (/lib64/libpthread.so.0+0xa6e9) (BuildId: 2f8d3c2d0f4d7888c2598d2ff6356537f5708a73) > > Looks like we use unsafe to put/write to 0 e.g. to cause a crash. Probably we could add an attribute to the function so that ubsan stops complaining (the put to 0 is done for a reason but ubsan cannot know this). Thanks for the reviews ! The 'macro thing' comes in a follow up. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19630#issuecomment-2165764870 From mbaesken at openjdk.org Thu Jun 13 14:05:17 2024 From: mbaesken at openjdk.org (Matthias Baesken) Date: Thu, 13 Jun 2024 14:05:17 GMT Subject: Integrated: 8333887: ubsan: unsafe.cpp:247:13: runtime error: store to null pointer of type 'volatile int' In-Reply-To: References: Message-ID: On Mon, 10 Jun 2024 13:52:15 GMT, Matthias Baesken wrote: > When running with ubsan enabled binaries, in a number of tests like > jdk/jfr/event/runtime/TestShutdownEvent.jtr > jdk/jfr/jvm/TestDumpOnCrash.jtr > we get those ubsan-errors : > > src/hotspot/share/prims/unsafe.cpp:247:13: runtime error: store to null pointer of type 'volatile int' > #0 0x7f0be9a3e10d in MemoryAccess::put(int) src/hotspot/share/prims/unsafe.cpp:247 > #1 0x7f0be9a3e10d in Unsafe_PutInt src/hotspot/share/prims/unsafe.cpp:315 > #2 0x7f0bd0502e7b () > #3 0x7f0bd04fe01f () > #4 0x7f0bd04fe01f () > #5 0x7f0bd04fe525 () > #6 0x7f0bd04f6c85 () > #7 0x7f0be80a2972 in JavaCalls::call_helper(JavaValue*, methodHandle const&, JavaCallArguments*, JavaThread*) src/hotspot/share/runtime/javaCalls.cpp:415 > #8 0x7f0be83160d8 in jni_invoke_static src/hotspot/share/prims/jni.cpp:888 > #9 0x7f0be831d875 in jni_CallStaticVoidMethod src/hotspot/share/prims/jni.cpp:1717 > #10 0x7f0beed32cf8 in invokeStaticMainWithArgs src/java.base/share/native/libjli/java.c:418 > #11 0x7f0beed35894 in JavaMain src/java.base/share/native/libjli/java.c:623 > #12 0x7f0beed3cf68 in ThreadJavaMain src/java.base/unix/native/libjli/java_md.c:653 > #13 0x7f0beeceb6e9 in start_thread (/lib64/libpthread.so.0+0xa6e9) (BuildId: 2f8d3c2d0f4d7888c2598d2ff6356537f5708a73) > > Looks like we use unsafe to put/write to 0 e.g. to cause a crash. Probably we could add an attribute to the function so that ubsan stops complaining (the put to 0 is done for a reason but ubsan cannot know this). This pull request has now been integrated. Changeset: 0d3a3771 Author: Matthias Baesken URL: https://git.openjdk.org/jdk/commit/0d3a3771c3777d3dd1fec8dc8faed5fd02b06830 Stats: 5 lines in 1 file changed: 5 ins; 0 del; 0 mod 8333887: ubsan: unsafe.cpp:247:13: runtime error: store to null pointer of type 'volatile int' Reviewed-by: lucy, mdoerr ------------- PR: https://git.openjdk.org/jdk/pull/19630 From rrich at openjdk.org Thu Jun 13 15:02:17 2024 From: rrich at openjdk.org (Richard Reingruber) Date: Thu, 13 Jun 2024 15:02:17 GMT Subject: RFR: 8331117: [PPC64] secondary_super_cache does not scale well [v5] In-Reply-To: References: Message-ID: On Wed, 12 Jun 2024 14:35:42 GMT, Martin Doerr wrote: >> PPC64 implementation of [JDK-8180450](https://bugs.openjdk.org/browse/JDK-8180450). Please review! >> I noticed that `r_array_length` is sometimes 0 and I don't see code for that on x86. Any idea? >> How can we verify it? By comparing the performance using the micro benchmarks? >> >> Micro benchmark results without patch (measured on Power10 with 2*8 hardware threads): >> >> Original >> SecondarySuperCacheHits: 13.033 ?(99.9%) 0.058 ns/op [Average] >> SecondarySuperCacheInterContention.test avgt 15 432.366 ? 8.364 ns/op >> SecondarySuperCacheInterContention.test:t1 avgt 15 432.310 ? 8.460 ns/op >> SecondarySuperCacheInterContention.test:t2 avgt 15 432.422 ? 10.819 ns/op >> SecondarySuperCacheIntraContention.test avgt 15 355.192 ? 3.597 ns/op >> SecondarySupersLookup.testNegative00 avgt 15 12.274 ? 0.026 ns/op >> SecondarySupersLookup.testNegative01 avgt 15 12.300 ? 0.039 ns/op >> SecondarySupersLookup.testNegative02 avgt 15 12.304 ? 0.034 ns/op >> SecondarySupersLookup.testNegative03 avgt 15 12.276 ? 0.050 ns/op >> SecondarySupersLookup.testNegative04 avgt 15 12.235 ? 0.044 ns/op >> SecondarySupersLookup.testNegative05 avgt 15 12.308 ? 0.156 ns/op >> SecondarySupersLookup.testNegative06 avgt 15 12.291 ? 0.048 ns/op >> SecondarySupersLookup.testNegative07 avgt 15 12.307 ? 0.052 ns/op >> SecondarySupersLookup.testNegative08 avgt 15 12.398 ? 0.075 ns/op >> SecondarySupersLookup.testNegative09 avgt 15 12.552 ? 0.122 ns/op >> SecondarySupersLookup.testNegative10 avgt 15 12.490 ? 0.083 ns/op >> SecondarySupersLookup.testNegative16 avgt 15 12.565 ? 0.092 ns/op >> SecondarySupersLookup.testNegative20 avgt 15 19.059 ? 0.958 ns/op >> SecondarySupersLookup.testNegative30 avgt 15 19.268 ? 0.124 ns/op >> SecondarySupersLookup.testNegative32 avgt 15 20.059 ? 0.114 ns/op >> SecondarySupersLookup.testNegative40 avgt 15 25.117 ? 0.368 ns/op >> SecondarySupersLookup.testNegative50 avgt 15 32.735 ? 0.359 ns/op >> SecondarySupersLookup.testNegative55 avgt 15 34.866 ? 0.152 ns/op >> SecondarySupersLookup.testNegative56 avgt 15 35.492 ? 0.276 ns/op >> SecondarySupersLookup.testNegative57 avgt 15 36.620 ? 0.334 ns/op >> SecondarySupersLookup.testNegative58 avgt 15 37.226 ? 0.180 ns/op >> SecondarySupersLookup.testNegative59 avgt 15 37.774 ? 0.241 ns/op >> SecondarySupersLookup.testNegative60 avgt 15 38.627 ? 1.451 ns/op >> SecondarySupersLookup.testNegative61 avgt 15 ... > > Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: > > Remove pointless assertion. Hi Martin, thanks for the port. It looks good. I've only got a few minor comments. Cheers, Richard. src/hotspot/cpu/ppc/macroAssembler_ppc.cpp line 2166: > 2164: > 2165: // Return true: we succeeded in generating this code > 2166: bool MacroAssembler::lookup_secondary_supers_table(Register r_sub_klass, The method always returns `true`. Should even return a value? src/hotspot/cpu/ppc/macroAssembler_ppc.cpp line 2176: > 2174: assert_different_registers(r_sub_klass, r_super_klass, temp1, temp2, temp3, temp4, result); > 2175: > 2176: Label L_fallthrough; `L_done` would be a better name. src/hotspot/cpu/ppc/macroAssembler_ppc.cpp line 2232: > 2230: > 2231: // Linear probe. Rotate the bitmap so that the next bit to test is > 2232: // in Bit 1. It's bit 2 that's tested next after the rotation, isn't it? See L2331 in `lookup_secondary_supers_table_slow_path` Suggestion: // in Bit 2. src/hotspot/cpu/ppc/macroAssembler_ppc.cpp line 2274: > 2272: LOOKUP_SECONDARY_SUPERS_TABLE_REGISTERS; > 2273: > 2274: Label L_fallthrough; `L_done` would be a better name. src/hotspot/cpu/ppc/macroAssembler_ppc.cpp line 2328: > 2326: ldx(result, r_array_base, r_array_index); > 2327: xor_(result, result, r_super_klass); > 2328: beq(CCR0, L_fallthrough); You might add a comment `success (result == 0)`. src/hotspot/cpu/ppc/ppc.ad line 12059: > 12057: u1 super_klass_slot = ((Klass*)$super_con$$constant)->hash_slot(); > 12058: if (InlineSecondarySupersTest) { > 12059: success = __ lookup_secondary_supers_table($sub$$Register, $super_reg$$Register, `success` is always true. Can it be removed? ------------- PR Review: https://git.openjdk.org/jdk/pull/19368#pullrequestreview-2113235016 PR Review Comment: https://git.openjdk.org/jdk/pull/19368#discussion_r1638232248 PR Review Comment: https://git.openjdk.org/jdk/pull/19368#discussion_r1636593154 PR Review Comment: https://git.openjdk.org/jdk/pull/19368#discussion_r1637747037 PR Review Comment: https://git.openjdk.org/jdk/pull/19368#discussion_r1637842324 PR Review Comment: https://git.openjdk.org/jdk/pull/19368#discussion_r1637847137 PR Review Comment: https://git.openjdk.org/jdk/pull/19368#discussion_r1638229107 From jbhateja at openjdk.org Thu Jun 13 15:16:27 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 13 Jun 2024 15:16:27 GMT Subject: RFR: 8329032: C2 compiler register allocation support for APX EGPRs Message-ID: <835IiFAjWPI9AoU9j1WSAhVY2EEHzjpvWucUXTGxgUw=.cdb547b2-e1e7-4c5b-915f-7fde46e26c81@github.com> Intel? Advanced Performance Extensions (Intel? APX) adds 16 new 64 bit general purpose register also known as Extended General Purpose Registers in IA-32e 64 bit mode. This initial patch adds following support:- 1) C2 compiler register allocation support. 2) State save restoration while transitioning from C2 JIT compiled code to runtime services. 3) Applicable extensions to native interface used by runtime for patching instruction. We plan to address C1 register support in subsequent patch as there are hard upper bound allocation limits (currently set to r11) imposed by existing implementation of linear scan algorithm after which it reserves remaining register for special purpose. Patch has been regressed over stand alone test points after merging with other APX support patches [1][2] under review. We plan to do thorough validation using [Intel's SDE](https://www.intel.com/content/www/us/en/download/684897/intel-software-development-emulator.html) during course of time and release incremental patches for bug fixes found during testing. [1] https://github.com/openjdk/jdk/pull/18476 [2] https://github.com/openjdk/jdk/pull/18562 PS: Intent of this draft PR is to facilitate validation of existing APX related PRs under review. ------------- Commit messages: - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8329032 - Changes to skip over stack alignment gaps while popping registers using POP2 after comment from sviswa7 - 32 bit build fix and enforced stack alignment constraints. - Support new PUSH2/POP2 instructions along with Push-Pop Acceleration (PPX) to optimize register save/restore operation. - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8329032 - 32-bit build fixes. - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8329032 - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8329032 - 8329032: C2 compiler register allocation support for APX EGPRs Changes: https://git.openjdk.org/jdk/pull/19042/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19042&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8329032 Stats: 738 lines in 19 files changed: 598 ins; 53 del; 87 mod Patch: https://git.openjdk.org/jdk/pull/19042.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19042/head:pull/19042 PR: https://git.openjdk.org/jdk/pull/19042 From kvn at openjdk.org Thu Jun 13 15:16:27 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 13 Jun 2024 15:16:27 GMT Subject: RFR: 8329032: C2 compiler register allocation support for APX EGPRs In-Reply-To: <835IiFAjWPI9AoU9j1WSAhVY2EEHzjpvWucUXTGxgUw=.cdb547b2-e1e7-4c5b-915f-7fde46e26c81@github.com> References: <835IiFAjWPI9AoU9j1WSAhVY2EEHzjpvWucUXTGxgUw=.cdb547b2-e1e7-4c5b-915f-7fde46e26c81@github.com> Message-ID: On Wed, 1 May 2024 18:42:13 GMT, Jatin Bhateja wrote: > Intel? Advanced Performance Extensions (Intel? APX) adds 16 new 64 bit general purpose register also known as Extended General Purpose Registers in IA-32e 64 bit mode. > > This initial patch adds following support:- > 1) C2 compiler register allocation support. > 2) State save restoration while transitioning from C2 JIT compiled code to runtime services. > 3) Applicable extensions to native interface used by runtime for patching instruction. > > We plan to address C1 register support in subsequent patch as there are hard upper bound allocation limits > (currently set to r11) imposed by existing implementation of linear scan algorithm after which it reserves > remaining register for special purpose. > > Patch has been regressed over stand alone test points after merging with other APX support patches [1][2] under review. > > We plan to do thorough validation using [Intel's SDE](https://www.intel.com/content/www/us/en/download/684897/intel-software-development-emulator.html) during course of time and release incremental patches for bug fixes > found during testing. > > [1] https://github.com/openjdk/jdk/pull/18476 > [2] https://github.com/openjdk/jdk/pull/18562 > > PS: Intent of this draft PR is to facilitate validation of existing APX related PRs under review. src/hotspot/cpu/x86/register_x86.hpp line 85: > 83: if (UseAPX) { > 84: return number_of_registers / 2; > 85: } Should check be reversed `!UseAPX`? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19042#discussion_r1631351591 From jbhateja at openjdk.org Thu Jun 13 15:24:13 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 13 Jun 2024 15:24:13 GMT Subject: RFR: 8329032: C2 compiler register allocation support for APX EGPRs In-Reply-To: References: <835IiFAjWPI9AoU9j1WSAhVY2EEHzjpvWucUXTGxgUw=.cdb547b2-e1e7-4c5b-915f-7fde46e26c81@github.com> Message-ID: On Fri, 7 Jun 2024 15:02:17 GMT, Vladimir Kozlov wrote: >> Intel? Advanced Performance Extensions (Intel? APX) adds 16 new 64 bit general purpose register also known as Extended General Purpose Registers in IA-32e 64 bit mode. >> >> This initial patch adds following support:- >> 1) C2 compiler register allocation support. >> 2) State save restoration while transitioning from C2 JIT compiled code to runtime services. >> 3) Applicable extensions to native interface used by runtime for patching instruction. >> >> We plan to address C1 register support in subsequent patch as there are hard upper bound allocation limits >> (currently set to r11) imposed by existing implementation of linear scan algorithm after which it reserves >> remaining register for special purpose. >> >> Patch has been regressed over stand alone test points after merging with other APX support patches [1][2] under review. >> >> We plan to do thorough validation using [Intel's SDE](https://www.intel.com/content/www/us/en/download/684897/intel-software-development-emulator.html) during course of time and release incremental patches for bug fixes >> found during testing. >> >> [1] https://github.com/openjdk/jdk/pull/18476 >> [2] https://github.com/openjdk/jdk/pull/18562 >> >> PS: Intent of this draft PR is to facilitate validation of existing APX related PRs under review. > > src/hotspot/cpu/x86/register_x86.hpp line 85: > >> 83: if (UseAPX) { >> 84: return number_of_registers / 2; >> 85: } > > Should check be reversed `!UseAPX`? Done ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19042#discussion_r1638405757 From kvn at openjdk.org Thu Jun 13 15:33:37 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 13 Jun 2024 15:33:37 GMT Subject: RFR: 8333819: Move embedded external addresses from relocation info into separate global table Message-ID: Currently we have oops and metadata in nmethod's local data sections which are referenced by index from relocation info. On other hand external addresses (declared with `ExternalAddress` in assembler code) are embedded into relocation info data because they don't need to be patched during normal execution. But based on my experiments we usually use the same external addresses in generated code. JavacBanch runs with product VM show about 4000 C1 and C2 compiled nmethods reference only 11 external addresses during execution. Which means these addresses are duplicated in a lot of relocation info data. There is also issue with embedded addresses in relocation info in Leyden project. Because these VM external addresses can change between runs we have to update/patch them when we load cached code and its relocation info. But because addresses are packed (compressed) the number of bytes used for it in relocation info may be not enough to pack new address. So we have to throw out such cached code. I suggest to move external addresses from relocation info into separate global table and use indexes to access it. It is similar to what we do for oops and metadata addresses so I used similar to OopRecorder but simplified mechanism. The results show that relocation info is reduced by 10% in product VM and by 32% in debug vm (which shared 145 external addresses). Which is few % reduction in nmethod size - more compact CodeCache. This should not affect generated code performance but only slightly compilation time (I need to use lock to access table) because we only modify how `external_data_relocation` is processed. Tested tier1-5,stress,xcomp Performance testing. ------------- Commit messages: - 8333819: Move embedded external addresses from relocation info into separate global table Changes: https://git.openjdk.org/jdk/pull/19703/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19703&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8333819 Stats: 93 lines in 8 files changed: 69 ins; 12 del; 12 mod Patch: https://git.openjdk.org/jdk/pull/19703.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19703/head:pull/19703 PR: https://git.openjdk.org/jdk/pull/19703 From amitkumar at openjdk.org Thu Jun 13 15:59:14 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Thu, 13 Jun 2024 15:59:14 GMT Subject: RFR: 8332602: [s390x] Improve itable_stub In-Reply-To: References: Message-ID: On Thu, 13 Jun 2024 13:10:35 GMT, Amit Kumar wrote: > s390x Port similar to [JDK-8305959 (x86)](https://bugs.openjdk.org/browse/JDK-8305959) and [JDK-8307352(aarch64)](https://bugs.openjdk.org/browse/JDK-8307352) > > Testing: I ran `tier1` test on fastdebug & release VM; I didn't see any regression there; src/hotspot/cpu/s390/macroAssembler_s390.hpp line 680: > 678: Register r_temp2, > 679: int itable_index, > 680: Label& nl_no_such_interface); Suggestion: Label& L_no_such_interface); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19698#discussion_r1638456278 From kvn at openjdk.org Thu Jun 13 16:00:15 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 13 Jun 2024 16:00:15 GMT Subject: RFR: 8329032: C2 compiler register allocation support for APX EGPRs In-Reply-To: <835IiFAjWPI9AoU9j1WSAhVY2EEHzjpvWucUXTGxgUw=.cdb547b2-e1e7-4c5b-915f-7fde46e26c81@github.com> References: <835IiFAjWPI9AoU9j1WSAhVY2EEHzjpvWucUXTGxgUw=.cdb547b2-e1e7-4c5b-915f-7fde46e26c81@github.com> Message-ID: <42ywup78ayweDBNWyfIl--7DYH5oSQ8pvkVT-NrIe0k=.97345383-791f-4d8d-a3b0-40b4518bb101@github.com> On Wed, 1 May 2024 18:42:13 GMT, Jatin Bhateja wrote: > Intel? Advanced Performance Extensions (Intel? APX) adds 16 new 64 bit general purpose register also known as Extended General Purpose Registers in IA-32e 64 bit mode. > > Summary of changes introduced along with this patch:- > > 1. C2 compiler register allocation support. > 2. Architecture state save restoration while transitioning from C1/C2 JIT compiled code to runtime services. > 3. Support new PUSH2/POP2 instructions along with push-pop acceleration hints (PPX) to optimize register save/restore operation. > 4. Applicable extensions to native interface used by runtime for patching instruction. > > We plan to address C1 register support in subsequent patch as there are hard upper bound allocation limits > (currently set to r11) imposed by existing implementation of linear scan algorithm after which it reserves > remaining register for special purpose. > > Patch has been regressed over stand alone test points after prioritizing EGPR allocations over existing GPR register by manually modifying the register sequences in relevant allocation class. > > We plan to do thorough validation using [Intel's SDE](https://www.intel.com/content/www/us/en/download/684897/intel-software-development-emulator.html) during course of time and release incremental patches for bug fixes > found during testing. > > Kindly review and share your feedback. > > Best Regards, > Jatin Looks like there is issue with 32-bit builds. See GHA ` linux-x86 / build ` failure. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19042#issuecomment-2166085003 From rehn at openjdk.org Thu Jun 13 17:26:35 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Thu, 13 Jun 2024 17:26:35 GMT Subject: RFR: 8332689: RISC-V: Use load instead of trampolines [v11] In-Reply-To: References: Message-ID: > Hi all, please consider! > > Today we do JAL to **dest** if **dest** is in reach (+/- 1 MB). > Using a very small application or running very short time we have fast patchable calls. > But any normal application running longer will increase the code size and code chrun/fragmentation. > So whatever or not you get hot fast calls rely on luck. > > To be patchable and get code cache reach we also emit a stub trampoline which we can point the JAL to. > This would be the common case for a patchable call. > > Code stream: > JAL > Stubs: > AUIPC > LD > JALR > > > > On some CPUs L1D and L1I can't contain the same cache line, which means the tramopline stub can bounce from L1I->L1D->L1I, which is expensive. > Even if you don't have that problem having a call to a jump is not the fastest way. > Loading the address avoids the pitsfalls of cmodx. > > This patch suggest to solve the problems with trampolines, we take small penalty in the naive case of JAL to **dest**, > and instead do by default: > > Code stream: > AUIPC > LD > JALR > Stubs: > > > An experimental option for turning trampolines back on exists. > > It should be possible to enhanced this with the WIP [Zjid](https://github.com/riscv/riscv-j-extension) by changing the JALR to JAL and nop out the auipc+ld (as the current proposal of Zjid forces the I-fetcher to fetch instruction in order (meaning we will avoid a lot issues which arm has)) when in reach and vice-versa. > > Numbers from VF2 (I have done them a few times, they are always overall in favor of this patch): > > fop (msec) 2239 | 2128 = 0.950424 > h2 (msec) 18660 | 16594 = 0.889282 > jython (msec) 22022 | 21925 = 0.995595 > luindex (msec) 2866 | 2842 = 0.991626 > lusearch (msec) 4108 | 4311 = 1.04942 > lusearch-fix (msec) 4406 | 4116 = 0.934181 > pmd (msec) 5976 | 5897 = 0.98678 > jython (msec) 22022 | 21925 = 0.995595 > Avg: 0.974112 > fop(xcomp) (msec) 2721 | 2714 = 0.997427 > h2(xcomp) (msec) 37719 | 38004 = 1.00756 > jython(xcomp) ... Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 16 commits: - Merge branch 'master' into 8332689 - Merge branch 'master' into 8332689 - Merge branch 'master' into 8332689 - Merge branch 'master' into 8332689 - Remove tmp file - Prepare for dynamic NativeCall size - Only allow one calling convetion, i.e. fixed sized - Merge branch 'master' into 8332689 - Review comments - Move shart/far code to cpp - ... and 6 more: https://git.openjdk.org/jdk/compare/5d2a19de...bb7249b8 ------------- Changes: https://git.openjdk.org/jdk/pull/19453/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19453&range=10 Stats: 908 lines in 16 files changed: 652 ins; 162 del; 94 mod Patch: https://git.openjdk.org/jdk/pull/19453.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19453/head:pull/19453 PR: https://git.openjdk.org/jdk/pull/19453 From jsjolen at openjdk.org Thu Jun 13 20:56:41 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Thu, 13 Jun 2024 20:56:41 GMT Subject: RFR: 8333658: NMT: Use an allocator with 4-byte pointers to save memory in NativeCallStackStorage [v17] In-Reply-To: References: Message-ID: <2AoLaTsYRyYf_3skER3bSsIwXhTQgwg1gB21MrxfjMI=.a5e7bcdc-c496-45da-b8ea-b331670dca4b@github.com> > Hi, > > This PR introduces a new allocator, `IndexedFreelistAllocator`. It uses a `GrowableArray` in order to have 4-byte "pointers" to its elements and also works as a freelist as unused slots form a linked list. This allocator cannot shrink its used memory. I'm always open for better names. > > We then use this allocator in order to store the `NativeCallStackStorage` hash table. This saves `4 * 4099 = ~16KiB` of memory as each table element is now only 4 bytes. For each stored stack it also saves 4 bytes. The fact that the allocator cannot shrink is fine, we do not ever remove stacks from the storage. > > The main point of this is to introduce a pointer-generic experimential API, so I also implemented CHeap and Arena allocators. It's currently placed in NMT, but we might want to move it into utilities. It uses a bit of template-magic, but my IDE (Emacs+clangd) was actually able to figure things out when the types didn't line up correctly etc, so it's not an enemy to IDE help. > > It sounded expensive to have the GrowableArray continously realloc its underlying data array, so I did a basic test where we allocate 1 000 000 stacks and push them into NativeCallStackStorage backed by different allocators. This is available in the PR. > > The results are as follows on linux-x64-slowdebug: > > > Generate stacks... Done > Time taken with GrowableArray: 8341.240945 > Time taken with CHeap: 12189.031318 > Time taken with Arena: 8800.703092 > Time taken with GrowableArray again: 8295.508829 > > > And on linux-x64: > > > Time taken with GrowableArray: 8378.018135 > Time taken with CHeap: 12437.347868 > Time taken with Arena: 8758.064717 > Time taken with GrowableArray again: 8391.076291 > > > Obviously, this is a very basic benchmark, but it seems like we're faster than CHeap and Arena for this case. Johan Sj?len has updated the pull request incrementally with three additional commits since the last revision: - Extraneous space - St?fe knows what's good for the code - Remove public specifier ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18979/files - new: https://git.openjdk.org/jdk/pull/18979/files/4b210c8a..c68e2c3a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18979&range=16 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18979&range=15-16 Stats: 115 lines in 2 files changed: 2 ins; 99 del; 14 mod Patch: https://git.openjdk.org/jdk/pull/18979.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18979/head:pull/18979 PR: https://git.openjdk.org/jdk/pull/18979 From jsjolen at openjdk.org Thu Jun 13 20:56:41 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Thu, 13 Jun 2024 20:56:41 GMT Subject: RFR: 8333658: NMT: Use an allocator with 4-byte pointers to save memory in NativeCallStackStorage [v16] In-Reply-To: References: <7Cct-1XEeiZHrWVzyS0raTYcsVSOhW2ilF15g3G41LM=.b54a1b88-db2d-44bf-868d-414c9d72059d@github.com> Message-ID: <7hncnmyUvOkawWPJY6zfCvM_vcT_UG07v9hYrXxk640=.e7ea7891-b699-479b-99b8-bdc0a854081e@github.com> On Thu, 13 Jun 2024 08:16:04 GMT, Thomas Stuefe wrote: >But for that, I would have needed the numerical value of the stack index. But that is hidden away in NativeCallStackStorage::StackIndex. Apart from giving me RSI from typing, I cannot just extract the integer from NativeCallStackStorage::StackIndex since its sole purpose in life is to hide that integer. Of course, I can declare yet another friend class. But that's ugly and does not scale. Alright, I've crumbled. I changed it to `using I = int32_t;`. Feel free to do the same with `StackIndex`. >I can kind of see the point with test classes, but even then, a better way would be to make the test class an inner class. Noo, it's so painful to have test code embedded within prod code. >I would remove/not bother with the Allocators. There isn't much of a point, and small code is good code. The arena version is particularly questionable since Arenas don't really support arbitrary deallocation. I would not want anyone to use this allocator in real code. That's fine, I've removed the performance testing. >I would like to have a non-growing version. Optionally, one where I can hand in an address range, and that gets used. Could possibly be combined for simplicity (if you specify a range, its a non-growing array). Reason: I want to be able to place stuff that needs to be address-stable, and I often need to do this in pre-allocated ranges. I see the value in that, but can we do that as a separate PR? Is this a Metaspace thing, by the way? **Note:** We still have no checks for double-frees. That'll need some baking in my head. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18979#issuecomment-2166774789 From jsjolen at openjdk.org Thu Jun 13 20:59:14 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Thu, 13 Jun 2024 20:59:14 GMT Subject: RFR: 8333658: NMT: Use an allocator with 4-byte pointers to save memory in NativeCallStackStorage [v16] In-Reply-To: <7Cct-1XEeiZHrWVzyS0raTYcsVSOhW2ilF15g3G41LM=.b54a1b88-db2d-44bf-868d-414c9d72059d@github.com> References: <7Cct-1XEeiZHrWVzyS0raTYcsVSOhW2ilF15g3G41LM=.b54a1b88-db2d-44bf-868d-414c9d72059d@github.com> Message-ID: <5wBwbzy6_pZOXfhebzDZ8j5IoKgSHuG0j_FHrpejP5Y=.54e97e89-79d9-4612-a8f1-0f16c94d09db@github.com> On Wed, 12 Jun 2024 14:49:32 GMT, Johan Sj?len wrote: >> Hi, >> >> This PR introduces a new allocator, `IndexedFreelistAllocator`. It uses a `GrowableArray` in order to have 4-byte "pointers" to its elements and also works as a freelist as unused slots form a linked list. This allocator cannot shrink its used memory. I'm always open for better names. >> >> We then use this allocator in order to store the `NativeCallStackStorage` hash table. This saves `4 * 4099 = ~16KiB` of memory as each table element is now only 4 bytes. For each stored stack it also saves 4 bytes. The fact that the allocator cannot shrink is fine, we do not ever remove stacks from the storage. >> >> The main point of this is to introduce a pointer-generic experimential API, so I also implemented CHeap and Arena allocators. It's currently placed in NMT, but we might want to move it into utilities. It uses a bit of template-magic, but my IDE (Emacs+clangd) was actually able to figure things out when the types didn't line up correctly etc, so it's not an enemy to IDE help. >> >> It sounded expensive to have the GrowableArray continously realloc its underlying data array, so I did a basic test where we allocate 1 000 000 stacks and push them into NativeCallStackStorage backed by different allocators. This is available in the PR. >> >> The results are as follows on linux-x64-slowdebug: >> >> >> Generate stacks... Done >> Time taken with GrowableArray: 8341.240945 >> Time taken with CHeap: 12189.031318 >> Time taken with Arena: 8800.703092 >> Time taken with GrowableArray again: 8295.508829 >> >> >> And on linux-x64: >> >> >> Time taken with GrowableArray: 8378.018135 >> Time taken with CHeap: 12437.347868 >> Time taken with Arena: 8758.064717 >> Time taken with GrowableArray again: 8391.076291 >> >> >> Obviously, this is a very basic benchmark, but it seems like we're faster than CHeap and Arena for this case. > > Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: > > Return on free if is_nil() Also: Obviously, feel fine with removing `StackIndex` as a wrapper. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18979#issuecomment-2166780621 From jsjolen at openjdk.org Thu Jun 13 21:02:33 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Thu, 13 Jun 2024 21:02:33 GMT Subject: RFR: 8333658: NMT: Use an allocator with 4-byte pointers to save memory in NativeCallStackStorage [v18] In-Reply-To: References: Message-ID: > Hi, > > This PR introduces a new allocator, `IndexedFreelistAllocator`. It uses a `GrowableArray` in order to have 4-byte "pointers" to its elements and also works as a freelist as unused slots form a linked list. This allocator cannot shrink its used memory. I'm always open for better names. > > We then use this allocator in order to store the `NativeCallStackStorage` hash table. This saves `4 * 4099 = ~16KiB` of memory as each table element is now only 4 bytes. For each stored stack it also saves 4 bytes. The fact that the allocator cannot shrink is fine, we do not ever remove stacks from the storage. > > The main point of this is to introduce a pointer-generic experimential API, so I also implemented CHeap and Arena allocators. It's currently placed in NMT, but we might want to move it into utilities. It uses a bit of template-magic, but my IDE (Emacs+clangd) was actually able to figure things out when the types didn't line up correctly etc, so it's not an enemy to IDE help. > > It sounded expensive to have the GrowableArray continously realloc its underlying data array, so I did a basic test where we allocate 1 000 000 stacks and push them into NativeCallStackStorage backed by different allocators. This is available in the PR. > > The results are as follows on linux-x64-slowdebug: > > > Generate stacks... Done > Time taken with GrowableArray: 8341.240945 > Time taken with CHeap: 12189.031318 > Time taken with Arena: 8800.703092 > Time taken with GrowableArray again: 8295.508829 > > > And on linux-x64: > > > Time taken with GrowableArray: 8378.018135 > Time taken with CHeap: 12437.347868 > Time taken with Arena: 8758.064717 > Time taken with GrowableArray again: 8391.076291 > > > Obviously, this is a very basic benchmark, but it seems like we're faster than CHeap and Arena for this case. Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: Remove dead CHeap allocator test ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18979/files - new: https://git.openjdk.org/jdk/pull/18979/files/c68e2c3a..17cd6b44 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18979&range=17 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18979&range=16-17 Stats: 4 lines in 1 file changed: 0 ins; 4 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/18979.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18979/head:pull/18979 PR: https://git.openjdk.org/jdk/pull/18979 From iklam at openjdk.org Thu Jun 13 21:18:20 2024 From: iklam at openjdk.org (Ioi Lam) Date: Thu, 13 Jun 2024 21:18:20 GMT Subject: RFR: 8330198: Add some class loading related perf counters to measure VM startup [v10] In-Reply-To: <-l1P3QkSX4ynnQA_sEQmaTUK63x6X30hF72n-FmoZuI=.dd3cc763-be18-4fd5-8478-f2a777769893@github.com> References: <-l1P3QkSX4ynnQA_sEQmaTUK63x6X30hF72n-FmoZuI=.dd3cc763-be18-4fd5-8478-f2a777769893@github.com> Message-ID: On Thu, 13 Jun 2024 05:00:33 GMT, Calvin Cheung wrote: >> Adding a few perf counters related to class loading to measure VM startup. The counters are only active if the user specifies `-Xlog:init` in the command line. A diagnostic flag `ProfileClassLinkage` is added to control the new counters. The flag is set to false by default and will be enabled if `-Xlog:init` is specified. >> >> This change is already in the leyden/premain branch. There are more counters in the branch to measure other stuff. For now, just upstreaming class loader related counters. >> >> Refer to the [comment](https://bugs.openjdk.org/browse/JDK-8330198?focusedId=14665311&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14665311) in the bug report for an example output. >> >> Passed tiers 1 - 4 testing. > > Calvin Cheung has updated the pull request incrementally with one additional commit since the last revision: > > check log_is_enable() in ClassLoader::print_counters Latest version looks good. ------------- Marked as reviewed by iklam (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18790#pullrequestreview-2117046579 From iklam at openjdk.org Thu Jun 13 21:22:35 2024 From: iklam at openjdk.org (Ioi Lam) Date: Thu, 13 Jun 2024 21:22:35 GMT Subject: RFR: 8293980: Resolve CONSTANT_FieldRef at CDS dump time [v7] In-Reply-To: <6tYcoQdH8mEhbBRUoAcOi3Gue7Lz9qIjJh3GpcLKGGE=.0b286773-cc03-434e-88ca-2c9cc5efce67@github.com> References: <6tYcoQdH8mEhbBRUoAcOi3Gue7Lz9qIjJh3GpcLKGGE=.0b286773-cc03-434e-88ca-2c9cc5efce67@github.com> Message-ID: <1NX3cVcCjKCr2jCC2JodA6J-cgcGg9FP_XeRHbp5s78=.ebe0b9c2-4f76-4221-8a88-31b1fb77d1e9@github.com> > ### Overview > > This PR archives `CONSTANT_FieldRef` entries in the _resolved_ state when it's safe to do so. > > I.e., when a `CONSTANT_FieldRef` constant pool entry in class `A` refers to a *non-static* field `B.F`, > - `B` is the same class as `A`; or > - `B` is a supertype of `A`; or > - `B` is one of the [vmClasses](https://github.com/openjdk/jdk/blob/3d4185a9ce482cc655a4c67f39cb2682b02ae4fe/src/hotspot/share/classfile/vmClasses.hpp), and `A` is loaded by the boot class loader. > > Under these conditions, it's guaranteed that whenever `A` tries to use this entry at runtime, `B` is guaranteed to have already been resolved in A's system dictionary, to the same value as resolved during dump time. > > Therefore, we can safely archive the `ResolvedFieldEntry` in class `A` that refers to `B.F`. > > (Note that we do not archive the `CONSTANT_FieldRef` entries for static fields, as the resolution of such entries can lead to class initialization at runtime. We plan to handle them in a future RFE.) > > ### Static CDS Archive > > This feature is implemented in three steps for static CDS archive dump: > > 1. At the end of the training run, `ClassListWriter` iterates over all loaded classes and writes the indices of their resolved `Class` and `FieldRef` constant pool entries into the classlist file, with the `@cp` prefix. E.g., the following means that the constant pool entries at indices 2, 19 and 106 were resolved during the training run: > > @cp java/util/Objects 2 19 106 > > 2. When creating the static CDS archive from the classlist file, `ClassListParser` processes the `@cp` entries and resolves all the indicated entries. > > 3. Inside the `ArchiveBuilder::make_klasses_shareable()` function, we iterate over all entries in all archived `ConstantPools`. When we see a _resolved_ entry that does not satisfy the safety requirements as stated in _Overview_, we revert it back to the unresolved state. > > ### Dynamic CDS Archive > > When dumping the dynamic CDS archive, `ClassListWriter` and `ClassListParser` are not used, so steps 1 and 2 are skipped. We only perform step 3 when the archive is being written. > > ### Limitations > > - For safety, we limit this optimization to only classes loaded by the boot, platform, and app class loaders. This may be relaxed in the future. > - We archive only the constant pool entries that are actually resolved during the training run. We don't speculatively resolve other entries, as doing so may cause C2 to unnecessarily generate code for paths that are never taken by the app... Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: Fixed failures with -Xcomp and -Dtest.dynamic.cds.archive=true ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19355/files - new: https://git.openjdk.org/jdk/pull/19355/files/828683f5..d0b37dc2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19355&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19355&range=05-06 Stats: 2 lines in 2 files changed: 2 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/19355.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19355/head:pull/19355 PR: https://git.openjdk.org/jdk/pull/19355 From iklam at openjdk.org Thu Jun 13 23:43:29 2024 From: iklam at openjdk.org (Ioi Lam) Date: Thu, 13 Jun 2024 23:43:29 GMT Subject: RFR: 8293980: Resolve CONSTANT_FieldRef at CDS dump time [v8] In-Reply-To: <6tYcoQdH8mEhbBRUoAcOi3Gue7Lz9qIjJh3GpcLKGGE=.0b286773-cc03-434e-88ca-2c9cc5efce67@github.com> References: <6tYcoQdH8mEhbBRUoAcOi3Gue7Lz9qIjJh3GpcLKGGE=.0b286773-cc03-434e-88ca-2c9cc5efce67@github.com> Message-ID: > ### Overview > > This PR archives `CONSTANT_FieldRef` entries in the _resolved_ state when it's safe to do so. > > I.e., when a `CONSTANT_FieldRef` constant pool entry in class `A` refers to a *non-static* field `B.F`, > - `B` is the same class as `A`; or > - `B` is a supertype of `A`; or > - `B` is one of the [vmClasses](https://github.com/openjdk/jdk/blob/3d4185a9ce482cc655a4c67f39cb2682b02ae4fe/src/hotspot/share/classfile/vmClasses.hpp), and `A` is loaded by the boot class loader. > > Under these conditions, it's guaranteed that whenever `A` tries to use this entry at runtime, `B` is guaranteed to have already been resolved in A's system dictionary, to the same value as resolved during dump time. > > Therefore, we can safely archive the `ResolvedFieldEntry` in class `A` that refers to `B.F`. > > (Note that we do not archive the `CONSTANT_FieldRef` entries for static fields, as the resolution of such entries can lead to class initialization at runtime. We plan to handle them in a future RFE.) > > ### Static CDS Archive > > This feature is implemented in three steps for static CDS archive dump: > > 1. At the end of the training run, `ClassListWriter` iterates over all loaded classes and writes the indices of their resolved `Class` and `FieldRef` constant pool entries into the classlist file, with the `@cp` prefix. E.g., the following means that the constant pool entries at indices 2, 19 and 106 were resolved during the training run: > > @cp java/util/Objects 2 19 106 > > 2. When creating the static CDS archive from the classlist file, `ClassListParser` processes the `@cp` entries and resolves all the indicated entries. > > 3. Inside the `ArchiveBuilder::make_klasses_shareable()` function, we iterate over all entries in all archived `ConstantPools`. When we see a _resolved_ entry that does not satisfy the safety requirements as stated in _Overview_, we revert it back to the unresolved state. > > ### Dynamic CDS Archive > > When dumping the dynamic CDS archive, `ClassListWriter` and `ClassListParser` are not used, so steps 1 and 2 are skipped. We only perform step 3 when the archive is being written. > > ### Limitations > > - For safety, we limit this optimization to only classes loaded by the boot, platform, and app class loaders. This may be relaxed in the future. > - We archive only the constant pool entries that are actually resolved during the training run. We don't speculatively resolve other entries, as doing so may cause C2 to unnecessarily generate code for paths that are never taken by the app... Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: Fixed failures with -Xcomp ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19355/files - new: https://git.openjdk.org/jdk/pull/19355/files/d0b37dc2..ee7a2960 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19355&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19355&range=06-07 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/19355.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19355/head:pull/19355 PR: https://git.openjdk.org/jdk/pull/19355 From sspitsyn at openjdk.org Fri Jun 14 01:17:17 2024 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Fri, 14 Jun 2024 01:17:17 GMT Subject: RFR: 8330702: Update failure handler to don't generate Error message if cores actions are empty In-Reply-To: References: Message-ID: On Thu, 30 May 2024 02:28:56 GMT, Leonid Mesnik wrote: > The message is generated if cores (or any other tools) section doesn't exist or is empty. However, there is no any tool for cores processing now defined. So ERROR message is generating, confusing users. > The fix is to don't print error for empty toolset which is the valid case. The message is still generate is tool is not defined to get error message in the case of miswriting. This looks okay to me. ------------- Marked as reviewed by sspitsyn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19470#pullrequestreview-2117249931 From ccheung at openjdk.org Fri Jun 14 01:23:20 2024 From: ccheung at openjdk.org (Calvin Cheung) Date: Fri, 14 Jun 2024 01:23:20 GMT Subject: RFR: 8330198: Add some class loading related perf counters to measure VM startup [v3] In-Reply-To: References: <7yfsvM0ff6gBYLefpro2qTcEMBmCOHd3YICcygItlZs=.d900a439-4932-46e6-b287-d1bf2789f195@github.com> Message-ID: On Sat, 25 May 2024 06:41:28 GMT, Ioi Lam wrote: >>> Okay my first reaction here is "I object!". I get that Leyden wants to be able to easily compare startup costs between itself and mainline, but what is this costing mainline? Even if these counters are not active there is an impact on the code execution and I want to know that impact is negligible. >> >> I added some perf numbers for various startup benchmarks in the bug report [comment](https://bugs.openjdk.org/browse/JDK-8330198?focusedId=14675860&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14675860). > >> Okay my first reaction here is "I object!". I get that Leyden wants to be able to easily compare startup costs between itself and mainline, but what is this costing mainline? Even if these counters are not active there is an impact on the code execution and I want to know that impact is negligible. > > These counters are useful in the mainline as well. We want to be able to use `java -Xlog:init` to diagnose start-up time performance for the mainline. > > The main cost of the performance counters is reading of the clock. All the new counters added in the PR are guarded by a global flag, so the cost is negligible when the logging is not enabled. Thanks @iklam, @dholmes-ora for the review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18790#issuecomment-2167039542 From ccheung at openjdk.org Fri Jun 14 01:23:21 2024 From: ccheung at openjdk.org (Calvin Cheung) Date: Fri, 14 Jun 2024 01:23:21 GMT Subject: Integrated: 8330198: Add some class loading related perf counters to measure VM startup In-Reply-To: References: Message-ID: On Tue, 16 Apr 2024 05:16:22 GMT, Calvin Cheung wrote: > Adding a few perf counters related to class loading to measure VM startup. The counters are only active if the user specifies `-Xlog:init` in the command line. A diagnostic flag `ProfileClassLinkage` is added to control the new counters. The flag is set to false by default and will be enabled if `-Xlog:init` is specified. > > This change is already in the leyden/premain branch. There are more counters in the branch to measure other stuff. For now, just upstreaming class loader related counters. > > Refer to the [comment](https://bugs.openjdk.org/browse/JDK-8330198?focusedId=14665311&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14665311) in the bug report for an example output. > > Passed tiers 1 - 4 testing. This pull request has now been integrated. Changeset: eb2488fd Author: Calvin Cheung URL: https://git.openjdk.org/jdk/commit/eb2488fd1781af49d936348d5f75731de2006ce7 Stats: 161 lines in 14 files changed: 145 ins; 7 del; 9 mod 8330198: Add some class loading related perf counters to measure VM startup Co-authored-by: Vladimir Ivanov Reviewed-by: iklam, dholmes ------------- PR: https://git.openjdk.org/jdk/pull/18790 From amitkumar at openjdk.org Fri Jun 14 02:52:41 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Fri, 14 Jun 2024 02:52:41 GMT Subject: RFR: 8332602: [s390x] Improve itable_stub [v2] In-Reply-To: References: Message-ID: > s390x Port similar to [JDK-8305959 (x86)](https://bugs.openjdk.org/browse/JDK-8305959) and [JDK-8307352(aarch64)](https://bugs.openjdk.org/browse/JDK-8307352) > > Testing: I ran `tier1` test on fastdebug & release VM; I didn't see any regression there; Amit Kumar has updated the pull request incrementally with two additional commits since the last revision: - Update src/hotspot/cpu/s390/macroAssembler_s390.hpp - Update src/hotspot/cpu/s390/macroAssembler_s390.hpp ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19698/files - new: https://git.openjdk.org/jdk/pull/19698/files/f3a7a08c..8692f7ad Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19698&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19698&range=00-01 Stats: 7 lines in 1 file changed: 0 ins; 0 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/19698.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19698/head:pull/19698 PR: https://git.openjdk.org/jdk/pull/19698 From amitkumar at openjdk.org Fri Jun 14 02:52:42 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Fri, 14 Jun 2024 02:52:42 GMT Subject: RFR: 8332602: [s390x] Improve itable_stub [v2] In-Reply-To: References: Message-ID: On Fri, 14 Jun 2024 02:49:44 GMT, Amit Kumar wrote: >> s390x Port similar to [JDK-8305959 (x86)](https://bugs.openjdk.org/browse/JDK-8305959) and [JDK-8307352(aarch64)](https://bugs.openjdk.org/browse/JDK-8307352) >> >> Testing: I ran `tier1` test on fastdebug & release VM; I didn't see any regression there; > > Amit Kumar has updated the pull request incrementally with two additional commits since the last revision: > > - Update src/hotspot/cpu/s390/macroAssembler_s390.hpp > - Update src/hotspot/cpu/s390/macroAssembler_s390.hpp src/hotspot/cpu/s390/macroAssembler_s390.cpp line 2855: > 2853: // } > 2854: // scan_temp += itable_offset_entry_size > 2855: // } while (temp_itbl_klass != 0); Suggestion: // do { // scan_temp += itable_offset_entry_size // temp_itbl_klass = *(scan_temp); // if (temp_itbl_klass == holder_klass) { // goto holder_found; // Found! // } // } while (temp_itbl_klass != 0); src/hotspot/cpu/s390/macroAssembler_s390.cpp line 2871: > 2869: // while (true) { > 2870: // temp_itbl_klass = *(scan_temp); > 2871: // scan_temp += itable_offset_entry_size Suggestion: // while (true) { // scan_temp += itable_offset_entry_size // temp_itbl_klass = *(scan_temp); src/hotspot/cpu/s390/macroAssembler_s390.hpp line 678: > 676: Register r_method_result, > 677: Register r_temp, > 678: Register r_temp2, Suggestion: void lookup_interface_method_stub(Register recv_klass, Register holder_klass, Register resolved_klass, Register method_result, Register temp, Register temp2, ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19698#discussion_r1639152975 PR Review Comment: https://git.openjdk.org/jdk/pull/19698#discussion_r1639154532 PR Review Comment: https://git.openjdk.org/jdk/pull/19698#discussion_r1639156209 From amitkumar at openjdk.org Fri Jun 14 02:58:38 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Fri, 14 Jun 2024 02:58:38 GMT Subject: RFR: 8332602: [s390x] Improve itable_stub [v3] In-Reply-To: References: Message-ID: > s390x Port similar to [JDK-8305959 (x86)](https://bugs.openjdk.org/browse/JDK-8305959) and [JDK-8307352(aarch64)](https://bugs.openjdk.org/browse/JDK-8307352) > > Testing: I ran `tier1` test on fastdebug & release VM; I didn't see any regression there; Amit Kumar has updated the pull request incrementally with three additional commits since the last revision: - Update src/hotspot/cpu/s390/macroAssembler_s390.cpp - Update src/hotspot/cpu/s390/macroAssembler_s390.cpp - polishing ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19698/files - new: https://git.openjdk.org/jdk/pull/19698/files/8692f7ad..19c3673c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19698&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19698&range=01-02 Stats: 8 lines in 2 files changed: 2 ins; 4 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/19698.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19698/head:pull/19698 PR: https://git.openjdk.org/jdk/pull/19698 From fyang at openjdk.org Fri Jun 14 03:31:13 2024 From: fyang at openjdk.org (Fei Yang) Date: Fri, 14 Jun 2024 03:31:13 GMT Subject: RFR: 8332689: RISC-V: Use load instead of trampolines [v7] In-Reply-To: References: <3UcoOmZTsZJBu0ZkOZQ21-ynr1slrOGCDuWYFnJQz1U=.6467ec54-cd28-4aae-a90c-6b6858f37986@github.com> <3jyhG5L-3PLzTSIckYxLlCEgMD-lWgD80sAQAWmAET8=.6aef0226-a0aa-4f28-9c25-e9c877d8f810@github.com> Message-ID: On Fri, 7 Jun 2024 12:03:20 GMT, Robbin Ehn wrote: >> I.e. only in mt_safe case we need an invidual cache flush, if instructions where changed. > > Also I split them up because it was very confusing having relocations calling mt_safe. > As you can see in FarCall they are quite different, made no sense to having both cases in same method. Ah, I see. That make sense to me. Thanks. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19453#discussion_r1639178008 From gcao at openjdk.org Fri Jun 14 03:40:36 2024 From: gcao at openjdk.org (Gui Cao) Date: Fri, 14 Jun 2024 03:40:36 GMT Subject: RFR: 8332587: RISC-V: secondary_super_cache does not scale well [v6] In-Reply-To: References: Message-ID: > Implementation of subtype checking [JDK-8180450](https://bugs.openjdk.org/browse/JDK-8180450) for linux-riscv64. > This optimization depends on availability of the Zbb extension which has the cpop instruction. > > ### Correctness testing: > > - [x] Run tier1-3, hotspot:tier4 tests on SOPHON SG2042 (release) > - [x] Run tier1-3 tests with -XX:+UseRVV on qemu 8.1.0 (fastdebug) > - [x] Run all of tier1 with `-XX:+VerifySecondarySupers` > > ### JMH tested on Banana Pi BPI-F3 board (has Zbb) and Enable UseZbb, Not Enable UseZba, UseZbs > Original: > > Benchmark Mode Cnt Score Error Units > SecondarySuperCacheHits.test avgt 15 11.375 ? 0.071 ns/op > SecondarySuperCacheInterContention.test avgt 15 646.087 ? 32.587 ns/op > SecondarySuperCacheInterContention.test:t1 avgt 15 600.090 ? 83.779 ns/op > SecondarySuperCacheInterContention.test:t2 avgt 15 692.084 ? 73.218 ns/op > SecondarySupersLookup.testNegative00 avgt 15 16.420 ? 0.239 ns/op > SecondarySupersLookup.testNegative01 avgt 15 18.307 ? 0.260 ns/op > SecondarySupersLookup.testNegative02 avgt 15 21.695 ? 0.458 ns/op > SecondarySupersLookup.testNegative03 avgt 15 24.855 ? 0.664 ns/op > SecondarySupersLookup.testNegative04 avgt 15 27.305 ? 0.522 ns/op > SecondarySupersLookup.testNegative05 avgt 15 29.719 ? 0.385 ns/op > SecondarySupersLookup.testNegative06 avgt 15 32.231 ? 0.498 ns/op > SecondarySupersLookup.testNegative07 avgt 15 33.747 ? 0.603 ns/op > SecondarySupersLookup.testNegative08 avgt 15 35.856 ? 0.629 ns/op > SecondarySupersLookup.testNegative09 avgt 15 37.077 ? 0.546 ns/op > SecondarySupersLookup.testNegative10 avgt 15 39.408 ? 0.465 ns/op > SecondarySupersLookup.testNegative16 avgt 15 51.041 ? 0.547 ns/op > SecondarySupersLookup.testNegative20 avgt 15 58.722 ? 0.922 ns/op > SecondarySupersLookup.testNegative30 avgt 15 77.310 ? 0.654 ns/op > SecondarySupersLookup.testNegative32 avgt 15 81.116 ? 0.854 ns/op > SecondarySupersLookup.testNegative40 avgt 15 96.311 ? 0.840 ns/op > SecondarySupersLookup.testNegative50 avgt 15 115.427 ? 0.838 ns/op > SecondarySupersLookup.testNegative55 avgt 15 124.371 ? 1.076 ns/op > SecondarySupersLookup.testNegative56 avgt 15 126.796 ? 0.916 ns/op > SecondarySupersLookup.testNegative57 avgt 15 127.952 ? 1.202 ns/op > SecondarySupersLookup.testNegative58 avgt 15 131.956 ? 4.515 ns/op > SecondarySupersLookup.testNegative59 avgt 15 131.858 ? 1.066 ns/op > SecondaryS... Gui Cao has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 12 additional commits since the last revision: - Merge remote-tracking branch 'upstream/master' into JDK-8332587 - Update ins_cost for PartialSubtypeCheck - Code Format - Merge remote-tracking branch 'upstream/master' into JDK-8332587 - Polish Code Comment - Merge remote-tracking branch 'upstream/master' into JDK-8332587 - Fix Code format - Fix for Hamlin comment - Merge remote-tracking branch 'upstream/master' into JDK-8332587 - Fix client VM build - ... and 2 more: https://git.openjdk.org/jdk/compare/c1aa6448...142d7677 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19320/files - new: https://git.openjdk.org/jdk/pull/19320/files/e3a53408..142d7677 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19320&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19320&range=04-05 Stats: 5402 lines in 372 files changed: 3794 ins; 475 del; 1133 mod Patch: https://git.openjdk.org/jdk/pull/19320.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19320/head:pull/19320 PR: https://git.openjdk.org/jdk/pull/19320 From gcao at openjdk.org Fri Jun 14 03:40:36 2024 From: gcao at openjdk.org (Gui Cao) Date: Fri, 14 Jun 2024 03:40:36 GMT Subject: RFR: 8332587: RISC-V: secondary_super_cache does not scale well [v3] In-Reply-To: References: <4BgBBn_Oqhipw03h7BA7ZV4ZbhnMhdDqXU94Z-IFshs=.bd949215-5dd8-43a6-ac4e-348feee4853b@github.com> Message-ID: On Mon, 10 Jun 2024 18:32:05 GMT, Hamlin Li wrote: >> Gui Cao has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix Code format > > Thanks for updating! > > With the fix, although it improves the perf for testNegative63/64, but seems it brings some regression for testNegative55-62, in this sense the fix should not be taken. > I'll take another look, sorry for long waiting. @Hamlin-Li : Hi, Do you have any other suggestions? Thanks ------------- PR Comment: https://git.openjdk.org/jdk/pull/19320#issuecomment-2167150244 From duke at openjdk.org Fri Jun 14 04:19:14 2024 From: duke at openjdk.org (Liming Liu) Date: Fri, 14 Jun 2024 04:19:14 GMT Subject: RFR: 8324781: runtime/Thread/TestAlwaysPreTouchStacks.java failed with Expected a higher ratio between stack committed and reserved [v9] In-Reply-To: References: Message-ID: On Thu, 6 Jun 2024 12:38:33 GMT, Thomas Stuefe wrote: >> Liming Liu has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix the wrong condition > > Meanwhile, I am warming to the current approach. I understand that this it avoids referring to individual downstream vendors, which I agree may be brittle. > > My main concern is to prevent future flag mismatches. Therefore, my proposal is to do what this patch does, but in a more generic way. Essentially, encoding that for certain flags, we cannot rely on older kernel correctly ignoring them. But we assume that downstream kernel vendors will at least fix conflicts when they merge in flags from mainline. We sacrifice the ability to benefit from vendor-specific backports, but that is the compromise. > > The flags I'd like to guard for now are: > 1) UEK7: MADV_DONTNEED_LOCKED -> MADV_DOEXEC > 2) UEK7: MADV_COLLAPSE -> MADV_DONTEXEC > 3) UEK6: MADV_POPULATE_READ -> MADV_DOEXEC > 4) UEK6: MADV_POPULATE_WRITE -> MADV_DONTEXEC > > If the vendor keeps up its routine of just shifting the proprietary flags to the end of the numerical MADV range for each new mainline flag, we will continue to have problems and this list may grow. > > The mechanism could be very close to what @limingliu-ampere does now, only a tad more generic. E.g.: > > > bool os::Linux::can_use_madvise_flag(int someflag) { > // have a hardcoded array of { flag, kernel version } tupels. > // Search it for someflag, and if found, return false if host kernel version is older than the encoded version. > // Otherwise return true. > } > > > and then maybe wrap the madvise call with something like this: > > > bool os::Linux::checked_madvise(..., someflag) { > assert(can_use_madvise_flag(someflag)) > call real madvise > } > > > in addition to something like this in initialization: > > > if (UseMadvPopulateWrite && ! can_use_madvise_flag(MADV_POPULATE_WRITE)) { > FLAG_SET_ERGO(UseMadvPopulateWrite, false); > } > > > Do you like this, does this make sense? Hi, @tstuefe. Could you please take a look? The patch had been limited to testcases, as there were already fixes in UEK and you created a ticket to cover pretouch. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18592#issuecomment-2167178974 From dholmes at openjdk.org Fri Jun 14 04:26:17 2024 From: dholmes at openjdk.org (David Holmes) Date: Fri, 14 Jun 2024 04:26:17 GMT Subject: RFR: 8333962: Obsolete OldSize In-Reply-To: References: Message-ID: On Tue, 11 Jun 2024 08:17:02 GMT, Albert Mingkun Yang wrote: > Obsolete OldSize and related code. An internal variable `OldSize` is kept to capture the capacity of old-gen size. Al seems reasonable. Thanks. src/hotspot/share/runtime/arguments.cpp line 543: > 541: { "UseNeon", JDK_Version::undefined(), JDK_Version::jdk(23), JDK_Version::jdk(24) }, > 542: { "ScavengeBeforeFullGC", JDK_Version::undefined(), JDK_Version::jdk(23), JDK_Version::jdk(24) }, > 543: Please remove blank line. but you need to sync with master to get recent updates to this table. ------------- Marked as reviewed by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19647#pullrequestreview-2117402095 PR Review Comment: https://git.openjdk.org/jdk/pull/19647#discussion_r1639203788 From iklam at openjdk.org Fri Jun 14 05:29:44 2024 From: iklam at openjdk.org (Ioi Lam) Date: Fri, 14 Jun 2024 05:29:44 GMT Subject: RFR: 8293980: Resolve CONSTANT_FieldRef at CDS dump time [v9] In-Reply-To: <6tYcoQdH8mEhbBRUoAcOi3Gue7Lz9qIjJh3GpcLKGGE=.0b286773-cc03-434e-88ca-2c9cc5efce67@github.com> References: <6tYcoQdH8mEhbBRUoAcOi3Gue7Lz9qIjJh3GpcLKGGE=.0b286773-cc03-434e-88ca-2c9cc5efce67@github.com> Message-ID: > ### Overview > > This PR archives `CONSTANT_FieldRef` entries in the _resolved_ state when it's safe to do so. > > I.e., when a `CONSTANT_FieldRef` constant pool entry in class `A` refers to a *non-static* field `B.F`, > - `B` is the same class as `A`; or > - `B` is a supertype of `A`; or > - `B` is one of the [vmClasses](https://github.com/openjdk/jdk/blob/3d4185a9ce482cc655a4c67f39cb2682b02ae4fe/src/hotspot/share/classfile/vmClasses.hpp), and `A` is loaded by the boot class loader. > > Under these conditions, it's guaranteed that whenever `A` tries to use this entry at runtime, `B` is guaranteed to have already been resolved in A's system dictionary, to the same value as resolved during dump time. > > Therefore, we can safely archive the `ResolvedFieldEntry` in class `A` that refers to `B.F`. > > (Note that we do not archive the `CONSTANT_FieldRef` entries for static fields, as the resolution of such entries can lead to class initialization at runtime. We plan to handle them in a future RFE.) > > ### Static CDS Archive > > This feature is implemented in three steps for static CDS archive dump: > > 1. At the end of the training run, `ClassListWriter` iterates over all loaded classes and writes the indices of their resolved `Class` and `FieldRef` constant pool entries into the classlist file, with the `@cp` prefix. E.g., the following means that the constant pool entries at indices 2, 19 and 106 were resolved during the training run: > > @cp java/util/Objects 2 19 106 > > 2. When creating the static CDS archive from the classlist file, `ClassListParser` processes the `@cp` entries and resolves all the indicated entries. > > 3. Inside the `ArchiveBuilder::make_klasses_shareable()` function, we iterate over all entries in all archived `ConstantPools`. When we see a _resolved_ entry that does not satisfy the safety requirements as stated in _Overview_, we revert it back to the unresolved state. > > ### Dynamic CDS Archive > > When dumping the dynamic CDS archive, `ClassListWriter` and `ClassListParser` are not used, so steps 1 and 2 are skipped. We only perform step 3 when the archive is being written. > > ### Limitations > > - For safety, we limit this optimization to only classes loaded by the boot, platform, and app class loaders. This may be relaxed in the future. > - We archive only the constant pool entries that are actually resolved during the training run. We don't speculatively resolve other entries, as doing so may cause C2 to unnecessarily generate code for paths that are never taken by the app... Ioi Lam has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 12 additional commits since the last revision: - Merge branch 'master' into 8293980-resolve-fields-at-dumptime - Fixed failures with -Xcomp - Fixed failures with -Xcomp and -Dtest.dynamic.cds.archive=true - Merge branch 'master' into 8293980-resolve-fields-at-dumptime - Added test case for safety with putfield against final fields (related to JDK-8157181) - Moved the test ResolvedConstants.java to resolvedConstants, as we will have more tests cases in this area - @DanHeidinga comments - Fixed typo in previous commit - Merge branch 'master' into 8293980-resolve-fields-at-dumptime - @matias9927 comments - moved remove_resolved_field_entries_if_non_deterministic() to cpCache - ... and 2 more: https://git.openjdk.org/jdk/compare/65b8c0a6...df17d34e ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19355/files - new: https://git.openjdk.org/jdk/pull/19355/files/ee7a2960..df17d34e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19355&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19355&range=07-08 Stats: 1039 lines in 74 files changed: 621 ins; 215 del; 203 mod Patch: https://git.openjdk.org/jdk/pull/19355.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19355/head:pull/19355 PR: https://git.openjdk.org/jdk/pull/19355 From iklam at openjdk.org Fri Jun 14 06:09:22 2024 From: iklam at openjdk.org (Ioi Lam) Date: Fri, 14 Jun 2024 06:09:22 GMT Subject: RFR: 8293980: Resolve CONSTANT_FieldRef at CDS dump time [v2] In-Reply-To: <1vK1vlFb91j3ilWmbMRr3PHu14ZkI8fWwU-JV4CcsQ0=.03c848b2-5424-4a6e-89fb-4b774d293fc1@github.com> References: <6tYcoQdH8mEhbBRUoAcOi3Gue7Lz9qIjJh3GpcLKGGE=.0b286773-cc03-434e-88ca-2c9cc5efce67@github.com> <7Kk3VF3qMR0IdptWLG1GGiWLbDm1BfCP2zBh7s6n3WE=.f245c5a2-cc27-4331-a401-1eaea41262ed@github.com> <1vK1vlFb91j3ilWmbMRr3PHu14ZkI8fWwU-JV4CcsQ0=.03c848b2-5424-4a6e-89fb-4b774d293fc1@github.com> Message-ID: On Thu, 23 May 2024 12:36:55 GMT, Erik Joelsson wrote: >> Ioi Lam has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: >> >> - Merge branch 'master' into 8293980-resolve-fields-at-dumptime >> - 8293980: Resolve CONSTANT_FieldRef at CDS dump time > > Build change looks good. Thanks @erikj79 @matias9927 @DanHeidinga for the review ------------- PR Comment: https://git.openjdk.org/jdk/pull/19355#issuecomment-2167274543 From iklam at openjdk.org Fri Jun 14 06:09:24 2024 From: iklam at openjdk.org (Ioi Lam) Date: Fri, 14 Jun 2024 06:09:24 GMT Subject: Integrated: 8293980: Resolve CONSTANT_FieldRef at CDS dump time In-Reply-To: <6tYcoQdH8mEhbBRUoAcOi3Gue7Lz9qIjJh3GpcLKGGE=.0b286773-cc03-434e-88ca-2c9cc5efce67@github.com> References: <6tYcoQdH8mEhbBRUoAcOi3Gue7Lz9qIjJh3GpcLKGGE=.0b286773-cc03-434e-88ca-2c9cc5efce67@github.com> Message-ID: On Wed, 22 May 2024 21:48:44 GMT, Ioi Lam wrote: > ### Overview > > This PR archives `CONSTANT_FieldRef` entries in the _resolved_ state when it's safe to do so. > > I.e., when a `CONSTANT_FieldRef` constant pool entry in class `A` refers to a *non-static* field `B.F`, > - `B` is the same class as `A`; or > - `B` is a supertype of `A`; or > - `B` is one of the [vmClasses](https://github.com/openjdk/jdk/blob/3d4185a9ce482cc655a4c67f39cb2682b02ae4fe/src/hotspot/share/classfile/vmClasses.hpp), and `A` is loaded by the boot class loader. > > Under these conditions, it's guaranteed that whenever `A` tries to use this entry at runtime, `B` is guaranteed to have already been resolved in A's system dictionary, to the same value as resolved during dump time. > > Therefore, we can safely archive the `ResolvedFieldEntry` in class `A` that refers to `B.F`. > > (Note that we do not archive the `CONSTANT_FieldRef` entries for static fields, as the resolution of such entries can lead to class initialization at runtime. We plan to handle them in a future RFE.) > > ### Static CDS Archive > > This feature is implemented in three steps for static CDS archive dump: > > 1. At the end of the training run, `ClassListWriter` iterates over all loaded classes and writes the indices of their resolved `Class` and `FieldRef` constant pool entries into the classlist file, with the `@cp` prefix. E.g., the following means that the constant pool entries at indices 2, 19 and 106 were resolved during the training run: > > @cp java/util/Objects 2 19 106 > > 2. When creating the static CDS archive from the classlist file, `ClassListParser` processes the `@cp` entries and resolves all the indicated entries. > > 3. Inside the `ArchiveBuilder::make_klasses_shareable()` function, we iterate over all entries in all archived `ConstantPools`. When we see a _resolved_ entry that does not satisfy the safety requirements as stated in _Overview_, we revert it back to the unresolved state. > > ### Dynamic CDS Archive > > When dumping the dynamic CDS archive, `ClassListWriter` and `ClassListParser` are not used, so steps 1 and 2 are skipped. We only perform step 3 when the archive is being written. > > ### Limitations > > - For safety, we limit this optimization to only classes loaded by the boot, platform, and app class loaders. This may be relaxed in the future. > - We archive only the constant pool entries that are actually resolved during the training run. We don't speculatively resolve other entries, as doing so may cause C2 to unnecessarily generate code for paths that are never taken by the app... This pull request has now been integrated. Changeset: b818679e Author: Ioi Lam URL: https://git.openjdk.org/jdk/commit/b818679ebafff6adb2be4edbe21245882a751d2e Stats: 1186 lines in 35 files changed: 1013 ins; 67 del; 106 mod 8293980: Resolve CONSTANT_FieldRef at CDS dump time Reviewed-by: erikj, matsaave, heidinga ------------- PR: https://git.openjdk.org/jdk/pull/19355 From fyang at openjdk.org Fri Jun 14 07:02:15 2024 From: fyang at openjdk.org (Fei Yang) Date: Fri, 14 Jun 2024 07:02:15 GMT Subject: RFR: 8332689: RISC-V: Use load instead of trampolines [v11] In-Reply-To: References: Message-ID: On Thu, 13 Jun 2024 17:26:35 GMT, Robbin Ehn wrote: >> Hi all, please consider! >> >> Today we do JAL to **dest** if **dest** is in reach (+/- 1 MB). >> Using a very small application or running very short time we have fast patchable calls. >> But any normal application running longer will increase the code size and code chrun/fragmentation. >> So whatever or not you get hot fast calls rely on luck. >> >> To be patchable and get code cache reach we also emit a stub trampoline which we can point the JAL to. >> This would be the common case for a patchable call. >> >> Code stream: >> JAL >> Stubs: >> AUIPC >> LD >> JALR >> >> >> >> On some CPUs L1D and L1I can't contain the same cache line, which means the tramopline stub can bounce from L1I->L1D->L1I, which is expensive. >> Even if you don't have that problem having a call to a jump is not the fastest way. >> Loading the address avoids the pitsfalls of cmodx. >> >> This patch suggest to solve the problems with trampolines, we take small penalty in the naive case of JAL to **dest**, >> and instead do by default: >> >> Code stream: >> AUIPC >> LD >> JALR >> Stubs: >> >> >> An experimental option for turning trampolines back on exists. >> >> It should be possible to enhanced this with the WIP [Zjid](https://github.com/riscv/riscv-j-extension) by changing the JALR to JAL and nop out the auipc+ld (as the current proposal of Zjid forces the I-fetcher to fetch instruction in order (meaning we will avoid a lot issues which arm has)) when in reach and vice-versa. >> >> Numbers from VF2 (I have done them a few times, they are always overall in favor of this patch): >> >> fop (msec) 2239 | 2128 = 0.950424 >> h2 (msec) 18660 | 16594 = 0.889282 >> jython (msec) 22022 | 21925 = 0.995595 >> luindex (msec) 2866 | 2842 = 0.991626 >> lusearch (msec) 4108 | 4311 = 1.04942 >> lusearch-fix (msec) 4406 | 4116 = 0.934181 >> pmd (msec) 5976 | 5897 = 0.98678 >> jython (msec) 22022 | 21925 = 0.995595 >> Avg: 0.974112 >> fop(xcomp) (msec) 2721 | 2714 = 0.997427 >> h2(xcomp) ... > > Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 16 commits: > > - Merge branch 'master' into 8332689 > - Merge branch 'master' into 8332689 > - Merge branch 'master' into 8332689 > - Merge branch 'master' into 8332689 > - Remove tmp file > - Prepare for dynamic NativeCall size > - Only allow one calling convetion, i.e. fixed sized > - Merge branch 'master' into 8332689 > - Review comments > - Move shart/far code to cpp > - ... and 6 more: https://git.openjdk.org/jdk/compare/5d2a19de...bb7249b8 src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 3732: > 3730: // Maybe emit a call via a trampoline. If the code cache is small > 3731: // trampolines won't be emitted. > 3732: address MacroAssembler::patchable_far_call(Address entry) { It doesn't look nice to me for `UseTrampolines` checks to be spread across this `MacroAssembler::patchable_far_call` function. I would suggest to keep the original `MacroAssembler::trampoline_call` and let `MacroAssembler::patchable_far_call` delegate work to it under `UseTrampolines`. What do you think? src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 4642: > 4640: } > 4641: } else { > 4642: rt_call(zero_blocks.target(), t0); Maybe simply: `rt_call(zero_blocks.target());` as `t0` is the default temp register for `rt_call`. src/hotspot/cpu/riscv/macroAssembler_riscv.hpp line 1201: > 1199: // > 1200: // Old patchable far calls: (-XX:+UseTrampolines) > 1201: // - trampoline call: How about combine the two lines? Like: `- trampoline call (old patchable far call / -XX:+UseTrampolines):` src/hotspot/cpu/riscv/macroAssembler_riscv.hpp line 1240: > 1238: > 1239: // Emit a direct call if the entry address will always be in range, > 1240: // otherwise a patachable far call. s/patachable/patchable/ ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19453#discussion_r1639360499 PR Review Comment: https://git.openjdk.org/jdk/pull/19453#discussion_r1639349761 PR Review Comment: https://git.openjdk.org/jdk/pull/19453#discussion_r1639337696 PR Review Comment: https://git.openjdk.org/jdk/pull/19453#discussion_r1639330072 From cstein at openjdk.org Fri Jun 14 09:42:33 2024 From: cstein at openjdk.org (Christian Stein) Date: Fri, 14 Jun 2024 09:42:33 GMT Subject: RFR: 8331552: Update to use jtreg 7.4 Message-ID: <_Q1eZAC0M9Q3B8idE8tfSg0TZ7Lh-tXoLdcbV4LZsa4=.392cc442-74ce-47c0-aea5-eaee500da7c1@github.com> Please review the change to update to using `jtreg` **7.4**. The primary change is to the `jib-profiles.js` file, which specifies the version of jtreg to use, for those systems that rely on this file. In addition, the `requiredVersion` has been updated in the various `TEST.ROOT` files. Testing: _tier1-tier5 pending..._ ------------- Commit messages: - 8331552: Update to use jtreg 7.4 Changes: https://git.openjdk.org/jdk/pull/19052/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19052&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8331552 Stats: 12 lines in 8 files changed: 0 ins; 0 del; 12 mod Patch: https://git.openjdk.org/jdk/pull/19052.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19052/head:pull/19052 PR: https://git.openjdk.org/jdk/pull/19052 From ayang at openjdk.org Fri Jun 14 10:19:47 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Fri, 14 Jun 2024 10:19:47 GMT Subject: RFR: 8333962: Obsolete OldSize [v2] In-Reply-To: References: Message-ID: > Obsolete OldSize and related code. An internal variable `OldSize` is kept to capture the capacity of old-gen size. Albert Mingkun Yang has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains one commit: obsolete-old-size ------------- Changes: https://git.openjdk.org/jdk/pull/19647/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19647&range=01 Stats: 192 lines in 15 files changed: 7 ins; 168 del; 17 mod Patch: https://git.openjdk.org/jdk/pull/19647.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19647/head:pull/19647 PR: https://git.openjdk.org/jdk/pull/19647 From ihse at openjdk.org Fri Jun 14 10:51:17 2024 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Fri, 14 Jun 2024 10:51:17 GMT Subject: RFR: 8331552: Update to use jtreg 7.4 In-Reply-To: <_Q1eZAC0M9Q3B8idE8tfSg0TZ7Lh-tXoLdcbV4LZsa4=.392cc442-74ce-47c0-aea5-eaee500da7c1@github.com> References: <_Q1eZAC0M9Q3B8idE8tfSg0TZ7Lh-tXoLdcbV4LZsa4=.392cc442-74ce-47c0-aea5-eaee500da7c1@github.com> Message-ID: <-YWkhRJhUGImRoIujOxkT8oL3voSNK0s1ERO8r6b60E=.fc9ff806-5eda-4f50-a5f5-11b5616923d4@github.com> On Thu, 2 May 2024 09:48:51 GMT, Christian Stein wrote: > Please review the change to update to using `jtreg` **7.4**. > > The primary change is to the `jib-profiles.js` file, which specifies the version of jtreg to use, for those systems that rely on this file. In addition, the `requiredVersion` has been updated in the various `TEST.ROOT` files. > > Testing: _tier1-tier5 pending..._ Looks good to me. I assume that you have run an extensive set of tests to verify that this does not break, even in higher tiers? ------------- Marked as reviewed by ihse (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19052#pullrequestreview-2118094672 From jbhateja at openjdk.org Fri Jun 14 10:54:01 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 14 Jun 2024 10:54:01 GMT Subject: RFR: 8329032: C2 compiler register allocation support for APX EGPRs [v2] In-Reply-To: <835IiFAjWPI9AoU9j1WSAhVY2EEHzjpvWucUXTGxgUw=.cdb547b2-e1e7-4c5b-915f-7fde46e26c81@github.com> References: <835IiFAjWPI9AoU9j1WSAhVY2EEHzjpvWucUXTGxgUw=.cdb547b2-e1e7-4c5b-915f-7fde46e26c81@github.com> Message-ID: > Intel? Advanced Performance Extensions (Intel? APX) adds 16 new 64 bit general purpose register also known as Extended General Purpose Registers in IA-32e 64 bit mode. > > Summary of changes introduced along with this patch:- > > 1. C2 compiler register allocation support. > 2. Architecture state save restoration while transitioning from C1/C2 JIT compiled code to runtime services. > 3. Support new PUSH2/POP2 instructions along with push-pop acceleration hints (PPX) to optimize register save/restore operation. > 4. Applicable extensions to native interface used by runtime for patching instruction. > > We plan to address C1 register support in subsequent patch as there are hard upper bound allocation limits > (currently set to r11) imposed by existing implementation of linear scan algorithm after which it reserves > remaining register for special purpose. > > Patch has been regressed over stand alone test points after prioritizing EGPR allocations over existing GPR register by manually modifying the register sequences in relevant allocation class. > > We plan to do thorough validation using [Intel's SDE](https://www.intel.com/content/www/us/en/download/684897/intel-software-development-emulator.html) during course of time and release incremental patches for bug fixes > found during testing. > > Kindly review and share your feedback. > > Best Regards, > Jatin Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: 32-bit build fixes. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19042/files - new: https://git.openjdk.org/jdk/pull/19042/files/cc9ca4b4..e92349ff Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19042&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19042&range=00-01 Stats: 11 lines in 5 files changed: 6 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/19042.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19042/head:pull/19042 PR: https://git.openjdk.org/jdk/pull/19042 From mdoerr at openjdk.org Fri Jun 14 12:01:43 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Fri, 14 Jun 2024 12:01:43 GMT Subject: RFR: 8331117: [PPC64] secondary_super_cache does not scale well [v5] In-Reply-To: References: Message-ID: On Wed, 12 Jun 2024 14:35:42 GMT, Martin Doerr wrote: >> PPC64 implementation of [JDK-8180450](https://bugs.openjdk.org/browse/JDK-8180450). Please review! >> I noticed that `r_array_length` is sometimes 0 and I don't see code for that on x86. Any idea? >> How can we verify it? By comparing the performance using the micro benchmarks? >> >> Micro benchmark results without patch (measured on Power10 with 2*8 hardware threads): >> >> Original >> SecondarySuperCacheHits: 13.033 ?(99.9%) 0.058 ns/op [Average] >> SecondarySuperCacheInterContention.test avgt 15 432.366 ? 8.364 ns/op >> SecondarySuperCacheInterContention.test:t1 avgt 15 432.310 ? 8.460 ns/op >> SecondarySuperCacheInterContention.test:t2 avgt 15 432.422 ? 10.819 ns/op >> SecondarySuperCacheIntraContention.test avgt 15 355.192 ? 3.597 ns/op >> SecondarySupersLookup.testNegative00 avgt 15 12.274 ? 0.026 ns/op >> SecondarySupersLookup.testNegative01 avgt 15 12.300 ? 0.039 ns/op >> SecondarySupersLookup.testNegative02 avgt 15 12.304 ? 0.034 ns/op >> SecondarySupersLookup.testNegative03 avgt 15 12.276 ? 0.050 ns/op >> SecondarySupersLookup.testNegative04 avgt 15 12.235 ? 0.044 ns/op >> SecondarySupersLookup.testNegative05 avgt 15 12.308 ? 0.156 ns/op >> SecondarySupersLookup.testNegative06 avgt 15 12.291 ? 0.048 ns/op >> SecondarySupersLookup.testNegative07 avgt 15 12.307 ? 0.052 ns/op >> SecondarySupersLookup.testNegative08 avgt 15 12.398 ? 0.075 ns/op >> SecondarySupersLookup.testNegative09 avgt 15 12.552 ? 0.122 ns/op >> SecondarySupersLookup.testNegative10 avgt 15 12.490 ? 0.083 ns/op >> SecondarySupersLookup.testNegative16 avgt 15 12.565 ? 0.092 ns/op >> SecondarySupersLookup.testNegative20 avgt 15 19.059 ? 0.958 ns/op >> SecondarySupersLookup.testNegative30 avgt 15 19.268 ? 0.124 ns/op >> SecondarySupersLookup.testNegative32 avgt 15 20.059 ? 0.114 ns/op >> SecondarySupersLookup.testNegative40 avgt 15 25.117 ? 0.368 ns/op >> SecondarySupersLookup.testNegative50 avgt 15 32.735 ? 0.359 ns/op >> SecondarySupersLookup.testNegative55 avgt 15 34.866 ? 0.152 ns/op >> SecondarySupersLookup.testNegative56 avgt 15 35.492 ? 0.276 ns/op >> SecondarySupersLookup.testNegative57 avgt 15 36.620 ? 0.334 ns/op >> SecondarySupersLookup.testNegative58 avgt 15 37.226 ? 0.180 ns/op >> SecondarySupersLookup.testNegative59 avgt 15 37.774 ? 0.241 ns/op >> SecondarySupersLookup.testNegative60 avgt 15 38.627 ? 1.451 ns/op >> SecondarySupersLookup.testNegative61 avgt 15 ... > > Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: > > Remove pointless assertion. Thanks for reviewing! Your suggestions are all good (see my update). ------------- PR Comment: https://git.openjdk.org/jdk/pull/19368#issuecomment-2167868849 From mdoerr at openjdk.org Fri Jun 14 12:01:43 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Fri, 14 Jun 2024 12:01:43 GMT Subject: RFR: 8331117: [PPC64] secondary_super_cache does not scale well [v6] In-Reply-To: References: Message-ID: > PPC64 implementation of [JDK-8180450](https://bugs.openjdk.org/browse/JDK-8180450). Please review! > I noticed that `r_array_length` is sometimes 0 and I don't see code for that on x86. Any idea? > How can we verify it? By comparing the performance using the micro benchmarks? > > Micro benchmark results without patch (measured on Power10 with 2*8 hardware threads): > > Original > SecondarySuperCacheHits: 13.033 ?(99.9%) 0.058 ns/op [Average] > SecondarySuperCacheInterContention.test avgt 15 432.366 ? 8.364 ns/op > SecondarySuperCacheInterContention.test:t1 avgt 15 432.310 ? 8.460 ns/op > SecondarySuperCacheInterContention.test:t2 avgt 15 432.422 ? 10.819 ns/op > SecondarySuperCacheIntraContention.test avgt 15 355.192 ? 3.597 ns/op > SecondarySupersLookup.testNegative00 avgt 15 12.274 ? 0.026 ns/op > SecondarySupersLookup.testNegative01 avgt 15 12.300 ? 0.039 ns/op > SecondarySupersLookup.testNegative02 avgt 15 12.304 ? 0.034 ns/op > SecondarySupersLookup.testNegative03 avgt 15 12.276 ? 0.050 ns/op > SecondarySupersLookup.testNegative04 avgt 15 12.235 ? 0.044 ns/op > SecondarySupersLookup.testNegative05 avgt 15 12.308 ? 0.156 ns/op > SecondarySupersLookup.testNegative06 avgt 15 12.291 ? 0.048 ns/op > SecondarySupersLookup.testNegative07 avgt 15 12.307 ? 0.052 ns/op > SecondarySupersLookup.testNegative08 avgt 15 12.398 ? 0.075 ns/op > SecondarySupersLookup.testNegative09 avgt 15 12.552 ? 0.122 ns/op > SecondarySupersLookup.testNegative10 avgt 15 12.490 ? 0.083 ns/op > SecondarySupersLookup.testNegative16 avgt 15 12.565 ? 0.092 ns/op > SecondarySupersLookup.testNegative20 avgt 15 19.059 ? 0.958 ns/op > SecondarySupersLookup.testNegative30 avgt 15 19.268 ? 0.124 ns/op > SecondarySupersLookup.testNegative32 avgt 15 20.059 ? 0.114 ns/op > SecondarySupersLookup.testNegative40 avgt 15 25.117 ? 0.368 ns/op > SecondarySupersLookup.testNegative50 avgt 15 32.735 ? 0.359 ns/op > SecondarySupersLookup.testNegative55 avgt 15 34.866 ? 0.152 ns/op > SecondarySupersLookup.testNegative56 avgt 15 35.492 ? 0.276 ns/op > SecondarySupersLookup.testNegative57 avgt 15 36.620 ? 0.334 ns/op > SecondarySupersLookup.testNegative58 avgt 15 37.226 ? 0.180 ns/op > SecondarySupersLookup.testNegative59 avgt 15 37.774 ? 0.241 ns/op > SecondarySupersLookup.testNegative60 avgt 15 38.627 ? 1.451 ns/op > SecondarySupersLookup.testNegative61 avgt 15 39.395 ? 0.249 ns/op > SecondarySupersLookup.testNegative62 avgt 15 ... Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: Minor improvements according to review suggestions. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19368/files - new: https://git.openjdk.org/jdk/pull/19368/files/1736aa6a..bea2f938 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19368&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19368&range=04-05 Stats: 17 lines in 3 files changed: 0 ins; 2 del; 15 mod Patch: https://git.openjdk.org/jdk/pull/19368.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19368/head:pull/19368 PR: https://git.openjdk.org/jdk/pull/19368 From mli at openjdk.org Fri Jun 14 12:22:14 2024 From: mli at openjdk.org (Hamlin Li) Date: Fri, 14 Jun 2024 12:22:14 GMT Subject: RFR: 8332587: RISC-V: secondary_super_cache does not scale well [v6] In-Reply-To: References: Message-ID: On Fri, 14 Jun 2024 03:40:36 GMT, Gui Cao wrote: >> Implementation of subtype checking [JDK-8180450](https://bugs.openjdk.org/browse/JDK-8180450) for linux-riscv64. >> This optimization depends on availability of the Zbb extension which has the cpop instruction. >> >> ### Correctness testing: >> >> - [x] Run tier1-3, hotspot:tier4 tests on SOPHON SG2042 (release) >> - [x] Run tier1-3 tests with -XX:+UseRVV on qemu 8.1.0 (fastdebug) >> - [x] Run all of tier1 with `-XX:+VerifySecondarySupers` >> >> ### JMH tested on Banana Pi BPI-F3 board (has Zbb) and Enable UseZbb, Not Enable UseZba, UseZbs >> Original: >> >> Benchmark Mode Cnt Score Error Units >> SecondarySuperCacheHits.test avgt 15 11.375 ? 0.071 ns/op >> SecondarySuperCacheInterContention.test avgt 15 646.087 ? 32.587 ns/op >> SecondarySuperCacheInterContention.test:t1 avgt 15 600.090 ? 83.779 ns/op >> SecondarySuperCacheInterContention.test:t2 avgt 15 692.084 ? 73.218 ns/op >> SecondarySupersLookup.testNegative00 avgt 15 16.420 ? 0.239 ns/op >> SecondarySupersLookup.testNegative01 avgt 15 18.307 ? 0.260 ns/op >> SecondarySupersLookup.testNegative02 avgt 15 21.695 ? 0.458 ns/op >> SecondarySupersLookup.testNegative03 avgt 15 24.855 ? 0.664 ns/op >> SecondarySupersLookup.testNegative04 avgt 15 27.305 ? 0.522 ns/op >> SecondarySupersLookup.testNegative05 avgt 15 29.719 ? 0.385 ns/op >> SecondarySupersLookup.testNegative06 avgt 15 32.231 ? 0.498 ns/op >> SecondarySupersLookup.testNegative07 avgt 15 33.747 ? 0.603 ns/op >> SecondarySupersLookup.testNegative08 avgt 15 35.856 ? 0.629 ns/op >> SecondarySupersLookup.testNegative09 avgt 15 37.077 ? 0.546 ns/op >> SecondarySupersLookup.testNegative10 avgt 15 39.408 ? 0.465 ns/op >> SecondarySupersLookup.testNegative16 avgt 15 51.041 ? 0.547 ns/op >> SecondarySupersLookup.testNegative20 avgt 15 58.722 ? 0.922 ns/op >> SecondarySupersLookup.testNegative30 avgt 15 77.310 ? 0.654 ns/op >> SecondarySupersLookup.testNegative32 avgt 15 81.116 ? 0.854 ns/op >> SecondarySupersLookup.testNegative40 avgt 15 96.311 ? 0.840 ns/op >> SecondarySupersLookup.testNegative50 avgt 15 115.427 ? 0.838 ns/op >> SecondarySupersLookup.testNegative55 avgt 15 124.371 ? 1.076 ns/op >> SecondarySupersLookup.testNegative56 avgt 15 126.796 ? 0.916 ns/op >> SecondarySupersLookup.testNegative57 avgt 15 127.952 ? 1.202 ns/op >> SecondarySupersLookup.testNegative58 avgt 15 131.956 ? 4.515 ns/op >> Seco... > > Gui Cao has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 12 additional commits since the last revision: > > - Merge remote-tracking branch 'upstream/master' into JDK-8332587 > - Update ins_cost for PartialSubtypeCheck > - Code Format > - Merge remote-tracking branch 'upstream/master' into JDK-8332587 > - Polish Code Comment > - Merge remote-tracking branch 'upstream/master' into JDK-8332587 > - Fix Code format > - Fix for Hamlin comment > - Merge remote-tracking branch 'upstream/master' into JDK-8332587 > - Fix client VM build > - ... and 2 more: https://git.openjdk.org/jdk/compare/8daf2fea...142d7677 Thanks for your patience. I was thinking to jump to `L_bitmap_full` in `lookup_secondary_supers_table_slow_path`, in this way I guess it might address the performance issue when bitmap full, and not introduce regression in other cases. But I'm not sure how much complication will be brought into the implementation. So, let's skip this rare case optmization. Some minor comment, otherwise looks good. src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 5613: > 5611: } > 5612: > 5613: #ifdef COMPILER2 Maybe put other "secondary super table" related code also inside COMPILER2 macro? ------------- PR Review: https://git.openjdk.org/jdk/pull/19320#pullrequestreview-2118147392 PR Review Comment: https://git.openjdk.org/jdk/pull/19320#discussion_r1639676041 From rrich at openjdk.org Fri Jun 14 12:38:13 2024 From: rrich at openjdk.org (Richard Reingruber) Date: Fri, 14 Jun 2024 12:38:13 GMT Subject: RFR: 8331117: [PPC64] secondary_super_cache does not scale well [v6] In-Reply-To: References: Message-ID: On Fri, 14 Jun 2024 12:01:43 GMT, Martin Doerr wrote: >> PPC64 implementation of [JDK-8180450](https://bugs.openjdk.org/browse/JDK-8180450). Please review! >> I noticed that `r_array_length` is sometimes 0 and I don't see code for that on x86. Any idea? (This has been addressed in the discussion.) >> How can we verify it? By comparing the performance using the micro benchmarks? >> >> Micro benchmark results without patch (measured on Power10 with 2*8 hardware threads): >> >> Original >> SecondarySuperCacheHits: 13.033 ?(99.9%) 0.058 ns/op [Average] >> SecondarySuperCacheInterContention.test avgt 15 432.366 ? 8.364 ns/op >> SecondarySuperCacheInterContention.test:t1 avgt 15 432.310 ? 8.460 ns/op >> SecondarySuperCacheInterContention.test:t2 avgt 15 432.422 ? 10.819 ns/op >> SecondarySuperCacheIntraContention.test avgt 15 355.192 ? 3.597 ns/op >> SecondarySupersLookup.testNegative00 avgt 15 12.274 ? 0.026 ns/op >> SecondarySupersLookup.testNegative01 avgt 15 12.300 ? 0.039 ns/op >> SecondarySupersLookup.testNegative02 avgt 15 12.304 ? 0.034 ns/op >> SecondarySupersLookup.testNegative03 avgt 15 12.276 ? 0.050 ns/op >> SecondarySupersLookup.testNegative04 avgt 15 12.235 ? 0.044 ns/op >> SecondarySupersLookup.testNegative05 avgt 15 12.308 ? 0.156 ns/op >> SecondarySupersLookup.testNegative06 avgt 15 12.291 ? 0.048 ns/op >> SecondarySupersLookup.testNegative07 avgt 15 12.307 ? 0.052 ns/op >> SecondarySupersLookup.testNegative08 avgt 15 12.398 ? 0.075 ns/op >> SecondarySupersLookup.testNegative09 avgt 15 12.552 ? 0.122 ns/op >> SecondarySupersLookup.testNegative10 avgt 15 12.490 ? 0.083 ns/op >> SecondarySupersLookup.testNegative16 avgt 15 12.565 ? 0.092 ns/op >> SecondarySupersLookup.testNegative20 avgt 15 19.059 ? 0.958 ns/op >> SecondarySupersLookup.testNegative30 avgt 15 19.268 ? 0.124 ns/op >> SecondarySupersLookup.testNegative32 avgt 15 20.059 ? 0.114 ns/op >> SecondarySupersLookup.testNegative40 avgt 15 25.117 ? 0.368 ns/op >> SecondarySupersLookup.testNegative50 avgt 15 32.735 ? 0.359 ns/op >> SecondarySupersLookup.testNegative55 avgt 15 34.866 ? 0.152 ns/op >> SecondarySupersLookup.testNegative56 avgt 15 35.492 ? 0.276 ns/op >> SecondarySupersLookup.testNegative57 avgt 15 36.620 ? 0.334 ns/op >> SecondarySupersLookup.testNegative58 avgt 15 37.226 ? 0.180 ns/op >> SecondarySupersLookup.testNegative59 avgt 15 37.774 ? 0.241 ns/op >> SecondarySupersLookup.testNegative60 avgt 15 38.627 ? 1.451 ns/op >> Sec... > > Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: > > Minor improvements according to review suggestions. Looks good! Thanks, Richard. ------------- Marked as reviewed by rrich (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19368#pullrequestreview-2118279135 From gcao at openjdk.org Fri Jun 14 12:42:14 2024 From: gcao at openjdk.org (Gui Cao) Date: Fri, 14 Jun 2024 12:42:14 GMT Subject: RFR: 8334078: RISC-V: TestIntVect.java fails after JDK-8332153 when running without RVV In-Reply-To: References: Message-ID: On Thu, 13 Jun 2024 03:24:15 GMT, Gui Cao wrote: > Hi, test/hotspot/jtreg/compiler/c2/cr7200264/TestIntVect.java fails without RVV after [JDK-8332153](https://bugs.openjdk.org/browse/JDK-8332153) in fastdebug mode. see jbs issue for exception information. > > As discussed on jbs, we prefixed the single letter cpu features with rv so that there would be no problem. And to synchronize the test cases. > > After this patch, we can get cpu feature string like this: > > ----------System.out:(4/168)---------- > WB.getCPUFeatures(): "rv64 rvi rvm rva rvf rvd rvc rvv" > CPUInfo.getAdditionalCPUInfo(): "" > CPUInfo.getFeatures(): [rv64, rvi, rvm, rva, rvf, rvd, rvc, rvv] > TEST PASSED > > > ### Testing > - [x] All Tests related to all changes in this patch on Banana Pi BPI-F3 board (with RVV1.0) (fastdebug) > - [x] All Tests related to all changes in this patch on SOPHON SG2042 (fastdebug) > - [ ] Run tier1-3 tests on SOPHON SG2042 (fastdebug) @Hamlin-Li : Are you OK with this change? This touches tests added by your previous PRs. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19686#issuecomment-2167946318 From rehn at openjdk.org Fri Jun 14 12:54:16 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Fri, 14 Jun 2024 12:54:16 GMT Subject: RFR: 8332689: RISC-V: Use load instead of trampolines [v7] In-Reply-To: References: <3UcoOmZTsZJBu0ZkOZQ21-ynr1slrOGCDuWYFnJQz1U=.6467ec54-cd28-4aae-a90c-6b6858f37986@github.com> Message-ID: On Tue, 11 Jun 2024 20:32:47 GMT, Hamlin Li wrote: >> I am considering names like `indirect_jump_link` :-) > > I'm not sure, but better have a `jump` in its name, just `load` is misleading. update ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19453#discussion_r1639774007 From rehn at openjdk.org Fri Jun 14 12:54:20 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Fri, 14 Jun 2024 12:54:20 GMT Subject: RFR: 8332689: RISC-V: Use load instead of trampolines [v10] In-Reply-To: References: Message-ID: On Tue, 11 Jun 2024 20:36:03 GMT, Hamlin Li wrote: >> Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 15 commits: >> >> - Merge branch 'master' into 8332689 >> - Merge branch 'master' into 8332689 >> - Merge branch 'master' into 8332689 >> - Remove tmp file >> - Prepare for dynamic NativeCall size >> - Only allow one calling convetion, i.e. fixed sized >> - Merge branch 'master' into 8332689 >> - Review comments >> - Move shart/far code to cpp >> - Cleanup >> - ... and 5 more: https://git.openjdk.org/jdk/compare/93f3918e...eb30360a > > src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 986: > >> 984: assert_cond(source != nullptr); >> 985: int64_t distance = source - pc(); >> 986: assert(is_simm32(distance), "Must be"); > > seems load_link can jump to about +/-2G dest from pc, jump_link seems support full address range jump (e.g. 48 bits)? The load can only happen within -+2Gb but address loaded is any 64 bit. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19453#discussion_r1639775430 From rehn at openjdk.org Fri Jun 14 12:54:24 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Fri, 14 Jun 2024 12:54:24 GMT Subject: RFR: 8332689: RISC-V: Use load instead of trampolines [v7] In-Reply-To: References: <3UcoOmZTsZJBu0ZkOZQ21-ynr1slrOGCDuWYFnJQz1U=.6467ec54-cd28-4aae-a90c-6b6858f37986@github.com> Message-ID: On Thu, 6 Jun 2024 21:07:16 GMT, Hamlin Li wrote: >> Robbin Ehn has updated the pull request incrementally with one additional commit since the last revision: >> >> Remove tmp file > > src/hotspot/cpu/riscv/nativeInst_riscv.cpp line 519: > >> 517: >> 518: address NativeCall::instruction_address() const { >> 519: if (!UseTrampolines) { > > maybe use positive condition check? similar suggestion for below conditions. ok ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19453#discussion_r1639773666 From rehn at openjdk.org Fri Jun 14 12:54:23 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Fri, 14 Jun 2024 12:54:23 GMT Subject: RFR: 8332689: RISC-V: Use load instead of trampolines [v11] In-Reply-To: References: Message-ID: <7L-xX_2BmAPm_XnAb7PYIKCkSX0j0rAnSUUL8XSv1QA=.bf49c924-c501-4c7c-a757-d54aae13b328@github.com> On Fri, 14 Jun 2024 06:58:14 GMT, Fei Yang wrote: >> Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 16 commits: >> >> - Merge branch 'master' into 8332689 >> - Merge branch 'master' into 8332689 >> - Merge branch 'master' into 8332689 >> - Merge branch 'master' into 8332689 >> - Remove tmp file >> - Prepare for dynamic NativeCall size >> - Only allow one calling convetion, i.e. fixed sized >> - Merge branch 'master' into 8332689 >> - Review comments >> - Move shart/far code to cpp >> - ... and 6 more: https://git.openjdk.org/jdk/compare/5d2a19de...bb7249b8 > > src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 3732: > >> 3730: // Maybe emit a call via a trampoline. If the code cache is small >> 3731: // trampolines won't be emitted. >> 3732: address MacroAssembler::patchable_far_call(Address entry) { > > It doesn't look nice to me for `UseTrampolines` checks to be spread across this `MacroAssembler::patchable_far_call` function. I would suggest to keep the original `MacroAssembler::trampoline_call` and let `MacroAssembler::patchable_far_call` delegate work to it under `UseTrampolines`. What do you think? fixed > src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 4642: > >> 4640: } >> 4641: } else { >> 4642: rt_call(zero_blocks.target(), t0); > > Maybe simply: `rt_call(zero_blocks.target());` as `t0` is the default temp register for `rt_call`. ok > src/hotspot/cpu/riscv/macroAssembler_riscv.hpp line 1201: > >> 1199: // >> 1200: // Old patchable far calls: (-XX:+UseTrampolines) >> 1201: // - trampoline call: > > How about combine the two lines? Like: > `- trampoline call (old patchable far call / -XX:+UseTrampolines):` ok > src/hotspot/cpu/riscv/macroAssembler_riscv.hpp line 1240: > >> 1238: >> 1239: // Emit a direct call if the entry address will always be in range, >> 1240: // otherwise a patachable far call. > > s/patachable/patchable/ fixed ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19453#discussion_r1639778686 PR Review Comment: https://git.openjdk.org/jdk/pull/19453#discussion_r1639778123 PR Review Comment: https://git.openjdk.org/jdk/pull/19453#discussion_r1639776855 PR Review Comment: https://git.openjdk.org/jdk/pull/19453#discussion_r1639776675 From erikj at openjdk.org Fri Jun 14 12:56:23 2024 From: erikj at openjdk.org (Erik Joelsson) Date: Fri, 14 Jun 2024 12:56:23 GMT Subject: RFR: 8331552: Update to use jtreg 7.4 In-Reply-To: <_Q1eZAC0M9Q3B8idE8tfSg0TZ7Lh-tXoLdcbV4LZsa4=.392cc442-74ce-47c0-aea5-eaee500da7c1@github.com> References: <_Q1eZAC0M9Q3B8idE8tfSg0TZ7Lh-tXoLdcbV4LZsa4=.392cc442-74ce-47c0-aea5-eaee500da7c1@github.com> Message-ID: On Thu, 2 May 2024 09:48:51 GMT, Christian Stein wrote: > Please review the change to update to using `jtreg` **7.4**. > > The primary change is to the `jib-profiles.js` file, which specifies the version of jtreg to use, for those systems that rely on this file. In addition, the `requiredVersion` has been updated in the various `TEST.ROOT` files. > > Testing: _tier1-tier5 pending..._ Marked as reviewed by erikj (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/19052#pullrequestreview-2118319536 From amitkumar at openjdk.org Fri Jun 14 13:12:43 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Fri, 14 Jun 2024 13:12:43 GMT Subject: RFR: 8331126: [s390x] secondary_super_cache does not scale well Message-ID: s390x Port for [JDK-8180450](https://bugs.openjdk.org/browse/JDK-8180450) I ran `tier1` test with `-XX:+UseSecondarySupersTable -XX:+VerifySecondarySupers -XX:+StressSecondarySupers` in fastdebug vm and didn't see any new failure appearing; Except one I have mentioned [here](https://github.com/openjdk/jdk/pull/19368#issuecomment-2154983693); But this is reproducible on every other architecture with these flags. Without Patch: SecondarySuperCacheHits.test avgt 15 0.929 ? 0.010 ns/op SecondarySuperCacheInterContention.test avgt 15 1.413 ? 0.007 ns/op SecondarySuperCacheInterContention.test:t1 avgt 15 1.415 ? 0.016 ns/op SecondarySuperCacheInterContention.test:t2 avgt 15 1.410 ? 0.017 ns/op Benchmark Mode Cnt Score Error Units SecondarySupersLookup.testNegative00 avgt 15 1.806 ? 0.325 ns/op SecondarySupersLookup.testNegative01 avgt 15 2.364 ? 0.236 ns/op SecondarySupersLookup.testNegative02 avgt 15 2.903 ? 0.215 ns/op SecondarySupersLookup.testNegative03 avgt 15 3.417 ? 0.199 ns/op SecondarySupersLookup.testNegative04 avgt 15 3.758 ? 0.102 ns/op SecondarySupersLookup.testNegative05 avgt 15 4.352 ? 0.123 ns/op SecondarySupersLookup.testNegative06 avgt 15 4.800 ? 0.099 ns/op SecondarySupersLookup.testNegative07 avgt 15 5.365 ? 0.060 ns/op SecondarySupersLookup.testNegative08 avgt 15 6.316 ? 0.092 ns/op SecondarySupersLookup.testNegative09 avgt 15 6.669 ? 0.164 ns/op SecondarySupersLookup.testNegative10 avgt 15 7.041 ? 0.164 ns/op SecondarySupersLookup.testNegative16 avgt 15 9.336 ? 0.185 ns/op SecondarySupersLookup.testNegative20 avgt 15 11.373 ? 0.029 ns/op SecondarySupersLookup.testNegative30 avgt 15 15.236 ? 0.051 ns/op SecondarySupersLookup.testNegative32 avgt 15 16.031 ? 0.091 ns/op SecondarySupersLookup.testNegative40 avgt 15 19.197 ? 0.279 ns/op SecondarySupersLookup.testNegative50 avgt 15 23.804 ? 2.387 ns/op SecondarySupersLookup.testNegative55 avgt 15 25.610 ? 1.155 ns/op SecondarySupersLookup.testNegative56 avgt 15 26.128 ? 2.203 ns/op SecondarySupersLookup.testNegative57 avgt 15 26.126 ? 0.881 ns/op SecondarySupersLookup.testNegative58 avgt 15 26.314 ? 0.521 ns/op SecondarySupersLookup.testNegative59 avgt 15 26.750 ? 0.837 ns/op SecondarySupersLookup.testNegative60 avgt 15 27.118 ? 0.557 ns/op SecondarySupersLookup.testNegative61 avgt 15 27.763 ? 1.628 ns/op SecondarySupersLookup.testNegative62 avgt 15 27.859 ? 0.574 ns/op SecondarySupersLookup.testNegative63 avgt 15 28.333 ? 0.763 ns/op SecondarySupersLookup.testNegative64 avgt 15 29.325 ? 2.331 ns/op SecondarySupersLookup.testPositive01 avgt 15 1.759 ? 0.259 ns/op SecondarySupersLookup.testPositive02 avgt 15 2.664 ? 0.192 ns/op SecondarySupersLookup.testPositive03 avgt 15 3.156 ? 0.317 ns/op SecondarySupersLookup.testPositive04 avgt 15 3.544 ? 0.243 ns/op SecondarySupersLookup.testPositive05 avgt 15 4.038 ? 0.267 ns/op SecondarySupersLookup.testPositive06 avgt 15 4.350 ? 0.172 ns/op SecondarySupersLookup.testPositive07 avgt 15 4.754 ? 0.223 ns/op SecondarySupersLookup.testPositive08 avgt 15 5.183 ? 0.232 ns/op SecondarySupersLookup.testPositive09 avgt 15 5.676 ? 0.267 ns/op SecondarySupersLookup.testPositive10 avgt 15 6.022 ? 0.219 ns/op SecondarySupersLookup.testPositive16 avgt 15 8.647 ? 0.317 ns/op SecondarySupersLookup.testPositive20 avgt 15 10.668 ? 0.318 ns/op SecondarySupersLookup.testPositive30 avgt 15 15.355 ? 0.646 ns/op SecondarySupersLookup.testPositive32 avgt 15 15.958 ? 0.364 ns/op SecondarySupersLookup.testPositive40 avgt 15 19.227 ? 0.571 ns/op SecondarySupersLookup.testPositive50 avgt 15 26.029 ? 3.961 ns/op SecondarySupersLookup.testPositive60 avgt 15 30.197 ? 3.974 ns/op SecondarySupersLookup.testPositive63 avgt 15 31.863 ? 3.963 ns/op SecondarySupersLookup.testPositive64 avgt 15 31.466 ? 4.002 ns/op TypePollution.instanceOfInterfaceSwitchLinearNoSCC avgt 12 12108.484 ? 144.972 ns/op TypePollution.instanceOfInterfaceSwitchLinearSCC avgt 12 10926.627 ? 66.176 ns/op TypePollution.instanceOfInterfaceSwitchTableNoSCC avgt 12 11914.055 ? 182.944 ns/op TypePollution.instanceOfInterfaceSwitchTableSCC avgt 12 10966.667 ? 68.248 ns/op TypePollution.parallelInstanceOfInterfaceSwitchLinearNoSCC avgt 12 243.053 ? 1.857 ms/op TypePollution.parallelInstanceOfInterfaceSwitchLinearSCC avgt 12 219.391 ? 0.630 ms/op TypePollution.parallelInstanceOfInterfaceSwitchTableNoSCC avgt 12 242.222 ? 0.620 ms/op TypePollution.parallelInstanceOfInterfaceSwitchTableSCC avgt 12 218.529 ? 1.236 ms/op With Patch: SecondarySuperCacheHits.test avgt 15 0.927 ? 0.005 ns/op SecondarySuperCacheInterContention.test avgt 15 1.414 ? 0.014 ns/op SecondarySuperCacheInterContention.test:t1 avgt 15 1.423 ? 0.027 ns/op SecondarySuperCacheInterContention.test:t2 avgt 15 1.405 ? 0.022 ns/op SecondarySupersLookup.testNegative00 avgt 15 1.352 ? 0.009 ns/op SecondarySupersLookup.testNegative01 avgt 15 1.409 ? 0.160 ns/op SecondarySupersLookup.testNegative02 avgt 15 1.353 ? 0.008 ns/op SecondarySupersLookup.testNegative03 avgt 15 1.411 ? 0.161 ns/op SecondarySupersLookup.testNegative04 avgt 15 1.353 ? 0.009 ns/op SecondarySupersLookup.testNegative05 avgt 15 1.353 ? 0.010 ns/op SecondarySupersLookup.testNegative06 avgt 15 1.380 ? 0.080 ns/op SecondarySupersLookup.testNegative07 avgt 15 1.354 ? 0.012 ns/op SecondarySupersLookup.testNegative08 avgt 15 1.353 ? 0.010 ns/op SecondarySupersLookup.testNegative09 avgt 15 1.354 ? 0.010 ns/op SecondarySupersLookup.testNegative10 avgt 15 1.353 ? 0.008 ns/op SecondarySupersLookup.testNegative16 avgt 15 1.353 ? 0.010 ns/op SecondarySupersLookup.testNegative20 avgt 15 1.353 ? 0.010 ns/op SecondarySupersLookup.testNegative30 avgt 15 1.353 ? 0.010 ns/op SecondarySupersLookup.testNegative32 avgt 15 1.354 ? 0.009 ns/op SecondarySupersLookup.testNegative40 avgt 15 1.368 ? 0.055 ns/op SecondarySupersLookup.testNegative50 avgt 15 1.353 ? 0.010 ns/op SecondarySupersLookup.testNegative55 avgt 15 5.166 ? 0.121 ns/op SecondarySupersLookup.testNegative56 avgt 15 5.147 ? 0.070 ns/op SecondarySupersLookup.testNegative57 avgt 15 5.150 ? 0.074 ns/op SecondarySupersLookup.testNegative58 avgt 15 5.144 ? 0.063 ns/op SecondarySupersLookup.testNegative59 avgt 15 5.142 ? 0.062 ns/op SecondarySupersLookup.testNegative60 avgt 15 9.679 ? 0.434 ns/op SecondarySupersLookup.testNegative61 avgt 15 9.587 ? 0.119 ns/op SecondarySupersLookup.testNegative62 avgt 15 9.570 ? 0.056 ns/op SecondarySupersLookup.testNegative63 avgt 15 28.957 ? 2.511 ns/op SecondarySupersLookup.testNegative64 avgt 15 29.815 ? 3.158 ns/op SecondarySupersLookup.testPositive01 avgt 15 1.680 ? 0.114 ns/op SecondarySupersLookup.testPositive02 avgt 15 1.683 ? 0.118 ns/op SecondarySupersLookup.testPositive03 avgt 15 1.680 ? 0.113 ns/op SecondarySupersLookup.testPositive04 avgt 15 1.681 ? 0.115 ns/op SecondarySupersLookup.testPositive05 avgt 15 1.690 ? 0.134 ns/op SecondarySupersLookup.testPositive06 avgt 15 1.682 ? 0.117 ns/op SecondarySupersLookup.testPositive07 avgt 15 1.683 ? 0.119 ns/op SecondarySupersLookup.testPositive08 avgt 15 1.687 ? 0.115 ns/op SecondarySupersLookup.testPositive09 avgt 15 1.683 ? 0.117 ns/op SecondarySupersLookup.testPositive10 avgt 15 1.681 ? 0.115 ns/op SecondarySupersLookup.testPositive16 avgt 15 1.681 ? 0.115 ns/op SecondarySupersLookup.testPositive20 avgt 15 1.681 ? 0.114 ns/op SecondarySupersLookup.testPositive30 avgt 15 1.688 ? 0.130 ns/op SecondarySupersLookup.testPositive32 avgt 15 3.059 ? 0.117 ns/op SecondarySupersLookup.testPositive40 avgt 15 3.063 ? 0.124 ns/op SecondarySupersLookup.testPositive50 avgt 15 1.689 ? 0.133 ns/op SecondarySupersLookup.testPositive60 avgt 15 3.060 ? 0.118 ns/op SecondarySupersLookup.testPositive63 avgt 15 27.292 ? 0.252 ns/op SecondarySupersLookup.testPositive64 avgt 15 28.949 ? 3.108 ns/op TypePollution.instanceOfInterfaceSwitchLinearNoSCC avgt 12 11872.844 ? 199.007 ns/op TypePollution.instanceOfInterfaceSwitchLinearSCC avgt 12 10979.060 ? 41.874 ns/op TypePollution.instanceOfInterfaceSwitchTableNoSCC avgt 12 4770.714 ? 10.642 ns/op TypePollution.instanceOfInterfaceSwitchTableSCC avgt 12 4840.972 ? 18.488 ns/op TypePollution.parallelInstanceOfInterfaceSwitchLinearNoSCC avgt 12 242.041 ? 2.459 ms/op TypePollution.parallelInstanceOfInterfaceSwitchLinearSCC avgt 12 218.714 ? 1.016 ms/op TypePollution.parallelInstanceOfInterfaceSwitchTableNoSCC avgt 12 93.759 ? 1.082 ms/op TypePollution.parallelInstanceOfInterfaceSwitchTableSCC avgt 12 94.644 ? 0.275 ms/op ------------- Commit messages: - [s390x] secondary super cache port Changes: https://git.openjdk.org/jdk/pull/19544/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19544&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8331126 Stats: 434 lines in 5 files changed: 433 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/19544.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19544/head:pull/19544 PR: https://git.openjdk.org/jdk/pull/19544 From rehn at openjdk.org Fri Jun 14 13:25:32 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Fri, 14 Jun 2024 13:25:32 GMT Subject: RFR: 8332689: RISC-V: Use load instead of trampolines [v12] In-Reply-To: References: Message-ID: > Hi all, please consider! > > Today we do JAL to **dest** if **dest** is in reach (+/- 1 MB). > Using a very small application or running very short time we have fast patchable calls. > But any normal application running longer will increase the code size and code chrun/fragmentation. > So whatever or not you get hot fast calls rely on luck. > > To be patchable and get code cache reach we also emit a stub trampoline which we can point the JAL to. > This would be the common case for a patchable call. > > Code stream: > JAL > Stubs: > AUIPC > LD > JALR > > > > On some CPUs L1D and L1I can't contain the same cache line, which means the tramopline stub can bounce from L1I->L1D->L1I, which is expensive. > Even if you don't have that problem having a call to a jump is not the fastest way. > Loading the address avoids the pitsfalls of cmodx. > > This patch suggest to solve the problems with trampolines, we take small penalty in the naive case of JAL to **dest**, > and instead do by default: > > Code stream: > AUIPC > LD > JALR > Stubs: > > > An experimental option for turning trampolines back on exists. > > It should be possible to enhanced this with the WIP [Zjid](https://github.com/riscv/riscv-j-extension) by changing the JALR to JAL and nop out the auipc+ld (as the current proposal of Zjid forces the I-fetcher to fetch instruction in order (meaning we will avoid a lot issues which arm has)) when in reach and vice-versa. > > Numbers from VF2 (I have done them a few times, they are always overall in favor of this patch): > > fop (msec) 2239 | 2128 = 0.950424 > h2 (msec) 18660 | 16594 = 0.889282 > jython (msec) 22022 | 21925 = 0.995595 > luindex (msec) 2866 | 2842 = 0.991626 > lusearch (msec) 4108 | 4311 = 1.04942 > lusearch-fix (msec) 4406 | 4116 = 0.934181 > pmd (msec) 5976 | 5897 = 0.98678 > jython (msec) 22022 | 21925 = 0.995595 > Avg: 0.974112 > fop(xcomp) (msec) 2721 | 2714 = 0.997427 > h2(xcomp) (msec) 37719 | 38004 = 1.00756 > jython(xcomp) ... Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 18 commits: - Review comments, removed dead code. - Merge branch 'master' into 8332689 - Merge branch 'master' into 8332689 - Merge branch 'master' into 8332689 - Merge branch 'master' into 8332689 - Merge branch 'master' into 8332689 - Remove tmp file - Prepare for dynamic NativeCall size - Only allow one calling convetion, i.e. fixed sized - Merge branch 'master' into 8332689 - ... and 8 more: https://git.openjdk.org/jdk/compare/cc64aeac...f1dd3e16 ------------- Changes: https://git.openjdk.org/jdk/pull/19453/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19453&range=11 Stats: 914 lines in 16 files changed: 652 ins; 167 del; 95 mod Patch: https://git.openjdk.org/jdk/pull/19453.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19453/head:pull/19453 PR: https://git.openjdk.org/jdk/pull/19453 From mdoerr at openjdk.org Fri Jun 14 13:38:29 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Fri, 14 Jun 2024 13:38:29 GMT Subject: RFR: 8331117: [PPC64] secondary_super_cache does not scale well [v6] In-Reply-To: References: Message-ID: On Fri, 14 Jun 2024 12:01:43 GMT, Martin Doerr wrote: >> PPC64 implementation of [JDK-8180450](https://bugs.openjdk.org/browse/JDK-8180450). Please review! >> I noticed that `r_array_length` is sometimes 0 and I don't see code for that on x86. Any idea? (This has been addressed in the discussion.) >> How can we verify it? By comparing the performance using the micro benchmarks? >> >> Micro benchmark results without patch (measured on Power10 with 2*8 hardware threads): >> >> Original >> SecondarySuperCacheHits: 13.033 ?(99.9%) 0.058 ns/op [Average] >> SecondarySuperCacheInterContention.test avgt 15 432.366 ? 8.364 ns/op >> SecondarySuperCacheInterContention.test:t1 avgt 15 432.310 ? 8.460 ns/op >> SecondarySuperCacheInterContention.test:t2 avgt 15 432.422 ? 10.819 ns/op >> SecondarySuperCacheIntraContention.test avgt 15 355.192 ? 3.597 ns/op >> SecondarySupersLookup.testNegative00 avgt 15 12.274 ? 0.026 ns/op >> SecondarySupersLookup.testNegative01 avgt 15 12.300 ? 0.039 ns/op >> SecondarySupersLookup.testNegative02 avgt 15 12.304 ? 0.034 ns/op >> SecondarySupersLookup.testNegative03 avgt 15 12.276 ? 0.050 ns/op >> SecondarySupersLookup.testNegative04 avgt 15 12.235 ? 0.044 ns/op >> SecondarySupersLookup.testNegative05 avgt 15 12.308 ? 0.156 ns/op >> SecondarySupersLookup.testNegative06 avgt 15 12.291 ? 0.048 ns/op >> SecondarySupersLookup.testNegative07 avgt 15 12.307 ? 0.052 ns/op >> SecondarySupersLookup.testNegative08 avgt 15 12.398 ? 0.075 ns/op >> SecondarySupersLookup.testNegative09 avgt 15 12.552 ? 0.122 ns/op >> SecondarySupersLookup.testNegative10 avgt 15 12.490 ? 0.083 ns/op >> SecondarySupersLookup.testNegative16 avgt 15 12.565 ? 0.092 ns/op >> SecondarySupersLookup.testNegative20 avgt 15 19.059 ? 0.958 ns/op >> SecondarySupersLookup.testNegative30 avgt 15 19.268 ? 0.124 ns/op >> SecondarySupersLookup.testNegative32 avgt 15 20.059 ? 0.114 ns/op >> SecondarySupersLookup.testNegative40 avgt 15 25.117 ? 0.368 ns/op >> SecondarySupersLookup.testNegative50 avgt 15 32.735 ? 0.359 ns/op >> SecondarySupersLookup.testNegative55 avgt 15 34.866 ? 0.152 ns/op >> SecondarySupersLookup.testNegative56 avgt 15 35.492 ? 0.276 ns/op >> SecondarySupersLookup.testNegative57 avgt 15 36.620 ? 0.334 ns/op >> SecondarySupersLookup.testNegative58 avgt 15 37.226 ? 0.180 ns/op >> SecondarySupersLookup.testNegative59 avgt 15 37.774 ? 0.241 ns/op >> SecondarySupersLookup.testNegative60 avgt 15 38.627 ? 1.451 ns/op >> Sec... > > Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: > > Minor improvements according to review suggestions. Thanks for the review! I've tried some very simple statistics to check that the first lookup directly hits as expected: java -Xcomp -XX:-TieredCompilation -version secondary supers direct hit ratio: 537 / 539 (the other 2 ones were hit with the first check in the first slow path loop) ------------- PR Comment: https://git.openjdk.org/jdk/pull/19368#issuecomment-2168070881 From aph at openjdk.org Fri Jun 14 13:39:18 2024 From: aph at openjdk.org (Andrew Haley) Date: Fri, 14 Jun 2024 13:39:18 GMT Subject: RFR: 8332587: RISC-V: secondary_super_cache does not scale well [v3] In-Reply-To: References: <4BgBBn_Oqhipw03h7BA7ZV4ZbhnMhdDqXU94Z-IFshs=.bd949215-5dd8-43a6-ac4e-348feee4853b@github.com> Message-ID: On Fri, 7 Jun 2024 09:45:29 GMT, Gui Cao wrote: > There are a bit regression in cases of testNegative63/64, although these might be rare cases or not very common cases, but it's worth to have a try to improve it if possible. I guess it's related to the implementation for the cases when bitmap is full. When it's full, before go to `repne_scan`, there're some instructions to execute. I wonder if it will help to have another "bitmap full test" just after "bitmap false test" (which is `test_bit(t0, r_bitmap, bit);`). But I'm not sure if it's feasible, maybe worth a try. So many superinterfaces is very rate. So rare, in fact, that it may never have happened in production Java code. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19320#issuecomment-2168073615 From mbaesken at openjdk.org Fri Jun 14 13:44:42 2024 From: mbaesken at openjdk.org (Matthias Baesken) Date: Fri, 14 Jun 2024 13:44:42 GMT Subject: RFR: 8334239: Introduce macro for ubsan method/function exclusions Message-ID: A number of functions/methods have to be excluded from ubsan detection (e.g. because they do things that ubsan warns about, however it is still valid what is done there). We can simplify this by introducing a macro (similar to asan-related ATTRIBUTE_NO_ASAN, see sanitizers/address.hpp). Currently something like this is used : #if defined(__clang__) || defined(__GNUC__) __attribute__((no_sanitize("undefined"))) #endif ------------- Commit messages: - JDK-8334239 Changes: https://git.openjdk.org/jdk/pull/19722/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19722&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8334239 Stats: 58 lines in 4 files changed: 46 ins; 9 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/19722.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19722/head:pull/19722 PR: https://git.openjdk.org/jdk/pull/19722 From aph at openjdk.org Fri Jun 14 13:51:17 2024 From: aph at openjdk.org (Andrew Haley) Date: Fri, 14 Jun 2024 13:51:17 GMT Subject: RFR: 8332587: RISC-V: secondary_super_cache does not scale well [v6] In-Reply-To: References: Message-ID: On Fri, 14 Jun 2024 03:40:36 GMT, Gui Cao wrote: >> Implementation of subtype checking [JDK-8180450](https://bugs.openjdk.org/browse/JDK-8180450) for linux-riscv64. >> This optimization depends on availability of the Zbb extension which has the cpop instruction. >> >> ### Correctness testing: >> >> - [x] Run tier1-3, hotspot:tier4 tests on SOPHON SG2042 (release) >> - [x] Run tier1-3 tests with -XX:+UseRVV on qemu 8.1.0 (fastdebug) >> - [x] Run all of tier1 with `-XX:+VerifySecondarySupers` >> >> ### JMH tested on Banana Pi BPI-F3 board (has Zbb) and Enable UseZbb, Not Enable UseZba, UseZbs >> Original: >> >> Benchmark Mode Cnt Score Error Units >> SecondarySuperCacheHits.test avgt 15 11.375 ? 0.071 ns/op >> SecondarySuperCacheInterContention.test avgt 15 646.087 ? 32.587 ns/op >> SecondarySuperCacheInterContention.test:t1 avgt 15 600.090 ? 83.779 ns/op >> SecondarySuperCacheInterContention.test:t2 avgt 15 692.084 ? 73.218 ns/op >> SecondarySupersLookup.testNegative00 avgt 15 16.420 ? 0.239 ns/op >> SecondarySupersLookup.testNegative01 avgt 15 18.307 ? 0.260 ns/op >> SecondarySupersLookup.testNegative02 avgt 15 21.695 ? 0.458 ns/op >> SecondarySupersLookup.testNegative03 avgt 15 24.855 ? 0.664 ns/op >> SecondarySupersLookup.testNegative04 avgt 15 27.305 ? 0.522 ns/op >> SecondarySupersLookup.testNegative05 avgt 15 29.719 ? 0.385 ns/op >> SecondarySupersLookup.testNegative06 avgt 15 32.231 ? 0.498 ns/op >> SecondarySupersLookup.testNegative07 avgt 15 33.747 ? 0.603 ns/op >> SecondarySupersLookup.testNegative08 avgt 15 35.856 ? 0.629 ns/op >> SecondarySupersLookup.testNegative09 avgt 15 37.077 ? 0.546 ns/op >> SecondarySupersLookup.testNegative10 avgt 15 39.408 ? 0.465 ns/op >> SecondarySupersLookup.testNegative16 avgt 15 51.041 ? 0.547 ns/op >> SecondarySupersLookup.testNegative20 avgt 15 58.722 ? 0.922 ns/op >> SecondarySupersLookup.testNegative30 avgt 15 77.310 ? 0.654 ns/op >> SecondarySupersLookup.testNegative32 avgt 15 81.116 ? 0.854 ns/op >> SecondarySupersLookup.testNegative40 avgt 15 96.311 ? 0.840 ns/op >> SecondarySupersLookup.testNegative50 avgt 15 115.427 ? 0.838 ns/op >> SecondarySupersLookup.testNegative55 avgt 15 124.371 ? 1.076 ns/op >> SecondarySupersLookup.testNegative56 avgt 15 126.796 ? 0.916 ns/op >> SecondarySupersLookup.testNegative57 avgt 15 127.952 ? 1.202 ns/op >> SecondarySupersLookup.testNegative58 avgt 15 131.956 ? 4.515 ns/op >> Seco... > > Gui Cao has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 12 additional commits since the last revision: > > - Merge remote-tracking branch 'upstream/master' into JDK-8332587 > - Update ins_cost for PartialSubtypeCheck > - Code Format > - Merge remote-tracking branch 'upstream/master' into JDK-8332587 > - Polish Code Comment > - Merge remote-tracking branch 'upstream/master' into JDK-8332587 > - Fix Code format > - Fix for Hamlin comment > - Merge remote-tracking branch 'upstream/master' into JDK-8332587 > - Fix client VM build > - ... and 2 more: https://git.openjdk.org/jdk/compare/986c0f2a...142d7677 It's worth running the "before" test with `-XX:-UseSecondarySupersCache`. This gives you a much better idea of the cost when you don't get any hit on the one-element secondary supers cache. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19320#issuecomment-2168094217 From gziemski at openjdk.org Fri Jun 14 14:48:16 2024 From: gziemski at openjdk.org (Gerard Ziemski) Date: Fri, 14 Jun 2024 14:48:16 GMT Subject: RFR: 8333994: NMT: call stacks should show source information In-Reply-To: References: Message-ID: On Tue, 11 Jun 2024 12:38:09 GMT, Thomas Stuefe wrote: > Patch adds printing of source file and line number to NMT call stacks that are printed in detail mode. Useful for hunting down leaks. I started looking at it yesterday and should be done today. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19655#issuecomment-2168199309 From mli at openjdk.org Fri Jun 14 14:55:22 2024 From: mli at openjdk.org (Hamlin Li) Date: Fri, 14 Jun 2024 14:55:22 GMT Subject: RFR: 8334078: RISC-V: TestIntVect.java fails after JDK-8332153 when running without RVV In-Reply-To: References: Message-ID: On Thu, 13 Jun 2024 03:24:15 GMT, Gui Cao wrote: > Hi, test/hotspot/jtreg/compiler/c2/cr7200264/TestIntVect.java fails without RVV after [JDK-8332153](https://bugs.openjdk.org/browse/JDK-8332153) in fastdebug mode. see jbs issue for exception information. > > As discussed on jbs, we prefixed the single letter cpu features with rv so that there would be no problem. And to synchronize the test cases. > > After this patch, we can get cpu feature string like this: > > ----------System.out:(4/168)---------- > WB.getCPUFeatures(): "rv64 rvi rvm rva rvf rvd rvc rvv" > CPUInfo.getAdditionalCPUInfo(): "" > CPUInfo.getFeatures(): [rv64, rvi, rvm, rva, rvf, rvd, rvc, rvv] > TEST PASSED > > > ### Testing > - [x] All Tests related to all changes in this patch on Banana Pi BPI-F3 board (with RVV1.0) (fastdebug) > - [x] All Tests related to all changes in this patch on SOPHON SG2042 (fastdebug) > - [ ] Run tier1-3 tests on SOPHON SG2042 (fastdebug) Thanks for fixing the issue. Some comments. src/hotspot/os_cpu/linux_riscv/vm_version_linux_riscv.cpp line 137: > 135: // like rvc, rvv, etc so that it will be easier to specify > 136: // target feature string in tests. > 137: strcat(buf, " rv"); Could there be any naming conflicting in the future? ie. there will be extension named rvd, etc. I'm not sure if the current riscv extension naming convention will avoid this situation, if answer is postive, then it looks good. ------------- PR Review: https://git.openjdk.org/jdk/pull/19686#pullrequestreview-2118567094 PR Review Comment: https://git.openjdk.org/jdk/pull/19686#discussion_r1639938403 From mli at openjdk.org Fri Jun 14 14:55:23 2024 From: mli at openjdk.org (Hamlin Li) Date: Fri, 14 Jun 2024 14:55:23 GMT Subject: RFR: 8334078: RISC-V: TestIntVect.java fails after JDK-8332153 when running without RVV In-Reply-To: References: Message-ID: On Fri, 14 Jun 2024 14:42:52 GMT, Hamlin Li wrote: >> Hi, test/hotspot/jtreg/compiler/c2/cr7200264/TestIntVect.java fails without RVV after [JDK-8332153](https://bugs.openjdk.org/browse/JDK-8332153) in fastdebug mode. see jbs issue for exception information. >> >> As discussed on jbs, we prefixed the single letter cpu features with rv so that there would be no problem. And to synchronize the test cases. >> >> After this patch, we can get cpu feature string like this: >> >> ----------System.out:(4/168)---------- >> WB.getCPUFeatures(): "rv64 rvi rvm rva rvf rvd rvc rvv" >> CPUInfo.getAdditionalCPUInfo(): "" >> CPUInfo.getFeatures(): [rv64, rvi, rvm, rva, rvf, rvd, rvc, rvv] >> TEST PASSED >> >> >> ### Testing >> - [x] All Tests related to all changes in this patch on Banana Pi BPI-F3 board (with RVV1.0) (fastdebug) >> - [x] All Tests related to all changes in this patch on SOPHON SG2042 (fastdebug) >> - [ ] Run tier1-3 tests on SOPHON SG2042 (fastdebug) > > src/hotspot/os_cpu/linux_riscv/vm_version_linux_riscv.cpp line 137: > >> 135: // like rvc, rvv, etc so that it will be easier to specify >> 136: // target feature string in tests. >> 137: strcat(buf, " rv"); > > Could there be any naming conflicting in the future? ie. there will be extension named rvd, etc. > I'm not sure if the current riscv extension naming convention will avoid this situation, if answer is postive, then it looks good. Another fix (avoid any potential naming conflict in the future) could be use `CPUInfo.getFeatures()` instead of WHITE_BOX.getCPUFeatures() at: https://github.com/openjdk/jdk/blob/master/test/hotspot/jtreg/compiler/lib/ir_framework/test/IREncodingPrinter.java#L419 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19686#discussion_r1639950284 From mli at openjdk.org Fri Jun 14 15:13:19 2024 From: mli at openjdk.org (Hamlin Li) Date: Fri, 14 Jun 2024 15:13:19 GMT Subject: RFR: 8332689: RISC-V: Use load instead of trampolines [v12] In-Reply-To: References: Message-ID: <6VSNVWFHUzGmj5azdsNDgoacEozSBC8FjcC_CWc87Ag=.a798f861-62ef-4668-b3fe-ffce52d0b05b@github.com> On Fri, 14 Jun 2024 13:25:32 GMT, Robbin Ehn wrote: >> Hi all, please consider! >> >> Today we do JAL to **dest** if **dest** is in reach (+/- 1 MB). >> Using a very small application or running very short time we have fast patchable calls. >> But any normal application running longer will increase the code size and code chrun/fragmentation. >> So whatever or not you get hot fast calls rely on luck. >> >> To be patchable and get code cache reach we also emit a stub trampoline which we can point the JAL to. >> This would be the common case for a patchable call. >> >> Code stream: >> JAL >> Stubs: >> AUIPC >> LD >> JALR >> >> >> >> On some CPUs L1D and L1I can't contain the same cache line, which means the tramopline stub can bounce from L1I->L1D->L1I, which is expensive. >> Even if you don't have that problem having a call to a jump is not the fastest way. >> Loading the address avoids the pitsfalls of cmodx. >> >> This patch suggest to solve the problems with trampolines, we take small penalty in the naive case of JAL to **dest**, >> and instead do by default: >> >> Code stream: >> AUIPC >> LD >> JALR >> Stubs: >> >> >> An experimental option for turning trampolines back on exists. >> >> It should be possible to enhanced this with the WIP [Zjid](https://github.com/riscv/riscv-j-extension) by changing the JALR to JAL and nop out the auipc+ld (as the current proposal of Zjid forces the I-fetcher to fetch instruction in order (meaning we will avoid a lot issues which arm has)) when in reach and vice-versa. >> >> Numbers from VF2 (I have done them a few times, they are always overall in favor of this patch): >> >> fop (msec) 2239 | 2128 = 0.950424 >> h2 (msec) 18660 | 16594 = 0.889282 >> jython (msec) 22022 | 21925 = 0.995595 >> luindex (msec) 2866 | 2842 = 0.991626 >> lusearch (msec) 4108 | 4311 = 1.04942 >> lusearch-fix (msec) 4406 | 4116 = 0.934181 >> pmd (msec) 5976 | 5897 = 0.98678 >> jython (msec) 22022 | 21925 = 0.995595 >> Avg: 0.974112 >> fop(xcomp) (msec) 2721 | 2714 = 0.997427 >> h2(xcomp) ... > > Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 18 commits: > > - Review comments, removed dead code. > - Merge branch 'master' into 8332689 > - Merge branch 'master' into 8332689 > - Merge branch 'master' into 8332689 > - Merge branch 'master' into 8332689 > - Merge branch 'master' into 8332689 > - Remove tmp file > - Prepare for dynamic NativeCall size > - Only allow one calling convetion, i.e. fixed sized > - Merge branch 'master' into 8332689 > - ... and 8 more: https://git.openjdk.org/jdk/compare/cc64aeac...f1dd3e16 Some comments and questions. (Sorry, I might have misread some code) ------------- PR Review: https://git.openjdk.org/jdk/pull/19453#pullrequestreview-2112378411 From mli at openjdk.org Fri Jun 14 15:13:23 2024 From: mli at openjdk.org (Hamlin Li) Date: Fri, 14 Jun 2024 15:13:23 GMT Subject: RFR: 8332689: RISC-V: Use load instead of trampolines [v10] In-Reply-To: References: Message-ID: On Tue, 11 Jun 2024 11:17:23 GMT, Robbin Ehn wrote: >> Hi all, please consider! >> >> Today we do JAL to **dest** if **dest** is in reach (+/- 1 MB). >> Using a very small application or running very short time we have fast patchable calls. >> But any normal application running longer will increase the code size and code chrun/fragmentation. >> So whatever or not you get hot fast calls rely on luck. >> >> To be patchable and get code cache reach we also emit a stub trampoline which we can point the JAL to. >> This would be the common case for a patchable call. >> >> Code stream: >> JAL >> Stubs: >> AUIPC >> LD >> JALR >> >> >> >> On some CPUs L1D and L1I can't contain the same cache line, which means the tramopline stub can bounce from L1I->L1D->L1I, which is expensive. >> Even if you don't have that problem having a call to a jump is not the fastest way. >> Loading the address avoids the pitsfalls of cmodx. >> >> This patch suggest to solve the problems with trampolines, we take small penalty in the naive case of JAL to **dest**, >> and instead do by default: >> >> Code stream: >> AUIPC >> LD >> JALR >> Stubs: >> >> >> An experimental option for turning trampolines back on exists. >> >> It should be possible to enhanced this with the WIP [Zjid](https://github.com/riscv/riscv-j-extension) by changing the JALR to JAL and nop out the auipc+ld (as the current proposal of Zjid forces the I-fetcher to fetch instruction in order (meaning we will avoid a lot issues which arm has)) when in reach and vice-versa. >> >> Numbers from VF2 (I have done them a few times, they are always overall in favor of this patch): >> >> fop (msec) 2239 | 2128 = 0.950424 >> h2 (msec) 18660 | 16594 = 0.889282 >> jython (msec) 22022 | 21925 = 0.995595 >> luindex (msec) 2866 | 2842 = 0.991626 >> lusearch (msec) 4108 | 4311 = 1.04942 >> lusearch-fix (msec) 4406 | 4116 = 0.934181 >> pmd (msec) 5976 | 5897 = 0.98678 >> jython (msec) 22022 | 21925 = 0.995595 >> Avg: 0.974112 >> fop(xcomp) (msec) 2721 | 2714 = 0.997427 >> h2(xcomp) ... > > Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 15 commits: > > - Merge branch 'master' into 8332689 > - Merge branch 'master' into 8332689 > - Merge branch 'master' into 8332689 > - Remove tmp file > - Prepare for dynamic NativeCall size > - Only allow one calling convetion, i.e. fixed sized > - Merge branch 'master' into 8332689 > - Review comments > - Move shart/far code to cpp > - Cleanup > - ... and 5 more: https://git.openjdk.org/jdk/compare/93f3918e...eb30360a src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 982: > 980: > 981: void MacroAssembler::load_link(const address source, Register temp) { > 982: assert(temp != noreg && temp != x0, "expecting a register"); with `temp == x5`, this assert is redundant. A question, why require `temp == x5`? src/hotspot/cpu/riscv/nativeInst_riscv.cpp line 60: > 58: }; > 59: > 60: address destination(nmethod *nm = nullptr) const; unused argument src/hotspot/cpu/riscv/nativeInst_riscv.cpp line 61: > 59: > 60: address destination(nmethod *nm = nullptr) const; > 61: void set_destination(address new_destination); unused method src/hotspot/cpu/riscv/nativeInst_riscv.cpp line 114: > 112: // Creation > 113: friend NativeCall* nativeCall_at(address addr); > 114: friend NativeCall* nativeCall_before(address return_address); Is these friend declarations necessary? src/hotspot/cpu/riscv/nativeInst_riscv.cpp line 321: > 319: } > 320: > 321: void NativeShortCall::replace_mt_safe(address instr_addr, address code_buffer) { seems no usage and necessity of these 2 methods `replace_mt_safe` and `insert `? src/hotspot/cpu/riscv/nativeInst_riscv.cpp line 331: > 329: // Creation > 330: friend NativeCall* nativeCall_at(address addr); > 331: friend NativeCall* nativeCall_before(address return_address); Is these friend declarations necessary? src/hotspot/cpu/riscv/nativeInst_riscv.cpp line 510: > 508: } > 509: > 510: void NativeFarCall::replace_mt_safe(address instr_addr, address code_buffer) { seems no usage and necessity of these 2 methods `replace_mt_safe` and `insert` ? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19453#discussion_r1636082767 PR Review Comment: https://git.openjdk.org/jdk/pull/19453#discussion_r1636076065 PR Review Comment: https://git.openjdk.org/jdk/pull/19453#discussion_r1636076377 PR Review Comment: https://git.openjdk.org/jdk/pull/19453#discussion_r1636145069 PR Review Comment: https://git.openjdk.org/jdk/pull/19453#discussion_r1636720621 PR Review Comment: https://git.openjdk.org/jdk/pull/19453#discussion_r1636145289 PR Review Comment: https://git.openjdk.org/jdk/pull/19453#discussion_r1636722377 From mli at openjdk.org Fri Jun 14 15:13:30 2024 From: mli at openjdk.org (Hamlin Li) Date: Fri, 14 Jun 2024 15:13:30 GMT Subject: RFR: 8332689: RISC-V: Use load instead of trampolines [v11] In-Reply-To: References: Message-ID: <2KzGTOLy36zCDT45_0k5TihdONa2d8KetnnizixXwZw=.4c15bb00-37da-4ce4-b498-4e102771e3c6@github.com> On Thu, 13 Jun 2024 17:26:35 GMT, Robbin Ehn wrote: >> Hi all, please consider! >> >> Today we do JAL to **dest** if **dest** is in reach (+/- 1 MB). >> Using a very small application or running very short time we have fast patchable calls. >> But any normal application running longer will increase the code size and code chrun/fragmentation. >> So whatever or not you get hot fast calls rely on luck. >> >> To be patchable and get code cache reach we also emit a stub trampoline which we can point the JAL to. >> This would be the common case for a patchable call. >> >> Code stream: >> JAL >> Stubs: >> AUIPC >> LD >> JALR >> >> >> >> On some CPUs L1D and L1I can't contain the same cache line, which means the tramopline stub can bounce from L1I->L1D->L1I, which is expensive. >> Even if you don't have that problem having a call to a jump is not the fastest way. >> Loading the address avoids the pitsfalls of cmodx. >> >> This patch suggest to solve the problems with trampolines, we take small penalty in the naive case of JAL to **dest**, >> and instead do by default: >> >> Code stream: >> AUIPC >> LD >> JALR >> Stubs: >> >> >> An experimental option for turning trampolines back on exists. >> >> It should be possible to enhanced this with the WIP [Zjid](https://github.com/riscv/riscv-j-extension) by changing the JALR to JAL and nop out the auipc+ld (as the current proposal of Zjid forces the I-fetcher to fetch instruction in order (meaning we will avoid a lot issues which arm has)) when in reach and vice-versa. >> >> Numbers from VF2 (I have done them a few times, they are always overall in favor of this patch): >> >> fop (msec) 2239 | 2128 = 0.950424 >> h2 (msec) 18660 | 16594 = 0.889282 >> jython (msec) 22022 | 21925 = 0.995595 >> luindex (msec) 2866 | 2842 = 0.991626 >> lusearch (msec) 4108 | 4311 = 1.04942 >> lusearch-fix (msec) 4406 | 4116 = 0.934181 >> pmd (msec) 5976 | 5897 = 0.98678 >> jython (msec) 22022 | 21925 = 0.995595 >> Avg: 0.974112 >> fop(xcomp) (msec) 2721 | 2714 = 0.997427 >> h2(xcomp) ... > > Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 16 commits: > > - Merge branch 'master' into 8332689 > - Merge branch 'master' into 8332689 > - Merge branch 'master' into 8332689 > - Merge branch 'master' into 8332689 > - Remove tmp file > - Prepare for dynamic NativeCall size > - Only allow one calling convetion, i.e. fixed sized > - Merge branch 'master' into 8332689 > - Review comments > - Move shart/far code to cpp > - ... and 6 more: https://git.openjdk.org/jdk/compare/5d2a19de...bb7249b8 src/hotspot/cpu/riscv/nativeInst_riscv.cpp line 109: > 107: > 108: //----------------------------------------------------------------------------- > 109: // NativeShortCall Both Far and Short call here are named `patchable far calls` in the comment in macroAssembler_riscv.hpp. So, it will be helpful to unify the naming. src/hotspot/cpu/riscv/nativeInst_riscv.cpp line 169: > 167: address addr = addr_at(0); > 168: if (NativeShortCall::is_at(addr)) { > 169: NativeShortCall* call = NativeShortCall::at(addr); Are these lines necessary? As this is an instance method (rather than static), so `NativeShortCall::is_at(addr)` must already be true? src/hotspot/cpu/riscv/nativeInst_riscv.cpp line 198: > 196: Assembler::patch(pInsn, 30, 21, (offset >> 1) & 0x3ff); > 197: Assembler::patch(pInsn, 20, 20, (offset >> 11) & 0x1); > 198: Assembler::patch(pInsn, 19, 12, (offset >> 12) & 0xff); should we reuse `MacroAssembler::pd_patch_instruction_size`? src/hotspot/cpu/riscv/nativeInst_riscv.cpp line 248: > 246: } > 247: > 248: bool NativeShortCall::reloc_set_destination(address dest) { `reloc_set_destination` and `set_destination_mt_safe` are almost same, maybe `set_destination_mt_safe` could call `reloc_set_destination`? src/hotspot/cpu/riscv/nativeInst_riscv.cpp line 257: > 255: assert(!NativeShortCallTrampolineStub::is_at(dest), "chained trampolines"); > 256: NativeShortCallTrampolineStub::at(trampoline_stub_addr)->set_destination(dest); > 257: } Maybe move these lines into `else` block below? as `Assembler::reachable_from_branch_at(call_addr, dest)` condition check does not depends on these `trampoline_stub_addr` related check & set. src/hotspot/cpu/riscv/nativeInst_riscv.cpp line 382: > 380: } > 381: > 382: address NativeFarCall::reloc_destination(address orig_address) { argument `orig_address` is not used src/hotspot/cpu/riscv/nativeInst_riscv.cpp line 387: > 385: CodeBlob *code = CodeCache::find_blob(call_addr); > 386: assert(code != nullptr, "Could not find the containing code blob"); > 387: address stub_addr = trampoline_stub_Relocation::get_trampoline_for(call_addr, (nmethod*)code); should there be an assert like `assert(code->is_nmethod())`? src/hotspot/cpu/riscv/nativeInst_riscv.cpp line 392: > 390: stub_addr = MacroAssembler::target_addr_for_insn(call_addr); > 391: } > 392: return stub_addr; Naming here is confusing, as the returned value is not stub addr, but target addr of a jump. Suggestion: if (stub_addr != nullptr) { return MacroAssembler::target_addr_for_insn(call_addr); } return nullptr; src/hotspot/cpu/riscv/nativeInst_riscv.cpp line 410: > 408: } > 409: > 410: bool NativeFarCall::set_destination_mt_safe(address dest, bool assert_lock) { Seems no caller will pass `assert_lock == false` src/hotspot/cpu/riscv/nativeInst_riscv.cpp line 410: > 408: } > 409: > 410: bool NativeFarCall::set_destination_mt_safe(address dest, bool assert_lock) { For NativeShortCall, reloc_set_destination and set_destination_mt_safe are almost same, but for NativeFarCall they're different, is this expected? src/hotspot/cpu/riscv/nativeInst_riscv.cpp line 423: > 421: > 422: if (stub_addr != nullptr) { > 423: set_stub_address_destination_at(stub_addr, dest); Is `ICache::invalidate_range` needed here? src/hotspot/cpu/riscv/nativeInst_riscv.cpp line 430: > 428: } > 429: > 430: bool NativeFarCall::reloc_set_destination(address dest) { argument `dest` is not used. src/hotspot/cpu/riscv/nativeInst_riscv.cpp line 436: > 434: CodeBlob *code = CodeCache::find_blob(call_addr); > 435: assert(code != nullptr, "Could not find the containing code blob"); > 436: address stub_addr = trampoline_stub_Relocation::get_trampoline_for(call_addr, (nmethod*)code); should there be an assert like `assert(code->is_nmethod())`? src/hotspot/cpu/riscv/nativeInst_riscv.cpp line 438: > 436: address stub_addr = trampoline_stub_Relocation::get_trampoline_for(call_addr, (nmethod*)code); > 437: > 438: if (stub_addr != nullptr) { Could `stub_addr == nullptr`? If positive, then it should return false when it's nullptr, if negative, then should the `if` be converted to an `assert`? src/hotspot/cpu/riscv/nativeInst_riscv.cpp line 439: > 437: > 438: if (stub_addr != nullptr) { > 439: MacroAssembler::pd_patch_instruction_size(call_addr, stub_addr); I could be wrong. `stub_addr` should be `dest`? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19453#discussion_r1638848549 PR Review Comment: https://git.openjdk.org/jdk/pull/19453#discussion_r1639489946 PR Review Comment: https://git.openjdk.org/jdk/pull/19453#discussion_r1639598854 PR Review Comment: https://git.openjdk.org/jdk/pull/19453#discussion_r1639653121 PR Review Comment: https://git.openjdk.org/jdk/pull/19453#discussion_r1639544121 PR Review Comment: https://git.openjdk.org/jdk/pull/19453#discussion_r1638764917 PR Review Comment: https://git.openjdk.org/jdk/pull/19453#discussion_r1638834002 PR Review Comment: https://git.openjdk.org/jdk/pull/19453#discussion_r1638840966 PR Review Comment: https://git.openjdk.org/jdk/pull/19453#discussion_r1639630967 PR Review Comment: https://git.openjdk.org/jdk/pull/19453#discussion_r1639654496 PR Review Comment: https://git.openjdk.org/jdk/pull/19453#discussion_r1639648835 PR Review Comment: https://git.openjdk.org/jdk/pull/19453#discussion_r1638767104 PR Review Comment: https://git.openjdk.org/jdk/pull/19453#discussion_r1638834176 PR Review Comment: https://git.openjdk.org/jdk/pull/19453#discussion_r1638873919 PR Review Comment: https://git.openjdk.org/jdk/pull/19453#discussion_r1639597667 From lmesnik at openjdk.org Fri Jun 14 15:35:25 2024 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Fri, 14 Jun 2024 15:35:25 GMT Subject: Integrated: 8330702: Update failure handler to don't generate Error message if cores actions are empty In-Reply-To: References: Message-ID: <1GboUj_AdcJeyF1gi8Km6mPYs5ph3oiN7DjN8_wc5n8=.18ff2151-3eb3-4ad9-b268-c43d7136a241@github.com> On Thu, 30 May 2024 02:28:56 GMT, Leonid Mesnik wrote: > The message is generated if cores (or any other tools) section doesn't exist or is empty. However, there is no any tool for cores processing now defined. So ERROR message is generating, confusing users. > The fix is to don't print error for empty toolset which is the valid case. The message is still generate is tool is not defined to get error message in the case of miswriting. This pull request has now been integrated. Changeset: 548e95a6 Author: Leonid Mesnik URL: https://git.openjdk.org/jdk/commit/548e95a689d63e97ddbdfe7dd7df3a2e3377046c Stats: 9 lines in 2 files changed: 5 ins; 0 del; 4 mod 8330702: Update failure handler to don't generate Error message if cores actions are empty Reviewed-by: sspitsyn ------------- PR: https://git.openjdk.org/jdk/pull/19470 From amitkumar at openjdk.org Fri Jun 14 15:38:20 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Fri, 14 Jun 2024 15:38:20 GMT Subject: RFR: 8331117: [PPC64] secondary_super_cache does not scale well [v6] In-Reply-To: References: Message-ID: On Fri, 14 Jun 2024 12:01:43 GMT, Martin Doerr wrote: >> PPC64 implementation of [JDK-8180450](https://bugs.openjdk.org/browse/JDK-8180450). Please review! >> I noticed that `r_array_length` is sometimes 0 and I don't see code for that on x86. Any idea? (This has been addressed in the discussion.) >> How can we verify it? By comparing the performance using the micro benchmarks? >> >> Micro benchmark results without patch (measured on Power10 with 2*8 hardware threads): >> >> Original >> SecondarySuperCacheHits: 13.033 ?(99.9%) 0.058 ns/op [Average] >> SecondarySuperCacheInterContention.test avgt 15 432.366 ? 8.364 ns/op >> SecondarySuperCacheInterContention.test:t1 avgt 15 432.310 ? 8.460 ns/op >> SecondarySuperCacheInterContention.test:t2 avgt 15 432.422 ? 10.819 ns/op >> SecondarySuperCacheIntraContention.test avgt 15 355.192 ? 3.597 ns/op >> SecondarySupersLookup.testNegative00 avgt 15 12.274 ? 0.026 ns/op >> SecondarySupersLookup.testNegative01 avgt 15 12.300 ? 0.039 ns/op >> SecondarySupersLookup.testNegative02 avgt 15 12.304 ? 0.034 ns/op >> SecondarySupersLookup.testNegative03 avgt 15 12.276 ? 0.050 ns/op >> SecondarySupersLookup.testNegative04 avgt 15 12.235 ? 0.044 ns/op >> SecondarySupersLookup.testNegative05 avgt 15 12.308 ? 0.156 ns/op >> SecondarySupersLookup.testNegative06 avgt 15 12.291 ? 0.048 ns/op >> SecondarySupersLookup.testNegative07 avgt 15 12.307 ? 0.052 ns/op >> SecondarySupersLookup.testNegative08 avgt 15 12.398 ? 0.075 ns/op >> SecondarySupersLookup.testNegative09 avgt 15 12.552 ? 0.122 ns/op >> SecondarySupersLookup.testNegative10 avgt 15 12.490 ? 0.083 ns/op >> SecondarySupersLookup.testNegative16 avgt 15 12.565 ? 0.092 ns/op >> SecondarySupersLookup.testNegative20 avgt 15 19.059 ? 0.958 ns/op >> SecondarySupersLookup.testNegative30 avgt 15 19.268 ? 0.124 ns/op >> SecondarySupersLookup.testNegative32 avgt 15 20.059 ? 0.114 ns/op >> SecondarySupersLookup.testNegative40 avgt 15 25.117 ? 0.368 ns/op >> SecondarySupersLookup.testNegative50 avgt 15 32.735 ? 0.359 ns/op >> SecondarySupersLookup.testNegative55 avgt 15 34.866 ? 0.152 ns/op >> SecondarySupersLookup.testNegative56 avgt 15 35.492 ? 0.276 ns/op >> SecondarySupersLookup.testNegative57 avgt 15 36.620 ? 0.334 ns/op >> SecondarySupersLookup.testNegative58 avgt 15 37.226 ? 0.180 ns/op >> SecondarySupersLookup.testNegative59 avgt 15 37.774 ? 0.241 ns/op >> SecondarySupersLookup.testNegative60 avgt 15 38.627 ? 1.451 ns/op >> Sec... > > Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: > > Minor improvements according to review suggestions. src/hotspot/cpu/ppc/macroAssembler_ppc.cpp line 2160: > 2158: r_array_length == R5_ARG3 && \ > 2159: (r_array_index == R6_ARG4 || r_array_index == noreg) && \ > 2160: (r_sub_klass == R7_ARG5 || r_sub_klass == noreg) && \ Maybe we can set `r_super_klass = R5` and `r_sub_klass =R7` to keep consistency in `c1_Runtime1_ppc.cpp`: case slow_subtype_check_id: { // Support for uint StubRoutine::partial_subtype_check( Klass sub, Klass super ); const Register sub_klass = R5, super_klass = R4, temp1_reg = R6, temp2_reg = R0; __ check_klass_subtype_slow_path(sub_klass, super_klass, temp1_reg, temp2_reg); // returns with CR0.eq if successful __ crandc(CCR0, Assembler::equal, CCR0, Assembler::equal); // failed: CR0.ne __ blr(); } break; I can see this being done for `aarch64`, `x86` and `risc-v` as well. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19368#discussion_r1640004253 From gcao at openjdk.org Fri Jun 14 15:57:13 2024 From: gcao at openjdk.org (Gui Cao) Date: Fri, 14 Jun 2024 15:57:13 GMT Subject: RFR: 8334078: RISC-V: TestIntVect.java fails after JDK-8332153 when running without RVV In-Reply-To: <5kfe-OxwcEK0A20RTgeuue-RB-mkk16xz_SfJvIW-iE=.41b3cdad-0ab8-4b1e-8e16-552028d0bbd2@github.com> References: <5kfe-OxwcEK0A20RTgeuue-RB-mkk16xz_SfJvIW-iE=.41b3cdad-0ab8-4b1e-8e16-552028d0bbd2@github.com> Message-ID: On Fri, 14 Jun 2024 15:54:07 GMT, Gui Cao wrote: >> Another fix (avoid any potential naming conflict in the future) could be use `CPUInfo.getFeatures()` instead of WHITE_BOX.getCPUFeatures() at: >> https://github.com/openjdk/jdk/blob/master/test/hotspot/jtreg/compiler/lib/ir_framework/test/IREncodingPrinter.java#L419 >> >> The existing checking with WHITE_BOX.getCPUFeatures() is error-prone, I suppose it would break this or that in the future. > >> Could there be any naming conflicting in the future? ie. there will be extension named rvd, etc. I'm not sure if the current riscv extension naming convention will avoid this situation, if answer is postive, then it looks good. > > Hi, Thanks for the review. > As I see it, new riscv extensions are now officially named with a Z prefix, and it's unlikely that rvd will be used for a future extenstion, as it's too ambiguous. Also, I see software like QEMU[1] also uses names like RVI | RVM | RVA | RVF | RVD | RVC elsewhere, so I guess it's not a big problem for us to use rv as a prefix. > > [1] https://github.com/qemu/qemu/blob/046a64b9801343e2e89eef10c7a48eec8d8c0d4f/target/riscv/cpu.c#L436-L442 > Another fix (avoid any potential naming conflict in the future) could be use `CPUInfo.getFeatures()` instead of WHITE_BOX.getCPUFeatures() at: https://github.com/openjdk/jdk/blob/master/test/hotspot/jtreg/compiler/lib/ir_framework/test/IREncodingPrinter.java#L419 > > The existing checking with WHITE_BOX.getCPUFeatures() is error-prone, I suppose it would break this or that in the future. I prefer to use rvv for matching in the tests as I think the current approach (vm.cpu.features ~= ".*v,.*") is error-prone because it's easy for people to ignore the comma which is needed for single-character riscv extensions. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19686#discussion_r1640025760 From gcao at openjdk.org Fri Jun 14 15:57:13 2024 From: gcao at openjdk.org (Gui Cao) Date: Fri, 14 Jun 2024 15:57:13 GMT Subject: RFR: 8334078: RISC-V: TestIntVect.java fails after JDK-8332153 when running without RVV In-Reply-To: References: Message-ID: <5kfe-OxwcEK0A20RTgeuue-RB-mkk16xz_SfJvIW-iE=.41b3cdad-0ab8-4b1e-8e16-552028d0bbd2@github.com> On Fri, 14 Jun 2024 14:51:34 GMT, Hamlin Li wrote: >> src/hotspot/os_cpu/linux_riscv/vm_version_linux_riscv.cpp line 137: >> >>> 135: // like rvc, rvv, etc so that it will be easier to specify >>> 136: // target feature string in tests. >>> 137: strcat(buf, " rv"); >> >> Could there be any naming conflicting in the future? ie. there will be extension named rvd, etc. >> I'm not sure if the current riscv extension naming convention will avoid this situation, if answer is postive, then it looks good. > > Another fix (avoid any potential naming conflict in the future) could be use `CPUInfo.getFeatures()` instead of WHITE_BOX.getCPUFeatures() at: > https://github.com/openjdk/jdk/blob/master/test/hotspot/jtreg/compiler/lib/ir_framework/test/IREncodingPrinter.java#L419 > > The existing checking with WHITE_BOX.getCPUFeatures() is error-prone, I suppose it would break this or that in the future. > Could there be any naming conflicting in the future? ie. there will be extension named rvd, etc. I'm not sure if the current riscv extension naming convention will avoid this situation, if answer is postive, then it looks good. Hi, Thanks for the review. As I see it, new riscv extensions are now officially named with a Z prefix, and it's unlikely that rvd will be used for a future extenstion, as it's too ambiguous. Also, I see software like QEMU[1] also uses names like RVI | RVM | RVA | RVF | RVD | RVC elsewhere, so I guess it's not a big problem for us to use rv as a prefix. [1] https://github.com/qemu/qemu/blob/046a64b9801343e2e89eef10c7a48eec8d8c0d4f/target/riscv/cpu.c#L436-L442 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19686#discussion_r1640024898 From lmesnik at openjdk.org Fri Jun 14 16:44:58 2024 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Fri, 14 Jun 2024 16:44:58 GMT Subject: RFR: 8332252: Clean up vmTestbase/vm/share Message-ID: The vmTestbase/vm/share is a shared test library for vmTestbase tests. This library contains a lot of code that is used by only by small number of tests or not used at all. There are no plans to actively develop new tests in vmTestsbase and improve this shared library. The final goal of this and the following PRs is to reduce the maintenance cost of vmTestbase by eliminating this library. Also, this PR moves test-specific code into corresponding test directories to increase code locality. This allows later easier move tests from vmTestbase. The few remaining classes include InMemoryJavaCompiler.java that is very similar to same class from the standard testlibrary and could be merge with it and ProcessUtils.java which is used by test/hotspot/jtreg/runtime/Thread/TestBreakSignalThreadDump.java and thus should be moved into the standard testlibrary. The stack and options might be merged in nsk/share test library. ------------- Commit messages: - 8332252: Clean up vmTestbase/vm/share Changes: https://git.openjdk.org/jdk/pull/19727/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19727&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8332252 Stats: 1647 lines in 44 files changed: 17 ins; 1586 del; 44 mod Patch: https://git.openjdk.org/jdk/pull/19727.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19727/head:pull/19727 PR: https://git.openjdk.org/jdk/pull/19727 From mli at openjdk.org Fri Jun 14 17:29:16 2024 From: mli at openjdk.org (Hamlin Li) Date: Fri, 14 Jun 2024 17:29:16 GMT Subject: RFR: 8334078: RISC-V: TestIntVect.java fails after JDK-8332153 when running without RVV In-Reply-To: References: <5kfe-OxwcEK0A20RTgeuue-RB-mkk16xz_SfJvIW-iE=.41b3cdad-0ab8-4b1e-8e16-552028d0bbd2@github.com> Message-ID: On Fri, 14 Jun 2024 15:54:57 GMT, Gui Cao wrote: >>> Could there be any naming conflicting in the future? ie. there will be extension named rvd, etc. I'm not sure if the current riscv extension naming convention will avoid this situation, if answer is postive, then it looks good. >> >> Hi, Thanks for the review. >> As I see it, new riscv extensions are now officially named with a Z prefix, and it's unlikely that rvd will be used for a future extenstion, as it's too ambiguous. Also, I see software like QEMU[1] also uses names like RVI | RVM | RVA | RVF | RVD | RVC elsewhere, so I guess it's not a big problem for us to use rv as a prefix. >> >> [1] https://github.com/qemu/qemu/blob/046a64b9801343e2e89eef10c7a48eec8d8c0d4f/target/riscv/cpu.c#L436-L442 > >> Another fix (avoid any potential naming conflict in the future) could be use `CPUInfo.getFeatures()` instead of WHITE_BOX.getCPUFeatures() at: https://github.com/openjdk/jdk/blob/master/test/hotspot/jtreg/compiler/lib/ir_framework/test/IREncodingPrinter.java#L419 >> >> The existing checking with WHITE_BOX.getCPUFeatures() is error-prone, I suppose it would break this or that in the future. > > I prefer to use rvv for matching in the tests as I think the current approach (vm.cpu.features ~= ".*v,.*") is error-prone because it's easy for people to ignore the comma which is needed for single-character riscv extensions. Yes, rv* is much better, I'm OK with this renaming. At the same time, can you fix `WHITE_BOX.getCPUFeatures()` with `CPUInfo.getFeatures()` in IREncodingPrinter.java? As I think it's the final fix for this kind of issue. As I said, with a `String.contains(xxx)`, it could fail with other cpu features in the future, as it mixes all cpu features in one long string, and there is no guarantee the similar issue will not happen again. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19686#discussion_r1640145033 From kvn at openjdk.org Fri Jun 14 17:29:19 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 14 Jun 2024 17:29:19 GMT Subject: RFR: 8329032: C2 compiler register allocation support for APX EGPRs [v2] In-Reply-To: References: <835IiFAjWPI9AoU9j1WSAhVY2EEHzjpvWucUXTGxgUw=.cdb547b2-e1e7-4c5b-915f-7fde46e26c81@github.com> Message-ID: <2CCaghpko4mT70yCsTnHdlHpCvCpGwJW6c_C_Vqfnoc=.53fcfb1e-28d7-49bd-b8be-1fdb8321bcba@github.com> On Fri, 14 Jun 2024 10:54:01 GMT, Jatin Bhateja wrote: >> Intel? Advanced Performance Extensions (Intel? APX) adds 16 new 64 bit general purpose register also known as Extended General Purpose Registers in IA-32e 64 bit mode. >> >> Summary of changes introduced along with this patch:- >> >> 1. C2 compiler register allocation support. >> 2. Architecture state save restoration while transitioning from C1/C2 JIT compiled code to runtime services. >> 3. Support new PUSH2/POP2 instructions along with push-pop acceleration hints (PPX) to optimize register save/restore operation. >> 4. Applicable extensions to native interface used by runtime for patching instruction. >> >> We plan to address C1 register support in subsequent patch as there are hard upper bound allocation limits >> (currently set to r11) imposed by existing implementation of linear scan algorithm after which it reserves >> remaining register for special purpose. >> >> Patch has been regressed over stand alone test points after prioritizing EGPR allocations over existing GPR register by manually modifying the register sequences in relevant allocation class. >> >> We plan to do thorough validation using [Intel's SDE](https://www.intel.com/content/www/us/en/download/684897/intel-software-development-emulator.html) during course of time and release incremental patches for bug fixes >> found during testing. >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > 32-bit build fixes. Some JVMCI tests failed in GHA on all x64 platforms: # A fatal error has been detected by the Java Runtime Environment: # # SIGSEGV (0xb) at pc=0x0000000109b66d2c, pid=19199, tid=26883 # # JRE version: OpenJDK Runtime Environment (24.0) (fastdebug build 24-internal-jatin-bhateja) # Java VM: OpenJDK 64-Bit Server VM (fastdebug 24-internal-jatin-bhateja, mixed mode, sharing, tiered, jvmci, compressed oops, compressed class ptrs, g1 gc, bsd-amd64) # Problematic frame: # V [libjvm.dylib+0x117ed2c] StackValue* StackValue::create_stack_value(ScopeValue*, unsigned char*, RegisterMap const*)+0x27c ------------- PR Comment: https://git.openjdk.org/jdk/pull/19042#issuecomment-2168465702 From duke at openjdk.org Fri Jun 14 17:59:22 2024 From: duke at openjdk.org (snadampal) Date: Fri, 14 Jun 2024 17:59:22 GMT Subject: RFR: 8322770: Implement C2 VectorizedHashCode on AArch64 In-Reply-To: References: <2VKOC-rT0vOyMcXUX2gs3sOrbZ5H79KBIo50sOOVmyI=.1936f78e-794c-4f54-af3c-b1b97e5fafa8@github.com> Message-ID: On Wed, 15 May 2024 15:27:23 GMT, Mikhail Ablakatov wrote: >> Hi, >> >>> I can update the patch with current results on Monday and we could decide how to proceed with this PR after that. Sounds good? >> >> Yes, that's right. > > Hi @theRealAph ! You may find the latest version here: https://github.com/mikabl-arm/jdk/commit/b3db421c795f683db1a001853990026bafc2ed4b . I gave a short explanation in the commit message, feel free to ask for more details if required. > > Unfortunately, it still contains critical bugs and I won't be able to take a look into the issue before the next week at best. Until it's fixed, it's not possible to run the benchmarks. Although I expect it to improve performance on longer integer arrays based on a benchmark I've written in C++ and Assembly. The results aren't comparable to the jmh results, so I won't post them here. Hi @mikabl-arm , the improvements for larger sizes look impressive, good work! any timeline for getting it merged? ------------- PR Comment: https://git.openjdk.org/jdk/pull/18487#issuecomment-2168348133 From ihse at openjdk.org Fri Jun 14 19:31:37 2024 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Fri, 14 Jun 2024 19:31:37 GMT Subject: RFR: 8333268: Fixes for static build Message-ID: This patch contains a set of changes to improve static builds. They will pave the way for implementing a full static-only java launcher. The changes here will: 1) Make sure non-exported symbols are made local in the static libraries. This means that the risk of symbol conflict is the same for static libraries as for dynamic libraries (i.e. in practice zero, as long as a consistent naming scheme is used for exported functions). 2) Remove the work-arounds to exclude duplicated symbols. 3) Fix some code in hotspot and the JDK libraries that did not work properly with a static java launcher. The latter fixes are copied from or inspired by the work done by @jianglizhou and her team as part of the Project Leyden [Hermetic Java](https://github.com/openjdk/leyden/tree/hermetic-java-runtime). ------------- Commit messages: - Merge branch 'master' into static-linking-progress - Move the exported JVM_IsStaticallyLinked to a better location - Use runtime lookup of static vs dynamic instead of #ifdef STATIC_BUILD - Copy fix for init_system_properties_values on linux - Make sure we do not try to build static libraries on Windows - 8333268: Fixes for static build Changes: https://git.openjdk.org/jdk/pull/19478/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19478&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8333268 Stats: 440 lines in 28 files changed: 203 ins; 74 del; 163 mod Patch: https://git.openjdk.org/jdk/pull/19478.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19478/head:pull/19478 PR: https://git.openjdk.org/jdk/pull/19478 From ihse at openjdk.org Fri Jun 14 19:31:37 2024 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Fri, 14 Jun 2024 19:31:37 GMT Subject: RFR: 8333268: Fixes for static build In-Reply-To: References: Message-ID: On Thu, 30 May 2024 13:00:21 GMT, Magnus Ihse Bursie wrote: > This patch contains a set of changes to improve static builds. They will pave the way for implementing a full static-only java launcher. The changes here will: > > 1) Make sure non-exported symbols are made local in the static libraries. This means that the risk of symbol conflict is the same for static libraries as for dynamic libraries (i.e. in practice zero, as long as a consistent naming scheme is used for exported functions). > > 2) Remove the work-arounds to exclude duplicated symbols. > > 3) Fix some code in hotspot and the JDK libraries that did not work properly with a static java launcher. > > The latter fixes are copied from or inspired by the work done by @jianglizhou and her team as part of the Project Leyden [Hermetic Java](https://github.com/openjdk/leyden/tree/hermetic-java-runtime). Some open questions: * Do `os::lookup_function` need to be implemented on Windows too, for symmetry, even if it is only used on Unix platforms? * Many of the changes in Hotspot boils down to `os::dll_load` doing the wrong thing when running with a static build. Perhaps we should provide a better function that knows how to find and load a symbol for both static and dynamic builds, and use that instead of making a lot of tests for static/dynamic on each location we need to look up a symbol from some other JDK library. * I managed to replace most of the #ifdef STATIC_BUILD with runtime checks. There are some places remaining though. Apart from the #ifdefs needed for JNI/JVMTI, which will need spec changes to address, there are code in java_md_macosx.m, jio.c and awt_Mlib.c that I did not manage to turn into runtime checks. They will need some more thorough work than just changing an `#ifdef` to an `if () {`. * And of course, the code in the build system to share all .o files except the two linktype files is still under development... I moved this away from Draft state, since I think it needs some visibility, especially since it touches several different parts of the code base, and such reviews tend to take time. I think the code here is good and basically okay to integrate. This patch will not on it's own solve the entire problem of building a proper static launcher, but it takes several important steps along the way. I think the changes here are reasonable to integrate into mainline at this point. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19478#issuecomment-2140743300 PR Comment: https://git.openjdk.org/jdk/pull/19478#issuecomment-2168635393 From ihse at openjdk.org Fri Jun 14 19:41:10 2024 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Fri, 14 Jun 2024 19:41:10 GMT Subject: RFR: 8333268: Fixes for static build In-Reply-To: References: Message-ID: On Thu, 30 May 2024 13:00:21 GMT, Magnus Ihse Bursie wrote: > This patch contains a set of changes to improve static builds. They will pave the way for implementing a full static-only java launcher. The changes here will: > > 1) Make sure non-exported symbols are made local in the static libraries. This means that the risk of symbol conflict is the same for static libraries as for dynamic libraries (i.e. in practice zero, as long as a consistent naming scheme is used for exported functions). > > 2) Remove the work-arounds to exclude duplicated symbols. > > 3) Fix some code in hotspot and the JDK libraries that did not work properly with a static java launcher. > > The latter fixes are copied from or inspired by the work done by @jianglizhou and her team as part of the Project Leyden [Hermetic Java](https://github.com/openjdk/leyden/tree/hermetic-java-runtime). The GHA tests fails when building gtest on Linux. This will require some investigation. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19478#issuecomment-2168647325 From coleenp at openjdk.org Fri Jun 14 19:46:12 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Fri, 14 Jun 2024 19:46:12 GMT Subject: RFR: 8332252: Clean up vmTestbase/vm/share In-Reply-To: References: Message-ID: On Fri, 14 Jun 2024 16:37:15 GMT, Leonid Mesnik wrote: > The vmTestbase/vm/share is a shared test library for vmTestbase tests. This library contains a lot of code that is used by only by small number of tests or not used at all. There are no plans to actively develop new tests in vmTestsbase and improve this shared library. > The final goal of this and the following PRs is to reduce the maintenance cost of vmTestbase by eliminating this library. > > Also, this PR moves test-specific code into corresponding test directories to increase code locality. This allows later easier move tests from vmTestbase. > > The few remaining classes include > InMemoryJavaCompiler.java > that is very similar to same class from the standard testlibrary and could be merge with it and > ProcessUtils.java > which is used by > test/hotspot/jtreg/runtime/Thread/TestBreakSignalThreadDump.java > and thus should be moved into the standard testlibrary. > The stack and options might be merged in nsk/share test library. Thank you so much for this cleanup work. I have one suggestion which hopefully will work. test/hotspot/jtreg/vmTestbase/metaspace/share/TriggerUnloadingWithWhiteBox.java line 23: > 21: * questions. > 22: */ > 23: package metaspace.share; There's a triggerUnloading call here: test/lib/jdk/test/lib/classloader/ClassUnloadCommon.java You might be able to also remove this file (and maybe the others) and use the ClassUnloadCommon version. ------------- PR Review: https://git.openjdk.org/jdk/pull/19727#pullrequestreview-2119124129 PR Review Comment: https://git.openjdk.org/jdk/pull/19727#discussion_r1640267492 From cjplummer at openjdk.org Fri Jun 14 19:54:13 2024 From: cjplummer at openjdk.org (Chris Plummer) Date: Fri, 14 Jun 2024 19:54:13 GMT Subject: RFR: 8332252: Clean up vmTestbase/vm/share In-Reply-To: References: Message-ID: On Fri, 14 Jun 2024 16:37:15 GMT, Leonid Mesnik wrote: > The vmTestbase/vm/share is a shared test library for vmTestbase tests. This library contains a lot of code that is used by only by small number of tests or not used at all. There are no plans to actively develop new tests in vmTestsbase and improve this shared library. > The final goal of this and the following PRs is to reduce the maintenance cost of vmTestbase by eliminating this library. > > Also, this PR moves test-specific code into corresponding test directories to increase code locality. This allows later easier move tests from vmTestbase. > > The few remaining classes include > InMemoryJavaCompiler.java > that is very similar to same class from the standard testlibrary and could be merge with it and > ProcessUtils.java > which is used by > test/hotspot/jtreg/runtime/Thread/TestBreakSignalThreadDump.java > and thus should be moved into the standard testlibrary. > The stack and options might be merged in nsk/share test library. test/hotspot/jtreg/vmTestbase/vm/compiler/complog/share/LogCompilationTest.java line 32: > 30: import vm.share.options.Option; > 31: import vm.share.options.OptionSupport; > 32: import vm.share.process.ProcessExecutor; You got rid of this import, but ProcessExecutor is still referenced below. Is this file even referenced during test execution? test/hotspot/jtreg/vmTestbase/vm/compiler/complog/share/ProcessExecutor.java line 135: > 133: > 134: public long getPid() { > 135: return process.toHandle().pid(); `tohandle()` is not necessary. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19727#discussion_r1640277346 PR Review Comment: https://git.openjdk.org/jdk/pull/19727#discussion_r1640260862 From lmesnik at openjdk.org Fri Jun 14 20:14:12 2024 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Fri, 14 Jun 2024 20:14:12 GMT Subject: RFR: 8332252: Clean up vmTestbase/vm/share In-Reply-To: References: Message-ID: On Fri, 14 Jun 2024 19:51:29 GMT, Chris Plummer wrote: >> The vmTestbase/vm/share is a shared test library for vmTestbase tests. This library contains a lot of code that is used by only by small number of tests or not used at all. There are no plans to actively develop new tests in vmTestsbase and improve this shared library. >> The final goal of this and the following PRs is to reduce the maintenance cost of vmTestbase by eliminating this library. >> >> Also, this PR moves test-specific code into corresponding test directories to increase code locality. This allows later easier move tests from vmTestbase. >> >> The few remaining classes include >> InMemoryJavaCompiler.java >> that is very similar to same class from the standard testlibrary and could be merge with it and >> ProcessUtils.java >> which is used by >> test/hotspot/jtreg/runtime/Thread/TestBreakSignalThreadDump.java >> and thus should be moved into the standard testlibrary. >> The stack and options might be merged in nsk/share test library. > > test/hotspot/jtreg/vmTestbase/vm/compiler/complog/share/LogCompilationTest.java line 32: > >> 30: import vm.share.options.Option; >> 31: import vm.share.options.OptionSupport; >> 32: import vm.share.process.ProcessExecutor; > > You got rid of this import, but ProcessExecutor is still referenced below. Is this file even referenced during test execution? The ProcessExecutor has been moved into this package, so it is local package now. Double checked that it is used and tests jtreg:open/test/hotspot/jtreg/vmTestbase/vm/compiler/complog still pass. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19727#discussion_r1640299885 From lmesnik at openjdk.org Fri Jun 14 20:18:10 2024 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Fri, 14 Jun 2024 20:18:10 GMT Subject: RFR: 8332252: Clean up vmTestbase/vm/share [v2] In-Reply-To: References: Message-ID: <7KcCGNzCVZPSzKdJFpqKfDMPlDiiMz52gyTbkZm10c8=.2e3fdcb1-07b1-4a7a-93de-f13dc1faf0ab@github.com> > The vmTestbase/vm/share is a shared test library for vmTestbase tests. This library contains a lot of code that is used by only by small number of tests or not used at all. There are no plans to actively develop new tests in vmTestsbase and improve this shared library. > The final goal of this and the following PRs is to reduce the maintenance cost of vmTestbase by eliminating this library. > > Also, this PR moves test-specific code into corresponding test directories to increase code locality. This allows later easier move tests from vmTestbase. > > The few remaining classes include > InMemoryJavaCompiler.java > that is very similar to same class from the standard testlibrary and could be merge with it and > ProcessUtils.java > which is used by > test/hotspot/jtreg/runtime/Thread/TestBreakSignalThreadDump.java > and thus should be moved into the standard testlibrary. > The stack and options might be merged in nsk/share test library. Leonid Mesnik has updated the pull request incrementally with one additional commit since the last revision: removed toHandle() ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19727/files - new: https://git.openjdk.org/jdk/pull/19727/files/275c9a00..f8a637dc Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19727&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19727&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/19727.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19727/head:pull/19727 PR: https://git.openjdk.org/jdk/pull/19727 From lmesnik at openjdk.org Fri Jun 14 20:20:20 2024 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Fri, 14 Jun 2024 20:20:20 GMT Subject: RFR: 8332252: Clean up vmTestbase/vm/share [v2] In-Reply-To: References: Message-ID: On Fri, 14 Jun 2024 19:42:21 GMT, Coleen Phillimore wrote: >> Leonid Mesnik has updated the pull request incrementally with one additional commit since the last revision: >> >> removed toHandle() > > test/hotspot/jtreg/vmTestbase/metaspace/share/TriggerUnloadingWithWhiteBox.java line 23: > >> 21: * questions. >> 22: */ >> 23: package metaspace.share; > > There's a triggerUnloading call here: > > test/lib/jdk/test/lib/classloader/ClassUnloadCommon.java > > You might be able to also remove this file (and maybe the others) and use the ClassUnloadCommon version. Thanks. I filed https://bugs.openjdk.org/browse/JDK-8334320 The functionality is little different so more testing might be required for changed tests. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19727#discussion_r1640315974 From duke at openjdk.org Fri Jun 14 20:31:44 2024 From: duke at openjdk.org (Volodymyr Paprotski) Date: Fri, 14 Jun 2024 20:31:44 GMT Subject: RFR: 8333583: Crypto-XDH.generateSecret regression after JDK-8329538 Message-ID: This fix recovers XDH performance but removes some of the P256 gains (~-8-14%). Still faster, but not as much. The fix is to undo 'int' return type on mult()/square(), which allowed to return partially reduced result (i.e. this avoids extra reductions when mult() result is fed into addition). This is the behaviour before the Montgomery ECC PR. I have a slightly better mult() intrinsic that does reduction at the end, but decided to use a more conservative fix and just keep the reduction in Java (i.e. original mult() refactored into multImpl() and reducePositive()) Will commit these optimizations I discovered while working on this in next release. --- Performance before Montgomery PR: Benchmark (algorithm) (dataSize) (keyLength) (provider) Mode Cnt Score Error Units SignatureBench.ECDSA.sign SHA256withECDSA 1024 256 thrpt 3 6398.727 ? 7.400 ops/s SignatureBench.ECDSA.sign SHA256withECDSA 16384 256 thrpt 3 6129.739 ? 5.995 ops/s SignatureBench.ECDSA.verify SHA256withECDSA 1024 256 thrpt 3 1889.928 ? 54.660 ops/s SignatureBench.ECDSA.verify SHA256withECDSA 16384 256 thrpt 3 1866.339 ? 42.438 ops/s Benchmark (algorithm) (keyLength) (kpgAlgorithm) (provider) Mode Cnt Score Error Units o.o.b.j.c.full.KeyAgreementBench.EC.generateSecret ECDH 256 EC thrpt 3 1350.745 ? 28.514 ops/s o.o.b.j.c.small.KeyAgreementBench.EC.generateSecret ECDH 256 EC thrpt 3 1349.393 ? 32.050 ops/s Benchmark (algorithm) (keyLength) (kpgAlgorithm) (provider) Mode Cnt Score Error Units KeyAgreementBench.XDH.generateSecret XDH 255 XDH thrpt 3 8435.277 ? 27.230 ops/s Performance in master without mult() intrinsic Benchmark (algorithm) (dataSize) (keyLength) (provider) Mode Cnt Score Error Units SignatureBench.ECDSA.sign SHA256withECDSA 1024 256 thrpt 3 6539.589 ? 132.844 ops/s SignatureBench.ECDSA.sign SHA256withECDSA 16384 256 thrpt 3 6202.530 ? 124.496 ops/s SignatureBench.ECDSA.verify SHA256withECDSA 1024 256 thrpt 3 1967.038 ? 15.819 ops/s SignatureBench.ECDSA.verify SHA256withECDSA 16384 256 thrpt 3 1931.667 ? 22.901 ops/s Benchmark (algorithm) (keyLength) (kpgAlgorithm) (provider) Mode Cnt Score Error Units o.o.b.j.c.full.KeyAgreementBench.EC.generateSecret ECDH 256 EC thrpt 3 1354.143 ? 24.861 ops/s o.o.b.j.c.small.KeyAgreementBench.EC.generateSecret ECDH 256 EC thrpt 3 1354.139 ? 21.904 ops/s Performance in master with mult() intrinsic Benchmark (algorithm) (dataSize) (keyLength) (provider) Mode Cnt Score Error Units SignatureBench.ECDSA.sign SHA256withECDSA 1024 256 thrpt 3 10534.707 ? 20.690 ops/s SignatureBench.ECDSA.sign SHA256withECDSA 16384 256 thrpt 3 9729.246 ? 102.803 ops/s SignatureBench.ECDSA.verify SHA256withECDSA 1024 256 thrpt 3 3549.011 ? 77.343 ops/s SignatureBench.ECDSA.verify SHA256withECDSA 16384 256 thrpt 3 3458.107 ? 14.622 ops/s Benchmark (algorithm) (keyLength) (kpgAlgorithm) (provider) Mode Cnt Score Error Units o.o.b.j.c.full.KeyAgreementBench.EC.generateSecret ECDH 256 EC thrpt 3 2563.566 ? 94.381 ops/s o.o.b.j.c.small.KeyAgreementBench.EC.generateSecret ECDH 256 EC thrpt 3 2569.143 ? 53.337 ops/s Benchmark (algorithm) (keyLength) (kpgAlgorithm) (provider) Mode Cnt Score Error Units KeyAgreementBench.XDH.generateSecret XDH 255 XDH thrpt 3 8309.028 ? 22.071 ops/s THIS PR without mult intrinsic Benchmark (algorithm) (dataSize) (keyLength) (provider) Mode Cnt Score Error Units SignatureBench.ECDSA.sign SHA256withECDSA 1024 256 thrpt 3 6225.541 ? 111.874 ops/s SignatureBench.ECDSA.sign SHA256withECDSA 16384 256 thrpt 3 5913.876 ? 121.556 ops/s SignatureBench.ECDSA.verify SHA256withECDSA 1024 256 thrpt 3 1837.740 ? 42.881 ops/s SignatureBench.ECDSA.verify SHA256withECDSA 16384 256 thrpt 3 1815.064 ? 72.015 ops/s Benchmark (algorithm) (keyLength) (kpgAlgorithm) (provider) Mode Cnt Score Error Units o.o.b.j.c.full.KeyAgreementBench.EC.generateSecret ECDH 256 EC thrpt 3 1271.716 ? 17.119 ops/s o.o.b.j.c.small.KeyAgreementBench.EC.generateSecret ECDH 256 EC thrpt 3 1265.405 ? 19.382 ops/s THIS PR with mult intrinsic Benchmark (algorithm) (dataSize) (keyLength) (provider) Mode Cnt Score Error Units SignatureBench.ECDSA.sign SHA256withECDSA 1024 256 thrpt 3 9560.700 ? 232.557 ops/s SignatureBench.ECDSA.sign SHA256withECDSA 16384 256 thrpt 3 8916.806 ? 164.756 ops/s SignatureBench.ECDSA.verify SHA256withECDSA 1024 256 thrpt 3 3064.470 ? 72.166 ops/s SignatureBench.ECDSA.verify SHA256withECDSA 16384 256 thrpt 3 2991.568 ? 75.720 ops/s Benchmark (algorithm) (keyLength) (kpgAlgorithm) (provider) Mode Cnt Score Error Units o.o.b.j.c.full.KeyAgreementBench.EC.generateSecret ECDH 256 EC thrpt 3 2200.308 ? 13.744 ops/s o.o.b.j.c.small.KeyAgreementBench.EC.generateSecret ECDH 256 EC thrpt 3 2203.028 ? 1.948 ops/s Benchmark (algorithm) (keyLength) (kpgAlgorithm) (provider) Mode Cnt Score Error Units KeyAgreementBench.XDH.generateSecret XDH 255 XDH thrpt 3 8514.924 ? 59.022 ops/s ------------- Commit messages: - whitespace - better reduction refactoring - Undo incomplete p256 mult reduction optimization Changes: https://git.openjdk.org/jdk/pull/19728/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19728&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8333583 Stats: 130 lines in 9 files changed: 53 ins; 37 del; 40 mod Patch: https://git.openjdk.org/jdk/pull/19728.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19728/head:pull/19728 PR: https://git.openjdk.org/jdk/pull/19728 From duke at openjdk.org Fri Jun 14 20:38:16 2024 From: duke at openjdk.org (Volodymyr Paprotski) Date: Fri, 14 Jun 2024 20:38:16 GMT Subject: RFR: 8333583: Crypto-XDH.generateSecret regression after JDK-8329538 In-Reply-To: References: Message-ID: On Fri, 14 Jun 2024 20:23:04 GMT, Volodymyr Paprotski wrote: > This fix recovers XDH performance but removes some of the P256 gains (~-8-14%). Still faster, but not as much. > > The fix is to undo 'int' return type on mult()/square(), which allowed to return partially reduced result (i.e. this avoids extra reductions when mult() result is fed into addition). This is the behaviour before the Montgomery ECC PR. > > I have a slightly better mult() intrinsic that does reduction at the end, but decided to use a more conservative fix and just keep the reduction in Java (i.e. original mult() refactored into multImpl() and reducePositive()) Will commit these optimizations I discovered while working on this in next release. > > --- > > Performance before Montgomery PR: > > Benchmark (algorithm) (dataSize) (keyLength) (provider) Mode Cnt Score Error Units > SignatureBench.ECDSA.sign SHA256withECDSA 1024 256 thrpt 3 6398.727 ? 7.400 ops/s > SignatureBench.ECDSA.sign SHA256withECDSA 16384 256 thrpt 3 6129.739 ? 5.995 ops/s > SignatureBench.ECDSA.verify SHA256withECDSA 1024 256 thrpt 3 1889.928 ? 54.660 ops/s > SignatureBench.ECDSA.verify SHA256withECDSA 16384 256 thrpt 3 1866.339 ? 42.438 ops/s > Benchmark (algorithm) (keyLength) (kpgAlgorithm) (provider) Mode Cnt Score Error Units > o.o.b.j.c.full.KeyAgreementBench.EC.generateSecret ECDH 256 EC thrpt 3 1350.745 ? 28.514 ops/s > o.o.b.j.c.small.KeyAgreementBench.EC.generateSecret ECDH 256 EC thrpt 3 1349.393 ? 32.050 ops/s > Benchmark (algorithm) (keyLength) (kpgAlgorithm) (provider) Mode Cnt Score Error Units > KeyAgreementBench.XDH.generateSecret XDH 255 XDH thrpt 3 8435.277 ? 27.230 ops/s > > Performance in master without mult() intrinsic > > Benchmark (algorithm) (dataSize) (keyLength) (provider) Mode Cnt Score Error Units > SignatureBench.ECDSA.sign SHA256withECDSA 1024 256 thrpt 3 6539.589 ? 132.844 ops/s > SignatureBench.ECDSA.sign SHA256withECDSA 16384 256 thrpt 3 6202.530 ? 124.496 ops/s > SignatureBench.ECDSA.verify SHA256withECDSA 1024 256 thrpt 3 1967.0... @ascarpino Would you mind reviewing this again please? Mostly java you reviewed before. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19728#issuecomment-2168728473 From gziemski at openjdk.org Fri Jun 14 20:53:10 2024 From: gziemski at openjdk.org (Gerard Ziemski) Date: Fri, 14 Jun 2024 20:53:10 GMT Subject: RFR: 8333994: NMT: call stacks should show source information In-Reply-To: References: Message-ID: On Tue, 11 Jun 2024 12:38:09 GMT, Thomas Stuefe wrote: > Patch adds printing of source file and line number to NMT call stacks that are printed in detail mode. Useful for hunting down leaks. It looks like `Decoder::get_source_info()` not implemented on macOS so I filed https://bugs.openjdk.org/browse/JDK-8334323 to fix this. It would be useful to show an example of improved printout here. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19655#issuecomment-2168742706 PR Comment: https://git.openjdk.org/jdk/pull/19655#issuecomment-2168745693 From gziemski at openjdk.org Fri Jun 14 21:02:12 2024 From: gziemski at openjdk.org (Gerard Ziemski) Date: Fri, 14 Jun 2024 21:02:12 GMT Subject: RFR: 8333994: NMT: call stacks should show source information In-Reply-To: References: Message-ID: On Tue, 11 Jun 2024 12:38:09 GMT, Thomas Stuefe wrote: > Patch adds printing of source file and line number to NMT call stacks that are printed in detail mode. Useful for hunting down leaks. There is a following comment in this code that says: > // Note: we deliberately omit printing source information here. NativeCallStack::print_on() > // can be called thousands of times as part of NMT detail reporting, and source printing > // can slow down reporting by a factor of 5 or more depending on platform (see JDK-8296931). but we are in fact looking up and printing more detail here. Is that comment no longer relevant, or is the slow down that goes with this change insignificant? ------------- Changes requested by gziemski (Committer). PR Review: https://git.openjdk.org/jdk/pull/19655#pullrequestreview-2119284529 From mdoerr at openjdk.org Fri Jun 14 21:08:27 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Fri, 14 Jun 2024 21:08:27 GMT Subject: RFR: 8331117: [PPC64] secondary_super_cache does not scale well [v7] In-Reply-To: References: Message-ID: <9taWn72w6sshuD8oA5gTuv8upZOb2Vx_eSTN0UW8i6Q=.c84fa389-6f04-459f-a0bd-0b45f8b69a29@github.com> > PPC64 implementation of [JDK-8180450](https://bugs.openjdk.org/browse/JDK-8180450). Please review! > I noticed that `r_array_length` is sometimes 0 and I don't see code for that on x86. Any idea? (This has been addressed in the discussion.) > How can we verify it? By comparing the performance using the micro benchmarks? > > Micro benchmark results without patch (measured on Power10 with 2*8 hardware threads): > > Original > SecondarySuperCacheHits: 13.033 ?(99.9%) 0.058 ns/op [Average] > SecondarySuperCacheInterContention.test avgt 15 432.366 ? 8.364 ns/op > SecondarySuperCacheInterContention.test:t1 avgt 15 432.310 ? 8.460 ns/op > SecondarySuperCacheInterContention.test:t2 avgt 15 432.422 ? 10.819 ns/op > SecondarySuperCacheIntraContention.test avgt 15 355.192 ? 3.597 ns/op > SecondarySupersLookup.testNegative00 avgt 15 12.274 ? 0.026 ns/op > SecondarySupersLookup.testNegative01 avgt 15 12.300 ? 0.039 ns/op > SecondarySupersLookup.testNegative02 avgt 15 12.304 ? 0.034 ns/op > SecondarySupersLookup.testNegative03 avgt 15 12.276 ? 0.050 ns/op > SecondarySupersLookup.testNegative04 avgt 15 12.235 ? 0.044 ns/op > SecondarySupersLookup.testNegative05 avgt 15 12.308 ? 0.156 ns/op > SecondarySupersLookup.testNegative06 avgt 15 12.291 ? 0.048 ns/op > SecondarySupersLookup.testNegative07 avgt 15 12.307 ? 0.052 ns/op > SecondarySupersLookup.testNegative08 avgt 15 12.398 ? 0.075 ns/op > SecondarySupersLookup.testNegative09 avgt 15 12.552 ? 0.122 ns/op > SecondarySupersLookup.testNegative10 avgt 15 12.490 ? 0.083 ns/op > SecondarySupersLookup.testNegative16 avgt 15 12.565 ? 0.092 ns/op > SecondarySupersLookup.testNegative20 avgt 15 19.059 ? 0.958 ns/op > SecondarySupersLookup.testNegative30 avgt 15 19.268 ? 0.124 ns/op > SecondarySupersLookup.testNegative32 avgt 15 20.059 ? 0.114 ns/op > SecondarySupersLookup.testNegative40 avgt 15 25.117 ? 0.368 ns/op > SecondarySupersLookup.testNegative50 avgt 15 32.735 ? 0.359 ns/op > SecondarySupersLookup.testNegative55 avgt 15 34.866 ? 0.152 ns/op > SecondarySupersLookup.testNegative56 avgt 15 35.492 ? 0.276 ns/op > SecondarySupersLookup.testNegative57 avgt 15 36.620 ? 0.334 ns/op > SecondarySupersLookup.testNegative58 avgt 15 37.226 ? 0.180 ns/op > SecondarySupersLookup.testNegative59 avgt 15 37.774 ? 0.241 ns/op > SecondarySupersLookup.testNegative60 avgt 15 38.627 ? 1.451 ns/op > SecondarySupersLookup.testNegative61 avgt 15 39.395 ? 0.249 ns/op > Seco... Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: Use same registers for sub_klass and super_klass as C1. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19368/files - new: https://git.openjdk.org/jdk/pull/19368/files/bea2f938..6136dbb9 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19368&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19368&range=05-06 Stats: 16 lines in 3 files changed: 3 ins; 0 del; 13 mod Patch: https://git.openjdk.org/jdk/pull/19368.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19368/head:pull/19368 PR: https://git.openjdk.org/jdk/pull/19368 From mdoerr at openjdk.org Fri Jun 14 21:08:28 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Fri, 14 Jun 2024 21:08:28 GMT Subject: RFR: 8331117: [PPC64] secondary_super_cache does not scale well [v6] In-Reply-To: References: Message-ID: On Fri, 14 Jun 2024 15:35:43 GMT, Amit Kumar wrote: >> Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: >> >> Minor improvements according to review suggestions. > > src/hotspot/cpu/ppc/macroAssembler_ppc.cpp line 2160: > >> 2158: r_array_length == R5_ARG3 && \ >> 2159: (r_array_index == R6_ARG4 || r_array_index == noreg) && \ >> 2160: (r_sub_klass == R7_ARG5 || r_sub_klass == noreg) && \ > > Maybe we can set `r_super_klass = R5` and `r_sub_klass =R7` to keep consistency in `c1_Runtime1_ppc.cpp`: > > > case slow_subtype_check_id: > { // Support for uint StubRoutine::partial_subtype_check( Klass sub, Klass super ); > const Register sub_klass = R5, > super_klass = R4, > temp1_reg = R6, > temp2_reg = R0; > __ check_klass_subtype_slow_path(sub_klass, super_klass, temp1_reg, temp2_reg); // returns with CR0.eq if successful > __ crandc(CCR0, Assembler::equal, CCR0, Assembler::equal); // failed: CR0.ne > __ blr(); > } > break; > > > I can see this being done for `aarch64`, `x86` and `risc-v` as well. Ok, using the same registers for sub_klass and super_klass as C1 should do no harm. See latest commit. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19368#discussion_r1640355111 From zgu at openjdk.org Fri Jun 14 21:15:16 2024 From: zgu at openjdk.org (Zhengyu Gu) Date: Fri, 14 Jun 2024 21:15:16 GMT Subject: RFR: 8333994: NMT: call stacks should show source information In-Reply-To: References: Message-ID: On Fri, 14 Jun 2024 20:59:50 GMT, Gerard Ziemski wrote: > There is a following comment in this code that says: > > > ``` > > // Note: we deliberately omit printing source information here. NativeCallStack::print_on() > > // can be called thousands of times as part of NMT detail reporting, and source printing > > // can slow down reporting by a factor of 5 or more depending on platform (see JDK-8296931). > > ``` > > but we are in fact looking up and printing more detail here. Is that comment no longer relevant, or is the slow down that goes with this change insignificant? I have the same question. Did dwarf decoder performance improve? If so, could you point me the PR? Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/19655#issuecomment-2168770022 From duke at openjdk.org Fri Jun 14 22:01:44 2024 From: duke at openjdk.org (Volodymyr Paprotski) Date: Fri, 14 Jun 2024 22:01:44 GMT Subject: RFR: 8333583: Crypto-XDH.generateSecret regression after JDK-8329538 [v2] In-Reply-To: References: Message-ID: > This fix recovers XDH performance but removes some of the P256 gains (~-8-14%). Still faster, but not as much. > > The fix is to undo 'int' return type on mult()/square(), which allowed to return partially reduced result (e.g. this avoids extra reductions when mult() result is fed into addition). This is the behaviour before the Montgomery ECC PR. > > --- > XDH.generateSecret performance > before Montgomery PR: > > Benchmark (algorithm) (keyLength) (kpgAlgorithm) (provider) Mode Cnt Score Error Units > KeyAgreementBench.XDH.generateSecret XDH 255 XDH thrpt 3 8435.277 ? 27.230 ops/s > > after Montgomery PR: > > Benchmark (algorithm) (keyLength) (kpgAlgorithm) (provider) Mode Cnt Score Error Units > KeyAgreementBench.XDH.generateSecret XDH 255 XDH thrpt 3 8309.028 ? 22.071 ops/s > > with this PR: > > Benchmark (algorithm) (keyLength) (kpgAlgorithm) (provider) Mode Cnt Score Error Units > KeyAgreementBench.XDH.generateSecret XDH 255 XDH thrpt 3 8491.268 ? 32.858 ops/s > > --- > > P256 performance with/without mult intrinsic: > > Performance before Montgomery PR: > > Benchmark (algorithm) (dataSize) (keyLength) (provider) Mode Cnt Score Error Units > SignatureBench.ECDSA.sign SHA256withECDSA 1024 256 thrpt 3 6398.727 ? 7.400 ops/s > SignatureBench.ECDSA.sign SHA256withECDSA 16384 256 thrpt 3 6129.739 ? 5.995 ops/s > SignatureBench.ECDSA.verify SHA256withECDSA 1024 256 thrpt 3 1889.928 ? 54.660 ops/s > SignatureBench.ECDSA.verify SHA256withECDSA 16384 256 thrpt 3 1866.339 ? 42.438 ops/s > Benchmark (algorithm) (keyLength) (kpgAlgorithm) (provider) Mode Cnt Score Error Units > o.o.b.j.c.full.KeyAgreementBench.EC.generateSecret ECDH 256 EC thrpt 3 1350.745 ? 28.514 ops/s > o.o.b.j.c.small.KeyAgreementBench.EC.generateSecret ECDH 256 EC thrpt 3 1349.393 ? 32.050 ops/s > > Performance in master without mult() intrinsic > > Benchmark (algorithm) (dataSize) (keyLength) (provider) Mode Cnt Score Err... Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision: Improve non-intrinsic p256 performance ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19728/files - new: https://git.openjdk.org/jdk/pull/19728/files/0219018b..2ab7bcbd Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19728&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19728&range=00-01 Stats: 43 lines in 2 files changed: 5 ins; 38 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/19728.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19728/head:pull/19728 PR: https://git.openjdk.org/jdk/pull/19728 From jonathanjoo at google.com Fri Jun 14 22:56:46 2024 From: jonathanjoo at google.com (Jonathan Joo) Date: Fri, 14 Jun 2024 15:56:46 -0700 Subject: Adaptable Heap Sizing for G1 GC In-Reply-To: <8717965B-DD60-4D97-8AA8-564194083D51@oracle.com> References: <8717965B-DD60-4D97-8AA8-564194083D51@oracle.com> Message-ID: Hi Erik, We had a similar vision with regards to not having to set heap sizes manually :) Agreed that with the currently proposed OpenJDK changes alone, there would be no effect for the user, just an entry point to allow for more intelligent heap sizing. We definitely do want to ship a policy that actually calculates and sets these flags, but I think a good point for discussion is *how* to ship such a policy. Note that as long as the two flags are introduced into the OpenJDK, there is always a way for people to modify the flags on their own and get AHS-like behavior. I guess the question is, to what extent do we want to take our current implementation of AHS logic, and move that from outside the JVM into the JVM? I think there are a few different possibilities, given that currently, AHS relies on internal Google services to access all the data we need. 1. Try to replicate exactly the way AHS works using APIs available from within hotspot code. For example, querying container limit and fullness information in a way that can work in any generic container environment. (Is there a good way to obtain this?) 2. Come up with a potentially less complex, but general working solution that is maintained solely within the hotspot code. The cons of this is that Google's implementation and upstream's implementation will diverge, and so there is more maintenance overhead from our end. It also won't have as robust functionality as the solution we are using at Google. 3. Don't bother with importing any AHS logic into the OpenJDK, but instead simply open-source/publish our current policies. This would allow for people to adopt their own implementations of AHS to plug it in a way they see fit, or fiddle with our code and integrate it into their own environments. Though I agree that without access to a special launcher or other mechanism to run this code, this approach may have limited usefulness. I'm not as familiar with logistically how viable it would be to do these solutions. Would love to hear whether you think these approaches are viable, and/or any blockers you might foresee. Best, ~ Jonathan On Thu, Jun 13, 2024 at 4:17?AM Erik Osterlund wrote: > Hi Jonathan, > > I?m currently working on automatic heap sizing for ZGC. My vision is that > users shouldn?t have to set heap sizes. > Would love to see that in G1 as well. What you are describing sounds like > it would do something similar. > > Having said that, it seems like the concrete changes you are proposing for > OpenJDK, would not actually > yield automatic heap sizing for the user. By the sound of it, you would > need your special launcher > with an extra thread that contains the actual heap sizing policy. The > proposed JVM changes are mostly for > being *able* to change the heap sizing policies externally, but without > any policy shipped that actually > changes it. > > While having a pluggable policy is great because anyone can put in their > own favourite policy, there > is also an obvious disadvantage that 99.9% of deployments won?t have any > special launcher or > supplier of an external heap sizing policy, or even know what we are > talking about. Therefore, > unless we also ship the policies, I unfortunately think that limits the > usefulness of the feature. > If, however, a policy was shipped so the heap can be sized automatically, > I think that would make it > much more widely useful. > > In my automatic heap sizing work, the goal is to ship both the mechanisms > and the policies needed > to automatically size (and resize) the heap, adapting to changing load and > environments. Are you > open to the idea of shipping a policy that actually changes the heap size > as well? It would be great > to be aligned on this, I think. > > Thanks, > /Erik > > On 13 Jun 2024, at 01:32, Jonathan Joo wrote: > > Hello hotspot-dev and hotspot-gc-dev, > > I'd like to reopen discussion on Adaptable Heap Sizing (AHS) for the G1 > Garbage Collector, since we now have some time to dedicate to bringing this > effort to the OpenJDK Community. Please see > https://mail.openjdk.org/pipermail/hotspot-gc-dev/2022-September/040096.html > for the original thread. > > The bullet points contained in the above link are still largely the same, > and we have made significant improvements to the service over the past few > years, and found success deploying it broadly across jobs internally. Now > that we feel the feature has matured, we'd like to introduce it to the > OpenJDK community in hopes that it can be adopted for broader use. > > In short - the goal of Adaptable Heap Sizing is to improve memory usage > and reduce OOMs for Java applications, especially those deployed in > containerized environments. The key insights are as follows: > > > 1. Applications with low memory requirements but configured with high > RAM often use RAM unnecessarily. We can utilize GC CPU overhead metrics to > help guide heap sizing, allowing for RAM savings in these scenarios. > 2. For Java applications running in containers, we can bound Java heap > usage based on our knowledge of the current container memory usage as well > as the current container size, to prevent container OOMs. > > > The implementation of AHS currently involves some fairly lightweight > changes to the JVM, through the introduction of two new manageable flags. > They are essentially the same as these two (open feature requests): > > - https://bugs.openjdk.org/browse/JDK-8236073 > - https://bugs.openjdk.org/browse/JDK-8204088 > > > In addition, we have a separate thread (outside of the JVM, in our custom > Java launcher) which reads in GC CPU overhead data and container > information, and calculates appropriate values for these two flags. We call > this the AHS worker thread, and this thread updates frequently (currently > every second). The vast majority of the AHS logic is in this worker thread > - the introduction of the new JVM flags above simply gives AHS a way to > tune GC heuristics given this additional information. > > Thomas Schatzl mentioned there is a similar-sounding effort going on in > ZGC , and also there were > folks outside of Google who expressed interest in this project, so I think > it is an appropriate time to discuss this again on an open forum. Given the > positive results we've had deploying AHS internally at Google, we feel this > is a valuable feature to the broader Java community that should be able to > be leveraged by all to achieve more stable and efficient Java heap behavior > ? > > I'd appreciate hearing peoples' thoughts on this. Thank you! > > ~ Jonathan > > (P.S. For more information, a talk given about this project can be viewed > here , though it is somewhat > dated.) > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sviswanathan at openjdk.org Fri Jun 14 23:48:12 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Fri, 14 Jun 2024 23:48:12 GMT Subject: RFR: 8333583: Crypto-XDH.generateSecret regression after JDK-8329538 [v2] In-Reply-To: References: Message-ID: On Fri, 14 Jun 2024 22:01:44 GMT, Volodymyr Paprotski wrote: >> This fix recovers XDH performance but removes some of the P256 gains (~-8-14%). Still faster, but not as much. >> >> The fix is to undo 'int' return type on mult()/square(), which allowed to return partially reduced result (e.g. this avoids extra reductions when mult() result is fed into addition). This is the behaviour before the Montgomery ECC PR. >> >> --- >> XDH.generateSecret performance >> before Montgomery PR: >> >> Benchmark (algorithm) (keyLength) (kpgAlgorithm) (provider) Mode Cnt Score Error Units >> KeyAgreementBench.XDH.generateSecret XDH 255 XDH thrpt 3 8435.277 ? 27.230 ops/s >> >> after Montgomery PR: >> >> Benchmark (algorithm) (keyLength) (kpgAlgorithm) (provider) Mode Cnt Score Error Units >> KeyAgreementBench.XDH.generateSecret XDH 255 XDH thrpt 3 8309.028 ? 22.071 ops/s >> >> with this PR: >> >> Benchmark (algorithm) (keyLength) (kpgAlgorithm) (provider) Mode Cnt Score Error Units >> KeyAgreementBench.XDH.generateSecret XDH 255 XDH thrpt 3 8491.268 ? 32.858 ops/s >> >> --- >> >> P256 performance with/without mult intrinsic: >> >> Performance before Montgomery PR: >> >> Benchmark (algorithm) (dataSize) (keyLength) (provider) Mode Cnt Score Error Units >> SignatureBench.ECDSA.sign SHA256withECDSA 1024 256 thrpt 3 6398.727 ? 7.400 ops/s >> SignatureBench.ECDSA.sign SHA256withECDSA 16384 256 thrpt 3 6129.739 ? 5.995 ops/s >> SignatureBench.ECDSA.verify SHA256withECDSA 1024 256 thrpt 3 1889.928 ? 54.660 ops/s >> SignatureBench.ECDSA.verify SHA256withECDSA 16384 256 thrpt 3 1866.339 ? 42.438 ops/s >> Benchmark (algorithm) (keyLength) (kpgAlgorithm) (provider) Mode Cnt Score Error Units >> o.o.b.j.c.full.KeyAgreementBench.EC.generateSecret ECDH 256 EC thrpt 3 1350.745 ? 28.514 ops/s >> o.o.b.j.c.small.KeyAgreementBench.EC.generateSecret ECDH 256 EC thrpt 3 1349.393 ? 32.050 ops/s >> >> Performance in master without mult() intrinsic >> >> Benchmark ... > > Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision: > > Improve non-intrinsic p256 performance src/hotspot/share/opto/runtime.cpp line 1417: > 1415: // result type needed > 1416: fields = TypeTuple::fields(1); > 1417: fields[TypeFunc::Parms + 0] = NULL; A minor nit: here NULL could be nullptr instead. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19728#discussion_r1640466077 From cjplummer at openjdk.org Sat Jun 15 00:04:13 2024 From: cjplummer at openjdk.org (Chris Plummer) Date: Sat, 15 Jun 2024 00:04:13 GMT Subject: RFR: 8332252: Clean up vmTestbase/vm/share [v2] In-Reply-To: <7KcCGNzCVZPSzKdJFpqKfDMPlDiiMz52gyTbkZm10c8=.2e3fdcb1-07b1-4a7a-93de-f13dc1faf0ab@github.com> References: <7KcCGNzCVZPSzKdJFpqKfDMPlDiiMz52gyTbkZm10c8=.2e3fdcb1-07b1-4a7a-93de-f13dc1faf0ab@github.com> Message-ID: On Fri, 14 Jun 2024 20:18:10 GMT, Leonid Mesnik wrote: >> The vmTestbase/vm/share is a shared test library for vmTestbase tests. This library contains a lot of code that is used by only by small number of tests or not used at all. There are no plans to actively develop new tests in vmTestsbase and improve this shared library. >> The final goal of this and the following PRs is to reduce the maintenance cost of vmTestbase by eliminating this library. >> >> Also, this PR moves test-specific code into corresponding test directories to increase code locality. This allows later easier move tests from vmTestbase. >> >> The few remaining classes include >> InMemoryJavaCompiler.java >> that is very similar to same class from the standard testlibrary and could be merge with it and >> ProcessUtils.java >> which is used by >> test/hotspot/jtreg/runtime/Thread/TestBreakSignalThreadDump.java >> and thus should be moved into the standard testlibrary. >> The stack and options might be merged in nsk/share test library. > > Leonid Mesnik has updated the pull request incrementally with one additional commit since the last revision: > > removed toHandle() Marked as reviewed by cjplummer (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/19727#pullrequestreview-2119605926 From dholmes at openjdk.org Sat Jun 15 05:12:15 2024 From: dholmes at openjdk.org (David Holmes) Date: Sat, 15 Jun 2024 05:12:15 GMT Subject: RFR: 8333962: Obsolete OldSize [v2] In-Reply-To: References: Message-ID: On Fri, 14 Jun 2024 10:19:47 GMT, Albert Mingkun Yang wrote: >> Obsolete OldSize and related code. An internal variable `OldSize` is kept to capture the capacity of old-gen size. > > Albert Mingkun Yang has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains one commit: > > obsolete-old-size Marked as reviewed by dholmes (Reviewer). src/hotspot/share/runtime/arguments.cpp line 37: > 35: #include "gc/shared/gcArguments.hpp" > 36: #include "gc/shared/gcConfig.hpp" > 37: #include "gc/shared/genArguments.hpp" Why is this needed? ------------- PR Review: https://git.openjdk.org/jdk/pull/19647#pullrequestreview-2120074670 PR Review Comment: https://git.openjdk.org/jdk/pull/19647#discussion_r1640761871 From stuefe at openjdk.org Sat Jun 15 05:41:35 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Sat, 15 Jun 2024 05:41:35 GMT Subject: RFR: 8334223: Make Arena MEMFLAGs immutable Message-ID: Arenas carry NMT flags. An arena should never change that flag. But it does: Arenas (as ResourceAreas), used by CompilerThread, are accounted toward mtCompiler. But since the RA is already created in the parent class constructor (as mtThread), we then have to awkwardly change the flag of an already existing RA in the CompilerThread constructor. As a prerequisite for future NMT work I would like Arena MEMFLAGS to be immutable. The patch does that: - we hand in MEMFLAGS to the Thread constructor now (defaults to mtThread) - CompilerThread hands in mtCompiler, all other threads rely on the default - on creation, both ResourceArea and HandleArea are now accounted toward the flag handed in - that allows us to make Arena::flags const, and to remove ResourceArea::bias_to which changed the flag in-flight for the arena - it also allows us to make Arena::flags private Other, unrelated cleanups: - Made Arena::_size_in_bytes and Arena::_tag private - Merged both Arena constructors into one by specifying a default value of `Chunk::init_size` for `init_size` argument. That makes it equivalent to the old `Arena(flag, tag)` constructor - removed `JavaThread::JavaThread(bool)`. That constructor was used when creating threads that are getting attached. There was only a single use for that constructor, and I replaced it with functionally equivalent code. Tests: I manually verified that the NMT numbers printed don't change. ------------- Commit messages: - start Changes: https://git.openjdk.org/jdk/pull/19693/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19693&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8334223 Stats: 76 lines in 12 files changed: 20 ins; 42 del; 14 mod Patch: https://git.openjdk.org/jdk/pull/19693.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19693/head:pull/19693 PR: https://git.openjdk.org/jdk/pull/19693 From stuefe at openjdk.org Sat Jun 15 05:44:31 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Sat, 15 Jun 2024 05:44:31 GMT Subject: RFR: 8330174: Establish no-access zone at the start of Klass encoding range [v3] In-Reply-To: <9RShpjQGr5MI3aqK6VqpYgDiUJS3q_Q6Bdo4jWmtJ5g=.764b3747-69be-4a70-a599-d6cb9a02bddd@github.com> References: <9RShpjQGr5MI3aqK6VqpYgDiUJS3q_Q6Bdo4jWmtJ5g=.764b3747-69be-4a70-a599-d6cb9a02bddd@github.com> Message-ID: > After having reserved an address range for the Klass encoding range, we either: > a) Place CDS, then class space, into that address range > b) Place only class space in that range (if CDS is off). > > For an nKlass of 0, the decoded Klasspointer points to the beginning of the encoding range. Since nKlass=0 is a special value, both CDS (a) and Metaspace (b) ensure that no Klass is placed right at the start of the Klass range. > > However, it would also be good to establish a no-access zone at the range's start. Dereferencing an nKlass=0 would then result in an immediate, obvious crash instead of in reading invalid data. > > This would closely mimic what we do in the compressed-oops-enabled java heap (albeit there we do it for fault-based null checks, too) and what Operating Systems do with low-address ranges. > > --- > > The patch: > > We can neither move the encoding base down one page (the encoding base is carefully chosen to fit the platform's decoding). Nor can we move CDS archive space up one page (since CDS relies on the archive being placed exactly at the encoding base address). Nor do we want to move class space up (since class space start has a high alignment requirement of 16MB, protection zone would need to be 16MB large, which is a waste of address space). > > Instead, as before, we just let Metaspace and CDS handle the protection zone internally. For Metaspace, this is very simple. We just protect the first page of class space. > > For CDS, it is a tiny bit more complex since we need to leave a "protection-zone-shaped hole" in the first region of the archive when we dump it. We do just that and then give that region a new property, "has protection zone". At runtime, we protect the underlying memory if a mapped region has a protection zone. > > With CDS, because the page size can differ between dump- and runtime, the protection zone is the size of CDS core region alignment, not page-sized (e.g. dumping on Linux aarch64 with 4KB pages shall generate an archive that can be used in Docker on MacOS with 16KB pages). > > ---- > > Tests: > - ran CDS and AppCDS jtreg tests manually on Mac m1 > - manually tested that decoding, then dereferencing an nKlass=0 gives us the new "Fault address is narrow Klass base - dereferencing a zero nKlass?" output in the hs-err file > - GHAs (which include the new regression test) Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: - Merge branch 'openjdk:master' into cds-metaspace-prot-prefix - Merge branch 'openjdk:master' into cds-metaspace-prot-prefix - Merge branch 'openjdk:master' into cds-metaspace-prot-prefix - Update metaspace.cpp - cds-metaspace-prot-prefix ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19290/files - new: https://git.openjdk.org/jdk/pull/19290/files/0477e957..2ccd527d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19290&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19290&range=01-02 Stats: 14900 lines in 579 files changed: 6133 ins; 7257 del; 1510 mod Patch: https://git.openjdk.org/jdk/pull/19290.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19290/head:pull/19290 PR: https://git.openjdk.org/jdk/pull/19290 From stuefe at openjdk.org Sat Jun 15 06:29:14 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Sat, 15 Jun 2024 06:29:14 GMT Subject: RFR: 8324781: runtime/Thread/TestAlwaysPreTouchStacks.java failed with Expected a higher ratio between stack committed and reserved [v12] In-Reply-To: References: Message-ID: On Tue, 11 Jun 2024 07:21:43 GMT, Liming Liu wrote: >> The testcase failed on Oracle CI since JDK-8315923. The root cause is that Oracle CI runs Linux-5.4.17-UEK where the value of MADV_POPULATE_WRITE (23) is used as MADV_DONTEXEC which is not supported by upstream. This PR solves the testcase failure by checking versions of kernels first, and checking the availability of MADV_POPULATE_WRITE when they are not older than 5.14. > > Liming Liu has updated the pull request incrementally with one additional commit since the last revision: > > Fix variable names Okay then ------------- Marked as reviewed by stuefe (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18592#pullrequestreview-2120231709 From gcao at openjdk.org Sat Jun 15 06:53:11 2024 From: gcao at openjdk.org (Gui Cao) Date: Sat, 15 Jun 2024 06:53:11 GMT Subject: RFR: 8334078: RISC-V: TestIntVect.java fails after JDK-8332153 when running without RVV In-Reply-To: References: <5kfe-OxwcEK0A20RTgeuue-RB-mkk16xz_SfJvIW-iE=.41b3cdad-0ab8-4b1e-8e16-552028d0bbd2@github.com> Message-ID: <8ZyZ714PEAEWRF290pni1GMVjgVNb0GHjFImmp6xgMw=.9ec8e9b4-be14-48cc-a633-abb59b3fa173@github.com> On Fri, 14 Jun 2024 17:26:34 GMT, Hamlin Li wrote: > Yes, rv* is much better, I'm OK with this renaming. > > At the same time, can you fix `WHITE_BOX.getCPUFeatures()` with `CPUInfo.getFeatures()` in IREncodingPrinter.java? As I think it's the final fix for this kind of issue. As I said, with a `String.contains(xxx)`, it could fail with other cpu features in the future, as it mixes all cpu features in one long string, and there is no guarantee the similar issue will not happen again. When I modify it this way, x86 fastdebug has some errors. ``` diff diff --git a/test/hotspot/jtreg/compiler/lib/ir_framework/test/IREncodingPrinter.java b/test/hotspot/jtreg/compiler/lib/ir_framework/test/IREncodingPrinter.java index 73943db3f53..03eba7c6c2c 100644 --- a/test/hotspot/jtreg/compiler/lib/ir_framework/test/IREncodingPrinter.java +++ b/test/hotspot/jtreg/compiler/lib/ir_framework/test/IREncodingPrinter.java @@ -29,6 +29,7 @@ import compiler.lib.ir_framework.shared.*; import jdk.test.lib.Platform; import jdk.test.whitebox.WhiteBox; +import jdk.test.whitebox.cpuinfo.CPUInfo; import java.lang.reflect.Method; import java.nio.ByteOrder; @@ -416,7 +417,7 @@ private boolean checkCPUFeature(String feature, String value) { TestFormat.failNoThrow("Provided incorrect value for feature " + feature + failAt()); return false; } - String cpuFeatures = WHITE_BOX.getCPUFeatures(); + List cpuFeatures = CPUInfo.getFeatures(); return (trueValue && cpuFeatures.contains(feature)) || (falseValue && !cpuFeatures.contains(feature)); } cpu info: processor : 127 vendor_id : GenuineIntel cpu family : 6 model : 106 model name : Intel(R) Xeon(R) Platinum 8378C CPU @ 2.80GHz stepping : 6 microcode : 0x1 cpu MHz : 2799.998 cache size : 58368 KB physical id : 1 siblings : 64 core id : 31 cpu cores : 32 apicid : 127 initial apicid : 127 fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch cpuid_fault invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves wbnoinvd arat avx512vbmi umip avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg avx512_vpopcntdq fsrm md_clear arch_capabilities bugs : spectre_v1 spectre_v2 spec_store_bypass swapgs mmio_stale_data eibrs_pbrsb bogomips : 5599.99 clflush size : 64 cache_alignment : 64 address sizes : 42 bits physical, 48 bits virtual power management: error message: 8) Method "public static void compiler.loopopts.superword.TestDependencyOffsets.testLongP7(long[])" - [Failed IR rules: 1]: * @IR rule 4: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={}, counts={"_#V#LOAD_VECTOR_L#_", "> 0", "_#V#ADD_VL#_", "> 0", "_#STORE_VECTOR#_", "> 0"}, applyIfPlatformOr={}, applyIfPlatform={}, failOn={}, applyIfOr={}, applyIfCPUFeatureAnd={"avx2", "true", "avx512", "false"}, applyIf={}, applyIfCPUFeature={}, applyIfAnd={"AlignVector", "true", "MaxVectorSize", ">= 16"}, applyIfNot={})" > Phase "PrintIdeal": - counts: Graph contains wrong number of nodes: * Constraint 1: "(\\d+(\\s){2}(LoadVector.*)+(\\s){2}===.*vector[A-Za-z]\[8\]:\{long\})" - Failed comparison: [found] 0 > 0 [given] - No nodes matched! * Constraint 2: "(\\d+(\\s){2}(AddVL.*)+(\\s){2}===.*vector[A-Za-z]\[8\]:\{long\})" - Failed comparison: [found] 0 > 0 [given] - No nodes matched! >>> Check stdout for compilation output of the failed methods I think we modified it by use CPUInfo.getFeatures() instead of WHITE_BOX.getCPUFeatures(), machine cpu doesn't show avx512, but it seems he has avx512f, avx512dq, etc., and may be using avx512xx when he actually runs it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19686#discussion_r1640824142 From stuefe at openjdk.org Sat Jun 15 06:54:17 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Sat, 15 Jun 2024 06:54:17 GMT Subject: RFR: 8333658: NMT: Use an allocator with 4-byte pointers to save memory in NativeCallStackStorage [v18] In-Reply-To: References: Message-ID: On Thu, 13 Jun 2024 21:02:33 GMT, Johan Sj?len wrote: >> Hi, >> >> This PR introduces a new allocator, `IndexedFreelistAllocator`. It uses a `GrowableArray` in order to have 4-byte "pointers" to its elements and also works as a freelist as unused slots form a linked list. This allocator cannot shrink its used memory. I'm always open for better names. >> >> We then use this allocator in order to store the `NativeCallStackStorage` hash table. This saves `4 * 4099 = ~16KiB` of memory as each table element is now only 4 bytes. For each stored stack it also saves 4 bytes. The fact that the allocator cannot shrink is fine, we do not ever remove stacks from the storage. >> >> The main point of this is to introduce a pointer-generic experimential API, so I also implemented CHeap and Arena allocators. It's currently placed in NMT, but we might want to move it into utilities. It uses a bit of template-magic, but my IDE (Emacs+clangd) was actually able to figure things out when the types didn't line up correctly etc, so it's not an enemy to IDE help. >> >> It sounded expensive to have the GrowableArray continously realloc its underlying data array, so I did a basic test where we allocate 1 000 000 stacks and push them into NativeCallStackStorage backed by different allocators. This is available in the PR. >> >> The results are as follows on linux-x64-slowdebug: >> >> >> Generate stacks... Done >> Time taken with GrowableArray: 8341.240945 >> Time taken with CHeap: 12189.031318 >> Time taken with Arena: 8800.703092 >> Time taken with GrowableArray again: 8295.508829 >> >> >> And on linux-x64: >> >> >> Time taken with GrowableArray: 8378.018135 >> Time taken with CHeap: 12437.347868 >> Time taken with Arena: 8758.064717 >> Time taken with GrowableArray again: 8391.076291 >> >> >> Obviously, this is a very basic benchmark, but it seems like we're faster than CHeap and Arena for this case. > > Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: > > Remove dead CHeap allocator test More comments. Thanks for taking my suggestions. We need a little gtest for this. src/hotspot/share/nmt/indexedFreeListAllocator.hpp line 62: > 60: I allocate(Args... args) { > 61: BackingElement* be; > 62: int i; I i? Then, later, just return i? src/hotspot/share/nmt/indexedFreeListAllocator.hpp line 79: > 77: > 78: void free(I i) { > 79: assert(i != nil || (i > 0 && i < _backing_storage.length()), "out of bounds free"); I think there are some errors here. This is probably broken. Which we would see if the gtests were running, but hotspot common tier1 tests seem broken. Do we allow passing in nil? Then, i must be either nil or valid, not != nil or valid. If not, use an AND, not an OR. i=0 is valid Could you also please factor out OOB test for i? src/hotspot/share/nmt/indexedFreeListAllocator.hpp line 80: > 78: void free(I i) { > 79: assert(i != nil || (i > 0 && i < _backing_storage.length()), "out of bounds free"); > 80: if (i != nil) return; i == nil? ------------- PR Review: https://git.openjdk.org/jdk/pull/18979#pullrequestreview-2120237393 PR Review Comment: https://git.openjdk.org/jdk/pull/18979#discussion_r1640814246 PR Review Comment: https://git.openjdk.org/jdk/pull/18979#discussion_r1640820200 PR Review Comment: https://git.openjdk.org/jdk/pull/18979#discussion_r1640820299 From gcao at openjdk.org Sat Jun 15 06:59:12 2024 From: gcao at openjdk.org (Gui Cao) Date: Sat, 15 Jun 2024 06:59:12 GMT Subject: RFR: 8332587: RISC-V: secondary_super_cache does not scale well [v6] In-Reply-To: References: Message-ID: On Fri, 14 Jun 2024 13:48:16 GMT, Andrew Haley wrote: > It's worth running the "before" test with `-XX:-UseSecondarySupersCache`. This gives you a much better idea of the cost when you don't get any hit on the one-element secondary supers cache. JMH tested on Banana Pi BPI-F3 board (has Zbb) and Enable UseZbb, and use `-XX:-UseSecondarySupersCache` to disable `UseSecondarySupersCache` Original(not with patch): Benchmark Mode Cnt Score Error Units [81/1889] SecondarySupersLookup.testNegative00 avgt 15 15.153 ? 0.219 ns/op SecondarySupersLookup.testNegative01 avgt 15 17.029 ? 0.287 ns/op SecondarySupersLookup.testNegative02 avgt 15 20.242 ? 0.353 ns/op SecondarySupersLookup.testNegative03 avgt 15 23.737 ? 0.515 ns/op SecondarySupersLookup.testNegative04 avgt 15 26.097 ? 0.477 ns/op SecondarySupersLookup.testNegative05 avgt 15 28.460 ? 0.546 ns/op SecondarySupersLookup.testNegative06 avgt 15 31.025 ? 0.622 ns/op SecondarySupersLookup.testNegative07 avgt 15 32.070 ? 0.518 ns/op SecondarySupersLookup.testNegative08 avgt 15 34.656 ? 0.586 ns/op SecondarySupersLookup.testNegative09 avgt 15 36.140 ? 0.622 ns/op SecondarySupersLookup.testNegative10 avgt 15 38.304 ? 0.577 ns/op SecondarySupersLookup.testNegative16 avgt 15 49.672 ? 0.726 ns/op SecondarySupersLookup.testNegative20 avgt 15 57.241 ? 0.709 ns/op SecondarySupersLookup.testNegative30 avgt 15 76.189 ? 0.804 ns/op SecondarySupersLookup.testNegative32 avgt 15 79.821 ? 0.809 ns/op SecondarySupersLookup.testNegative40 avgt 15 95.006 ? 0.999 ns/op SecondarySupersLookup.testNegative50 avgt 15 113.808 ? 0.933 ns/op SecondarySupersLookup.testNegative55 avgt 15 122.897 ? 1.237 ns/op SecondarySupersLookup.testNegative56 avgt 15 125.180 ? 1.005 ns/op SecondarySupersLookup.testNegative57 avgt 15 126.606 ? 0.925 ns/op SecondarySupersLookup.testNegative58 avgt 15 128.890 ? 0.809 ns/op SecondarySupersLookup.testNegative59 avgt 15 130.382 ? 1.092 ns/op SecondarySupersLookup.testNegative60 avgt 15 132.426 ? 1.045 ns/op SecondarySupersLookup.testNegative61 avgt 15 133.953 ? 1.062 ns/op SecondarySupersLookup.testNegative62 avgt 15 136.156 ? 0.974 ns/op SecondarySupersLookup.testNegative63 avgt 15 137.958 ? 1.172 ns/op SecondarySupersLookup.testNegative64 avgt 15 142.439 ? 4.703 ns/op SecondarySupersLookup.testPositive01 avgt 15 17.030 ? 0.218 ns/op SecondarySupersLookup.testPositive02 avgt 15 20.688 ? 0.793 ns/op SecondarySupersLookup.testPositive03 avgt 15 24.253 ? 0.566 ns/op SecondarySupersLookup.testPositive04 avgt 15 27.154 ? 0.495 ns/op SecondarySupersLookup.testPositive05 avgt 15 28.596 ? 0.892 ns/op SecondarySupersLookup.testPositive06 avgt 15 30.846 ? 0.304 ns/op SecondarySupersLookup.testPositive07 avgt 15 32.960 ? 0.564 ns/op SecondarySupersLookup.testPositive08 avgt 15 34.706 ? 0.377 ns/op SecondarySupersLookup.testPositive09 avgt 15 36.615 ? 0.453 ns/op SecondarySupersLookup.testPositive10 avgt 15 38.760 ? 0.462 ns/op SecondarySupersLookup.testPositive16 avgt 15 49.848 ? 0.489 ns/op SecondarySupersLookup.testPositive20 avgt 15 57.419 ? 0.467 ns/op SecondarySupersLookup.testPositive30 avgt 15 76.303 ? 0.530 ns/op SecondarySupersLookup.testPositive32 avgt 15 79.537 ? 0.323 ns/op SecondarySupersLookup.testPositive40 avgt 15 94.493 ? 0.565 ns/op SecondarySupersLookup.testPositive50 avgt 15 113.451 ? 1.932 ns/op SecondarySupersLookup.testPositive60 avgt 15 133.233 ? 4.262 ns/op SecondarySupersLookup.testPositive63 avgt 15 137.090 ? 2.027 ns/op SecondarySupersLookup.testPositive64 avgt 15 138.784 ? 1.650 ns/op Finished running test 'micro:vm.lang.SecondarySupersLookup' With patch Benchmark Mode Cnt Score Error Units SecondarySupersLookup.testNegative00 avgt 15 12.621 ? 0.172 ns/op SecondarySupersLookup.testNegative01 avgt 15 12.614 ? 0.163 ns/op SecondarySupersLookup.testNegative02 avgt 15 12.619 ? 0.169 ns/op SecondarySupersLookup.testNegative03 avgt 15 12.621 ? 0.171 ns/op SecondarySupersLookup.testNegative04 avgt 15 12.617 ? 0.167 ns/op SecondarySupersLookup.testNegative05 avgt 15 12.618 ? 0.167 ns/op SecondarySupersLookup.testNegative06 avgt 15 12.626 ? 0.180 ns/op SecondarySupersLookup.testNegative07 avgt 15 12.621 ? 0.175 ns/op SecondarySupersLookup.testNegative08 avgt 15 12.623 ? 0.176 ns/op SecondarySupersLookup.testNegative09 avgt 15 12.626 ? 0.177 ns/op SecondarySupersLookup.testNegative10 avgt 15 12.625 ? 0.181 ns/op SecondarySupersLookup.testNegative16 avgt 15 12.625 ? 0.178 ns/op SecondarySupersLookup.testNegative20 avgt 15 12.624 ? 0.178 ns/op SecondarySupersLookup.testNegative30 avgt 15 12.635 ? 0.195 ns/op SecondarySupersLookup.testNegative32 avgt 15 12.638 ? 0.200 ns/op SecondarySupersLookup.testNegative40 avgt 15 12.644 ? 0.209 ns/op SecondarySupersLookup.testNegative50 avgt 15 12.646 ? 0.212 ns/op SecondarySupersLookup.testNegative55 avgt 15 51.029 ? 0.833 ns/op SecondarySupersLookup.testNegative56 avgt 15 51.484 ? 1.074 ns/op SecondarySupersLookup.testNegative57 avgt 15 51.170 ? 0.731 ns/op SecondarySupersLookup.testNegative58 avgt 15 51.775 ? 1.573 ns/op SecondarySupersLookup.testNegative59 avgt 15 51.000 ? 0.919 ns/op SecondarySupersLookup.testNegative60 avgt 15 73.169 ? 0.950 ns/op SecondarySupersLookup.testNegative61 avgt 15 73.537 ? 1.235 ns/op SecondarySupersLookup.testNegative62 avgt 15 75.116 ? 4.232 ns/op SecondarySupersLookup.testNegative63 avgt 15 153.908 ? 1.126 ns/op SecondarySupersLookup.testNegative64 avgt 15 155.937 ? 1.101 ns/op SecondarySupersLookup.testPositive01 avgt 15 17.656 ? 0.220 ns/op SecondarySupersLookup.testPositive02 avgt 15 17.012 ? 0.192 ns/op SecondarySupersLookup.testPositive03 avgt 15 17.643 ? 0.200 ns/op SecondarySupersLookup.testPositive04 avgt 15 17.640 ? 0.196 ns/op SecondarySupersLookup.testPositive05 avgt 15 17.018 ? 0.203 ns/op SecondarySupersLookup.testPositive06 avgt 15 17.643 ? 0.201 ns/op SecondarySupersLookup.testPositive07 avgt 15 17.648 ? 0.208 ns/op SecondarySupersLookup.testPositive08 avgt 15 17.652 ? 0.214 ns/op SecondarySupersLookup.testPositive09 avgt 15 17.650 ? 0.207 ns/op SecondarySupersLookup.testPositive10 avgt 15 17.026 ? 0.213 ns/op SecondarySupersLookup.testPositive16 avgt 15 17.649 ? 0.209 ns/op SecondarySupersLookup.testPositive20 avgt 15 17.652 ? 0.216 ns/op SecondarySupersLookup.testPositive30 avgt 15 37.097 ? 0.301 ns/op SecondarySupersLookup.testPositive32 avgt 15 37.351 ? 0.459 ns/op SecondarySupersLookup.testPositive40 avgt 15 42.951 ? 1.256 ns/op SecondarySupersLookup.testPositive50 avgt 15 17.642 ? 0.194 ns/op SecondarySupersLookup.testPositive60 avgt 15 37.099 ? 0.300 ns/op SecondarySupersLookup.testPositive63 avgt 15 153.592 ? 6.942 ns/op SecondarySupersLookup.testPositive64 avgt 15 153.214 ? 2.070 ns/op Finished running test 'micro:vm.lang.SecondarySupersLookup' @theRealAph : Hi, Can you take a look? Thanks ------------- PR Comment: https://git.openjdk.org/jdk/pull/19320#issuecomment-2169167629 From gcao at openjdk.org Sat Jun 15 07:06:23 2024 From: gcao at openjdk.org (Gui Cao) Date: Sat, 15 Jun 2024 07:06:23 GMT Subject: RFR: 8332587: RISC-V: secondary_super_cache does not scale well [v7] In-Reply-To: References: Message-ID: > Implementation of subtype checking [JDK-8180450](https://bugs.openjdk.org/browse/JDK-8180450) for linux-riscv64. > This optimization depends on availability of the Zbb extension which has the cpop instruction. > > ### Correctness testing: > > - [x] Run tier1-3, hotspot:tier4 tests on SOPHON SG2042 (release) > - [x] Run tier1-3 tests with -XX:+UseRVV on qemu 8.1.0 (fastdebug) > - [x] Run all of tier1 with `-XX:+VerifySecondarySupers` > > ### JMH tested on Banana Pi BPI-F3 board (has Zbb) and Enable UseZbb, Not Enable UseZba, UseZbs > Original: > > Benchmark Mode Cnt Score Error Units > SecondarySuperCacheHits.test avgt 15 11.375 ? 0.071 ns/op > SecondarySuperCacheInterContention.test avgt 15 646.087 ? 32.587 ns/op > SecondarySuperCacheInterContention.test:t1 avgt 15 600.090 ? 83.779 ns/op > SecondarySuperCacheInterContention.test:t2 avgt 15 692.084 ? 73.218 ns/op > SecondarySupersLookup.testNegative00 avgt 15 16.420 ? 0.239 ns/op > SecondarySupersLookup.testNegative01 avgt 15 18.307 ? 0.260 ns/op > SecondarySupersLookup.testNegative02 avgt 15 21.695 ? 0.458 ns/op > SecondarySupersLookup.testNegative03 avgt 15 24.855 ? 0.664 ns/op > SecondarySupersLookup.testNegative04 avgt 15 27.305 ? 0.522 ns/op > SecondarySupersLookup.testNegative05 avgt 15 29.719 ? 0.385 ns/op > SecondarySupersLookup.testNegative06 avgt 15 32.231 ? 0.498 ns/op > SecondarySupersLookup.testNegative07 avgt 15 33.747 ? 0.603 ns/op > SecondarySupersLookup.testNegative08 avgt 15 35.856 ? 0.629 ns/op > SecondarySupersLookup.testNegative09 avgt 15 37.077 ? 0.546 ns/op > SecondarySupersLookup.testNegative10 avgt 15 39.408 ? 0.465 ns/op > SecondarySupersLookup.testNegative16 avgt 15 51.041 ? 0.547 ns/op > SecondarySupersLookup.testNegative20 avgt 15 58.722 ? 0.922 ns/op > SecondarySupersLookup.testNegative30 avgt 15 77.310 ? 0.654 ns/op > SecondarySupersLookup.testNegative32 avgt 15 81.116 ? 0.854 ns/op > SecondarySupersLookup.testNegative40 avgt 15 96.311 ? 0.840 ns/op > SecondarySupersLookup.testNegative50 avgt 15 115.427 ? 0.838 ns/op > SecondarySupersLookup.testNegative55 avgt 15 124.371 ? 1.076 ns/op > SecondarySupersLookup.testNegative56 avgt 15 126.796 ? 0.916 ns/op > SecondarySupersLookup.testNegative57 avgt 15 127.952 ? 1.202 ns/op > SecondarySupersLookup.testNegative58 avgt 15 131.956 ? 4.515 ns/op > SecondarySupersLookup.testNegative59 avgt 15 131.858 ? 1.066 ns/op > SecondaryS... Gui Cao has updated the pull request incrementally with one additional commit since the last revision: Put "secondary super table" generate code inside COMPILER2 macro ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19320/files - new: https://git.openjdk.org/jdk/pull/19320/files/142d7677..ec01d64b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19320&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19320&range=05-06 Stats: 2 lines in 1 file changed: 1 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/19320.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19320/head:pull/19320 PR: https://git.openjdk.org/jdk/pull/19320 From gcao at openjdk.org Sat Jun 15 07:12:13 2024 From: gcao at openjdk.org (Gui Cao) Date: Sat, 15 Jun 2024 07:12:13 GMT Subject: RFR: 8332587: RISC-V: secondary_super_cache does not scale well [v6] In-Reply-To: References: Message-ID: <9UpkCN44laVTS1P7Eax4cZw82HdiSzuogLSaeXDdhPM=.c6694a54-1ec5-47e0-a846-02439f324dbb@github.com> On Fri, 14 Jun 2024 11:19:57 GMT, Hamlin Li wrote: >> Gui Cao has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 12 additional commits since the last revision: >> >> - Merge remote-tracking branch 'upstream/master' into JDK-8332587 >> - Update ins_cost for PartialSubtypeCheck >> - Code Format >> - Merge remote-tracking branch 'upstream/master' into JDK-8332587 >> - Polish Code Comment >> - Merge remote-tracking branch 'upstream/master' into JDK-8332587 >> - Fix Code format >> - Fix for Hamlin comment >> - Merge remote-tracking branch 'upstream/master' into JDK-8332587 >> - Fix client VM build >> - ... and 2 more: https://git.openjdk.org/jdk/compare/b0b13fc2...142d7677 > > src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 5613: > >> 5611: } >> 5612: >> 5613: #ifdef COMPILER2 > > Maybe put other "secondary super table" related code also inside COMPILER2 macro? Hi, I I've put "secondary super table" related generate code inside COMPILER2 macro. The related code in macroAssembler, which I guess the C1 optimization[1] may also use, is currently consistent with arm64,x86, etc., and has not been put into COMPILER2 macro. [1] https://bugs.openjdk.org/browse/JDK-8331658 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19320#discussion_r1640837578 From kbarrett at openjdk.org Sat Jun 15 07:38:30 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Sat, 15 Jun 2024 07:38:30 GMT Subject: RFR: 8333133: Simplify QuickSort::sort In-Reply-To: References: Message-ID: On Wed, 29 May 2024 18:52:03 GMT, Kim Barrett wrote: > The "idempotent" argument is removed from that function, with associated > simplifications to the implementation. Callers are updated to remove that > argument. Callers that were providing a false value are unaffected in their > behavior. The 3 callers that were providing a true value to request the > associated feature are also unaffected (other than by being made faster), > because the arrays involved don't contain any equivalent pairs. > > There are also some miscellaneous cleanups, including using the swap utility > and fixing some comments. > > Testing: mach5 tier1-3 Thanks for the review. > So .... IIUC Not exactly. > the only code that would be affected by this change would be code that > passes true, Correct. > which could also have equivalent elements to sort, Correct. > and which requires the sort order to always be the same regardless of the > order the elements are found. It does not provide any such thing. All the flag does is prevent swapping of equivalent elements, which doesn't give us any interesting additional ordering property. We can only detect the effect of the flag if there are elements that are equivalent according to the sort function but are distinguishable by some other means. Depending on what you mean, either (1) All permutations of the sequence will sort to the same order. This isn't implementable. (2) The sort doesn't change the relative order of equivalent elements, e.g. the sort is stable. Simple quicksort is not stable. It's possible to make it stable via O(N) extra memory, or (I've read) using a complex variant that is significantly slower. We're not doing either of those. > I think only the archive related code cares about deterministic order, and > package and module names should be unique, so this seems fine. Correct. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19464#issuecomment-2169182340 From stuefe at openjdk.org Sat Jun 15 07:59:14 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Sat, 15 Jun 2024 07:59:14 GMT Subject: RFR: 8333769: Pretouching tests dont test pretouching In-Reply-To: References: Message-ID: On Thu, 13 Jun 2024 13:24:42 GMT, Sonia Zaldana Calles wrote: > Hi all, > > This PR addresses [8333769](https://bugs.openjdk.org/browse/JDK-8333769). > > We already have a test for parallel GC that makes sure pretouching behaviour is correct ([test/hotspot/jtreg/gc/parallel/TestAlwaysPreTouchBehavior.java](https://github.com/openjdk/jdk/blob/master/test/hotspot/jtreg/gc/parallel/TestAlwaysPreTouchBehavior.java)). > > Unfortunately this test is limited to linux because of the scanning of `/proc/pid/status`. With this patch I propose two changes: > > - Adding a function to the os namespace `os::rss` and exposing this API via WhiteBox. This in turn allows us to generalize the above test to be used across all platforms. > - Running the modified test with all collectors. > > Additionally, I considered removing other pre-existing pretouch tests (for example, this [z test](https://github.com/openjdk/jdk/blob/master/test/hotspot/jtreg/gc/z/TestAlwaysPreTouch.java)), as this new test is a bit more thorough. However, I noticed that some of these tests run alongside other configurables such as varying numbers of parallel GC threads, varying heap sizes, etc. Therefore, there might not be any harm in running these tests as well. > > Looking forward to your comments, > Sonia Good in general. Does it run as part of GHAs? src/hotspot/os/aix/os_aix.cpp line 290: > 288: } > 289: > 290: julong os::rss() { return (julong)0; } Please make the return code a size_t src/hotspot/os/windows/os_windows.cpp line 868: > 866: BOOL ret = GetProcessMemoryInfo( > 867: GetCurrentProcess(), (PROCESS_MEMORY_COUNTERS *)&pmex, sizeof(pmex)); > 868: if (ret != 0) { Suggestion: if (ret) { test/hotspot/jtreg/gc/TestAlwaysPreTouchBehavior.java line 2: > 1: /* > 2: * Copyright (c) 2014, 2024, Alibaba Group Holding Limited. All rights reserved. Add us pls, this is a significant change. test/hotspot/jtreg/gc/TestAlwaysPreTouchBehavior.java line 25: > 23: > 24: package gc; > 25: - Exclude for AIX - We can probably manage with less heap. We need a heap size that clearly sticks out above the background noise of normal RSS, but 512MB or 256 MB should probably do the trick too. test/hotspot/jtreg/gc/TestAlwaysPreTouchBehavior.java line 28: > 26: /** > 27: * @test id=ParallelCollector > 28: * @summary Tests AlwaysPreTouch Behavior, pages of java heap should be pretouched with AlwaysPreTouch enabled. This test reads RSS of test process, which should be bigger than heap size(1g) with AlwaysPreTouch enabled. Since we do this n times, I'd cut down the subject length to "test AlwaysPreTouch". test/hotspot/jtreg/gc/TestAlwaysPreTouchBehavior.java line 80: > 78: * @run main/othervm -Xbootclasspath/a:. -XX:+UnlockDiagnosticVMOptions -XX:+WhiteBoxAPI -XX:+UseZGC -XX:-ZGenerational -Xmx1g -Xms1g -XX:+AlwaysPreTouch gc.TestAlwaysPreTouchBehavior > 79: */ > 80: stray newline test/hotspot/jtreg/gc/TestAlwaysPreTouchBehavior.java line 108: > 106: } > 107: Runtime runtime = Runtime.getRuntime(); > 108: long committedMemory = runtime.totalMemory() / 1024; // in kb Why divide by KB? Seems off. test/hotspot/jtreg/gc/TestAlwaysPreTouchBehavior.java line 109: > 107: Runtime runtime = Runtime.getRuntime(); > 108: long committedMemory = runtime.totalMemory() / 1024; // in kb > 109: Asserts.assertGreaterThanOrEqual(rss, committedMemory, "RSS of this process(" + rss + "kb) should be bigger than or equal to committed heap mem(" + committedMemory + "kb)"); Hmm, should be greater, really. Heap should be fully touched, and then we should have plenty touched memory that is not heap. ------------- PR Review: https://git.openjdk.org/jdk/pull/19699#pullrequestreview-2120250479 PR Review Comment: https://git.openjdk.org/jdk/pull/19699#discussion_r1640860382 PR Review Comment: https://git.openjdk.org/jdk/pull/19699#discussion_r1640856604 PR Review Comment: https://git.openjdk.org/jdk/pull/19699#discussion_r1640865166 PR Review Comment: https://git.openjdk.org/jdk/pull/19699#discussion_r1640864776 PR Review Comment: https://git.openjdk.org/jdk/pull/19699#discussion_r1640866947 PR Review Comment: https://git.openjdk.org/jdk/pull/19699#discussion_r1640867216 PR Review Comment: https://git.openjdk.org/jdk/pull/19699#discussion_r1640871782 PR Review Comment: https://git.openjdk.org/jdk/pull/19699#discussion_r1640863065 From aph at openjdk.org Sat Jun 15 08:35:17 2024 From: aph at openjdk.org (Andrew Haley) Date: Sat, 15 Jun 2024 08:35:17 GMT Subject: RFR: 8332587: RISC-V: secondary_super_cache does not scale well [v7] In-Reply-To: References: Message-ID: On Sat, 15 Jun 2024 07:06:23 GMT, Gui Cao wrote: >> Implementation of subtype checking [JDK-8180450](https://bugs.openjdk.org/browse/JDK-8180450) for linux-riscv64. >> This optimization depends on availability of the Zbb extension which has the cpop instruction. >> >> ### Correctness testing: >> >> - [x] Run tier1-3, hotspot:tier4 tests on SOPHON SG2042 (release) >> - [x] Run tier1-3 tests with -XX:+UseRVV on qemu 8.1.0 (fastdebug) >> - [x] Run all of tier1 with `-XX:+VerifySecondarySupers` >> >> ### JMH tested on Banana Pi BPI-F3 board (has Zbb) and Enable UseZbb, Not Enable UseZba, UseZbs >> Original: >> >> Benchmark Mode Cnt Score Error Units >> SecondarySuperCacheHits.test avgt 15 11.375 ? 0.071 ns/op >> SecondarySuperCacheInterContention.test avgt 15 646.087 ? 32.587 ns/op >> SecondarySuperCacheInterContention.test:t1 avgt 15 600.090 ? 83.779 ns/op >> SecondarySuperCacheInterContention.test:t2 avgt 15 692.084 ? 73.218 ns/op >> SecondarySupersLookup.testNegative00 avgt 15 16.420 ? 0.239 ns/op >> SecondarySupersLookup.testNegative01 avgt 15 18.307 ? 0.260 ns/op >> SecondarySupersLookup.testNegative02 avgt 15 21.695 ? 0.458 ns/op >> SecondarySupersLookup.testNegative03 avgt 15 24.855 ? 0.664 ns/op >> SecondarySupersLookup.testNegative04 avgt 15 27.305 ? 0.522 ns/op >> SecondarySupersLookup.testNegative05 avgt 15 29.719 ? 0.385 ns/op >> SecondarySupersLookup.testNegative06 avgt 15 32.231 ? 0.498 ns/op >> SecondarySupersLookup.testNegative07 avgt 15 33.747 ? 0.603 ns/op >> SecondarySupersLookup.testNegative08 avgt 15 35.856 ? 0.629 ns/op >> SecondarySupersLookup.testNegative09 avgt 15 37.077 ? 0.546 ns/op >> SecondarySupersLookup.testNegative10 avgt 15 39.408 ? 0.465 ns/op >> SecondarySupersLookup.testNegative16 avgt 15 51.041 ? 0.547 ns/op >> SecondarySupersLookup.testNegative20 avgt 15 58.722 ? 0.922 ns/op >> SecondarySupersLookup.testNegative30 avgt 15 77.310 ? 0.654 ns/op >> SecondarySupersLookup.testNegative32 avgt 15 81.116 ? 0.854 ns/op >> SecondarySupersLookup.testNegative40 avgt 15 96.311 ? 0.840 ns/op >> SecondarySupersLookup.testNegative50 avgt 15 115.427 ? 0.838 ns/op >> SecondarySupersLookup.testNegative55 avgt 15 124.371 ? 1.076 ns/op >> SecondarySupersLookup.testNegative56 avgt 15 126.796 ? 0.916 ns/op >> SecondarySupersLookup.testNegative57 avgt 15 127.952 ? 1.202 ns/op >> SecondarySupersLookup.testNegative58 avgt 15 131.956 ? 4.515 ns/op >> Seco... > > Gui Cao has updated the pull request incrementally with one additional commit since the last revision: > > Put "secondary super table" generate code inside COMPILER2 macro > > It's worth running the "before" test with `-XX:-UseSecondarySupersCache`. This gives you a much better idea of the cost when you don't get any hit on the one-element secondary supers cache. > > @theRealAph : Hi, Can you take a look? Thanks That all looks like it's working as expected. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19320#issuecomment-2169208978 From mdoerr at openjdk.org Sat Jun 15 08:48:30 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Sat, 15 Jun 2024 08:48:30 GMT Subject: RFR: 8331117: [PPC64] secondary_super_cache does not scale well [v8] In-Reply-To: References: Message-ID: > PPC64 implementation of [JDK-8180450](https://bugs.openjdk.org/browse/JDK-8180450). Please review! > I noticed that `r_array_length` is sometimes 0 and I don't see code for that on x86. Any idea? (This has been addressed in the discussion.) > How can we verify it? By comparing the performance using the micro benchmarks? > > Micro benchmark results without patch (measured on Power10 with 2*8 hardware threads): > > Original > SecondarySuperCacheHits: 13.033 ?(99.9%) 0.058 ns/op [Average] > SecondarySuperCacheInterContention.test avgt 15 432.366 ? 8.364 ns/op > SecondarySuperCacheInterContention.test:t1 avgt 15 432.310 ? 8.460 ns/op > SecondarySuperCacheInterContention.test:t2 avgt 15 432.422 ? 10.819 ns/op > SecondarySuperCacheIntraContention.test avgt 15 355.192 ? 3.597 ns/op > SecondarySupersLookup.testNegative00 avgt 15 12.274 ? 0.026 ns/op > SecondarySupersLookup.testNegative01 avgt 15 12.300 ? 0.039 ns/op > SecondarySupersLookup.testNegative02 avgt 15 12.304 ? 0.034 ns/op > SecondarySupersLookup.testNegative03 avgt 15 12.276 ? 0.050 ns/op > SecondarySupersLookup.testNegative04 avgt 15 12.235 ? 0.044 ns/op > SecondarySupersLookup.testNegative05 avgt 15 12.308 ? 0.156 ns/op > SecondarySupersLookup.testNegative06 avgt 15 12.291 ? 0.048 ns/op > SecondarySupersLookup.testNegative07 avgt 15 12.307 ? 0.052 ns/op > SecondarySupersLookup.testNegative08 avgt 15 12.398 ? 0.075 ns/op > SecondarySupersLookup.testNegative09 avgt 15 12.552 ? 0.122 ns/op > SecondarySupersLookup.testNegative10 avgt 15 12.490 ? 0.083 ns/op > SecondarySupersLookup.testNegative16 avgt 15 12.565 ? 0.092 ns/op > SecondarySupersLookup.testNegative20 avgt 15 19.059 ? 0.958 ns/op > SecondarySupersLookup.testNegative30 avgt 15 19.268 ? 0.124 ns/op > SecondarySupersLookup.testNegative32 avgt 15 20.059 ? 0.114 ns/op > SecondarySupersLookup.testNegative40 avgt 15 25.117 ? 0.368 ns/op > SecondarySupersLookup.testNegative50 avgt 15 32.735 ? 0.359 ns/op > SecondarySupersLookup.testNegative55 avgt 15 34.866 ? 0.152 ns/op > SecondarySupersLookup.testNegative56 avgt 15 35.492 ? 0.276 ns/op > SecondarySupersLookup.testNegative57 avgt 15 36.620 ? 0.334 ns/op > SecondarySupersLookup.testNegative58 avgt 15 37.226 ? 0.180 ns/op > SecondarySupersLookup.testNegative59 avgt 15 37.774 ? 0.241 ns/op > SecondarySupersLookup.testNegative60 avgt 15 38.627 ? 1.451 ns/op > SecondarySupersLookup.testNegative61 avgt 15 39.395 ? 0.249 ns/op > Seco... Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: Make sure UseSecondarySupersTable is only used on Power7 or later. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19368/files - new: https://git.openjdk.org/jdk/pull/19368/files/6136dbb9..5633ff25 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19368&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19368&range=06-07 Stats: 7 lines in 1 file changed: 7 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/19368.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19368/head:pull/19368 PR: https://git.openjdk.org/jdk/pull/19368 From mdoerr at openjdk.org Sat Jun 15 08:54:15 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Sat, 15 Jun 2024 08:54:15 GMT Subject: RFR: 8331117: [PPC64] secondary_super_cache does not scale well [v8] In-Reply-To: References: Message-ID: On Sat, 15 Jun 2024 08:48:30 GMT, Martin Doerr wrote: >> PPC64 implementation of [JDK-8180450](https://bugs.openjdk.org/browse/JDK-8180450). Please review! >> I noticed that `r_array_length` is sometimes 0 and I don't see code for that on x86. Any idea? (This has been addressed in the discussion.) >> How can we verify it? By comparing the performance using the micro benchmarks? >> >> Micro benchmark results without patch (measured on Power10 with 2*8 hardware threads): >> >> Original >> SecondarySuperCacheHits: 13.033 ?(99.9%) 0.058 ns/op [Average] >> SecondarySuperCacheInterContention.test avgt 15 432.366 ? 8.364 ns/op >> SecondarySuperCacheInterContention.test:t1 avgt 15 432.310 ? 8.460 ns/op >> SecondarySuperCacheInterContention.test:t2 avgt 15 432.422 ? 10.819 ns/op >> SecondarySuperCacheIntraContention.test avgt 15 355.192 ? 3.597 ns/op >> SecondarySupersLookup.testNegative00 avgt 15 12.274 ? 0.026 ns/op >> SecondarySupersLookup.testNegative01 avgt 15 12.300 ? 0.039 ns/op >> SecondarySupersLookup.testNegative02 avgt 15 12.304 ? 0.034 ns/op >> SecondarySupersLookup.testNegative03 avgt 15 12.276 ? 0.050 ns/op >> SecondarySupersLookup.testNegative04 avgt 15 12.235 ? 0.044 ns/op >> SecondarySupersLookup.testNegative05 avgt 15 12.308 ? 0.156 ns/op >> SecondarySupersLookup.testNegative06 avgt 15 12.291 ? 0.048 ns/op >> SecondarySupersLookup.testNegative07 avgt 15 12.307 ? 0.052 ns/op >> SecondarySupersLookup.testNegative08 avgt 15 12.398 ? 0.075 ns/op >> SecondarySupersLookup.testNegative09 avgt 15 12.552 ? 0.122 ns/op >> SecondarySupersLookup.testNegative10 avgt 15 12.490 ? 0.083 ns/op >> SecondarySupersLookup.testNegative16 avgt 15 12.565 ? 0.092 ns/op >> SecondarySupersLookup.testNegative20 avgt 15 19.059 ? 0.958 ns/op >> SecondarySupersLookup.testNegative30 avgt 15 19.268 ? 0.124 ns/op >> SecondarySupersLookup.testNegative32 avgt 15 20.059 ? 0.114 ns/op >> SecondarySupersLookup.testNegative40 avgt 15 25.117 ? 0.368 ns/op >> SecondarySupersLookup.testNegative50 avgt 15 32.735 ? 0.359 ns/op >> SecondarySupersLookup.testNegative55 avgt 15 34.866 ? 0.152 ns/op >> SecondarySupersLookup.testNegative56 avgt 15 35.492 ? 0.276 ns/op >> SecondarySupersLookup.testNegative57 avgt 15 36.620 ? 0.334 ns/op >> SecondarySupersLookup.testNegative58 avgt 15 37.226 ? 0.180 ns/op >> SecondarySupersLookup.testNegative59 avgt 15 37.774 ? 0.241 ns/op >> SecondarySupersLookup.testNegative60 avgt 15 38.627 ? 1.451 ns/op >> Sec... > > Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: > > Make sure UseSecondarySupersTable is only used on Power7 or later. I've added a check for Power7 because [JDK-8331859](https://bugs.openjdk.org/browse/JDK-8331859) is not yet implemented. It should get removed by that one again. I think we should have this check in case this PR gets backported. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19368#issuecomment-2169218293 From ayang at openjdk.org Sat Jun 15 08:54:14 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Sat, 15 Jun 2024 08:54:14 GMT Subject: RFR: 8333962: Obsolete OldSize [v2] In-Reply-To: References: Message-ID: <--FcnwH_fA8e27VK3SwN0vJZZW8yZsyxh2I2jul-Enk=.3cd55bc7-b0ac-4ef6-93a1-b880ff255bd3@github.com> On Sat, 15 Jun 2024 05:09:49 GMT, David Holmes wrote: >> Albert Mingkun Yang has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains one commit: >> >> obsolete-old-size > > src/hotspot/share/runtime/arguments.cpp line 37: > >> 35: #include "gc/shared/gcArguments.hpp" >> 36: #include "gc/shared/gcConfig.hpp" >> 37: #include "gc/shared/genArguments.hpp" > > Why is this needed? `Arguments::set_heap_size` accesses `OldSize`, which is declared in this header. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19647#discussion_r1640921256 From aph at openjdk.org Sat Jun 15 08:54:15 2024 From: aph at openjdk.org (Andrew Haley) Date: Sat, 15 Jun 2024 08:54:15 GMT Subject: RFR: 8331117: [PPC64] secondary_super_cache does not scale well [v2] In-Reply-To: References: Message-ID: On Wed, 29 May 2024 07:36:52 GMT, Andrew Haley wrote: >> Performance seems to be not affected by that bug. Note that I have used https://github.com/openjdk/jdk/pull/19427 to run TypePollution micro benchmarks. > >> Performance seems to be not affected by that bug. > > That is extremely suspicious. > That doesn't look like a platform specific thing. I'm getting the same result on x86_64. @theRealAph: Is that a known limitation or is it worth a new JBS issue? I've never seen this. It must be a regression. I'll have a look. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19368#issuecomment-2169219557 From aph at openjdk.org Sat Jun 15 08:54:17 2024 From: aph at openjdk.org (Andrew Haley) Date: Sat, 15 Jun 2024 08:54:17 GMT Subject: RFR: 8331117: [PPC64] secondary_super_cache does not scale well [v5] In-Reply-To: References: Message-ID: On Thu, 13 Jun 2024 13:35:39 GMT, Richard Reingruber wrote: >> Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: >> >> Remove pointless assertion. > > src/hotspot/cpu/ppc/macroAssembler_ppc.cpp line 2166: > >> 2164: >> 2165: // Return true: we succeeded in generating this code >> 2166: bool MacroAssembler::lookup_secondary_supers_table(Register r_sub_klass, > > The method always returns `true`. Should even return a value? It's there to communicate failure, if there was any. Some ports can fail to generate code because of space exhaustion, and we need to communicate this to the caller. > src/hotspot/cpu/ppc/macroAssembler_ppc.cpp line 2232: > >> 2230: >> 2231: // Linear probe. Rotate the bitmap so that the next bit to test is >> 2232: // in Bit 1. > > It's bit 2 that's tested next after the rotation, isn't it? See L2331 in `lookup_secondary_supers_table_slow_path` > Suggestion: > > // in Bit 2. Yes, it's rather confusing language. In fact, the bit we just tested is in Bit 1. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19368#discussion_r1640920059 PR Review Comment: https://git.openjdk.org/jdk/pull/19368#discussion_r1640918956 From aph at openjdk.org Sat Jun 15 09:06:13 2024 From: aph at openjdk.org (Andrew Haley) Date: Sat, 15 Jun 2024 09:06:13 GMT Subject: RFR: 8331117: [PPC64] secondary_super_cache does not scale well [v2] In-Reply-To: References: Message-ID: <4rR63y358m-xvin43-ltLouSYYUOO3igQ-8_f5OqF-o=.77e54e27-0e1a-4a46-82fa-a6446379a659@github.com> On Sat, 15 Jun 2024 08:51:18 GMT, Andrew Haley wrote: > That doesn't look like a platform specific thing. I'm getting the same result on x86_64. @theRealAph: Is that a known limitation or is it worth a new JBS issue? Ah, I see. The test is doing some IR node counts for Klass loads, and `-UseSecondarySupersCache` deletes one of those loads. So it's not actually a behavioural change. I'm not sure what to do about this. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19368#issuecomment-2169228092 From amitkumar at openjdk.org Sat Jun 15 10:40:37 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Sat, 15 Jun 2024 10:40:37 GMT Subject: RFR: 8332602: [s390x] Improve itable_stub [v4] In-Reply-To: References: Message-ID: <35Us1SSW89u3UExrFQvmdpn3GmDD2InhNE8Cu6wsXso=.7a1c0db1-8577-42b3-b6bb-ab5a0b10d556@github.com> > s390x Port similar to [JDK-8305959 (x86)](https://bugs.openjdk.org/browse/JDK-8305959) and [JDK-8307352(aarch64)](https://bugs.openjdk.org/browse/JDK-8307352) > > Testing: I ran `tier1` test on fastdebug & release VM; I didn't see any regression there; > > Benchmarking: > > Without Patch: > Benchmark Mode Cnt Score Error Units > InterfaceCalls.test1stInt2Types avgt 12 1.924 ? 0.001 ns/op > InterfaceCalls.test1stInt3Types avgt 12 13.925 ? 0.014 ns/op > InterfaceCalls.test1stInt5Types avgt 12 16.591 ? 0.045 ns/op > InterfaceCalls.test2ndInt2Types avgt 12 2.028 ? 0.013 ns/op > InterfaceCalls.test2ndInt3Types avgt 12 7.634 ? 0.049 ns/op > InterfaceCalls.test2ndInt5Types avgt 12 16.231 ? 1.222 ns/op > InterfaceCalls.testIfaceCall avgt 12 16.587 ? 0.058 ns/op > InterfaceCalls.testIfaceExtCall avgt 12 17.532 ? 0.024 ns/op > InterfaceCalls.testMonomorphic avgt 12 0.746 ? 0.001 ns/op > Finished running test 'micro:vm.compiler.InterfaceCalls' > > > With Patch: > > Benchmark Mode Cnt Score Error Units > InterfaceCalls.test1stInt2Types avgt 12 1.929 ? 0.012 ns/op > InterfaceCalls.test1stInt3Types avgt 12 13.280 ? 0.093 ns/op > InterfaceCalls.test1stInt5Types avgt 12 16.169 ? 0.364 ns/op > InterfaceCalls.test2ndInt2Types avgt 12 6.758 ? 4.473 ns/op > InterfaceCalls.test2ndInt3Types avgt 12 11.772 ? 2.411 ns/op > InterfaceCalls.test2ndInt5Types avgt 12 15.099 ? 0.081 ns/op > InterfaceCalls.testIfaceCall avgt 12 15.972 ? 0.021 ns/op > InterfaceCalls.testIfaceExtCall avgt 12 16.600 ? 0.322 ns/op > InterfaceCalls.testMonomorphic avgt 12 0.746 ? 0.001 ns/op > Finished running test 'micro:vm.compiler.InterfaceCalls' Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: Update src/hotspot/cpu/s390/macroAssembler_s390.cpp ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19698/files - new: https://git.openjdk.org/jdk/pull/19698/files/19c3673c..22e33189 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19698&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19698&range=02-03 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/19698.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19698/head:pull/19698 PR: https://git.openjdk.org/jdk/pull/19698 From amitkumar at openjdk.org Sat Jun 15 10:40:37 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Sat, 15 Jun 2024 10:40:37 GMT Subject: RFR: 8332602: [s390x] Improve itable_stub [v3] In-Reply-To: References: Message-ID: On Fri, 14 Jun 2024 02:58:38 GMT, Amit Kumar wrote: >> s390x Port similar to [JDK-8305959 (x86)](https://bugs.openjdk.org/browse/JDK-8305959) and [JDK-8307352(aarch64)](https://bugs.openjdk.org/browse/JDK-8307352) >> >> Testing: I ran `tier1` test on fastdebug & release VM; I didn't see any regression there; >> >> Benchmarking: >> >> Without Patch: >> Benchmark Mode Cnt Score Error Units >> InterfaceCalls.test1stInt2Types avgt 12 1.924 ? 0.001 ns/op >> InterfaceCalls.test1stInt3Types avgt 12 13.925 ? 0.014 ns/op >> InterfaceCalls.test1stInt5Types avgt 12 16.591 ? 0.045 ns/op >> InterfaceCalls.test2ndInt2Types avgt 12 2.028 ? 0.013 ns/op >> InterfaceCalls.test2ndInt3Types avgt 12 7.634 ? 0.049 ns/op >> InterfaceCalls.test2ndInt5Types avgt 12 16.231 ? 1.222 ns/op >> InterfaceCalls.testIfaceCall avgt 12 16.587 ? 0.058 ns/op >> InterfaceCalls.testIfaceExtCall avgt 12 17.532 ? 0.024 ns/op >> InterfaceCalls.testMonomorphic avgt 12 0.746 ? 0.001 ns/op >> Finished running test 'micro:vm.compiler.InterfaceCalls' >> >> >> With Patch: >> >> Benchmark Mode Cnt Score Error Units >> InterfaceCalls.test1stInt2Types avgt 12 1.929 ? 0.012 ns/op >> InterfaceCalls.test1stInt3Types avgt 12 13.280 ? 0.093 ns/op >> InterfaceCalls.test1stInt5Types avgt 12 16.169 ? 0.364 ns/op >> InterfaceCalls.test2ndInt2Types avgt 12 6.758 ? 4.473 ns/op >> InterfaceCalls.test2ndInt3Types avgt 12 11.772 ? 2.411 ns/op >> InterfaceCalls.test2ndInt5Types avgt 12 15.099 ? 0.081 ns/op >> InterfaceCalls.testIfaceCall avgt 12 15.972 ? 0.021 ns/op >> InterfaceCalls.testIfaceExtCall avgt 12 16.600 ? 0.322 ns/op >> InterfaceCalls.testMonomorphic avgt 12 0.746 ? 0.001 ns/op >> Finished running test 'micro:vm.compiler.InterfaceCalls' > > Amit Kumar has updated the pull request incrementally with three additional commits since the last revision: > > - Update src/hotspot/cpu/s390/macroAssembler_s390.cpp > - Update src/hotspot/cpu/s390/macroAssembler_s390.cpp > - polishing src/hotspot/cpu/s390/macroAssembler_s390.cpp line 2903: > 2901: z_bru(L_loop_search_resolved); > 2902: > 2903: bind(L_resolved_found); Suggestion: // See if we already have a holder klass. If not, go and scan for it. bind(L_resolved_found); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19698#discussion_r1641002482 From amitkumar at openjdk.org Sat Jun 15 11:00:19 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Sat, 15 Jun 2024 11:00:19 GMT Subject: RFR: 8332603: [ppc] Improve itable_stub Message-ID: PPC Port similar to [JDK-8305959 (x86)](https://bugs.openjdk.org/browse/JDK-8305959) and [JDK-8307352(aarch64)](https://bugs.openjdk.org/browse/JDK-8307352) ------------- Commit messages: - ppc port Changes: https://git.openjdk.org/jdk/pull/19733/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19733&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8332603 Stats: 154 lines in 3 files changed: 135 ins; 7 del; 12 mod Patch: https://git.openjdk.org/jdk/pull/19733.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19733/head:pull/19733 PR: https://git.openjdk.org/jdk/pull/19733 From amitkumar at openjdk.org Sat Jun 15 13:12:12 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Sat, 15 Jun 2024 13:12:12 GMT Subject: RFR: 8331126: [s390x] secondary_super_cache does not scale well In-Reply-To: References: Message-ID: On Tue, 4 Jun 2024 15:19:51 GMT, Amit Kumar wrote: > s390x Port for [JDK-8180450](https://bugs.openjdk.org/browse/JDK-8180450) > > I ran `tier1` test with `-XX:+UseSecondarySupersTable -XX:+VerifySecondarySupers -XX:+StressSecondarySupers` in fastdebug vm and didn't see any new failure appearing; Except one I have mentioned [here](https://github.com/openjdk/jdk/pull/19368#issuecomment-2154983693); But this is reproducible on every other architecture with these flags. > > > Without Patch: > > SecondarySuperCacheHits.test avgt 15 0.929 ? 0.010 ns/op > > SecondarySuperCacheInterContention.test avgt 15 1.413 ? 0.007 ns/op > SecondarySuperCacheInterContention.test:t1 avgt 15 1.415 ? 0.016 ns/op > SecondarySuperCacheInterContention.test:t2 avgt 15 1.410 ? 0.017 ns/op > > Benchmark Mode Cnt Score Error Units > SecondarySupersLookup.testNegative00 avgt 15 1.806 ? 0.325 ns/op > SecondarySupersLookup.testNegative01 avgt 15 2.364 ? 0.236 ns/op > SecondarySupersLookup.testNegative02 avgt 15 2.903 ? 0.215 ns/op > SecondarySupersLookup.testNegative03 avgt 15 3.417 ? 0.199 ns/op > SecondarySupersLookup.testNegative04 avgt 15 3.758 ? 0.102 ns/op > SecondarySupersLookup.testNegative05 avgt 15 4.352 ? 0.123 ns/op > SecondarySupersLookup.testNegative06 avgt 15 4.800 ? 0.099 ns/op > SecondarySupersLookup.testNegative07 avgt 15 5.365 ? 0.060 ns/op > SecondarySupersLookup.testNegative08 avgt 15 6.316 ? 0.092 ns/op > SecondarySupersLookup.testNegative09 avgt 15 6.669 ? 0.164 ns/op > SecondarySupersLookup.testNegative10 avgt 15 7.041 ? 0.164 ns/op > SecondarySupersLookup.testNegative16 avgt 15 9.336 ? 0.185 ns/op > SecondarySupersLookup.testNegative20 avgt 15 11.373 ? 0.029 ns/op > SecondarySupersLookup.testNegative30 avgt 15 15.236 ? 0.051 ns/op > SecondarySupersLookup.testNegative32 avgt 15 16.031 ? 0.091 ns/op > SecondarySupersLookup.testNegative40 avgt 15 19.197 ? 0.279 ns/op > SecondarySupersLookup.testNegative50 avgt 15 23.804 ? 2.387 ns/op > SecondarySupersLookup.testNegative55 avgt 15 25.610 ? 1.155 ns/op > SecondarySupersLookup.testNegative56 avgt 15 26.128 ? 2.203 ns/op > SecondarySupersLookup.testNegative57 avgt 15 26.126 ? 0.881 ns/op > SecondarySupersLookup.testNegative58 avgt 15 26.314 ? 0.521 ns/op > SecondarySupersLookup.testNegative59 avgt 15 26.750 ? 0.837 ns/op > SecondarySupersLookup.testNegative60 avgt 15 27.118 ? 0.557 ns/op > SecondarySupersLookup.testNegative61 avgt 15 27.763 ? 1.628 ns... src/hotspot/cpu/s390/macroAssembler_s390.cpp line 3417: > 3415: // r_result should have either 0 or 1 value > 3416: NearLabel check_0, check_1; > 3417: Suggestion: // r_result should have either 0 or 1 value src/hotspot/cpu/s390/macroAssembler_s390.cpp line 3420: > 3418: // check for 0 > 3419: z_chi(r_result, 0); > 3420: asm_assert(bcondNotLow, "r_result >= 0", 33); Suggestion: asm_assert(bcondNotLow, "r_result should be equal or greater than 0", 33); src/hotspot/cpu/s390/macroAssembler_s390.cpp line 3424: > 3422: // check for 1 > 3423: z_chi(r_result, 1); > 3424: asm_assert(bcondNotHigh, "r_result <= 1", 33); Suggestion: asm_assert(bcondNotHigh, "r_result should be equal or less than 1", 33); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19544#discussion_r1641153358 PR Review Comment: https://git.openjdk.org/jdk/pull/19544#discussion_r1641154603 PR Review Comment: https://git.openjdk.org/jdk/pull/19544#discussion_r1641155110 From amitkumar at openjdk.org Sat Jun 15 13:15:38 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Sat, 15 Jun 2024 13:15:38 GMT Subject: RFR: 8331126: [s390x] secondary_super_cache does not scale well [v2] In-Reply-To: References: Message-ID: > s390x Port for [JDK-8180450](https://bugs.openjdk.org/browse/JDK-8180450) > > I ran `tier1` test with `-XX:+UseSecondarySupersTable -XX:+VerifySecondarySupers -XX:+StressSecondarySupers` in fastdebug vm and didn't see any new failure appearing; Except one I have mentioned [here](https://github.com/openjdk/jdk/pull/19368#issuecomment-2154983693); But this is reproducible on every other architecture with these flags. > > > Without Patch: > > SecondarySuperCacheHits.test avgt 15 0.929 ? 0.010 ns/op > > SecondarySuperCacheInterContention.test avgt 15 1.413 ? 0.007 ns/op > SecondarySuperCacheInterContention.test:t1 avgt 15 1.415 ? 0.016 ns/op > SecondarySuperCacheInterContention.test:t2 avgt 15 1.410 ? 0.017 ns/op > > Benchmark Mode Cnt Score Error Units > SecondarySupersLookup.testNegative00 avgt 15 1.806 ? 0.325 ns/op > SecondarySupersLookup.testNegative01 avgt 15 2.364 ? 0.236 ns/op > SecondarySupersLookup.testNegative02 avgt 15 2.903 ? 0.215 ns/op > SecondarySupersLookup.testNegative03 avgt 15 3.417 ? 0.199 ns/op > SecondarySupersLookup.testNegative04 avgt 15 3.758 ? 0.102 ns/op > SecondarySupersLookup.testNegative05 avgt 15 4.352 ? 0.123 ns/op > SecondarySupersLookup.testNegative06 avgt 15 4.800 ? 0.099 ns/op > SecondarySupersLookup.testNegative07 avgt 15 5.365 ? 0.060 ns/op > SecondarySupersLookup.testNegative08 avgt 15 6.316 ? 0.092 ns/op > SecondarySupersLookup.testNegative09 avgt 15 6.669 ? 0.164 ns/op > SecondarySupersLookup.testNegative10 avgt 15 7.041 ? 0.164 ns/op > SecondarySupersLookup.testNegative16 avgt 15 9.336 ? 0.185 ns/op > SecondarySupersLookup.testNegative20 avgt 15 11.373 ? 0.029 ns/op > SecondarySupersLookup.testNegative30 avgt 15 15.236 ? 0.051 ns/op > SecondarySupersLookup.testNegative32 avgt 15 16.031 ? 0.091 ns/op > SecondarySupersLookup.testNegative40 avgt 15 19.197 ? 0.279 ns/op > SecondarySupersLookup.testNegative50 avgt 15 23.804 ? 2.387 ns/op > SecondarySupersLookup.testNegative55 avgt 15 25.610 ? 1.155 ns/op > SecondarySupersLookup.testNegative56 avgt 15 26.128 ? 2.203 ns/op > SecondarySupersLookup.testNegative57 avgt 15 26.126 ? 0.881 ns/op > SecondarySupersLookup.testNegative58 avgt 15 26.314 ? 0.521 ns/op > SecondarySupersLookup.testNegative59 avgt 15 26.750 ? 0.837 ns/op > SecondarySupersLookup.testNegative60 avgt 15 27.118 ? 0.557 ns/op > SecondarySupersLookup.testNegative61 avgt 15 27.763 ? 1.628 ns... Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: Removes unused Labels & makes comment more sensible ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19544/files - new: https://git.openjdk.org/jdk/pull/19544/files/f5837a9b..d2d56f8e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19544&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19544&range=00-01 Stats: 3 lines in 1 file changed: 0 ins; 1 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/19544.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19544/head:pull/19544 PR: https://git.openjdk.org/jdk/pull/19544 From amitkumar at openjdk.org Sat Jun 15 13:25:15 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Sat, 15 Jun 2024 13:25:15 GMT Subject: RFR: 8331117: [PPC64] secondary_super_cache does not scale well [v8] In-Reply-To: References: Message-ID: On Sat, 15 Jun 2024 08:48:30 GMT, Martin Doerr wrote: >> PPC64 implementation of [JDK-8180450](https://bugs.openjdk.org/browse/JDK-8180450). Please review! >> I noticed that `r_array_length` is sometimes 0 and I don't see code for that on x86. Any idea? (This has been addressed in the discussion.) >> How can we verify it? By comparing the performance using the micro benchmarks? >> >> Micro benchmark results without patch (measured on Power10 with 2*8 hardware threads): >> >> Original >> SecondarySuperCacheHits: 13.033 ?(99.9%) 0.058 ns/op [Average] >> SecondarySuperCacheInterContention.test avgt 15 432.366 ? 8.364 ns/op >> SecondarySuperCacheInterContention.test:t1 avgt 15 432.310 ? 8.460 ns/op >> SecondarySuperCacheInterContention.test:t2 avgt 15 432.422 ? 10.819 ns/op >> SecondarySuperCacheIntraContention.test avgt 15 355.192 ? 3.597 ns/op >> SecondarySupersLookup.testNegative00 avgt 15 12.274 ? 0.026 ns/op >> SecondarySupersLookup.testNegative01 avgt 15 12.300 ? 0.039 ns/op >> SecondarySupersLookup.testNegative02 avgt 15 12.304 ? 0.034 ns/op >> SecondarySupersLookup.testNegative03 avgt 15 12.276 ? 0.050 ns/op >> SecondarySupersLookup.testNegative04 avgt 15 12.235 ? 0.044 ns/op >> SecondarySupersLookup.testNegative05 avgt 15 12.308 ? 0.156 ns/op >> SecondarySupersLookup.testNegative06 avgt 15 12.291 ? 0.048 ns/op >> SecondarySupersLookup.testNegative07 avgt 15 12.307 ? 0.052 ns/op >> SecondarySupersLookup.testNegative08 avgt 15 12.398 ? 0.075 ns/op >> SecondarySupersLookup.testNegative09 avgt 15 12.552 ? 0.122 ns/op >> SecondarySupersLookup.testNegative10 avgt 15 12.490 ? 0.083 ns/op >> SecondarySupersLookup.testNegative16 avgt 15 12.565 ? 0.092 ns/op >> SecondarySupersLookup.testNegative20 avgt 15 19.059 ? 0.958 ns/op >> SecondarySupersLookup.testNegative30 avgt 15 19.268 ? 0.124 ns/op >> SecondarySupersLookup.testNegative32 avgt 15 20.059 ? 0.114 ns/op >> SecondarySupersLookup.testNegative40 avgt 15 25.117 ? 0.368 ns/op >> SecondarySupersLookup.testNegative50 avgt 15 32.735 ? 0.359 ns/op >> SecondarySupersLookup.testNegative55 avgt 15 34.866 ? 0.152 ns/op >> SecondarySupersLookup.testNegative56 avgt 15 35.492 ? 0.276 ns/op >> SecondarySupersLookup.testNegative57 avgt 15 36.620 ? 0.334 ns/op >> SecondarySupersLookup.testNegative58 avgt 15 37.226 ? 0.180 ns/op >> SecondarySupersLookup.testNegative59 avgt 15 37.774 ? 0.241 ns/op >> SecondarySupersLookup.testNegative60 avgt 15 38.627 ? 1.451 ns/op >> Sec... > > Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: > > Make sure UseSecondarySupersTable is only used on Power7 or later. I have used it as a base to implement s390x changes, so gone through it multiple times; Also ran tier1 test ( on power 8 machine) with `-XX:+UseSecondarySupersTable -XX:+VerifySecondarySupers -XX:+StressSecondarySupers`; LGTM. ------------- Marked as reviewed by amitkumar (Committer). PR Review: https://git.openjdk.org/jdk/pull/19368#pullrequestreview-2120438451 From amitkumar at openjdk.org Sat Jun 15 14:37:37 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Sat, 15 Jun 2024 14:37:37 GMT Subject: RFR: 8332603: [PPC64] Improve itable_stub [v2] In-Reply-To: References: Message-ID: > PPC Port similar to [JDK-8305959 (x86)](https://bugs.openjdk.org/browse/JDK-8305959) and [JDK-8307352(aarch64)](https://bugs.openjdk.org/browse/JDK-8307352) Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: consistency ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19733/files - new: https://git.openjdk.org/jdk/pull/19733/files/ba205309..57fbfe69 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19733&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19733&range=00-01 Stats: 6 lines in 1 file changed: 0 ins; 0 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/19733.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19733/head:pull/19733 PR: https://git.openjdk.org/jdk/pull/19733 From amitkumar at openjdk.org Sat Jun 15 14:37:37 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Sat, 15 Jun 2024 14:37:37 GMT Subject: RFR: 8332603: [PPC64] Improve itable_stub [v2] In-Reply-To: References: Message-ID: On Sat, 15 Jun 2024 14:34:41 GMT, Amit Kumar wrote: >> PPC Port similar to [JDK-8305959 (x86)](https://bugs.openjdk.org/browse/JDK-8305959) and [JDK-8307352(aarch64)](https://bugs.openjdk.org/browse/JDK-8307352) > > Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: > > consistency src/hotspot/cpu/ppc/macroAssembler_ppc.hpp line 573: > 571: Register r_temp2, > 572: int itable_index, > 573: Label& L_no_such_interface); Suggestion: void lookup_interface_method_stub(Register recv_klass, Register holder_klass, Register resolved_klass, Register method_result, Register temp, Register temp2, int itable_index, Label& L_no_such_interface); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19733#discussion_r1641202443 From amitkumar at openjdk.org Sat Jun 15 16:14:21 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Sat, 15 Jun 2024 16:14:21 GMT Subject: RFR: 8332603: [PPC64] Improve itable_stub [v2] In-Reply-To: References: Message-ID: On Sat, 15 Jun 2024 14:37:37 GMT, Amit Kumar wrote: >> PPC Port similar to [JDK-8305959 (x86)](https://bugs.openjdk.org/browse/JDK-8305959) and [JDK-8307352(aarch64)](https://bugs.openjdk.org/browse/JDK-8307352) > > Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: > > consistency got some failures, looking into those; Got failures in CDS, when running the test with exploded-jdk; Running in optimized-jdk shows that test are passing; ------------- PR Comment: https://git.openjdk.org/jdk/pull/19733#issuecomment-2170031487 PR Comment: https://git.openjdk.org/jdk/pull/19733#issuecomment-2170101973 From lmesnik at openjdk.org Sat Jun 15 16:28:05 2024 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Sat, 15 Jun 2024 16:28:05 GMT Subject: RFR: 8332252: Clean up vmTestbase/vm/share [v3] In-Reply-To: References: Message-ID: > The vmTestbase/vm/share is a shared test library for vmTestbase tests. This library contains a lot of code that is used by only by small number of tests or not used at all. There are no plans to actively develop new tests in vmTestsbase and improve this shared library. > The final goal of this and the following PRs is to reduce the maintenance cost of vmTestbase by eliminating this library. > > Also, this PR moves test-specific code into corresponding test directories to increase code locality. This allows later easier move tests from vmTestbase. > > The few remaining classes include > InMemoryJavaCompiler.java > that is very similar to same class from the standard testlibrary and could be merge with it and > ProcessUtils.java > which is used by > test/hotspot/jtreg/runtime/Thread/TestBreakSignalThreadDump.java > and thus should be moved into the standard testlibrary. > The stack and options might be merged in nsk/share test library. Leonid Mesnik has updated the pull request incrementally with one additional commit since the last revision: removed unused import ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19727/files - new: https://git.openjdk.org/jdk/pull/19727/files/f8a637dc..53757d6f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19727&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19727&range=01-02 Stats: 2 lines in 1 file changed: 0 ins; 1 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/19727.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19727/head:pull/19727 PR: https://git.openjdk.org/jdk/pull/19727 From amitkumar at openjdk.org Sun Jun 16 08:09:11 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Sun, 16 Jun 2024 08:09:11 GMT Subject: RFR: 8332603: [PPC64] Improve itable_stub [v2] In-Reply-To: References: Message-ID: On Sat, 15 Jun 2024 14:37:37 GMT, Amit Kumar wrote: >> PPC Port similar to [JDK-8305959 (x86)](https://bugs.openjdk.org/browse/JDK-8305959) and [JDK-8307352(aarch64)](https://bugs.openjdk.org/browse/JDK-8307352) > > Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: > > consistency Testing (Power 8 - ppc-le): ============================== Test summary ============================== TEST TOTAL PASS FAIL ERROR >> jtreg:test/hotspot/jtreg:tier1 2470 2469 1 0 << >> jtreg:test/jdk:tier1 2413 2412 1 0 << >> jtreg:test/langtools:tier1 4534 4533 1 0 << jtreg:test/jaxp:tier1 0 0 0 0 jtreg:test/lib-test:tier1 33 33 0 0 ============================== TEST FAILURE root at crampon1:~/amit/jdk# cat $(find . -name newfailures.txt) # newfailures.txt # newfailures.txt gtest/GTestWrapper.java # newfailures.txt java/util/ResourceBundle/Control/MissingResourceCauseTestRun.java # newfailures.txt jdk/javadoc/doclet/testIOException/TestIOException.java # newfailures.txt # newfailures.txt two test are failing with `root` user; There are already issues on JBS: [TestIOException.java](https://bugs.openjdk.org/browse/JDK-8334195) & [MissingResourceCauseTestRun.java](https://bugs.openjdk.org/browse/JDK-8334333) `GTestWrapper.java` is failing due to unspecified `-nativepath`; ------------- PR Comment: https://git.openjdk.org/jdk/pull/19733#issuecomment-2171175854 From amitkumar at openjdk.org Sun Jun 16 09:49:42 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Sun, 16 Jun 2024 09:49:42 GMT Subject: RFR: 8319947: Recursive lightweight locking: s390x implementation [v6] In-Reply-To: <_HXRejW4TcLfTYbXlRQUVyejaomiwx6HhObYfG1lX6E=.f89b29a0-4c2d-43e1-83be-d7bafd84816c@github.com> References: <_HXRejW4TcLfTYbXlRQUVyejaomiwx6HhObYfG1lX6E=.f89b29a0-4c2d-43e1-83be-d7bafd84816c@github.com> Message-ID: > s390x port for recursive locking. > > testing: > - [x] build fastdebug-vm > - [x] build slowdebug-vm > - [x] build release-vm > - [x] build optimized-vm > - [x] ./test/jdk/java/util/concurrent (fastdebug-vm) > - [x] with C1 > - [x] with C2 > - [x] with interpreter > - [x] ./test/jdk/java/util/concurrent (release-vm) > - [x] with C1 > - [x] with C2 > - [x] with interpreter > - [x] ./test/jdk/java/util/concurrent (slowdebug-vm) > - [x] with C1 > - [x] with C2 > - [x] with interpreter > - [x] tier1 with fastdebug-vm > - [x] tier1 with slowdebug-vm > - [x] tier1 with release-vm > > *BenchMarks*: > > Results from Performance LPARs : > > > Locking Mode = 1 (without Patch) > > Benchmark (innerCount) Mode Cnt Score Error Units > LockUnlock.testContendedLock 100 avgt 12 5.144 ? 0.035 ns/op > LockUnlock.testRecursiveLockUnlock 100 avgt 12 3824.742 ? 89.475 ns/op > LockUnlock.testRecursiveSynchronization 100 avgt 12 25.348 ? 0.559 ns/op > LockUnlock.testSerialLockUnlock 100 avgt 12 466.629 ? 3.036 ns/op > LockUnlock.testSimpleLockUnlock 100 avgt 12 468.532 ? 1.793 ns/op > Finished running test 'micro:vm.lang.LockUnlock' > > Locking Mode = 1 (with patch) > > Benchmark (innerCount) Mode Cnt Score Error Units > LockUnlock.testContendedLock 100 avgt 12 5.146 ? 0.027 ns/op > LockUnlock.testRecursiveLockUnlock 100 avgt 12 3833.175 ? 75.863 ns/op > LockUnlock.testRecursiveSynchronization 100 avgt 12 25.206 ? 0.519 ns/op > LockUnlock.testSerialLockUnlock 100 avgt 12 473.973 ? 2.103 ns/op > LockUnlock.testSimpleLockUnlock 100 avgt 12 470.749 ? 2.229 ns/op > Finished running test 'micro:vm.lang.LockUnlock' > > > > > Locking Mode = 2 (without Patch) > > Benchmark (innerCount) Mode Cnt Score Error Units > LockUnlock.testContendedLock 100 avgt 12 4.688 ? 0.051 ns/op > LockUnlock.testRecursiveLockUnlock 100 avgt 12 12800.544 ? 92.265 ns/op > LockUnlock.testRecursiveSynchronization 100 avgt 12 26.486 ? 2.229 ns/op > LockUnlock.testSerialLockUnlock 100 avgt 12 424.499 ? 0.416 ns/op > LockUnlock.testSimpleLockUnlock 100 avgt 12 424.241 ? 0.840 ns/op > Finished running test 'micro:vm.lang.Lo... Amit Kumar has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains seven commits: - Merge branch 'master' into recursive_locking_v1 - not using load_const_optimized in compiler_fast_lock_lightweight_object - minor code formatting & variable renamings - revert DiagnoseSyncOnValueBasedClasses changes from c1 - suggestions from Axel - Merge branch 'master' into recursive_locking_v1 - s390x recursive locking port ------------- Changes: https://git.openjdk.org/jdk/pull/18878/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=18878&range=05 Stats: 553 lines in 9 files changed: 426 ins; 64 del; 63 mod Patch: https://git.openjdk.org/jdk/pull/18878.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18878/head:pull/18878 PR: https://git.openjdk.org/jdk/pull/18878 From jpai at openjdk.org Sun Jun 16 11:44:24 2024 From: jpai at openjdk.org (Jaikiran Pai) Date: Sun, 16 Jun 2024 11:44:24 GMT Subject: RFR: 8331552: Update to use jtreg 7.4 In-Reply-To: <_Q1eZAC0M9Q3B8idE8tfSg0TZ7Lh-tXoLdcbV4LZsa4=.392cc442-74ce-47c0-aea5-eaee500da7c1@github.com> References: <_Q1eZAC0M9Q3B8idE8tfSg0TZ7Lh-tXoLdcbV4LZsa4=.392cc442-74ce-47c0-aea5-eaee500da7c1@github.com> Message-ID: On Thu, 2 May 2024 09:48:51 GMT, Christian Stein wrote: > Please review the change to update to using `jtreg` **7.4**. > > The primary change is to the `jib-profiles.js` file, which specifies the version of jtreg to use, for those systems that rely on this file. In addition, the `requiredVersion` has been updated in the various `TEST.ROOT` files. > > Testing: _tier1-tier5 pending..._ Looks OK to me. ------------- Marked as reviewed by jpai (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19052#pullrequestreview-2121354181 From jpai at openjdk.org Sun Jun 16 11:47:24 2024 From: jpai at openjdk.org (Jaikiran Pai) Date: Sun, 16 Jun 2024 11:47:24 GMT Subject: RFR: 8331552: Update to use jtreg 7.4 In-Reply-To: <_Q1eZAC0M9Q3B8idE8tfSg0TZ7Lh-tXoLdcbV4LZsa4=.392cc442-74ce-47c0-aea5-eaee500da7c1@github.com> References: <_Q1eZAC0M9Q3B8idE8tfSg0TZ7Lh-tXoLdcbV4LZsa4=.392cc442-74ce-47c0-aea5-eaee500da7c1@github.com> Message-ID: On Thu, 2 May 2024 09:48:51 GMT, Christian Stein wrote: > Please review the change to update to using `jtreg` **7.4**. > > The primary change is to the `jib-profiles.js` file, which specifies the version of jtreg to use, for those systems that rely on this file. In addition, the `requiredVersion` has been updated in the various `TEST.ROOT` files. > > Testing: _tier1-tier5 pending..._ Hello Christian, it would be better to merge latest master branch into this PR before integrating this. It looks like it currently uses `master` from more than a month back. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19052#issuecomment-2171466397 From gziemski at openjdk.org Mon Jun 17 02:22:16 2024 From: gziemski at openjdk.org (Gerard Ziemski) Date: Mon, 17 Jun 2024 02:22:16 GMT Subject: RFR: 8333994: NMT: call stacks should show source information In-Reply-To: References: Message-ID: On Tue, 11 Jun 2024 12:38:09 GMT, Thomas Stuefe wrote: > Patch adds printing of source file and line number to NMT call stacks that are printed in detail mode. Useful for hunting down leaks. The implementation of `NativeCallStack::print_on()` is basically: > if (os::dll_address_to_function_name() { > Decoder::get_source_info() // calls into os::dll_address_to_library_name() > } > if (not printed the info yet) { > os::dll_address_to_library_name() > } where `os::dll_address_to_function_name()` and `os::dll_address_to_library_name()` each call `dladdr()`, so we end up with 2 calls to `dladdr()` in each case. If that's what takes up most of the time, then perhaps we can find a way to optimize this code by sharing the context returned by `dladdr()` and we can then add the fancy feature, such as `line` info? ------------- PR Comment: https://git.openjdk.org/jdk/pull/19655#issuecomment-2172050666 From dholmes at openjdk.org Mon Jun 17 04:51:11 2024 From: dholmes at openjdk.org (David Holmes) Date: Mon, 17 Jun 2024 04:51:11 GMT Subject: RFR: 8333133: Simplify QuickSort::sort In-Reply-To: References: Message-ID: On Sat, 15 Jun 2024 07:35:41 GMT, Kim Barrett wrote: > It does not provide any such thing. All the flag does is prevent swapping of equivalent elements, which doesn't give us any interesting additional ordering property. I only meant the sort order of the equivalent elements would be maintained. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19464#issuecomment-2172231564 From dholmes at openjdk.org Mon Jun 17 04:53:12 2024 From: dholmes at openjdk.org (David Holmes) Date: Mon, 17 Jun 2024 04:53:12 GMT Subject: RFR: 8333962: Obsolete OldSize [v2] In-Reply-To: <--FcnwH_fA8e27VK3SwN0vJZZW8yZsyxh2I2jul-Enk=.3cd55bc7-b0ac-4ef6-93a1-b880ff255bd3@github.com> References: <--FcnwH_fA8e27VK3SwN0vJZZW8yZsyxh2I2jul-Enk=.3cd55bc7-b0ac-4ef6-93a1-b880ff255bd3@github.com> Message-ID: <3di2So2Uj6-xau81y4dtdpAG2_GlIHNUd8KTyEHUtTs=.df3c5bab-cd31-4c9b-b905-265e8beb2d10@github.com> On Sat, 15 Jun 2024 08:51:56 GMT, Albert Mingkun Yang wrote: >> src/hotspot/share/runtime/arguments.cpp line 37: >> >>> 35: #include "gc/shared/gcArguments.hpp" >>> 36: #include "gc/shared/gcConfig.hpp" >>> 37: #include "gc/shared/genArguments.hpp" >> >> Why is this needed? > > `Arguments::set_heap_size` accesses `OldSize`, which is declared in this header. Got it. Thanks ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19647#discussion_r1642158519 From varadam at openjdk.org Mon Jun 17 05:05:12 2024 From: varadam at openjdk.org (Varada M) Date: Mon, 17 Jun 2024 05:05:12 GMT Subject: RFR: 8332603: [PPC64] Improve itable_stub [v2] In-Reply-To: References: Message-ID: On Sat, 15 Jun 2024 14:37:37 GMT, Amit Kumar wrote: >> PPC Port similar to [JDK-8305959 (x86)](https://bugs.openjdk.org/browse/JDK-8305959) and [JDK-8307352(aarch64)](https://bugs.openjdk.org/browse/JDK-8307352) > > Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: > > consistency There are no related failures in tier1 with release and fastdebug on aix-ppc. Here is the benchmark results. Before patch: Benchmark Mode Cnt Score Error Units InterfaceCalls.test1stInt2Types avgt 12 18.084 ? 0.160 ns/op InterfaceCalls.test1stInt3Types avgt 12 26.789 ? 0.282 ns/op InterfaceCalls.test1stInt5Types avgt 12 26.926 ? 0.438 ns/op InterfaceCalls.test2ndInt2Types avgt 12 18.500 ? 0.215 ns/op InterfaceCalls.test2ndInt3Types avgt 12 30.996 ? 0.489 ns/op InterfaceCalls.test2ndInt5Types avgt 12 28.395 ? 0.215 ns/op InterfaceCalls.testIfaceCall avgt 12 26.749 ? 0.238 ns/op InterfaceCalls.testIfaceExtCall avgt 12 28.189 ? 0.395 ns/op InterfaceCalls.testMonomorphic avgt 12 15.251 ? 0.126 ns/op Finished running test 'micro:vm.compiler.InterfaceCalls' Test report is stored in build/aix-ppc64-server-release/test-results/micro_vm_compiler_InterfaceCalls ============================== Test summary ============================== TEST TOTAL PASS FAIL ERROR micro:vm.compiler.InterfaceCalls 1 1 0 0 ============================== TEST SUCCESS Finished building target 'test' in configuration 'aix-ppc64-server-release' After patch : Benchmark Mode Cnt Score Error Units InterfaceCalls.test1stInt2Types avgt 12 18.078 ? 0.133 ns/op InterfaceCalls.test1stInt3Types avgt 12 26.816 ? 0.276 ns/op InterfaceCalls.test1stInt5Types avgt 12 26.918 ? 0.233 ns/op InterfaceCalls.test2ndInt2Types avgt 12 18.674 ? 0.204 ns/op InterfaceCalls.test2ndInt3Types avgt 12 30.360 ? 0.251 ns/op InterfaceCalls.test2ndInt5Types avgt 12 28.356 ? 0.394 ns/op InterfaceCalls.testIfaceCall avgt 12 26.676 ? 0.188 ns/op InterfaceCalls.testIfaceExtCall avgt 12 27.613 ? 0.258 ns/op InterfaceCalls.testMonomorphic avgt 12 15.175 ? 0.090 ns/op Finished running test 'micro:vm.compiler.InterfaceCalls' Test report is stored in build/aix-ppc64-server-release/test-results/micro_vm_compiler_InterfaceCalls ============================== Test summary ============================== TEST TOTAL PASS FAIL ERROR micro:vm.compiler.InterfaceCalls 1 1 0 0 ============================== TEST SUCCESS ------------- PR Comment: https://git.openjdk.org/jdk/pull/19733#issuecomment-2172244327 From dholmes at openjdk.org Mon Jun 17 05:37:11 2024 From: dholmes at openjdk.org (David Holmes) Date: Mon, 17 Jun 2024 05:37:11 GMT Subject: RFR: 8334239: Introduce macro for ubsan method/function exclusions In-Reply-To: References: Message-ID: <3dZ0rNWOpXwirrj4n6liCkvvnjPb3F69AJAx1U-R1Wc=.49cf4541-ed1a-4fbb-9499-02a7cb461b42@github.com> On Fri, 14 Jun 2024 13:39:14 GMT, Matthias Baesken wrote: > A number of functions/methods have to be excluded from ubsan detection (e.g. because they do things that ubsan warns about, however it is still valid what is done there). > We can simplify this by introducing a macro (similar to asan-related ATTRIBUTE_NO_ASAN, see sanitizers/address.hpp). > Currently something like this is used : > > #if defined(__clang__) || defined(__GNUC__) > __attribute__((no_sanitize("undefined"))) > #endif In keeping with address.hpp and leak.hpp the header would either be called undefinedbehaviour.hpp or just ub.hpp for short. I think the actual attribute would read better if placed on the line before the function definition. Thanks. ------------- Changes requested by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19722#pullrequestreview-2121918399 From jbhateja at openjdk.org Mon Jun 17 05:53:35 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 17 Jun 2024 05:53:35 GMT Subject: RFR: 8329032: C2 compiler register allocation support for APX EGPRs [v3] In-Reply-To: <835IiFAjWPI9AoU9j1WSAhVY2EEHzjpvWucUXTGxgUw=.cdb547b2-e1e7-4c5b-915f-7fde46e26c81@github.com> References: <835IiFAjWPI9AoU9j1WSAhVY2EEHzjpvWucUXTGxgUw=.cdb547b2-e1e7-4c5b-915f-7fde46e26c81@github.com> Message-ID: <8bWJmx1khK66SJPV3THbBLzDych3zwt5VlRSTeAViOU=.de4d309c-cd56-4963-97f0-3735046d6c27@github.com> > Intel? Advanced Performance Extensions (Intel? APX) adds 16 new 64 bit general purpose register also known as Extended General Purpose Registers in IA-32e 64 bit mode. > > Summary of changes introduced along with this patch:- > > 1. C2 compiler register allocation support. > 2. Architecture state save restoration while transitioning from C1/C2 JIT compiled code to runtime services. > 3. Support new PUSH2/POP2 instructions along with push-pop acceleration hints (PPX) to optimize register save/restore operation. > 4. Applicable extensions to native interface used by runtime for patching instruction. > > We plan to address C1 register support in subsequent patch as there are hard upper bound allocation limits > (currently set to r11) imposed by existing implementation of linear scan algorithm after which it reserves > remaining register for special purpose. > > Patch has been regressed over stand alone test points after prioritizing EGPR allocations over existing GPR register by manually modifying the register sequences in relevant allocation class. > > We plan to do thorough validation using [Intel's SDE](https://www.intel.com/content/www/us/en/download/684897/intel-software-development-emulator.html) during course of time and release incremental patches for bug fixes > found during testing. > > Kindly review and share your feedback. > > Best Regards, > Jatin Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: jvmci test failures fixes ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19042/files - new: https://git.openjdk.org/jdk/pull/19042/files/e92349ff..f13a5574 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19042&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19042&range=01-02 Stats: 68 lines in 2 files changed: 22 ins; 0 del; 46 mod Patch: https://git.openjdk.org/jdk/pull/19042.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19042/head:pull/19042 PR: https://git.openjdk.org/jdk/pull/19042 From stuefe at openjdk.org Mon Jun 17 06:16:12 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Mon, 17 Jun 2024 06:16:12 GMT Subject: RFR: 8334239: Introduce macro for ubsan method/function exclusions In-Reply-To: <3dZ0rNWOpXwirrj4n6liCkvvnjPb3F69AJAx1U-R1Wc=.49cf4541-ed1a-4fbb-9499-02a7cb461b42@github.com> References: <3dZ0rNWOpXwirrj4n6liCkvvnjPb3F69AJAx1U-R1Wc=.49cf4541-ed1a-4fbb-9499-02a7cb461b42@github.com> Message-ID: On Mon, 17 Jun 2024 05:34:33 GMT, David Holmes wrote: > In keeping with address.hpp and leak.hpp the header would either be called undefinedbehaviour.hpp or just ub.hpp for short. > > I think the actual attribute would read better if placed on the line before the function definition. > > Thanks. I vote for ubsan.hpp and asan.hpp, or for a common header sanitizer.hpp. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19722#issuecomment-2172368124 From duke at openjdk.org Mon Jun 17 06:19:21 2024 From: duke at openjdk.org (Liming Liu) Date: Mon, 17 Jun 2024 06:19:21 GMT Subject: Integrated: 8324781: runtime/Thread/TestAlwaysPreTouchStacks.java failed with Expected a higher ratio between stack committed and reserved In-Reply-To: References: Message-ID: <4jE8-p7IJwhxqp6lnnh6WI8SwLHxfIWPGp6V7I_oHdA=.e2307e02-a3ba-4684-a3ce-1fad375793e3@github.com> On Wed, 3 Apr 2024 08:12:22 GMT, Liming Liu wrote: > The testcase failed on Oracle CI since JDK-8315923. The root cause is that Oracle CI runs Linux-5.4.17-UEK where the value of MADV_POPULATE_WRITE (23) is used as MADV_DONTEXEC which is not supported by upstream. This PR solves the testcase failure by checking versions of kernels first, and checking the availability of MADV_POPULATE_WRITE when they are not older than 5.14. This pull request has now been integrated. Changeset: 31e8deba Author: Liming Liu Committer: Thomas Stuefe URL: https://git.openjdk.org/jdk/commit/31e8debae63e008da79e403bcb870a7be631af2c Stats: 13 lines in 3 files changed: 4 ins; 3 del; 6 mod 8324781: runtime/Thread/TestAlwaysPreTouchStacks.java failed with Expected a higher ratio between stack committed and reserved 8325218: gc/parallel/TestAlwaysPreTouchBehavior.java fails Reviewed-by: stefank, jsjolen, stuefe ------------- PR: https://git.openjdk.org/jdk/pull/18592 From cstein at openjdk.org Mon Jun 17 06:21:12 2024 From: cstein at openjdk.org (Christian Stein) Date: Mon, 17 Jun 2024 06:21:12 GMT Subject: RFR: 8331431: Update to use jtreg 7.4 In-Reply-To: <_Q1eZAC0M9Q3B8idE8tfSg0TZ7Lh-tXoLdcbV4LZsa4=.392cc442-74ce-47c0-aea5-eaee500da7c1@github.com> References: <_Q1eZAC0M9Q3B8idE8tfSg0TZ7Lh-tXoLdcbV4LZsa4=.392cc442-74ce-47c0-aea5-eaee500da7c1@github.com> Message-ID: <0uNmeiETRIRQQ36f2H_cMJ9jygo2A9-MuhJHnhyZmCw=.f3e4fd02-f00c-4961-9347-c959a0ee9e35@github.com> On Thu, 2 May 2024 09:48:51 GMT, Christian Stein wrote: > Please review the change to update to using `jtreg` **7.4**. > > The primary change is to the `jib-profiles.js` file, which specifies the version of jtreg to use, for those systems that rely on this file. In addition, the `requiredVersion` has been updated in the various `TEST.ROOT` files. > > Tested: tier 1 ? tier 5 Update the title to use the correct JBS issue number. > Looks good to me. I assume that you have run an extensive set of tests to verify that this does not break, even in higher tiers? Yes. See below. > Hello Christian, it would be better to merge latest master branch into this PR before integrating this. It looks like it currently uses master from more than a month back. Latest tier 1-5 tests were run against late last week's HEAD revision. So, all looks good. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19052#issuecomment-2172374402 From stuefe at openjdk.org Mon Jun 17 06:39:45 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Mon, 17 Jun 2024 06:39:45 GMT Subject: RFR: 8332105: Exploded JDK does not include CDS [v2] In-Reply-To: References: Message-ID: > An exploded JDK cannot be used with either -Xshare:on or -Xshare:auto. That causes tests like runtime/CompressedOops/CompressedCPUSpecificClassSpaceReservation.java to fail when running on an exploded JDK. > > Since an exploded JDK cannot use CDS, we should - for tests - treat it as if CDS had not been included. > > > ---- > > Note that I was torn between two ways to fix this: > > - either this fix, which is rather simple and automatically updates the "vm.cds" `@requires` property > - or to expose "exploded-ness" as a boolean property via `WhiteBox` and `VMProps`(`jdk.exploded`). See this draft PR: https://github.com/openjdk/jdk/pull/19178 . > > The latter is cleaner and clearer, conveying the message of exploded-ness without muddling it with the CDS aspect. But OTOH the complexity may not be required. > > I can go either way, though I have a slight preference for this PR, which is why I posted it. Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: - Merge branch 'openjdk:master' into JDK-8332105-Exploded-JDK-should-count-as-if-CDS-had-not-been-included-in-the-build - JDK-8332105-Exploded-JDK-should-count-as-if-CDS-had-not-been-included-in-the-build ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19188/files - new: https://git.openjdk.org/jdk/pull/19188/files/179d00b3..b9d17dce Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19188&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19188&range=00-01 Stats: 135873 lines in 2635 files changed: 90868 ins; 31523 del; 13482 mod Patch: https://git.openjdk.org/jdk/pull/19188.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19188/head:pull/19188 PR: https://git.openjdk.org/jdk/pull/19188 From rehn at openjdk.org Mon Jun 17 06:48:17 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Mon, 17 Jun 2024 06:48:17 GMT Subject: RFR: 8332689: RISC-V: Use load instead of trampolines [v10] In-Reply-To: References: Message-ID: On Wed, 12 Jun 2024 08:51:33 GMT, Hamlin Li wrote: >> Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 15 commits: >> >> - Merge branch 'master' into 8332689 >> - Merge branch 'master' into 8332689 >> - Merge branch 'master' into 8332689 >> - Remove tmp file >> - Prepare for dynamic NativeCall size >> - Only allow one calling convetion, i.e. fixed sized >> - Merge branch 'master' into 8332689 >> - Review comments >> - Move shart/far code to cpp >> - Cleanup >> - ... and 5 more: https://git.openjdk.org/jdk/compare/93f3918e...eb30360a > > src/hotspot/cpu/riscv/nativeInst_riscv.cpp line 60: > >> 58: }; >> 59: >> 60: address destination(nmethod *nm = nullptr) const; > > unused argument This pre-exsisting, fixed. > src/hotspot/cpu/riscv/nativeInst_riscv.cpp line 61: > >> 59: >> 60: address destination(nmethod *nm = nullptr) const; >> 61: void set_destination(address new_destination); > > unused method Used in NativeShortCall::reloc_set_destination/NativeShortCall::set_destination_mt_safe: `NativeShortCallTrampolineStub::at(trampoline_stub_addr)->set_destination(dest)` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19453#discussion_r1642260368 PR Review Comment: https://git.openjdk.org/jdk/pull/19453#discussion_r1642260028 From mbaesken at openjdk.org Mon Jun 17 06:49:41 2024 From: mbaesken at openjdk.org (Matthias Baesken) Date: Mon, 17 Jun 2024 06:49:41 GMT Subject: RFR: 8334239: Introduce macro for ubsan method/function exclusions [v2] In-Reply-To: References: <3dZ0rNWOpXwirrj4n6liCkvvnjPb3F69AJAx1U-R1Wc=.49cf4541-ed1a-4fbb-9499-02a7cb461b42@github.com> Message-ID: On Mon, 17 Jun 2024 06:13:08 GMT, Thomas Stuefe wrote: > I think the actual attribute would read better if placed on the line before the function definition. I placed it on the line before the function. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19722#issuecomment-2172424707 From mbaesken at openjdk.org Mon Jun 17 06:49:41 2024 From: mbaesken at openjdk.org (Matthias Baesken) Date: Mon, 17 Jun 2024 06:49:41 GMT Subject: RFR: 8334239: Introduce macro for ubsan method/function exclusions [v2] In-Reply-To: References: Message-ID: > A number of functions/methods have to be excluded from ubsan detection (e.g. because they do things that ubsan warns about, however it is still valid what is done there). > We can simplify this by introducing a macro (similar to asan-related ATTRIBUTE_NO_ASAN, see sanitizers/address.hpp). > Currently something like this is used : > > #if defined(__clang__) || defined(__GNUC__) > __attribute__((no_sanitize("undefined"))) > #endif Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: move ATTRIBUTE_NO_UBSAN to a separate line ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19722/files - new: https://git.openjdk.org/jdk/pull/19722/files/d5d5cedd..735a6871 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19722&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19722&range=00-01 Stats: 6 lines in 3 files changed: 3 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/19722.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19722/head:pull/19722 PR: https://git.openjdk.org/jdk/pull/19722 From rehn at openjdk.org Mon Jun 17 06:56:15 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Mon, 17 Jun 2024 06:56:15 GMT Subject: RFR: 8332689: RISC-V: Use load instead of trampolines [v10] In-Reply-To: References: Message-ID: On Wed, 12 Jun 2024 08:56:22 GMT, Hamlin Li wrote: >> Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 15 commits: >> >> - Merge branch 'master' into 8332689 >> - Merge branch 'master' into 8332689 >> - Merge branch 'master' into 8332689 >> - Remove tmp file >> - Prepare for dynamic NativeCall size >> - Only allow one calling convetion, i.e. fixed sized >> - Merge branch 'master' into 8332689 >> - Review comments >> - Move shart/far code to cpp >> - Cleanup >> - ... and 5 more: https://git.openjdk.org/jdk/compare/93f3918e...eb30360a > > src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 982: > >> 980: >> 981: void MacroAssembler::load_link(const address source, Register temp) { >> 982: assert(temp != noreg && temp != x0, "expecting a register"); > > with `temp == x5`, this assert is redundant. > A question, why require `temp == x5`? No, reason, fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19453#discussion_r1642268548 From jsjolen at openjdk.org Mon Jun 17 07:14:16 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Mon, 17 Jun 2024 07:14:16 GMT Subject: RFR: 8333658: NMT: Use an allocator with 4-byte pointers to save memory in NativeCallStackStorage [v18] In-Reply-To: References: Message-ID: On Sat, 15 Jun 2024 06:45:26 GMT, Thomas Stuefe wrote: >> Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: >> >> Remove dead CHeap allocator test > > src/hotspot/share/nmt/indexedFreeListAllocator.hpp line 79: > >> 77: >> 78: void free(I i) { >> 79: assert(i != nil || (i > 0 && i < _backing_storage.length()), "out of bounds free"); > > I think there are some errors here. This is probably broken. Which we would see if the gtests were running, but hotspot common tier1 tests seem broken. > > Do we allow passing in nil? Then, i must be either nil or valid, not != nil or valid. If not, use an AND, not an OR. > i=0 is valid > Could you also please factor out OOB test for i? It's meant to express `P => Q` equiv. to `~P V Q`, but we're trying to say "is_not_nil(i) => is_valid(i)`, so it should be `i == nil` **not** `i != nil`. Yes, it's broken. Also yes, let's make OOB check a predicate function. > src/hotspot/share/nmt/indexedFreeListAllocator.hpp line 80: > >> 78: void free(I i) { >> 79: assert(i != nil || (i > 0 && i < _backing_storage.length()), "out of bounds free"); >> 80: if (i != nil) return; > > i == nil? Not sure how I managed this miracle of coding :), yes. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18979#discussion_r1642289772 PR Review Comment: https://git.openjdk.org/jdk/pull/18979#discussion_r1642290526 From jsjolen at openjdk.org Mon Jun 17 07:16:13 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Mon, 17 Jun 2024 07:16:13 GMT Subject: RFR: 8333994: NMT: call stacks should show source information In-Reply-To: References: Message-ID: On Fri, 14 Jun 2024 21:13:05 GMT, Zhengyu Gu wrote: > There is a following comment in this code that says: > > > ``` > > // Note: we deliberately omit printing source information here. NativeCallStack::print_on() > > // can be called thousands of times as part of NMT detail reporting, and source printing > > // can slow down reporting by a factor of 5 or more depending on platform (see JDK-8296931). > > ``` > > but we are in fact looking up and printing more detail here. Is that comment no longer relevant, or is the slow down that goes with this change insignificant? If the performance is poor, then I suspect that caching this information would be useful. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19655#issuecomment-2172469670 From amitkumar at openjdk.org Mon Jun 17 07:22:18 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Mon, 17 Jun 2024 07:22:18 GMT Subject: RFR: 8332603: [PPC64] Improve itable_stub [v2] In-Reply-To: References: Message-ID: On Mon, 17 Jun 2024 05:02:18 GMT, Varada M wrote: >Testing is fine with release VM, I see one failure test/hotspot/jtreg/gtest/GTestWrapper.java with fastdebug on aix-ppc. @varada1110 can you check the reason for the failure ? is it just because `-nativepath` is not specified ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/19733#issuecomment-2172477240 From varadam at openjdk.org Mon Jun 17 07:22:18 2024 From: varadam at openjdk.org (Varada M) Date: Mon, 17 Jun 2024 07:22:18 GMT Subject: RFR: 8332603: [PPC64] Improve itable_stub [v2] In-Reply-To: References: Message-ID: On Mon, 17 Jun 2024 05:02:18 GMT, Varada M wrote: > Testing is fine with release VM, I see one failure `test/hotspot/jtreg/gtest/GTestWrapper.java` with fastdebug on aix-ppc. > > Here is the benchmark results. > > ``` > Before patch: > > Benchmark Mode Cnt Score Error Units > InterfaceCalls.test1stInt2Types avgt 12 18.084 ? 0.160 ns/op > InterfaceCalls.test1stInt3Types avgt 12 26.789 ? 0.282 ns/op > InterfaceCalls.test1stInt5Types avgt 12 26.926 ? 0.438 ns/op > InterfaceCalls.test2ndInt2Types avgt 12 18.500 ? 0.215 ns/op > InterfaceCalls.test2ndInt3Types avgt 12 30.996 ? 0.489 ns/op > InterfaceCalls.test2ndInt5Types avgt 12 28.395 ? 0.215 ns/op > InterfaceCalls.testIfaceCall avgt 12 26.749 ? 0.238 ns/op > InterfaceCalls.testIfaceExtCall avgt 12 28.189 ? 0.395 ns/op > InterfaceCalls.testMonomorphic avgt 12 15.251 ? 0.126 ns/op > Finished running test 'micro:vm.compiler.InterfaceCalls' > Test report is stored in build/aix-ppc64-server-release/test-results/micro_vm_compiler_InterfaceCalls > > ============================== > Test summary > ============================== > TEST TOTAL PASS FAIL ERROR > micro:vm.compiler.InterfaceCalls 1 1 0 0 > ============================== > TEST SUCCESS > > Finished building target 'test' in configuration 'aix-ppc64-server-release' > > > After patch : > > Benchmark Mode Cnt Score Error Units > InterfaceCalls.test1stInt2Types avgt 12 18.078 ? 0.133 ns/op > InterfaceCalls.test1stInt3Types avgt 12 26.816 ? 0.276 ns/op > InterfaceCalls.test1stInt5Types avgt 12 26.918 ? 0.233 ns/op > InterfaceCalls.test2ndInt2Types avgt 12 18.674 ? 0.204 ns/op > InterfaceCalls.test2ndInt3Types avgt 12 30.360 ? 0.251 ns/op > InterfaceCalls.test2ndInt5Types avgt 12 28.356 ? 0.394 ns/op > InterfaceCalls.testIfaceCall avgt 12 26.676 ? 0.188 ns/op > InterfaceCalls.testIfaceExtCall avgt 12 27.613 ? 0.258 ns/op > InterfaceCalls.testMonomorphic avgt 12 15.175 ? 0.090 ns/op > Finished running test 'micro:vm.compiler.InterfaceCalls' > Test report is stored in build/aix-ppc64-server-release/test-results/micro_vm_compiler_InterfaceCalls > > ============================== > Test summary > ============================== > TEST TOTAL PASS FAIL ERROR > micro:vm.compiler.InterfaceCalls 1 1 0 0 > ============================== > TEST SUCCESS > ``` GTestWrapper Failure : [----------] 2 tests from code [ RUN ] code.vtableStubs_vm [ OK ] code.vtableStubs_vm (0 ms) [ RUN ] code.itableStubs_vm # # A fatal error has been detected by the Java Runtime Environment: # # Internal Error (/home/hotspot/openjdk/jdk-varada/src/hotspot/cpu/ppc/assembler_ppc.hpp:1028), pid=16777526, tid=258 # assert(nbits == 32 || (-(1 << (nbits-1)) <= x && x < (1 << (nbits-1)))) failed: value out of range # # JRE version: OpenJDK Runtime Environment (24.0) (fastdebug build 24-internal-adhoc.hotspot.jdk-varada) # Java VM: OpenJDK 64-Bit Server VM (fastdebug 24-internal-adhoc.hotspot.jdk-varada, mixed mode, tiered, compressed oops, compressed class ptrs, g1 gc, aix-ppc64) # No core dump will be written. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again # # An error report file with more information is saved as: # /home/hotspot/openjdk/jdk-varada/build/aix-ppc64-server-fastdebug/test-support/jtreg_test_hotspot_jtreg_gtest_GTestWrapper_java/scratch/0/hs_err_pid16777526.log # # If you would like to submit a bug report, please visit: # https://bugreport.java.com/bugreport/crash.jsp # ------------- PR Comment: https://git.openjdk.org/jdk/pull/19733#issuecomment-2172479057 From rehn at openjdk.org Mon Jun 17 07:41:16 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Mon, 17 Jun 2024 07:41:16 GMT Subject: RFR: 8332689: RISC-V: Use load instead of trampolines [v10] In-Reply-To: References: Message-ID: On Wed, 12 Jun 2024 09:39:33 GMT, Hamlin Li wrote: >> Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 15 commits: >> >> - Merge branch 'master' into 8332689 >> - Merge branch 'master' into 8332689 >> - Merge branch 'master' into 8332689 >> - Remove tmp file >> - Prepare for dynamic NativeCall size >> - Only allow one calling convetion, i.e. fixed sized >> - Merge branch 'master' into 8332689 >> - Review comments >> - Move shart/far code to cpp >> - Cleanup >> - ... and 5 more: https://git.openjdk.org/jdk/compare/93f3918e...eb30360a > > src/hotspot/cpu/riscv/nativeInst_riscv.cpp line 114: > >> 112: // Creation >> 113: friend NativeCall* nativeCall_at(address addr); >> 114: friend NativeCall* nativeCall_before(address return_address); > > Is these friend declarations necessary? No, fixed. > src/hotspot/cpu/riscv/nativeInst_riscv.cpp line 321: > >> 319: } >> 320: >> 321: void NativeShortCall::replace_mt_safe(address instr_addr, address code_buffer) { > > seems no usage and necessity of these 2 methods `replace_mt_safe` and `insert `? Pre-existing, removing them. > src/hotspot/cpu/riscv/nativeInst_riscv.cpp line 331: > >> 329: // Creation >> 330: friend NativeCall* nativeCall_at(address addr); >> 331: friend NativeCall* nativeCall_before(address return_address); > > Is these friend declarations necessary? No, fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19453#discussion_r1642327838 PR Review Comment: https://git.openjdk.org/jdk/pull/19453#discussion_r1642329243 PR Review Comment: https://git.openjdk.org/jdk/pull/19453#discussion_r1642328396 From rehn at openjdk.org Mon Jun 17 07:55:16 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Mon, 17 Jun 2024 07:55:16 GMT Subject: RFR: 8332689: RISC-V: Use load instead of trampolines [v11] In-Reply-To: <2KzGTOLy36zCDT45_0k5TihdONa2d8KetnnizixXwZw=.4c15bb00-37da-4ce4-b498-4e102771e3c6@github.com> References: <2KzGTOLy36zCDT45_0k5TihdONa2d8KetnnizixXwZw=.4c15bb00-37da-4ce4-b498-4e102771e3c6@github.com> Message-ID: On Thu, 13 Jun 2024 19:17:13 GMT, Hamlin Li wrote: >> Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 16 commits: >> >> - Merge branch 'master' into 8332689 >> - Merge branch 'master' into 8332689 >> - Merge branch 'master' into 8332689 >> - Merge branch 'master' into 8332689 >> - Remove tmp file >> - Prepare for dynamic NativeCall size >> - Only allow one calling convetion, i.e. fixed sized >> - Merge branch 'master' into 8332689 >> - Review comments >> - Move shart/far code to cpp >> - ... and 6 more: https://git.openjdk.org/jdk/compare/5d2a19de...bb7249b8 > > src/hotspot/cpu/riscv/nativeInst_riscv.cpp line 382: > >> 380: } >> 381: >> 382: address NativeFarCall::reloc_destination(address orig_address) { > > argument `orig_address` is not used All these methods have the same signature, i.e.: `address reloc_destination(address orig_address);` Some may use the argument some may not. > src/hotspot/cpu/riscv/nativeInst_riscv.cpp line 430: > >> 428: } >> 429: >> 430: bool NativeFarCall::reloc_set_destination(address dest) { > > argument `dest` is not used. Same here, all these methods have the same signature. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19453#discussion_r1642351788 PR Review Comment: https://git.openjdk.org/jdk/pull/19453#discussion_r1642352441 From rehn at openjdk.org Mon Jun 17 07:55:19 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Mon, 17 Jun 2024 07:55:19 GMT Subject: RFR: 8332689: RISC-V: Use load instead of trampolines [v10] In-Reply-To: References: Message-ID: On Wed, 12 Jun 2024 15:51:19 GMT, Hamlin Li wrote: >> Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 15 commits: >> >> - Merge branch 'master' into 8332689 >> - Merge branch 'master' into 8332689 >> - Merge branch 'master' into 8332689 >> - Remove tmp file >> - Prepare for dynamic NativeCall size >> - Only allow one calling convetion, i.e. fixed sized >> - Merge branch 'master' into 8332689 >> - Review comments >> - Move shart/far code to cpp >> - Cleanup >> - ... and 5 more: https://git.openjdk.org/jdk/compare/93f3918e...eb30360a > > src/hotspot/cpu/riscv/nativeInst_riscv.cpp line 510: > >> 508: } >> 509: >> 510: void NativeFarCall::replace_mt_safe(address instr_addr, address code_buffer) { > > seems no usage and necessity of these 2 methods `replace_mt_safe` and `insert` ? Removing them. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19453#discussion_r1642349992 From rehn at openjdk.org Mon Jun 17 08:11:16 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Mon, 17 Jun 2024 08:11:16 GMT Subject: RFR: 8332689: RISC-V: Use load instead of trampolines [v11] In-Reply-To: <2KzGTOLy36zCDT45_0k5TihdONa2d8KetnnizixXwZw=.4c15bb00-37da-4ce4-b498-4e102771e3c6@github.com> References: <2KzGTOLy36zCDT45_0k5TihdONa2d8KetnnizixXwZw=.4c15bb00-37da-4ce4-b498-4e102771e3c6@github.com> Message-ID: On Thu, 13 Jun 2024 20:26:40 GMT, Hamlin Li wrote: >> Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 16 commits: >> >> - Merge branch 'master' into 8332689 >> - Merge branch 'master' into 8332689 >> - Merge branch 'master' into 8332689 >> - Merge branch 'master' into 8332689 >> - Remove tmp file >> - Prepare for dynamic NativeCall size >> - Only allow one calling convetion, i.e. fixed sized >> - Merge branch 'master' into 8332689 >> - Review comments >> - Move shart/far code to cpp >> - ... and 6 more: https://git.openjdk.org/jdk/compare/5d2a19de...bb7249b8 > > src/hotspot/cpu/riscv/nativeInst_riscv.cpp line 387: > >> 385: CodeBlob *code = CodeCache::find_blob(call_addr); >> 386: assert(code != nullptr, "Could not find the containing code blob"); >> 387: address stub_addr = trampoline_stub_Relocation::get_trampoline_for(call_addr, (nmethod*)code); > > should there be an assert like `assert(code->is_nmethod())`? Added if instead ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19453#discussion_r1642375280 From dholmes at openjdk.org Mon Jun 17 08:17:15 2024 From: dholmes at openjdk.org (David Holmes) Date: Mon, 17 Jun 2024 08:17:15 GMT Subject: RFR: 8334239: Introduce macro for ubsan method/function exclusions [v2] In-Reply-To: References: <3dZ0rNWOpXwirrj4n6liCkvvnjPb3F69AJAx1U-R1Wc=.49cf4541-ed1a-4fbb-9499-02a7cb461b42@github.com> Message-ID: On Mon, 17 Jun 2024 06:13:08 GMT, Thomas Stuefe wrote: > I vote for ubsan.hpp and asan.hpp, or for a common header sanitizer.hpp. That's a suggestion that should have been made when we added asan and lsan support at the beginning if last year. The "san" in the name is redundant with the "sanitizers" in the path. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19722#issuecomment-2172594613 From rehn at openjdk.org Mon Jun 17 08:17:19 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Mon, 17 Jun 2024 08:17:19 GMT Subject: RFR: 8332689: RISC-V: Use load instead of trampolines [v11] In-Reply-To: <2KzGTOLy36zCDT45_0k5TihdONa2d8KetnnizixXwZw=.4c15bb00-37da-4ce4-b498-4e102771e3c6@github.com> References: <2KzGTOLy36zCDT45_0k5TihdONa2d8KetnnizixXwZw=.4c15bb00-37da-4ce4-b498-4e102771e3c6@github.com> Message-ID: On Thu, 13 Jun 2024 20:34:19 GMT, Hamlin Li wrote: >> Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 16 commits: >> >> - Merge branch 'master' into 8332689 >> - Merge branch 'master' into 8332689 >> - Merge branch 'master' into 8332689 >> - Merge branch 'master' into 8332689 >> - Remove tmp file >> - Prepare for dynamic NativeCall size >> - Only allow one calling convetion, i.e. fixed sized >> - Merge branch 'master' into 8332689 >> - Review comments >> - Move shart/far code to cpp >> - ... and 6 more: https://git.openjdk.org/jdk/compare/5d2a19de...bb7249b8 > > src/hotspot/cpu/riscv/nativeInst_riscv.cpp line 392: > >> 390: stub_addr = MacroAssembler::target_addr_for_insn(call_addr); >> 391: } >> 392: return stub_addr; > > Naming here is confusing, as the returned value is not stub addr, but target addr of a jump. > Suggestion: > > if (stub_addr != nullptr) { > return MacroAssembler::target_addr_for_insn(call_addr); > } > return nullptr; The return value MASM::target_addr_for_insn() is for the load in: auipc ld <------------------- jalr During relocation the stub is moved as is, meaning it contains the correct destination. So all we need to do is to make for the auipc + ld is loading from the stub. Thus this code is only concerned about where the stub address is. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19453#discussion_r1642384148 From rehn at openjdk.org Mon Jun 17 08:26:17 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Mon, 17 Jun 2024 08:26:17 GMT Subject: RFR: 8332689: RISC-V: Use load instead of trampolines [v11] In-Reply-To: <2KzGTOLy36zCDT45_0k5TihdONa2d8KetnnizixXwZw=.4c15bb00-37da-4ce4-b498-4e102771e3c6@github.com> References: <2KzGTOLy36zCDT45_0k5TihdONa2d8KetnnizixXwZw=.4c15bb00-37da-4ce4-b498-4e102771e3c6@github.com> Message-ID: On Thu, 13 Jun 2024 20:37:58 GMT, Hamlin Li wrote: >> Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 16 commits: >> >> - Merge branch 'master' into 8332689 >> - Merge branch 'master' into 8332689 >> - Merge branch 'master' into 8332689 >> - Merge branch 'master' into 8332689 >> - Remove tmp file >> - Prepare for dynamic NativeCall size >> - Only allow one calling convetion, i.e. fixed sized >> - Merge branch 'master' into 8332689 >> - Review comments >> - Move shart/far code to cpp >> - ... and 6 more: https://git.openjdk.org/jdk/compare/5d2a19de...bb7249b8 > > src/hotspot/cpu/riscv/nativeInst_riscv.cpp line 109: > >> 107: >> 108: //----------------------------------------------------------------------------- >> 109: // NativeShortCall > > Both Far and Short call here are named `patchable far calls` in the comment in macroAssembler_riscv.hpp. > So, it will be helpful to unify the naming. macroAssembler_riscv.hpp/nativeInst_riscv.hpp is the interface specification. Meaning it should only if so refer to NativeCall. NativeXXXCall is just an implementation name in the cpp file. Looking a the comment.... I'll try to make an update. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19453#discussion_r1642397151 From rehn at openjdk.org Mon Jun 17 08:29:16 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Mon, 17 Jun 2024 08:29:16 GMT Subject: RFR: 8332689: RISC-V: Use load instead of trampolines [v11] In-Reply-To: <2KzGTOLy36zCDT45_0k5TihdONa2d8KetnnizixXwZw=.4c15bb00-37da-4ce4-b498-4e102771e3c6@github.com> References: <2KzGTOLy36zCDT45_0k5TihdONa2d8KetnnizixXwZw=.4c15bb00-37da-4ce4-b498-4e102771e3c6@github.com> Message-ID: On Thu, 13 Jun 2024 20:26:49 GMT, Hamlin Li wrote: >> Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 16 commits: >> >> - Merge branch 'master' into 8332689 >> - Merge branch 'master' into 8332689 >> - Merge branch 'master' into 8332689 >> - Merge branch 'master' into 8332689 >> - Remove tmp file >> - Prepare for dynamic NativeCall size >> - Only allow one calling convetion, i.e. fixed sized >> - Merge branch 'master' into 8332689 >> - Review comments >> - Move shart/far code to cpp >> - ... and 6 more: https://git.openjdk.org/jdk/compare/5d2a19de...bb7249b8 > > src/hotspot/cpu/riscv/nativeInst_riscv.cpp line 436: > >> 434: CodeBlob *code = CodeCache::find_blob(call_addr); >> 435: assert(code != nullptr, "Could not find the containing code blob"); >> 436: address stub_addr = trampoline_stub_Relocation::get_trampoline_for(call_addr, (nmethod*)code); > > should there be an assert like `assert(code->is_nmethod())`? No, if. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19453#discussion_r1642401847 From rehn at openjdk.org Mon Jun 17 08:37:14 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Mon, 17 Jun 2024 08:37:14 GMT Subject: RFR: 8332689: RISC-V: Use load instead of trampolines [v11] In-Reply-To: <2KzGTOLy36zCDT45_0k5TihdONa2d8KetnnizixXwZw=.4c15bb00-37da-4ce4-b498-4e102771e3c6@github.com> References: <2KzGTOLy36zCDT45_0k5TihdONa2d8KetnnizixXwZw=.4c15bb00-37da-4ce4-b498-4e102771e3c6@github.com> Message-ID: On Fri, 14 Jun 2024 08:43:33 GMT, Hamlin Li wrote: >> Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 16 commits: >> >> - Merge branch 'master' into 8332689 >> - Merge branch 'master' into 8332689 >> - Merge branch 'master' into 8332689 >> - Merge branch 'master' into 8332689 >> - Remove tmp file >> - Prepare for dynamic NativeCall size >> - Only allow one calling convetion, i.e. fixed sized >> - Merge branch 'master' into 8332689 >> - Review comments >> - Move shart/far code to cpp >> - ... and 6 more: https://git.openjdk.org/jdk/compare/5d2a19de...bb7249b8 > > src/hotspot/cpu/riscv/nativeInst_riscv.cpp line 169: > >> 167: address addr = addr_at(0); >> 168: if (NativeShortCall::is_at(addr)) { >> 169: NativeShortCall* call = NativeShortCall::at(addr); > > Are these lines necessary? As this is an instance method (rather than static), so `NativeShortCall::is_at(addr)` must already be true? Yes as you can do: `NativeCall* nc = (NativeCall*)(address);` > src/hotspot/cpu/riscv/nativeInst_riscv.cpp line 257: > >> 255: assert(!NativeShortCallTrampolineStub::is_at(dest), "chained trampolines"); >> 256: NativeShortCallTrampolineStub::at(trampoline_stub_addr)->set_destination(dest); >> 257: } > > Maybe move these lines into `else` block below? as `Assembler::reachable_from_branch_at(call_addr, dest)` condition check does not depends on these `trampoline_stub_addr` related check & set. This is the original code, I have not changed it. > src/hotspot/cpu/riscv/nativeInst_riscv.cpp line 438: > >> 436: address stub_addr = trampoline_stub_Relocation::get_trampoline_for(call_addr, (nmethod*)code); >> 437: >> 438: if (stub_addr != nullptr) { > > Could `stub_addr == nullptr`? If positive, then it should return false when it's nullptr, if negative, then should the `if` be converted to an `assert`? Yes, it can, no. Because we never want to execute anything more in `Relocation::pd_set_call_destination()` during NativeFarCall. NativeShortCall (original) do a lot of weird things which seems a bit wrong/unnecessary. I did not want to sort that out as I'm proposing not using it. > src/hotspot/cpu/riscv/nativeInst_riscv.cpp line 439: > >> 437: >> 438: if (stub_addr != nullptr) { >> 439: MacroAssembler::pd_patch_instruction_size(call_addr, stub_addr); > > I could be wrong. `stub_addr` should be `dest`? No, it is the stub address. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19453#discussion_r1642416139 PR Review Comment: https://git.openjdk.org/jdk/pull/19453#discussion_r1642419489 PR Review Comment: https://git.openjdk.org/jdk/pull/19453#discussion_r1642412487 PR Review Comment: https://git.openjdk.org/jdk/pull/19453#discussion_r1642422227 From jsjolen at openjdk.org Mon Jun 17 08:45:56 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Mon, 17 Jun 2024 08:45:56 GMT Subject: RFR: 8333658: NMT: Use an allocator with 4-byte pointers to save memory in NativeCallStackStorage [v19] In-Reply-To: References: Message-ID: > Hi, > > This PR introduces a new allocator, `IndexedFreelistAllocator`. It uses a `GrowableArray` in order to have 4-byte "pointers" to its elements and also works as a freelist as unused slots form a linked list. This allocator cannot shrink its used memory. I'm always open for better names. > > We then use this allocator in order to store the `NativeCallStackStorage` hash table. This saves `4 * 4099 = ~16KiB` of memory as each table element is now only 4 bytes. For each stored stack it also saves 4 bytes. The fact that the allocator cannot shrink is fine, we do not ever remove stacks from the storage. > > The main point of this is to introduce a pointer-generic experimential API, so I also implemented CHeap and Arena allocators. It's currently placed in NMT, but we might want to move it into utilities. It uses a bit of template-magic, but my IDE (Emacs+clangd) was actually able to figure things out when the types didn't line up correctly etc, so it's not an enemy to IDE help. > > It sounded expensive to have the GrowableArray continously realloc its underlying data array, so I did a basic test where we allocate 1 000 000 stacks and push them into NativeCallStackStorage backed by different allocators. This is available in the PR. > > The results are as follows on linux-x64-slowdebug: > > > Generate stacks... Done > Time taken with GrowableArray: 8341.240945 > Time taken with CHeap: 12189.031318 > Time taken with Arena: 8800.703092 > Time taken with GrowableArray again: 8295.508829 > > > And on linux-x64: > > > Time taken with GrowableArray: 8378.018135 > Time taken with CHeap: 12437.347868 > Time taken with Arena: 8758.064717 > Time taken with GrowableArray again: 8391.076291 > > > Obviously, this is a very basic benchmark, but it seems like we're faster than CHeap and Arena for this case. Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: Various minor fixes ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18979/files - new: https://git.openjdk.org/jdk/pull/18979/files/17cd6b44..56ab4977 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18979&range=18 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18979&range=17-18 Stats: 9 lines in 1 file changed: 4 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/18979.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18979/head:pull/18979 PR: https://git.openjdk.org/jdk/pull/18979 From jsjolen at openjdk.org Mon Jun 17 08:45:56 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Mon, 17 Jun 2024 08:45:56 GMT Subject: RFR: 8333658: NMT: Use an allocator with 4-byte pointers to save memory in NativeCallStackStorage [v19] In-Reply-To: References: Message-ID: On Sat, 15 Jun 2024 06:36:59 GMT, Thomas Stuefe wrote: >> Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: >> >> Various minor fixes > > src/hotspot/share/nmt/indexedFreeListAllocator.hpp line 62: > >> 60: I allocate(Args... args) { >> 61: BackingElement* be; >> 62: int i; > > I i? Then, later, just return i? Thanks, clean up from wrapping I. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18979#discussion_r1642431581 From rehn at openjdk.org Mon Jun 17 08:47:19 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Mon, 17 Jun 2024 08:47:19 GMT Subject: RFR: 8332689: RISC-V: Use load instead of trampolines [v11] In-Reply-To: <2KzGTOLy36zCDT45_0k5TihdONa2d8KetnnizixXwZw=.4c15bb00-37da-4ce4-b498-4e102771e3c6@github.com> References: <2KzGTOLy36zCDT45_0k5TihdONa2d8KetnnizixXwZw=.4c15bb00-37da-4ce4-b498-4e102771e3c6@github.com> Message-ID: On Fri, 14 Jun 2024 10:07:34 GMT, Hamlin Li wrote: >> Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 16 commits: >> >> - Merge branch 'master' into 8332689 >> - Merge branch 'master' into 8332689 >> - Merge branch 'master' into 8332689 >> - Merge branch 'master' into 8332689 >> - Remove tmp file >> - Prepare for dynamic NativeCall size >> - Only allow one calling convetion, i.e. fixed sized >> - Merge branch 'master' into 8332689 >> - Review comments >> - Move shart/far code to cpp >> - ... and 6 more: https://git.openjdk.org/jdk/compare/5d2a19de...bb7249b8 > > src/hotspot/cpu/riscv/nativeInst_riscv.cpp line 198: > >> 196: Assembler::patch(pInsn, 30, 21, (offset >> 1) & 0x3ff); >> 197: Assembler::patch(pInsn, 20, 20, (offset >> 11) & 0x1); >> 198: Assembler::patch(pInsn, 19, 12, (offset >> 12) & 0xff); > > should we reuse `MacroAssembler::pd_patch_instruction_size`? This is the original code, I have not change it. (just moved from hpp -> cpp) > src/hotspot/cpu/riscv/nativeInst_riscv.cpp line 248: > >> 246: } >> 247: >> 248: bool NativeShortCall::reloc_set_destination(address dest) { > > `reloc_set_destination` and `set_destination_mt_safe` are almost same, maybe `set_destination_mt_safe` could call `reloc_set_destination`? Ok > src/hotspot/cpu/riscv/nativeInst_riscv.cpp line 410: > >> 408: } >> 409: >> 410: bool NativeFarCall::set_destination_mt_safe(address dest, bool assert_lock) { > > Seems no caller will pass `assert_lock == false` No, but this is the interface in NativeCall which just delegates here. > src/hotspot/cpu/riscv/nativeInst_riscv.cpp line 410: > >> 408: } >> 409: >> 410: bool NativeFarCall::set_destination_mt_safe(address dest, bool assert_lock) { > > For NativeShortCall, reloc_set_destination and set_destination_mt_safe are almost same, but for NativeFarCall they're different, is this expected? They are not the same case, but the original code tries to handle them in the same method which just makes the code complicated. > src/hotspot/cpu/riscv/nativeInst_riscv.cpp line 423: > >> 421: >> 422: if (stub_addr != nullptr) { >> 423: set_stub_address_destination_at(stub_addr, dest); > > Is `ICache::invalidate_range` needed here? No as we do a data store and data read from this location. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19453#discussion_r1642426941 PR Review Comment: https://git.openjdk.org/jdk/pull/19453#discussion_r1642432228 PR Review Comment: https://git.openjdk.org/jdk/pull/19453#discussion_r1642429501 PR Review Comment: https://git.openjdk.org/jdk/pull/19453#discussion_r1642434302 PR Review Comment: https://git.openjdk.org/jdk/pull/19453#discussion_r1642430372 From stefank at openjdk.org Mon Jun 17 09:01:15 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Mon, 17 Jun 2024 09:01:15 GMT Subject: RFR: 8332252: Clean up vmTestbase/vm/share [v3] In-Reply-To: References: Message-ID: On Sat, 15 Jun 2024 16:28:05 GMT, Leonid Mesnik wrote: >> The vmTestbase/vm/share is a shared test library for vmTestbase tests. This library contains a lot of code that is used by only by small number of tests or not used at all. There are no plans to actively develop new tests in vmTestsbase and improve this shared library. >> The final goal of this and the following PRs is to reduce the maintenance cost of vmTestbase by eliminating this library. >> >> Also, this PR moves test-specific code into corresponding test directories to increase code locality. This allows later easier move tests from vmTestbase. >> >> The few remaining classes include >> InMemoryJavaCompiler.java >> that is very similar to same class from the standard testlibrary and could be merge with it and >> ProcessUtils.java >> which is used by >> test/hotspot/jtreg/runtime/Thread/TestBreakSignalThreadDump.java >> and thus should be moved into the standard testlibrary. >> The stack and options might be merged in nsk/share test library. > > Leonid Mesnik has updated the pull request incrementally with one additional commit since the last revision: > > removed unused import Some of the moves/renames messed up the sort-order of the imports. Could you take a pass over the patch and clean that up? ------------- PR Review: https://git.openjdk.org/jdk/pull/19727#pullrequestreview-2122317400 From jsjolen at openjdk.org Mon Jun 17 09:05:50 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Mon, 17 Jun 2024 09:05:50 GMT Subject: RFR: 8333658: NMT: Use an allocator with 4-byte pointers to save memory in NativeCallStackStorage [v20] In-Reply-To: References: Message-ID: > Hi, > > This PR introduces a new allocator, `IndexedFreelistAllocator`. It uses a `GrowableArray` in order to have 4-byte "pointers" to its elements and also works as a freelist as unused slots form a linked list. This allocator cannot shrink its used memory. I'm always open for better names. > > We then use this allocator in order to store the `NativeCallStackStorage` hash table. This saves `4 * 4099 = ~16KiB` of memory as each table element is now only 4 bytes. For each stored stack it also saves 4 bytes. The fact that the allocator cannot shrink is fine, we do not ever remove stacks from the storage. > > The main point of this is to introduce a pointer-generic experimential API, so I also implemented CHeap and Arena allocators. It's currently placed in NMT, but we might want to move it into utilities. It uses a bit of template-magic, but my IDE (Emacs+clangd) was actually able to figure things out when the types didn't line up correctly etc, so it's not an enemy to IDE help. > > It sounded expensive to have the GrowableArray continously realloc its underlying data array, so I did a basic test where we allocate 1 000 000 stacks and push them into NativeCallStackStorage backed by different allocators. This is available in the PR. > > The results are as follows on linux-x64-slowdebug: > > > Generate stacks... Done > Time taken with GrowableArray: 8341.240945 > Time taken with CHeap: 12189.031318 > Time taken with Arena: 8800.703092 > Time taken with GrowableArray again: 8295.508829 > > > And on linux-x64: > > > Time taken with GrowableArray: 8378.018135 > Time taken with CHeap: 12437.347868 > Time taken with Arena: 8758.064717 > Time taken with GrowableArray again: 8391.076291 > > > Obviously, this is a very basic benchmark, but it seems like we're faster than CHeap and Arena for this case. Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: Rename append to push, fix asserts ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18979/files - new: https://git.openjdk.org/jdk/pull/18979/files/56ab4977..db131696 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18979&range=19 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18979&range=18-19 Stats: 13 lines in 2 files changed: 0 ins; 6 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/18979.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18979/head:pull/18979 PR: https://git.openjdk.org/jdk/pull/18979 From jsjolen at openjdk.org Mon Jun 17 09:15:46 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Mon, 17 Jun 2024 09:15:46 GMT Subject: RFR: 8333658: NMT: Use an allocator with 4-byte pointers to save memory in NativeCallStackStorage [v21] In-Reply-To: References: Message-ID: > Hi, > > This PR introduces a new allocator, `IndexedFreelistAllocator`. It uses a `GrowableArray` in order to have 4-byte "pointers" to its elements and also works as a freelist as unused slots form a linked list. This allocator cannot shrink its used memory. I'm always open for better names. > > We then use this allocator in order to store the `NativeCallStackStorage` hash table. This saves `4 * 4099 = ~16KiB` of memory as each table element is now only 4 bytes. For each stored stack it also saves 4 bytes. The fact that the allocator cannot shrink is fine, we do not ever remove stacks from the storage. > > The main point of this is to introduce a pointer-generic experimential API, so I also implemented CHeap and Arena allocators. It's currently placed in NMT, but we might want to move it into utilities. It uses a bit of template-magic, but my IDE (Emacs+clangd) was actually able to figure things out when the types didn't line up correctly etc, so it's not an enemy to IDE help. > > It sounded expensive to have the GrowableArray continously realloc its underlying data array, so I did a basic test where we allocate 1 000 000 stacks and push them into NativeCallStackStorage backed by different allocators. This is available in the PR. > > The results are as follows on linux-x64-slowdebug: > > > Generate stacks... Done > Time taken with GrowableArray: 8341.240945 > Time taken with CHeap: 12189.031318 > Time taken with Arena: 8800.703092 > Time taken with GrowableArray again: 8295.508829 > > > And on linux-x64: > > > Time taken with GrowableArray: 8378.018135 > Time taken with CHeap: 12437.347868 > Time taken with Arena: 8758.064717 > Time taken with GrowableArray again: 8391.076291 > > > Obviously, this is a very basic benchmark, but it seems like we're faster than CHeap and Arena for this case. Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: Fix tests ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18979/files - new: https://git.openjdk.org/jdk/pull/18979/files/db131696..f392c3b5 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18979&range=20 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18979&range=19-20 Stats: 13 lines in 1 file changed: 13 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/18979.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18979/head:pull/18979 PR: https://git.openjdk.org/jdk/pull/18979 From mdoerr at openjdk.org Mon Jun 17 09:35:29 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 17 Jun 2024 09:35:29 GMT Subject: RFR: 8331117: [PPC64] secondary_super_cache does not scale well [v8] In-Reply-To: References: Message-ID: On Sat, 15 Jun 2024 08:48:30 GMT, Martin Doerr wrote: >> PPC64 implementation of [JDK-8180450](https://bugs.openjdk.org/browse/JDK-8180450). Please review! >> I noticed that `r_array_length` is sometimes 0 and I don't see code for that on x86. Any idea? (This has been addressed in the discussion.) >> How can we verify it? By comparing the performance using the micro benchmarks? >> >> Micro benchmark results without patch (measured on Power10 with 2*8 hardware threads): >> >> Original >> SecondarySuperCacheHits: 13.033 ?(99.9%) 0.058 ns/op [Average] >> SecondarySuperCacheInterContention.test avgt 15 432.366 ? 8.364 ns/op >> SecondarySuperCacheInterContention.test:t1 avgt 15 432.310 ? 8.460 ns/op >> SecondarySuperCacheInterContention.test:t2 avgt 15 432.422 ? 10.819 ns/op >> SecondarySuperCacheIntraContention.test avgt 15 355.192 ? 3.597 ns/op >> SecondarySupersLookup.testNegative00 avgt 15 12.274 ? 0.026 ns/op >> SecondarySupersLookup.testNegative01 avgt 15 12.300 ? 0.039 ns/op >> SecondarySupersLookup.testNegative02 avgt 15 12.304 ? 0.034 ns/op >> SecondarySupersLookup.testNegative03 avgt 15 12.276 ? 0.050 ns/op >> SecondarySupersLookup.testNegative04 avgt 15 12.235 ? 0.044 ns/op >> SecondarySupersLookup.testNegative05 avgt 15 12.308 ? 0.156 ns/op >> SecondarySupersLookup.testNegative06 avgt 15 12.291 ? 0.048 ns/op >> SecondarySupersLookup.testNegative07 avgt 15 12.307 ? 0.052 ns/op >> SecondarySupersLookup.testNegative08 avgt 15 12.398 ? 0.075 ns/op >> SecondarySupersLookup.testNegative09 avgt 15 12.552 ? 0.122 ns/op >> SecondarySupersLookup.testNegative10 avgt 15 12.490 ? 0.083 ns/op >> SecondarySupersLookup.testNegative16 avgt 15 12.565 ? 0.092 ns/op >> SecondarySupersLookup.testNegative20 avgt 15 19.059 ? 0.958 ns/op >> SecondarySupersLookup.testNegative30 avgt 15 19.268 ? 0.124 ns/op >> SecondarySupersLookup.testNegative32 avgt 15 20.059 ? 0.114 ns/op >> SecondarySupersLookup.testNegative40 avgt 15 25.117 ? 0.368 ns/op >> SecondarySupersLookup.testNegative50 avgt 15 32.735 ? 0.359 ns/op >> SecondarySupersLookup.testNegative55 avgt 15 34.866 ? 0.152 ns/op >> SecondarySupersLookup.testNegative56 avgt 15 35.492 ? 0.276 ns/op >> SecondarySupersLookup.testNegative57 avgt 15 36.620 ? 0.334 ns/op >> SecondarySupersLookup.testNegative58 avgt 15 37.226 ? 0.180 ns/op >> SecondarySupersLookup.testNegative59 avgt 15 37.774 ? 0.241 ns/op >> SecondarySupersLookup.testNegative60 avgt 15 38.627 ? 1.451 ns/op >> Sec... > > Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: > > Make sure UseSecondarySupersTable is only used on Power7 or later. Thanks for all reviews, testing and comments! ------------- PR Comment: https://git.openjdk.org/jdk/pull/19368#issuecomment-2172861495 From mdoerr at openjdk.org Mon Jun 17 09:35:30 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 17 Jun 2024 09:35:30 GMT Subject: RFR: 8331117: [PPC64] secondary_super_cache does not scale well [v2] In-Reply-To: <4rR63y358m-xvin43-ltLouSYYUOO3igQ-8_f5OqF-o=.77e54e27-0e1a-4a46-82fa-a6446379a659@github.com> References: <4rR63y358m-xvin43-ltLouSYYUOO3igQ-8_f5OqF-o=.77e54e27-0e1a-4a46-82fa-a6446379a659@github.com> Message-ID: On Sat, 15 Jun 2024 09:03:57 GMT, Andrew Haley wrote: > > That doesn't look like a platform specific thing. I'm getting the same result on x86_64. @theRealAph: Is that a known limitation or is it worth a new JBS issue? > > Ah, I see. The test is doing some IR node counts for Klass loads, and `-UseSecondarySupersCache` deletes one of those loads. So it's not actually a behavioural change. I'm not sure what to do about this. Maybe file an issue and ask the IR test folks to take a look? I don't think this PR is a good place to discuss it. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19368#issuecomment-2172868044 From mdoerr at openjdk.org Mon Jun 17 09:35:30 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 17 Jun 2024 09:35:30 GMT Subject: Integrated: 8331117: [PPC64] secondary_super_cache does not scale well In-Reply-To: References: Message-ID: On Thu, 23 May 2024 14:11:36 GMT, Martin Doerr wrote: > PPC64 implementation of [JDK-8180450](https://bugs.openjdk.org/browse/JDK-8180450). Please review! > I noticed that `r_array_length` is sometimes 0 and I don't see code for that on x86. Any idea? (This has been addressed in the discussion.) > How can we verify it? By comparing the performance using the micro benchmarks? > > Micro benchmark results without patch (measured on Power10 with 2*8 hardware threads): > > Original > SecondarySuperCacheHits: 13.033 ?(99.9%) 0.058 ns/op [Average] > SecondarySuperCacheInterContention.test avgt 15 432.366 ? 8.364 ns/op > SecondarySuperCacheInterContention.test:t1 avgt 15 432.310 ? 8.460 ns/op > SecondarySuperCacheInterContention.test:t2 avgt 15 432.422 ? 10.819 ns/op > SecondarySuperCacheIntraContention.test avgt 15 355.192 ? 3.597 ns/op > SecondarySupersLookup.testNegative00 avgt 15 12.274 ? 0.026 ns/op > SecondarySupersLookup.testNegative01 avgt 15 12.300 ? 0.039 ns/op > SecondarySupersLookup.testNegative02 avgt 15 12.304 ? 0.034 ns/op > SecondarySupersLookup.testNegative03 avgt 15 12.276 ? 0.050 ns/op > SecondarySupersLookup.testNegative04 avgt 15 12.235 ? 0.044 ns/op > SecondarySupersLookup.testNegative05 avgt 15 12.308 ? 0.156 ns/op > SecondarySupersLookup.testNegative06 avgt 15 12.291 ? 0.048 ns/op > SecondarySupersLookup.testNegative07 avgt 15 12.307 ? 0.052 ns/op > SecondarySupersLookup.testNegative08 avgt 15 12.398 ? 0.075 ns/op > SecondarySupersLookup.testNegative09 avgt 15 12.552 ? 0.122 ns/op > SecondarySupersLookup.testNegative10 avgt 15 12.490 ? 0.083 ns/op > SecondarySupersLookup.testNegative16 avgt 15 12.565 ? 0.092 ns/op > SecondarySupersLookup.testNegative20 avgt 15 19.059 ? 0.958 ns/op > SecondarySupersLookup.testNegative30 avgt 15 19.268 ? 0.124 ns/op > SecondarySupersLookup.testNegative32 avgt 15 20.059 ? 0.114 ns/op > SecondarySupersLookup.testNegative40 avgt 15 25.117 ? 0.368 ns/op > SecondarySupersLookup.testNegative50 avgt 15 32.735 ? 0.359 ns/op > SecondarySupersLookup.testNegative55 avgt 15 34.866 ? 0.152 ns/op > SecondarySupersLookup.testNegative56 avgt 15 35.492 ? 0.276 ns/op > SecondarySupersLookup.testNegative57 avgt 15 36.620 ? 0.334 ns/op > SecondarySupersLookup.testNegative58 avgt 15 37.226 ? 0.180 ns/op > SecondarySupersLookup.testNegative59 avgt 15 37.774 ? 0.241 ns/op > SecondarySupersLookup.testNegative60 avgt 15 38.627 ? 1.451 ns/op > SecondarySupersLookup.testNegative61 avgt 15 39.395 ? 0.249 ns/op > Seco... This pull request has now been integrated. Changeset: 0d1080d1 Author: Martin Doerr URL: https://git.openjdk.org/jdk/commit/0d1080d194c596dc74dd8b173b18b14cc71e1b52 Stats: 421 lines in 6 files changed: 421 ins; 0 del; 0 mod 8331117: [PPC64] secondary_super_cache does not scale well Reviewed-by: rrich, amitkumar ------------- PR: https://git.openjdk.org/jdk/pull/19368 From rrich at openjdk.org Mon Jun 17 09:52:19 2024 From: rrich at openjdk.org (Richard Reingruber) Date: Mon, 17 Jun 2024 09:52:19 GMT Subject: RFR: 8331117: [PPC64] secondary_super_cache does not scale well [v2] In-Reply-To: <4rR63y358m-xvin43-ltLouSYYUOO3igQ-8_f5OqF-o=.77e54e27-0e1a-4a46-82fa-a6446379a659@github.com> References: <4rR63y358m-xvin43-ltLouSYYUOO3igQ-8_f5OqF-o=.77e54e27-0e1a-4a46-82fa-a6446379a659@github.com> Message-ID: On Sat, 15 Jun 2024 09:03:57 GMT, Andrew Haley wrote: > > That doesn't look like a platform specific thing. I'm getting the same result on x86_64. @theRealAph: Is that a known limitation or is it worth a new JBS issue? > > Ah, I see. The test is doing some IR node counts for Klass loads, and `-UseSecondarySupersCache` deletes one of those loads. So it's not actually a behavioural change. I'm not sure what to do about this. I think the `@IR` rule can be duplicated using `applyIf` to distinguish between `-XX:-UseSecondarySupersCache` and `-XX:+UseSecondarySupersCache` ------------- PR Comment: https://git.openjdk.org/jdk/pull/19368#issuecomment-2172919531 From amitkumar at openjdk.org Mon Jun 17 10:00:15 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Mon, 17 Jun 2024 10:00:15 GMT Subject: RFR: 8332105: Exploded JDK does not include CDS [v2] In-Reply-To: References: Message-ID: On Mon, 17 Jun 2024 06:39:45 GMT, Thomas Stuefe wrote: >> An exploded JDK cannot be used with either -Xshare:on or -Xshare:auto. That causes tests like runtime/CompressedOops/CompressedCPUSpecificClassSpaceReservation.java to fail when running on an exploded JDK. >> >> Since an exploded JDK cannot use CDS, we should - for tests - treat it as if CDS had not been included. >> >> >> ---- >> >> Note that I was torn between two ways to fix this: >> >> - either this fix, which is rather simple and automatically updates the "vm.cds" `@requires` property >> - or to expose "exploded-ness" as a boolean property via `WhiteBox` and `VMProps`(`jdk.exploded`). See this draft PR: https://github.com/openjdk/jdk/pull/19178 . >> >> The latter is cleaner and clearer, conveying the message of exploded-ness without muddling it with the CDS aspect. But OTOH the complexity may not be required. >> >> I can go either way, though I have a slight preference for this PR, which is why I posted it. > > Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: > > - Merge branch 'openjdk:master' into JDK-8332105-Exploded-JDK-should-count-as-if-CDS-had-not-been-included-in-the-build > - JDK-8332105-Exploded-JDK-should-count-as-if-CDS-had-not-been-included-in-the-build I still see some failures which are only failing in exploded-jvm; java/foreign/TestLinker.java java/lang/SecurityManager/CheckSecurityProvider.java java/lang/StackWalker/VerifyStackTrace.java java/lang/System/LoggerFinder/internal/BaseDefaultLoggerFinderTest/BaseDefaultLoggerFinderTest.java java/lang/System/LoggerFinder/internal/BootstrapLogger/BootstrapLoggerTest.java java/lang/System/LoggerFinder/internal/LoggerFinderLoaderTest/LoggerFinderLoaderTest.java java/lang/invoke/RevealDirectTest.java java/lang/invoke/lambda/LogGeneratedClassesTest.java java/lang/reflect/records/IsRecordTest.java java/lang/reflect/records/RecordPermissionsTest.java java/lang/reflect/records/RecordReflectionTest.java java/lang/runtime/ObjectMethodsTest.java java/util/Currency/PropertiesTestRun.java java/util/ResourceBundle/Bug6359330.java java/util/TimeZone/TimeZoneDatePermissionCheckRun.java java/util/logging/LogManager/Configuration/rootLoggerHandlers/BadRootLoggerHandlers.java java/util/logging/LogManager/Configuration/rootLoggerHandlers/RootLoggerHandlers.java java/util/logging/LogManager/Configuration/updateConfiguration/SimpleUpdateConfigWithInputStreamTest.java java/util/logging/LogManager/Configuration/updateConfiguration/UpdateConfigurationTest.java java/util/logging/Logger/getGlobal/TestGetGlobal.java java/util/logging/Logger/getGlobal/TestGetGlobalByName.java java/util/logging/Logger/getGlobal/TestGetGlobalConcurrent.java java/util/logging/Logger/setResourceBundle/TestSetResourceBundle.java java/util/logging/TestMainAppContext.java jdk/internal/jimage/JImageReadTest.java jdk/modules/etc/JmodExcludedFiles.java sun/reflect/ReflectionFactory/ReflectionFactoryTest.java sun/util/locale/provider/Bug8152817.java Did you also notice them in tier1 with exploded-build ? I only looked at `TestLinker.java` failure, It doesn't "seem" to be related to CDS: TEST RESULT: Failed. Execution failed: `main' threw exception: java.util.ServiceConfigurationError: Locale provider adapter "CLDR"cannot be instantiated. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19188#issuecomment-2172935164 From mli at openjdk.org Mon Jun 17 10:05:13 2024 From: mli at openjdk.org (Hamlin Li) Date: Mon, 17 Jun 2024 10:05:13 GMT Subject: RFR: 8332587: RISC-V: secondary_super_cache does not scale well [v7] In-Reply-To: References: Message-ID: On Sat, 15 Jun 2024 07:06:23 GMT, Gui Cao wrote: >> Implementation of subtype checking [JDK-8180450](https://bugs.openjdk.org/browse/JDK-8180450) for linux-riscv64. >> This optimization depends on availability of the Zbb extension which has the cpop instruction. >> >> ### Correctness testing: >> >> - [x] Run tier1-3, hotspot:tier4 tests on SOPHON SG2042 (release) >> - [x] Run tier1-3 tests with -XX:+UseRVV on qemu 8.1.0 (fastdebug) >> - [x] Run all of tier1 with `-XX:+VerifySecondarySupers` >> >> ### JMH tested on Banana Pi BPI-F3 board (has Zbb) and Enable UseZbb, Not Enable UseZba, UseZbs >> Original: >> >> Benchmark Mode Cnt Score Error Units >> SecondarySuperCacheHits.test avgt 15 11.375 ? 0.071 ns/op >> SecondarySuperCacheInterContention.test avgt 15 646.087 ? 32.587 ns/op >> SecondarySuperCacheInterContention.test:t1 avgt 15 600.090 ? 83.779 ns/op >> SecondarySuperCacheInterContention.test:t2 avgt 15 692.084 ? 73.218 ns/op >> SecondarySupersLookup.testNegative00 avgt 15 16.420 ? 0.239 ns/op >> SecondarySupersLookup.testNegative01 avgt 15 18.307 ? 0.260 ns/op >> SecondarySupersLookup.testNegative02 avgt 15 21.695 ? 0.458 ns/op >> SecondarySupersLookup.testNegative03 avgt 15 24.855 ? 0.664 ns/op >> SecondarySupersLookup.testNegative04 avgt 15 27.305 ? 0.522 ns/op >> SecondarySupersLookup.testNegative05 avgt 15 29.719 ? 0.385 ns/op >> SecondarySupersLookup.testNegative06 avgt 15 32.231 ? 0.498 ns/op >> SecondarySupersLookup.testNegative07 avgt 15 33.747 ? 0.603 ns/op >> SecondarySupersLookup.testNegative08 avgt 15 35.856 ? 0.629 ns/op >> SecondarySupersLookup.testNegative09 avgt 15 37.077 ? 0.546 ns/op >> SecondarySupersLookup.testNegative10 avgt 15 39.408 ? 0.465 ns/op >> SecondarySupersLookup.testNegative16 avgt 15 51.041 ? 0.547 ns/op >> SecondarySupersLookup.testNegative20 avgt 15 58.722 ? 0.922 ns/op >> SecondarySupersLookup.testNegative30 avgt 15 77.310 ? 0.654 ns/op >> SecondarySupersLookup.testNegative32 avgt 15 81.116 ? 0.854 ns/op >> SecondarySupersLookup.testNegative40 avgt 15 96.311 ? 0.840 ns/op >> SecondarySupersLookup.testNegative50 avgt 15 115.427 ? 0.838 ns/op >> SecondarySupersLookup.testNegative55 avgt 15 124.371 ? 1.076 ns/op >> SecondarySupersLookup.testNegative56 avgt 15 126.796 ? 0.916 ns/op >> SecondarySupersLookup.testNegative57 avgt 15 127.952 ? 1.202 ns/op >> SecondarySupersLookup.testNegative58 avgt 15 131.956 ? 4.515 ns/op >> Seco... > > Gui Cao has updated the pull request incrementally with one additional commit since the last revision: > > Put "secondary super table" generate code inside COMPILER2 macro Marked as reviewed by mli (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/19320#pullrequestreview-2122475235 From mli at openjdk.org Mon Jun 17 10:05:14 2024 From: mli at openjdk.org (Hamlin Li) Date: Mon, 17 Jun 2024 10:05:14 GMT Subject: RFR: 8332587: RISC-V: secondary_super_cache does not scale well [v6] In-Reply-To: <9UpkCN44laVTS1P7Eax4cZw82HdiSzuogLSaeXDdhPM=.c6694a54-1ec5-47e0-a846-02439f324dbb@github.com> References: <9UpkCN44laVTS1P7Eax4cZw82HdiSzuogLSaeXDdhPM=.c6694a54-1ec5-47e0-a846-02439f324dbb@github.com> Message-ID: On Sat, 15 Jun 2024 07:09:29 GMT, Gui Cao wrote: >> src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 5613: >> >>> 5611: } >>> 5612: >>> 5613: #ifdef COMPILER2 >> >> Maybe put other "secondary super table" related code also inside COMPILER2 macro? > > Hi, I I've put "secondary super table" related generate code inside COMPILER2 macro. The related code in macroAssembler, which I guess the C1 optimization[1] may also use, is currently consistent with arm64,x86, etc., and has not been put into COMPILER2 macro. > > [1] https://bugs.openjdk.org/browse/JDK-8331658 Thanks for updating. Looks good. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19320#discussion_r1642551885 From jsjolen at openjdk.org Mon Jun 17 10:11:13 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Mon, 17 Jun 2024 10:11:13 GMT Subject: RFR: 8334223: Make Arena MEMFLAGs immutable In-Reply-To: References: Message-ID: On Thu, 13 Jun 2024 11:59:05 GMT, Thomas Stuefe wrote: > Arenas carry NMT flags. > > An arena should never change that flag. But it does: Arenas (as ResourceAreas), used by CompilerThread, are accounted toward mtCompiler. But since the RA is already created in the parent class constructor (as mtThread), we then have to awkwardly change the flag of an already existing RA in the CompilerThread constructor. > > As a prerequisite for future NMT work I would like Arena MEMFLAGS to be immutable. > > The patch does that: > - we hand in MEMFLAGS to the Thread constructor now (defaults to mtThread) > - CompilerThread hands in mtCompiler, all other threads rely on the default > - on creation, both ResourceArea and HandleArea are now accounted toward the flag handed in > - that allows us to make Arena::flags const, and to remove ResourceArea::bias_to which changed the flag in-flight for the arena > - it also allows us to make Arena::flags private > > Other, unrelated cleanups: > - Made Arena::_size_in_bytes and Arena::_tag private > - Merged both Arena constructors into one by specifying a default value of `Chunk::init_size` for `init_size` argument. That makes it equivalent to the old `Arena(flag, tag)` constructor > - removed `JavaThread::JavaThread(bool)`. That constructor was used when creating threads that are getting attached. There was only a single use for that constructor, and I replaced it with functionally equivalent code. > > Tests: > > I manually verified that the NMT numbers printed don't change. Hi Thomas, This looks like a very good cleanup to me, thank you. All the best, Johan src/hotspot/share/compiler/compilerThread.cpp line 35: > 33: CompilerThread::CompilerThread(CompileQueue* queue, > 34: CompilerCounters* counters) > 35: : JavaThread(&CompilerThread::thread_entry, 0, mtCompiler) { Style, pre-existing: Shouldn't be this far indented. src/hotspot/share/memory/arena.hpp line 98: > 96: > 97: private: > 98: Style: No space between access specifier and next declaration. ------------- Marked as reviewed by jsjolen (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19693#pullrequestreview-2122466958 PR Review Comment: https://git.openjdk.org/jdk/pull/19693#discussion_r1642548209 PR Review Comment: https://git.openjdk.org/jdk/pull/19693#discussion_r1642547269 From amitkumar at openjdk.org Mon Jun 17 10:12:43 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Mon, 17 Jun 2024 10:12:43 GMT Subject: RFR: 8331126: [s390x] secondary_super_cache does not scale well [v3] In-Reply-To: References: Message-ID: > s390x Port for [JDK-8180450](https://bugs.openjdk.org/browse/JDK-8180450) > > I ran `tier1` test with `-XX:+UseSecondarySupersTable -XX:+VerifySecondarySupers -XX:+StressSecondarySupers` in fastdebug vm and didn't see any new failure appearing; Except one I have mentioned [here](https://github.com/openjdk/jdk/pull/19368#issuecomment-2154983693); But this is reproducible on every other architecture with these flags. > > > Without Patch: > > SecondarySuperCacheHits.test avgt 15 0.929 ? 0.010 ns/op > > SecondarySuperCacheInterContention.test avgt 15 1.413 ? 0.007 ns/op > SecondarySuperCacheInterContention.test:t1 avgt 15 1.415 ? 0.016 ns/op > SecondarySuperCacheInterContention.test:t2 avgt 15 1.410 ? 0.017 ns/op > > Benchmark Mode Cnt Score Error Units > SecondarySupersLookup.testNegative00 avgt 15 1.806 ? 0.325 ns/op > SecondarySupersLookup.testNegative01 avgt 15 2.364 ? 0.236 ns/op > SecondarySupersLookup.testNegative02 avgt 15 2.903 ? 0.215 ns/op > SecondarySupersLookup.testNegative03 avgt 15 3.417 ? 0.199 ns/op > SecondarySupersLookup.testNegative04 avgt 15 3.758 ? 0.102 ns/op > SecondarySupersLookup.testNegative05 avgt 15 4.352 ? 0.123 ns/op > SecondarySupersLookup.testNegative06 avgt 15 4.800 ? 0.099 ns/op > SecondarySupersLookup.testNegative07 avgt 15 5.365 ? 0.060 ns/op > SecondarySupersLookup.testNegative08 avgt 15 6.316 ? 0.092 ns/op > SecondarySupersLookup.testNegative09 avgt 15 6.669 ? 0.164 ns/op > SecondarySupersLookup.testNegative10 avgt 15 7.041 ? 0.164 ns/op > SecondarySupersLookup.testNegative16 avgt 15 9.336 ? 0.185 ns/op > SecondarySupersLookup.testNegative20 avgt 15 11.373 ? 0.029 ns/op > SecondarySupersLookup.testNegative30 avgt 15 15.236 ? 0.051 ns/op > SecondarySupersLookup.testNegative32 avgt 15 16.031 ? 0.091 ns/op > SecondarySupersLookup.testNegative40 avgt 15 19.197 ? 0.279 ns/op > SecondarySupersLookup.testNegative50 avgt 15 23.804 ? 2.387 ns/op > SecondarySupersLookup.testNegative55 avgt 15 25.610 ? 1.155 ns/op > SecondarySupersLookup.testNegative56 avgt 15 26.128 ? 2.203 ns/op > SecondarySupersLookup.testNegative57 avgt 15 26.126 ? 0.881 ns/op > SecondarySupersLookup.testNegative58 avgt 15 26.314 ? 0.521 ns/op > SecondarySupersLookup.testNegative59 avgt 15 26.750 ? 0.837 ns/op > SecondarySupersLookup.testNegative60 avgt 15 27.118 ? 0.557 ns/op > SecondarySupersLookup.testNegative61 avgt 15 27.763 ? 1.628 ns... Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: Update src/hotspot/cpu/s390/macroAssembler_s390.cpp ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19544/files - new: https://git.openjdk.org/jdk/pull/19544/files/d2d56f8e..b7f0554a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19544&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19544&range=01-02 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/19544.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19544/head:pull/19544 PR: https://git.openjdk.org/jdk/pull/19544 From amitkumar at openjdk.org Mon Jun 17 10:12:43 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Mon, 17 Jun 2024 10:12:43 GMT Subject: RFR: 8331126: [s390x] secondary_super_cache does not scale well [v2] In-Reply-To: References: Message-ID: On Sat, 15 Jun 2024 13:15:38 GMT, Amit Kumar wrote: >> s390x Port for [JDK-8180450](https://bugs.openjdk.org/browse/JDK-8180450) >> >> I ran `tier1` test with `-XX:+UseSecondarySupersTable -XX:+VerifySecondarySupers -XX:+StressSecondarySupers` in fastdebug vm and didn't see any new failure appearing; Except one I have mentioned [here](https://github.com/openjdk/jdk/pull/19368#issuecomment-2154983693); But this is reproducible on every other architecture with these flags. >> >> >> Without Patch: >> >> SecondarySuperCacheHits.test avgt 15 0.929 ? 0.010 ns/op >> >> SecondarySuperCacheInterContention.test avgt 15 1.413 ? 0.007 ns/op >> SecondarySuperCacheInterContention.test:t1 avgt 15 1.415 ? 0.016 ns/op >> SecondarySuperCacheInterContention.test:t2 avgt 15 1.410 ? 0.017 ns/op >> >> Benchmark Mode Cnt Score Error Units >> SecondarySupersLookup.testNegative00 avgt 15 1.806 ? 0.325 ns/op >> SecondarySupersLookup.testNegative01 avgt 15 2.364 ? 0.236 ns/op >> SecondarySupersLookup.testNegative02 avgt 15 2.903 ? 0.215 ns/op >> SecondarySupersLookup.testNegative03 avgt 15 3.417 ? 0.199 ns/op >> SecondarySupersLookup.testNegative04 avgt 15 3.758 ? 0.102 ns/op >> SecondarySupersLookup.testNegative05 avgt 15 4.352 ? 0.123 ns/op >> SecondarySupersLookup.testNegative06 avgt 15 4.800 ? 0.099 ns/op >> SecondarySupersLookup.testNegative07 avgt 15 5.365 ? 0.060 ns/op >> SecondarySupersLookup.testNegative08 avgt 15 6.316 ? 0.092 ns/op >> SecondarySupersLookup.testNegative09 avgt 15 6.669 ? 0.164 ns/op >> SecondarySupersLookup.testNegative10 avgt 15 7.041 ? 0.164 ns/op >> SecondarySupersLookup.testNegative16 avgt 15 9.336 ? 0.185 ns/op >> SecondarySupersLookup.testNegative20 avgt 15 11.373 ? 0.029 ns/op >> SecondarySupersLookup.testNegative30 avgt 15 15.236 ? 0.051 ns/op >> SecondarySupersLookup.testNegative32 avgt 15 16.031 ? 0.091 ns/op >> SecondarySupersLookup.testNegative40 avgt 15 19.197 ? 0.279 ns/op >> SecondarySupersLookup.testNegative50 avgt 15 23.804 ? 2.387 ns/op >> SecondarySupersLookup.testNegative55 avgt 15 25.610 ? 1.155 ns/op >> SecondarySupersLookup.testNegative56 avgt 15 26.128 ? 2.203 ns/op >> SecondarySupersLookup.testNegative57 avgt 15 26.126 ? 0.881 ns/op >> SecondarySupersLookup.testNegative58 avgt 15 26.314 ? 0.521 ns/op >> SecondarySupersLookup.testNegative59 avgt 15 26.750 ? 0.837 ns/op >> SecondarySupersLookup.testNegative60 avgt 15 27.118 ? 0.557 ... > > Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: > > Removes unused Labels & makes comment more sensible src/hotspot/cpu/s390/macroAssembler_s390.cpp line 3157: > 3155: > 3156: // scans count pointer sized words at [addr] for occurrence of value, > 3157: // generic (count must be >0) Suggestion: // scans count pointer sized words at [r_addr] for occurrence of r_value, // generic (r_count must be >0) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19544#discussion_r1642564350 From mli at openjdk.org Mon Jun 17 10:14:14 2024 From: mli at openjdk.org (Hamlin Li) Date: Mon, 17 Jun 2024 10:14:14 GMT Subject: RFR: 8334078: RISC-V: TestIntVect.java fails after JDK-8332153 when running without RVV In-Reply-To: <8ZyZ714PEAEWRF290pni1GMVjgVNb0GHjFImmp6xgMw=.9ec8e9b4-be14-48cc-a633-abb59b3fa173@github.com> References: <5kfe-OxwcEK0A20RTgeuue-RB-mkk16xz_SfJvIW-iE=.41b3cdad-0ab8-4b1e-8e16-552028d0bbd2@github.com> <8ZyZ714PEAEWRF290pni1GMVjgVNb0GHjFImmp6xgMw=.9ec8e9b4-be14-48cc-a633-abb59b3fa173@github.com> Message-ID: <4PB2Xr3tNIGHWStynDRgEqTj-a5Kcb4a3eW6NtPiMeo=.ce8ce063-2384-4539-91b5-8bd7870c5a12@github.com> On Sat, 15 Jun 2024 06:50:51 GMT, Gui Cao wrote: >> Yes, rv* is much better, I'm OK with this renaming. >> >> At the same time, can you fix `WHITE_BOX.getCPUFeatures()` with `CPUInfo.getFeatures()` in IREncodingPrinter.java? As I think it's the final fix for this kind of issue. As I said, with a `String.contains(xxx)`, it could fail with other cpu features in the future, as it mixes all cpu features in one long string, and there is no guarantee the similar issue will not happen again. > >> Yes, rv* is much better, I'm OK with this renaming. >> >> At the same time, can you fix `WHITE_BOX.getCPUFeatures()` with `CPUInfo.getFeatures()` in IREncodingPrinter.java? As I think it's the final fix for this kind of issue. As I said, with a `String.contains(xxx)`, it could fail with other cpu features in the future, as it mixes all cpu features in one long string, and there is no guarantee the similar issue will not happen again. > > When I modify it this way, x86 fastdebug has some errors. > ``` diff > diff --git a/test/hotspot/jtreg/compiler/lib/ir_framework/test/IREncodingPrinter.java b/test/hotspot/jtreg/compiler/lib/ir_framework/test/IREncodingPrinter.java > index 73943db3f53..03eba7c6c2c 100644 > --- a/test/hotspot/jtreg/compiler/lib/ir_framework/test/IREncodingPrinter.java > +++ b/test/hotspot/jtreg/compiler/lib/ir_framework/test/IREncodingPrinter.java > @@ -29,6 +29,7 @@ > import compiler.lib.ir_framework.shared.*; > import jdk.test.lib.Platform; > import jdk.test.whitebox.WhiteBox; > +import jdk.test.whitebox.cpuinfo.CPUInfo; > > import java.lang.reflect.Method; > import java.nio.ByteOrder; > @@ -416,7 +417,7 @@ private boolean checkCPUFeature(String feature, String value) { > TestFormat.failNoThrow("Provided incorrect value for feature " + feature + failAt()); > return false; > } > - String cpuFeatures = WHITE_BOX.getCPUFeatures(); > + List cpuFeatures = CPUInfo.getFeatures(); > return (trueValue && cpuFeatures.contains(feature)) || (falseValue && !cpuFeatures.contains(feature)); > } > > > > > cpu info: > > processor : 127 > vendor_id : GenuineIntel > cpu family : 6 > model : 106 > model name : Intel(R) Xeon(R) Platinum 8378C CPU @ 2.80GHz > stepping : 6 > microcode : 0x1 > cpu MHz : 2799.998 > cache size : 58368 KB > physical id : 1 > siblings : 64 > core id : 31 > cpu cores : 32 > apicid : 127 > initial apicid : 127 > fpu : yes > fpu_exception : yes > cpuid level : 13 > wp : yes > flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch cpuid_fault invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb avx512cd sha_ni avx512bw avx512vl xsave... I see, thanks for updating. I guess there could be other potential issues in the future, but it's related to existing, I don't have good simle solution for it. Let's move forward with this one. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19686#discussion_r1642567012 From mli at openjdk.org Mon Jun 17 10:17:13 2024 From: mli at openjdk.org (Hamlin Li) Date: Mon, 17 Jun 2024 10:17:13 GMT Subject: RFR: 8334078: RISC-V: TestIntVect.java fails after JDK-8332153 when running without RVV In-Reply-To: References: Message-ID: On Thu, 13 Jun 2024 03:24:15 GMT, Gui Cao wrote: > Hi, test/hotspot/jtreg/compiler/c2/cr7200264/TestIntVect.java fails without RVV after [JDK-8332153](https://bugs.openjdk.org/browse/JDK-8332153) in fastdebug mode. see jbs issue for exception information. > > As discussed on jbs, we prefixed the single letter cpu features with rv so that there would be no problem. And to synchronize the test cases. > > After this patch, we can get cpu feature string like this: > > ----------System.out:(4/168)---------- > WB.getCPUFeatures(): "rv64 rvi rvm rva rvf rvd rvc rvv" > CPUInfo.getAdditionalCPUInfo(): "" > CPUInfo.getFeatures(): [rv64, rvi, rvm, rva, rvf, rvd, rvc, rvv] > TEST PASSED > > > ### Testing > - [x] All Tests related to all changes in this patch on Banana Pi BPI-F3 board (with RVV1.0) (fastdebug) > - [x] All Tests related to all changes in this patch on SOPHON SG2042 (fastdebug) > - [ ] Run tier1-3 tests on SOPHON SG2042 (fastdebug) Looks good. ------------- Marked as reviewed by mli (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19686#pullrequestreview-2122504709 From amitkumar at openjdk.org Mon Jun 17 10:19:46 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Mon, 17 Jun 2024 10:19:46 GMT Subject: RFR: 8331126: [s390x] secondary_super_cache does not scale well [v4] In-Reply-To: References: Message-ID: > s390x Port for [JDK-8180450](https://bugs.openjdk.org/browse/JDK-8180450) > > I ran `tier1` test with `-XX:+UseSecondarySupersTable -XX:+VerifySecondarySupers -XX:+StressSecondarySupers` in fastdebug vm and didn't see any new failure appearing; Except one I have mentioned [here](https://github.com/openjdk/jdk/pull/19368#issuecomment-2154983693); But this is reproducible on every other architecture with these flags. > > > Without Patch: > > SecondarySuperCacheHits.test avgt 15 0.929 ? 0.010 ns/op > > SecondarySuperCacheInterContention.test avgt 15 1.413 ? 0.007 ns/op > SecondarySuperCacheInterContention.test:t1 avgt 15 1.415 ? 0.016 ns/op > SecondarySuperCacheInterContention.test:t2 avgt 15 1.410 ? 0.017 ns/op > > Benchmark Mode Cnt Score Error Units > SecondarySupersLookup.testNegative00 avgt 15 1.806 ? 0.325 ns/op > SecondarySupersLookup.testNegative01 avgt 15 2.364 ? 0.236 ns/op > SecondarySupersLookup.testNegative02 avgt 15 2.903 ? 0.215 ns/op > SecondarySupersLookup.testNegative03 avgt 15 3.417 ? 0.199 ns/op > SecondarySupersLookup.testNegative04 avgt 15 3.758 ? 0.102 ns/op > SecondarySupersLookup.testNegative05 avgt 15 4.352 ? 0.123 ns/op > SecondarySupersLookup.testNegative06 avgt 15 4.800 ? 0.099 ns/op > SecondarySupersLookup.testNegative07 avgt 15 5.365 ? 0.060 ns/op > SecondarySupersLookup.testNegative08 avgt 15 6.316 ? 0.092 ns/op > SecondarySupersLookup.testNegative09 avgt 15 6.669 ? 0.164 ns/op > SecondarySupersLookup.testNegative10 avgt 15 7.041 ? 0.164 ns/op > SecondarySupersLookup.testNegative16 avgt 15 9.336 ? 0.185 ns/op > SecondarySupersLookup.testNegative20 avgt 15 11.373 ? 0.029 ns/op > SecondarySupersLookup.testNegative30 avgt 15 15.236 ? 0.051 ns/op > SecondarySupersLookup.testNegative32 avgt 15 16.031 ? 0.091 ns/op > SecondarySupersLookup.testNegative40 avgt 15 19.197 ? 0.279 ns/op > SecondarySupersLookup.testNegative50 avgt 15 23.804 ? 2.387 ns/op > SecondarySupersLookup.testNegative55 avgt 15 25.610 ? 1.155 ns/op > SecondarySupersLookup.testNegative56 avgt 15 26.128 ? 2.203 ns/op > SecondarySupersLookup.testNegative57 avgt 15 26.126 ? 0.881 ns/op > SecondarySupersLookup.testNegative58 avgt 15 26.314 ? 0.521 ns/op > SecondarySupersLookup.testNegative59 avgt 15 26.750 ? 0.837 ns/op > SecondarySupersLookup.testNegative60 avgt 15 27.118 ? 0.557 ns/op > SecondarySupersLookup.testNegative61 avgt 15 27.763 ? 1.628 ns... Amit Kumar has updated the pull request incrementally with two additional commits since the last revision: - Update src/hotspot/cpu/s390/macroAssembler_s390.cpp - rename: r_scratch to r_result in repne_scan method ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19544/files - new: https://git.openjdk.org/jdk/pull/19544/files/b7f0554a..1042f43a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19544&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19544&range=02-03 Stats: 5 lines in 1 file changed: 0 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/19544.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19544/head:pull/19544 PR: https://git.openjdk.org/jdk/pull/19544 From amitkumar at openjdk.org Mon Jun 17 10:19:47 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Mon, 17 Jun 2024 10:19:47 GMT Subject: RFR: 8331126: [s390x] secondary_super_cache does not scale well [v4] In-Reply-To: References: Message-ID: On Mon, 17 Jun 2024 10:16:46 GMT, Amit Kumar wrote: >> s390x Port for [JDK-8180450](https://bugs.openjdk.org/browse/JDK-8180450) >> >> I ran `tier1` test with `-XX:+UseSecondarySupersTable -XX:+VerifySecondarySupers -XX:+StressSecondarySupers` in fastdebug vm and didn't see any new failure appearing; Except one I have mentioned [here](https://github.com/openjdk/jdk/pull/19368#issuecomment-2154983693); But this is reproducible on every other architecture with these flags. >> >> >> Without Patch: >> >> SecondarySuperCacheHits.test avgt 15 0.929 ? 0.010 ns/op >> >> SecondarySuperCacheInterContention.test avgt 15 1.413 ? 0.007 ns/op >> SecondarySuperCacheInterContention.test:t1 avgt 15 1.415 ? 0.016 ns/op >> SecondarySuperCacheInterContention.test:t2 avgt 15 1.410 ? 0.017 ns/op >> >> Benchmark Mode Cnt Score Error Units >> SecondarySupersLookup.testNegative00 avgt 15 1.806 ? 0.325 ns/op >> SecondarySupersLookup.testNegative01 avgt 15 2.364 ? 0.236 ns/op >> SecondarySupersLookup.testNegative02 avgt 15 2.903 ? 0.215 ns/op >> SecondarySupersLookup.testNegative03 avgt 15 3.417 ? 0.199 ns/op >> SecondarySupersLookup.testNegative04 avgt 15 3.758 ? 0.102 ns/op >> SecondarySupersLookup.testNegative05 avgt 15 4.352 ? 0.123 ns/op >> SecondarySupersLookup.testNegative06 avgt 15 4.800 ? 0.099 ns/op >> SecondarySupersLookup.testNegative07 avgt 15 5.365 ? 0.060 ns/op >> SecondarySupersLookup.testNegative08 avgt 15 6.316 ? 0.092 ns/op >> SecondarySupersLookup.testNegative09 avgt 15 6.669 ? 0.164 ns/op >> SecondarySupersLookup.testNegative10 avgt 15 7.041 ? 0.164 ns/op >> SecondarySupersLookup.testNegative16 avgt 15 9.336 ? 0.185 ns/op >> SecondarySupersLookup.testNegative20 avgt 15 11.373 ? 0.029 ns/op >> SecondarySupersLookup.testNegative30 avgt 15 15.236 ? 0.051 ns/op >> SecondarySupersLookup.testNegative32 avgt 15 16.031 ? 0.091 ns/op >> SecondarySupersLookup.testNegative40 avgt 15 19.197 ? 0.279 ns/op >> SecondarySupersLookup.testNegative50 avgt 15 23.804 ? 2.387 ns/op >> SecondarySupersLookup.testNegative55 avgt 15 25.610 ? 1.155 ns/op >> SecondarySupersLookup.testNegative56 avgt 15 26.128 ? 2.203 ns/op >> SecondarySupersLookup.testNegative57 avgt 15 26.126 ? 0.881 ns/op >> SecondarySupersLookup.testNegative58 avgt 15 26.314 ? 0.521 ns/op >> SecondarySupersLookup.testNegative59 avgt 15 26.750 ? 0.837 ns/op >> SecondarySupersLookup.testNegative60 avgt 15 27.118 ? 0.557 ... > > Amit Kumar has updated the pull request incrementally with two additional commits since the last revision: > > - Update src/hotspot/cpu/s390/macroAssembler_s390.cpp > - rename: r_scratch to r_result in repne_scan method src/hotspot/cpu/s390/macroAssembler_s390.cpp line 3156: > 3154: } > 3155: > 3156: // scans count pointer sized words at [r_addr] for occurrence of r_value, Suggestion: // scans r_count pointer sized words at [r_addr] for occurrence of r_value, ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19544#discussion_r1642571738 From mbaesken at openjdk.org Mon Jun 17 10:27:23 2024 From: mbaesken at openjdk.org (Matthias Baesken) Date: Mon, 17 Jun 2024 10:27:23 GMT Subject: RFR: 8334239: Introduce macro for ubsan method/function exclusions [v2] In-Reply-To: References: Message-ID: On Mon, 17 Jun 2024 06:49:41 GMT, Matthias Baesken wrote: >> A number of functions/methods have to be excluded from ubsan detection (e.g. because they do things that ubsan warns about, however it is still valid what is done there). >> We can simplify this by introducing a macro (similar to asan-related ATTRIBUTE_NO_ASAN, see sanitizers/address.hpp). >> Currently something like this is used : >> >> #if defined(__clang__) || defined(__GNUC__) >> __attribute__((no_sanitize("undefined"))) >> #endif > > Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: > > move ATTRIBUTE_NO_UBSAN to a separate line Under 'sanitizers' there would be a header for one sanitizer called 'ubsan' - that naming might be a little redundant but would still make sense because that#s the name of the tool ; but I can live with both ub.hpp or ubsan.hpp . Are there any other opinions ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/19722#issuecomment-2172991559 From amitkumar at openjdk.org Mon Jun 17 10:35:18 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Mon, 17 Jun 2024 10:35:18 GMT Subject: RFR: 8332603: [PPC64] Improve itable_stub [v2] In-Reply-To: References: Message-ID: On Sat, 15 Jun 2024 14:37:37 GMT, Amit Kumar wrote: >> PPC Port similar to [JDK-8305959 (x86)](https://bugs.openjdk.org/browse/JDK-8305959) and [JDK-8307352(aarch64)](https://bugs.openjdk.org/browse/JDK-8307352) > > Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: > > consistency I have tried to reproduce it but I'm constantly getting "-nativepath" not specified error; Even though I have hardcoded native directory path; I'll try to look into the failure once I'm able to reproduce it on my machine; ------------- PR Comment: https://git.openjdk.org/jdk/pull/19733#issuecomment-2173004544 From aph at openjdk.org Mon Jun 17 10:50:20 2024 From: aph at openjdk.org (Andrew Haley) Date: Mon, 17 Jun 2024 10:50:20 GMT Subject: RFR: 8331117: [PPC64] secondary_super_cache does not scale well [v2] In-Reply-To: <4rR63y358m-xvin43-ltLouSYYUOO3igQ-8_f5OqF-o=.77e54e27-0e1a-4a46-82fa-a6446379a659@github.com> References: <4rR63y358m-xvin43-ltLouSYYUOO3igQ-8_f5OqF-o=.77e54e27-0e1a-4a46-82fa-a6446379a659@github.com> Message-ID: On Sat, 15 Jun 2024 09:03:57 GMT, Andrew Haley wrote: >>> That doesn't look like a platform specific thing. I'm getting the same result on x86_64. @theRealAph: Is that a known limitation or is it worth a new JBS issue? >> >> I've never seen this. It must be a regression. I'll have a look. > >> That doesn't look like a platform specific thing. I'm getting the same result on x86_64. @theRealAph: Is that a known limitation or is it worth a new JBS issue? > > Ah, I see. The test is doing some IR node counts for Klass loads, and `-UseSecondarySupersCache` deletes one of those loads. So it's not actually a behavioural change. I'm not sure what to do about this. > > > That doesn't look like a platform specific thing. I'm getting the same result on x86_64. @theRealAph: Is that a known limitation or is it worth a new JBS issue? > > > > > > Ah, I see. The test is doing some IR node counts for Klass loads, and `-UseSecondarySupersCache` deletes one of those loads. So it's not actually a behavioural change. I'm not sure what to do about this. > > Maybe file an issue and ask the IR test folks to take a look? I don't think this PR is a good place to discuss it. The tests are not all expected to work with arbitrary combinations of -XX arguments. In many cases, they will surely fail. I'll be working on a followup patch that will remove all uses of `secondary_super_cache`, and at that point the test will need to be fixed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19368#issuecomment-2173061620 From stuefe at openjdk.org Mon Jun 17 10:57:16 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Mon, 17 Jun 2024 10:57:16 GMT Subject: RFR: 8334239: Introduce macro for ubsan method/function exclusions [v2] In-Reply-To: References: Message-ID: On Mon, 17 Jun 2024 10:24:58 GMT, Matthias Baesken wrote: > Under 'sanitizers' there would be a header for one sanitizer called 'ubsan' - that naming might be a little redundant but would still make sense because that#s the name of the tool ; but I can live with both ub.hpp or ubsan.hpp . Are there any other opinions ? I'm fine with all proposals so far. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19722#issuecomment-2173074347 From gcao at openjdk.org Mon Jun 17 11:30:13 2024 From: gcao at openjdk.org (Gui Cao) Date: Mon, 17 Jun 2024 11:30:13 GMT Subject: RFR: 8334078: RISC-V: TestIntVect.java fails after JDK-8332153 when running without RVV In-Reply-To: References: Message-ID: On Thu, 13 Jun 2024 05:49:59 GMT, Fei Yang wrote: >> Hi, test/hotspot/jtreg/compiler/c2/cr7200264/TestIntVect.java fails without RVV after [JDK-8332153](https://bugs.openjdk.org/browse/JDK-8332153) in fastdebug mode. see jbs issue for exception information. >> >> As discussed on jbs, we prefixed the single letter cpu features with rv so that there would be no problem. And to synchronize the test cases. >> >> After this patch, we can get cpu feature string like this: >> >> ----------System.out:(4/168)---------- >> WB.getCPUFeatures(): "rv64 rvi rvm rva rvf rvd rvc rvv" >> CPUInfo.getAdditionalCPUInfo(): "" >> CPUInfo.getFeatures(): [rv64, rvi, rvm, rva, rvf, rvd, rvc, rvv] >> TEST PASSED >> >> >> ### Testing >> - [x] All Tests related to all changes in this patch on Banana Pi BPI-F3 board (with RVV1.0) (fastdebug) >> - [x] All Tests related to all changes in this patch on SOPHON SG2042 (fastdebug) >> - [ ] Run tier1-3 tests on SOPHON SG2042 (fastdebug) > > Nice cleanup! Thanks! > > Suggestion about the JBS title: `RISC-V: TestIntVect.java fails after JDK-8332153 when running without RVV` @RealFYang @Hamlin-Li : Thanks for the review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19686#issuecomment-2173135628 From gcao at openjdk.org Mon Jun 17 11:38:14 2024 From: gcao at openjdk.org (Gui Cao) Date: Mon, 17 Jun 2024 11:38:14 GMT Subject: Integrated: 8334078: RISC-V: TestIntVect.java fails after JDK-8332153 when running without RVV In-Reply-To: References: Message-ID: On Thu, 13 Jun 2024 03:24:15 GMT, Gui Cao wrote: > Hi, test/hotspot/jtreg/compiler/c2/cr7200264/TestIntVect.java fails without RVV after [JDK-8332153](https://bugs.openjdk.org/browse/JDK-8332153) in fastdebug mode. see jbs issue for exception information. > > As discussed on jbs, we prefixed the single letter cpu features with rv so that there would be no problem. And to synchronize the test cases. > > After this patch, we can get cpu feature string like this: > > ----------System.out:(4/168)---------- > WB.getCPUFeatures(): "rv64 rvi rvm rva rvf rvd rvc rvv" > CPUInfo.getAdditionalCPUInfo(): "" > CPUInfo.getFeatures(): [rv64, rvi, rvm, rva, rvf, rvd, rvc, rvv] > TEST PASSED > > > ### Testing > - [x] All Tests related to all changes in this patch on Banana Pi BPI-F3 board (with RVV1.0) (fastdebug) > - [x] All Tests related to all changes in this patch on SOPHON SG2042 (fastdebug) > - [ ] Run tier1-3 tests on SOPHON SG2042 (fastdebug) This pull request has now been integrated. Changeset: ef7923e1 Author: Gui Cao Committer: Fei Yang URL: https://git.openjdk.org/jdk/commit/ef7923e1277ce86c6e5331871f1031c28bf82e31 Stats: 207 lines in 14 files changed: 3 ins; 149 del; 55 mod 8334078: RISC-V: TestIntVect.java fails after JDK-8332153 when running without RVV Reviewed-by: fyang, mli ------------- PR: https://git.openjdk.org/jdk/pull/19686 From rehn at openjdk.org Mon Jun 17 12:00:38 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Mon, 17 Jun 2024 12:00:38 GMT Subject: RFR: 8332689: RISC-V: Use load instead of trampolines [v13] In-Reply-To: References: Message-ID: <17ZAbvTeE5BVr3IfiBEUueHTS4sSEaDywTUHf5Q8TLI=.cd33ced6-c102-4b6f-b38d-4b3fa30fa22b@github.com> > Hi all, please consider! > > Today we do JAL to **dest** if **dest** is in reach (+/- 1 MB). > Using a very small application or running very short time we have fast patchable calls. > But any normal application running longer will increase the code size and code chrun/fragmentation. > So whatever or not you get hot fast calls rely on luck. > > To be patchable and get code cache reach we also emit a stub trampoline which we can point the JAL to. > This would be the common case for a patchable call. > > Code stream: > JAL > Stubs: > AUIPC > LD > JALR > > > > On some CPUs L1D and L1I can't contain the same cache line, which means the tramopline stub can bounce from L1I->L1D->L1I, which is expensive. > Even if you don't have that problem having a call to a jump is not the fastest way. > Loading the address avoids the pitsfalls of cmodx. > > This patch suggest to solve the problems with trampolines, we take small penalty in the naive case of JAL to **dest**, > and instead do by default: > > Code stream: > AUIPC > LD > JALR > Stubs: > > > An experimental option for turning trampolines back on exists. > > It should be possible to enhanced this with the WIP [Zjid](https://github.com/riscv/riscv-j-extension) by changing the JALR to JAL and nop out the auipc+ld (as the current proposal of Zjid forces the I-fetcher to fetch instruction in order (meaning we will avoid a lot issues which arm has)) when in reach and vice-versa. > > Numbers from VF2 (I have done them a few times, they are always overall in favor of this patch): > > fop (msec) 2239 | 2128 = 0.950424 > h2 (msec) 18660 | 16594 = 0.889282 > jython (msec) 22022 | 21925 = 0.995595 > luindex (msec) 2866 | 2842 = 0.991626 > lusearch (msec) 4108 | 4311 = 1.04942 > lusearch-fix (msec) 4406 | 4116 = 0.934181 > pmd (msec) 5976 | 5897 = 0.98678 > jython (msec) 22022 | 21925 = 0.995595 > Avg: 0.974112 > fop(xcomp) (msec) 2721 | 2714 = 0.997427 > h2(xcomp) (msec) 37719 | 38004 = 1.00756 > jython(xcomp) ... Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 20 commits: - To be pushed - Merge branch 'master' into 8332689 - Review comments, removed dead code. - Merge branch 'master' into 8332689 - Merge branch 'master' into 8332689 - Merge branch 'master' into 8332689 - Merge branch 'master' into 8332689 - Merge branch 'master' into 8332689 - Remove tmp file - Prepare for dynamic NativeCall size - ... and 10 more: https://git.openjdk.org/jdk/compare/31e8deba...b51702b4 ------------- Changes: https://git.openjdk.org/jdk/pull/19453/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19453&range=12 Stats: 874 lines in 16 files changed: 611 ins; 168 del; 95 mod Patch: https://git.openjdk.org/jdk/pull/19453.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19453/head:pull/19453 PR: https://git.openjdk.org/jdk/pull/19453 From rcastanedalo at openjdk.org Mon Jun 17 12:13:51 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 17 Jun 2024 12:13:51 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 Message-ID: This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail. We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold: - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work. ## Summary of the Changes ### Platform-Independent Changes (`src/hotspot/share`) These consist mainly of: - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early; - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and - temporary support for porting the JEP to the remaining platforms. The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed. ### Platform-Dependent Changes (`src/hotspot/cpu`) These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms. #### ADL Changes The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy. #### `G1BarrierSetAssembler` Changes Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live registers, provided by the `SaveLiveRegisters` class. This class is already available in all platforms that support ZGC (see [JDK-8330685](https://bugs.openjdk.org/browse/JDK-8330685)). ### Test Changes (`test`) The changeset includes: - a comprehensive set of tests that verify, using the [IR Test Framework](https://github.com/openjdk/jdk/blob/master/test/hotspot/jtreg/compiler/lib/ir_framework/README.md), that barriers are generated and optimized as expected in different scenarios and configurations; - a test that triggers a latent issue in C2's OopMap building logic (addressed by this changeset), where undefined values generated by C2's implementation of ADL `TEMP` operands are included in OopMaps; - removal of memory limits in tests where, before this changeset, C2 would rather bail out silently; - adjustment of the OptoAssembly output expected by `compiler.c2.aarch64.TestVolatiles`; and - relaxation of the expectations in a case of `compiler.c2.irTests.scalarReplacement.AllocationMergesTests` where C2 is now able to reduce an allocation. ## Notes for Port Maintainers Porting this JEP to a different platform involves the following tasks: - Predicate all existing ADL instructions that may match a C2 memory access operation so that the match is only enabled if the operations do not include barrier information (`barrier_data() == 0`). The relevant memory access operations are (for `X` in `{P, N}`): `StoreX`, `CompareAndExchangeX`, `CompareAndSwapX`, `WeakCompareAndSwapX`, `GetAndSetX`, and `LoadX`. - Create a new ADL file (with suggested name `src/hotspot/cpu/$PLATFORM/gc/g1/g1_$PLATFORM.ad`) where all G1-specific memory access instructions are defined and predicated with (`UseG1GC && barrier_data() != 0`). It is important to use the same instruction naming convention for all platforms (`g1StoreX`, `g1CompareAndExchangeX`, etc.), to support running the new IR Test Framework tests. The instruction implementations are responsible for generating the appropriate barrier code as well as the memory access itself. Generating the barrier code typically involves reserving temporary registers to support OOP encoding/decoding and the barrier operations themselves, creating a `G1PreBarrierStubC2` or `G1PostBarrierStubC2`, notifying the stub object which registers are and aren't live at the barrier point (besides those that are live out of the entire instruction), and calling into `G1BarrierSetAssembler` to generate the actual barrier machine code. - Generalize, and possibly refactor, the logic already existing in `G1BarrierSetAssembler` to generate barrier code from the newly introduced ADL instructions. - Implement `G1BarrierSetAssembler::generate_c2_pre_barrier_stub()` and `G1BarrierSetAssembler::generate_c2_post_barrier_stub()` to generate the out-of-line, slow path of the barriers. The slow path includes a call into the JVM, which is supported by the `SaveLiveRegisters` class, typically implemented in `src/hotspot/cpu/$PLATFORM/gc/shared/barrierSetAssembler_$PLATFORM.*`. - For the arm (32-bit), s390, and x86 (32-bit) platforms, implement the `SaveLiveRegisters` class (or similar support). Essentially, this class implements saving and restoring registers given by the barrier stub class, according to the calling convention of the platform. - Optionally, extend `Matcher::pd_clone_node()` and introduce an additional `g1EncodePAndStoreN` ADL instruction to support the removal of redundant decompression operations (see the [JEP description](https://openjdk.org/jeps/475), subsection "Candidate optimizations"). To ease these tasks, this changeset includes the following additional functionality, which is guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT` and will be removed after all ports are merged in this pull request and before integration: - `G1UseLateBarrierExpansion` JVM flag to enable/disable late barrier expansion. This flag can be useful to diagnose issues in the generated barrier code by comparing to that generated for early barrier expansion. - `G1StressBarriers` JVM flag to run G1 under an extreme configuration that exercises otherwise rarely executed barrier paths. This flag can be useful to improve test coverage and find low-frequency bugs in the barrier implementation. ## Testing ### Functionality | tests | default configuration | `-XX:-UseCompressedOops` | `-XX:+G1StressBarriers` | `-XX:-UseCompressedOops -XX:+G1StressBarriers` | | ----- | --------------------- | ------------------------ | ----------------------- | ---------------------------------------------- | | tier1-tier3 | all Oracle-supported platforms * | all Oracle-supported platforms | all Oracle-supported platforms | all Oracle-supported platforms | | tier4-tier5 | all Oracle-supported platforms | linux-x64, linux-aarch64 | linux-x64, linux-aarch64 | - | | tier6-tier8 | all Oracle-supported platforms | - | - | - | | jcstress | linux-x64, linux-aarch64 | linux-x64, linux-aarch64 | linux-x64, linux-aarch64 | linux-x64, linux-aarch64 | * _all Oracle-supported platforms_: linux-x64, windows-x64, macosx-x64, linux-aarch64, macosx-aarch64 ### C2 Execution Time On average, this changeset reduces C2's execution time, when using the G1 collector, by 15% (on x64) and 18% (on aarch64) across all [DaCapo 23.11-chopin](https://www.dacapobench.org/) benchmarks. C2's execution time is measured, on both compared JVM versions, using the HotSpot options `-Xbatch -XX:-TieredCompilation -XX:+CITime`. ### Quality of C2-Generated Code #### Speed Over a wide range of standard benchmark suites (including DaCapo 9.12-bach, DaCapo 23.11-chopin, Renaissance, SPECjbb2015, and SPECjvm2008) run across all Oracle-supported platforms, this changeset yields 101 statistically significant speedups (including four double-digit speedups, up to 17%) and 16 statistically significant regressions (down to -6%). This general speedup can be explained by a combination of the following three factors: a positive net effect in the compiler inlining heuristics (due to differences in how these handle barrier code), the effect of a lower C2 overhead in short-running and/or CPU-saturating benchmarks, and, to a lesser extent, C2 optimizations enabled by late barrier expansion (such as [barrier elision](https://mail.openjdk.org/pipermail/hotspot-gc-dev/2024-March/046283.html) and [allocation reduction](https://github.com/openjdk/jdk/compare/master...robcasloz:jdk:JDK-8334060-g1-late-barrier-expansion#diff-bd6f2e9621b343c4322e740e187e1874dd809bfb51af621ae5 782522bc17ae8fR1358-R1363)). #### Size The changeset does not cause any statistically significant difference in the size of the code generated by C2 for any [DaCapo 23.11-chopin](https://www.dacapobench.org/) benchmark on the x64 or aarch64 platforms. Size of C2-generated code is measured using the same Hotspot options as for C2 execution time, and compared after normalizing to the total number of bytecodes compiled (i.e. the unit of comparison is C2-compiled bytes per C2-compiled bytecode). ### Comprehensibility to Non-C2 Developers In parallel with the JEP implementation, two Oracle GC engineers have successfully and independently prototyped non-trivial G1 barrier enhancements using the late barrier expansion model. According to their feedback, these enhancements would have been significantly harder to prototype, if at all possible, using early barrier expansion without requiring assistance from a C2 engineer. ------------- Commit messages: - Implement JEP 475 Changes: https://git.openjdk.org/jdk/pull/19746/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19746&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8334060 Stats: 3656 lines in 35 files changed: 3355 ins; 174 del; 127 mod Patch: https://git.openjdk.org/jdk/pull/19746.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19746/head:pull/19746 PR: https://git.openjdk.org/jdk/pull/19746 From fweimer at openjdk.org Mon Jun 17 12:56:18 2024 From: fweimer at openjdk.org (Florian Weimer) Date: Mon, 17 Jun 2024 12:56:18 GMT Subject: RFR: 8333133: Simplify QuickSort::sort In-Reply-To: References: Message-ID: On Mon, 17 Jun 2024 04:48:46 GMT, David Holmes wrote: > > It does not provide any such thing. All the flag does is prevent swapping of > > equivalent elements, which doesn't give us any interesting additional ordering > > property. > > I only meant the sort order of the equivalent elements would be maintained. I think the partitioning phase swaps inequal elements based on comparison with the pivot, and this can move elements equivalent to the pivot past the pivot, with or without that additional equality check. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19464#issuecomment-2173322389 From zgu at openjdk.org Mon Jun 17 13:27:14 2024 From: zgu at openjdk.org (Zhengyu Gu) Date: Mon, 17 Jun 2024 13:27:14 GMT Subject: RFR: 8333962: Obsolete OldSize [v2] In-Reply-To: References: Message-ID: On Fri, 14 Jun 2024 10:19:47 GMT, Albert Mingkun Yang wrote: >> Obsolete OldSize and related code. An internal variable `OldSize` is kept to capture the capacity of old-gen size. > > Albert Mingkun Yang has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains one commit: > > obsolete-old-size src/hotspot/share/gc/shared/genArguments.hpp line 36: > 34: extern size_t MaxOldSize; > 35: > 36: extern size_t OldSize; Any reason we still want to keep `OldSize` variable? becase GCs really care about are `init`, `min` and `max` values. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19647#discussion_r1642814306 From szaldana at openjdk.org Mon Jun 17 13:39:16 2024 From: szaldana at openjdk.org (Sonia Zaldana Calles) Date: Mon, 17 Jun 2024 13:39:16 GMT Subject: RFR: 8333769: Pretouching tests dont test pretouching In-Reply-To: References: Message-ID: On Sat, 15 Jun 2024 07:56:01 GMT, Thomas Stuefe wrote: > Good in general. Does it run as part of GHAs? @tstuefe That's correct. It runs under the hs/tier1 gc tests. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19699#issuecomment-2173419347 From ayang at openjdk.org Mon Jun 17 14:14:13 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Mon, 17 Jun 2024 14:14:13 GMT Subject: RFR: 8333962: Obsolete OldSize [v2] In-Reply-To: References: Message-ID: On Mon, 17 Jun 2024 13:25:00 GMT, Zhengyu Gu wrote: >> Albert Mingkun Yang has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains one commit: >> >> obsolete-old-size > > src/hotspot/share/gc/shared/genArguments.hpp line 36: > >> 34: extern size_t MaxOldSize; >> 35: >> 36: extern size_t OldSize; > > Any reason we still want to keep `OldSize` variable? becase GCs really care about are `init`, `min` and `max` values. The concept of "initial old-gen size" will always be there. If `OldSize` is removed, all readers need to be updated to `InitialHeapSize - NewSize`. It's not obvious that is definitely better/more readable. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19647#discussion_r1642885640 From zgu at openjdk.org Mon Jun 17 14:24:14 2024 From: zgu at openjdk.org (Zhengyu Gu) Date: Mon, 17 Jun 2024 14:24:14 GMT Subject: RFR: 8333962: Obsolete OldSize [v2] In-Reply-To: References: Message-ID: <9wcGUx1Tyv7uL_qE19qUb-65JJglpKpmxxsoZhqTuMo=.90e1be90-0f90-4d43-8bb7-87aa3d6a59cc@github.com> On Fri, 14 Jun 2024 10:19:47 GMT, Albert Mingkun Yang wrote: >> Obsolete OldSize and related code. An internal variable `OldSize` is kept to capture the capacity of old-gen size. > > Albert Mingkun Yang has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains one commit: > > obsolete-old-size LGTM ------------- Marked as reviewed by zgu (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19647#pullrequestreview-2123055719 From mdoerr at openjdk.org Mon Jun 17 14:29:21 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 17 Jun 2024 14:29:21 GMT Subject: RFR: 8332603: [PPC64] Improve itable_stub [v2] In-Reply-To: References: Message-ID: On Sat, 15 Jun 2024 14:37:37 GMT, Amit Kumar wrote: >> PPC Port similar to [JDK-8305959 (x86)](https://bugs.openjdk.org/browse/JDK-8305959) and [JDK-8307352(aarch64)](https://bugs.openjdk.org/browse/JDK-8307352) > > Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: > > consistency I'll take a closer look when I find more time. @varada1110: Do you still have the hs_err file for the error you reported? It would be helpful to see which instruction is getting the value which is too large. Note that we have `add_const_optmized` available on PPC64 for large offsets if it's an addi instruction. src/hotspot/cpu/ppc/macroAssembler_ppc.cpp line 1976: > 1974: bne(CCR0, L_loop_search_resolved); > 1975: > 1976: mr_if_needed(holder_offset, scan_temp); I prefer using `mr` without "_if_needed" after you asserted that the registers are different. src/hotspot/cpu/ppc/macroAssembler_ppc.cpp line 1983: > 1981: cmpdi(CCR0, holder_offset, 0); > 1982: beq(CCR0, L_search_holder); > 1983: mr_if_needed(scan_temp, holder_offset); Same, here. ------------- PR Review: https://git.openjdk.org/jdk/pull/19733#pullrequestreview-2123057335 PR Review Comment: https://git.openjdk.org/jdk/pull/19733#discussion_r1642902552 PR Review Comment: https://git.openjdk.org/jdk/pull/19733#discussion_r1642902887 From ayang at openjdk.org Mon Jun 17 14:30:30 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Mon, 17 Jun 2024 14:30:30 GMT Subject: RFR: 8333962: Obsolete OldSize [v3] In-Reply-To: References: Message-ID: <2teqQqHjCobhvE7McOCbECpQ1z4kxnFg74KoqFHZiFg=.93fae87f-eeef-4010-a458-7a5aebbae869@github.com> > Obsolete OldSize and related code. An internal variable `OldSize` is kept to capture the capacity of old-gen size. Albert Mingkun Yang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains one additional commit since the last revision: obsolete-old-size ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19647/files - new: https://git.openjdk.org/jdk/pull/19647/files/3e820b6f..a27f5172 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19647&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19647&range=01-02 Stats: 1483 lines in 67 files changed: 966 ins; 310 del; 207 mod Patch: https://git.openjdk.org/jdk/pull/19647.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19647/head:pull/19647 PR: https://git.openjdk.org/jdk/pull/19647 From ayang at openjdk.org Mon Jun 17 14:30:30 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Mon, 17 Jun 2024 14:30:30 GMT Subject: RFR: 8333962: Obsolete OldSize [v2] In-Reply-To: References: Message-ID: On Fri, 14 Jun 2024 10:19:47 GMT, Albert Mingkun Yang wrote: >> Obsolete OldSize and related code. An internal variable `OldSize` is kept to capture the capacity of old-gen size. > > Albert Mingkun Yang has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains one commit: > > obsolete-old-size Synced with master. Running some tests before merging. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19647#issuecomment-2173573858 From asmehra at openjdk.org Mon Jun 17 14:38:11 2024 From: asmehra at openjdk.org (Ashutosh Mehra) Date: Mon, 17 Jun 2024 14:38:11 GMT Subject: RFR: 8333769: Pretouching tests dont test pretouching In-Reply-To: References: Message-ID: On Thu, 13 Jun 2024 13:24:42 GMT, Sonia Zaldana Calles wrote: > Hi all, > > This PR addresses [8333769](https://bugs.openjdk.org/browse/JDK-8333769). > > We already have a test for parallel GC that makes sure pretouching behaviour is correct ([test/hotspot/jtreg/gc/parallel/TestAlwaysPreTouchBehavior.java](https://github.com/openjdk/jdk/blob/master/test/hotspot/jtreg/gc/parallel/TestAlwaysPreTouchBehavior.java)). > > Unfortunately this test is limited to linux because of the scanning of `/proc/pid/status`. With this patch I propose two changes: > > - Adding a function to the os namespace `os::rss` and exposing this API via WhiteBox. This in turn allows us to generalize the above test to be used across all platforms. > - Running the modified test with all collectors. > > Additionally, I considered removing other pre-existing pretouch tests (for example, this [z test](https://github.com/openjdk/jdk/blob/master/test/hotspot/jtreg/gc/z/TestAlwaysPreTouch.java)), as this new test is a bit more thorough. However, I noticed that some of these tests run alongside other configurables such as varying numbers of parallel GC threads, varying heap sizes, etc. Therefore, there might not be any harm in running these tests as well. > > Looking forward to your comments, > Sonia `TestAlwaysPreTouchBehavior` has covered all GCs except the Serial. Is that intentional? ------------- PR Comment: https://git.openjdk.org/jdk/pull/19699#issuecomment-2173592012 From stuefe at openjdk.org Mon Jun 17 15:19:42 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Mon, 17 Jun 2024 15:19:42 GMT Subject: RFR: 8333994: NMT: call stacks should show source information [v2] In-Reply-To: References: Message-ID: > Patch adds printing of source file and line number to NMT call stacks that are printed in detail mode. Useful for hunting down leaks. Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: - caching - Merge branch 'master' into JDK-8333994-NMT-call-stacks-should-show-source-information - exclude macos from testing source info - copyrights - test - JDK-8333994-NMT-call-stacks-should-show-source-information ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19655/files - new: https://git.openjdk.org/jdk/pull/19655/files/c6a20adb..63240369 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19655&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19655&range=00-01 Stats: 6576 lines in 440 files changed: 4694 ins; 600 del; 1282 mod Patch: https://git.openjdk.org/jdk/pull/19655.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19655/head:pull/19655 PR: https://git.openjdk.org/jdk/pull/19655 From mdoerr at openjdk.org Mon Jun 17 15:28:13 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 17 Jun 2024 15:28:13 GMT Subject: RFR: 8332603: [PPC64] Improve itable_stub [v2] In-Reply-To: References: Message-ID: On Sat, 15 Jun 2024 14:37:37 GMT, Amit Kumar wrote: >> PPC Port similar to [JDK-8305959 (x86)](https://bugs.openjdk.org/browse/JDK-8305959) and [JDK-8307352(aarch64)](https://bugs.openjdk.org/browse/JDK-8307352) > > Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: > > consistency The error can be reproduced by `make run-test TEST="gtest:code"`. The generated hs_err files shows: V [libjvm.so+0x8ad7ac] Assembler::addi(Register, Register, int)+0x16c (assembler_ppc.hpp:1027) V [libjvm.so+0x191f758] MacroAssembler::lookup_interface_method_stub(Register, Register, Register, Register, Register, Register, int, Label&)+0x3e8 (macroAssembler_ppc.cpp:1988) V [libjvm.so+0x21ca8d0] VtableStubs::create_itable_stub(int)+0x380 (vtableStubs_ppc_64.cpp:189) ------------- PR Comment: https://git.openjdk.org/jdk/pull/19733#issuecomment-2173710317 From stuefe at openjdk.org Mon Jun 17 15:28:14 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Mon, 17 Jun 2024 15:28:14 GMT Subject: RFR: 8333994: NMT: call stacks should show source information [v2] In-Reply-To: References: Message-ID: On Fri, 14 Jun 2024 21:13:05 GMT, Zhengyu Gu wrote: > > I have the same question. Did dwarf decoder performance improve? If so, could you point me the PR? Thanks! I completely forgot that this had been an issue. The comment was even written by me :( No, Elf decoder is still slow. But I have found myself too many times staring at NMT output now trying to make sense of the offsets. Missing source info in combination with the small stack size of 4 makes investigations a pain. I added a simple caching mechanism to aid printing. Its pretty straight-forward, but still I am not sure it is worth the complexity. Here the numbers: Running all NMT jtreg tests: - Stock JVM (no source info): 40 seconds - Source info: 2 min 30 seconds - Source info + caching: 1 min 15 seconds I think that is acceptable. Any more intricate caching would be over the complexity-benefit line. @gerard-ziemski The cost is with Dwarf parsing, not dladdr. dladdr is cheap. But feel free to make Dwarf parsing cheaper, that would be surely welcome. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19655#issuecomment-2173709665 From stuefe at openjdk.org Mon Jun 17 15:33:34 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Mon, 17 Jun 2024 15:33:34 GMT Subject: RFR: 8334223: Make Arena MEMFLAGs immutable [v2] In-Reply-To: References: Message-ID: > Arenas carry NMT flags. > > An arena should never change that flag. But it does: Arenas (as ResourceAreas), used by CompilerThread, are accounted toward mtCompiler. But since the RA is already created in the parent class constructor (as mtThread), we then have to awkwardly change the flag of an already existing RA in the CompilerThread constructor. > > As a prerequisite for future NMT work I would like Arena MEMFLAGS to be immutable. > > The patch does that: > - we hand in MEMFLAGS to the Thread constructor now (defaults to mtThread) > - CompilerThread hands in mtCompiler, all other threads rely on the default > - on creation, both ResourceArea and HandleArea are now accounted toward the flag handed in > - that allows us to make Arena::flags const, and to remove ResourceArea::bias_to which changed the flag in-flight for the arena > - it also allows us to make Arena::flags private > > Other, unrelated cleanups: > - Made Arena::_size_in_bytes and Arena::_tag private > - Merged both Arena constructors into one by specifying a default value of `Chunk::init_size` for `init_size` argument. That makes it equivalent to the old `Arena(flag, tag)` constructor > - removed `JavaThread::JavaThread(bool)`. That constructor was used when creating threads that are getting attached. There was only a single use for that constructor, and I replaced it with functionally equivalent code. > > Tests: > > I manually verified that the NMT numbers printed don't change. Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: - feedback johan - Merge branch 'master' into arena-constify-memflags - start ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19693/files - new: https://git.openjdk.org/jdk/pull/19693/files/78238a42..1dabcc59 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19693&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19693&range=00-01 Stats: 3421 lines in 176 files changed: 2449 ins; 476 del; 496 mod Patch: https://git.openjdk.org/jdk/pull/19693.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19693/head:pull/19693 PR: https://git.openjdk.org/jdk/pull/19693 From stuefe at openjdk.org Mon Jun 17 15:33:34 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Mon, 17 Jun 2024 15:33:34 GMT Subject: RFR: 8334223: Make Arena MEMFLAGs immutable [v2] In-Reply-To: References: Message-ID: On Mon, 17 Jun 2024 10:08:49 GMT, Johan Sj?len wrote: > This looks like a very good cleanup to me, thank you. Thanks @jdksjolen ! I would like to have someone from the runtime team okay the changes to the thread constructors. Maybe @dholmes-ora ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/19693#issuecomment-2173721837 From amitkumar at openjdk.org Mon Jun 17 15:33:18 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Mon, 17 Jun 2024 15:33:18 GMT Subject: RFR: 8332603: [PPC64] Improve itable_stub [v2] In-Reply-To: References: Message-ID: <2HGiiXycT6tAOaKkQw913hTq5bb9nr05Xlw3kQuinnU=.9d6e298a-f464-4675-88fa-64d12fe7805d@github.com> On Sat, 15 Jun 2024 14:37:37 GMT, Amit Kumar wrote: >> PPC Port similar to [JDK-8305959 (x86)](https://bugs.openjdk.org/browse/JDK-8305959) and [JDK-8307352(aarch64)](https://bugs.openjdk.org/browse/JDK-8307352) > > Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: > > consistency thanks, I can see it on ppc-le as well. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19733#issuecomment-2173720170 From szaldana at openjdk.org Mon Jun 17 15:38:30 2024 From: szaldana at openjdk.org (Sonia Zaldana Calles) Date: Mon, 17 Jun 2024 15:38:30 GMT Subject: RFR: 8333769: Pretouching tests dont test pretouching [v2] In-Reply-To: References: Message-ID: > Hi all, > > This PR addresses [8333769](https://bugs.openjdk.org/browse/JDK-8333769). > > We already have a test for parallel GC that makes sure pretouching behaviour is correct ([test/hotspot/jtreg/gc/parallel/TestAlwaysPreTouchBehavior.java](https://github.com/openjdk/jdk/blob/master/test/hotspot/jtreg/gc/parallel/TestAlwaysPreTouchBehavior.java)). > > Unfortunately this test is limited to linux because of the scanning of `/proc/pid/status`. With this patch I propose two changes: > > - Adding a function to the os namespace `os::rss` and exposing this API via WhiteBox. This in turn allows us to generalize the above test to be used across all platforms. > - Running the modified test with all collectors. > > Additionally, I considered removing other pre-existing pretouch tests (for example, this [z test](https://github.com/openjdk/jdk/blob/master/test/hotspot/jtreg/gc/z/TestAlwaysPreTouch.java)), as this new test is a bit more thorough. However, I noticed that some of these tests run alongside other configurables such as varying numbers of parallel GC threads, varying heap sizes, etc. Therefore, there might not be any harm in running these tests as well. > > Looking forward to your comments, > Sonia Sonia Zaldana Calles has updated the pull request incrementally with one additional commit since the last revision: Changes based on feedback and also adding test for serial collector ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19699/files - new: https://git.openjdk.org/jdk/pull/19699/files/c1dacf72..756899a7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19699&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19699&range=00-01 Stats: 43 lines in 6 files changed: 19 ins; 1 del; 23 mod Patch: https://git.openjdk.org/jdk/pull/19699.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19699/head:pull/19699 PR: https://git.openjdk.org/jdk/pull/19699 From szaldana at openjdk.org Mon Jun 17 15:38:30 2024 From: szaldana at openjdk.org (Sonia Zaldana Calles) Date: Mon, 17 Jun 2024 15:38:30 GMT Subject: RFR: 8333769: Pretouching tests dont test pretouching [v2] In-Reply-To: References: Message-ID: On Sat, 15 Jun 2024 07:55:16 GMT, Thomas Stuefe wrote: >> Sonia Zaldana Calles has updated the pull request incrementally with one additional commit since the last revision: >> >> Changes based on feedback and also adding test for serial collector > > test/hotspot/jtreg/gc/TestAlwaysPreTouchBehavior.java line 108: > >> 106: } >> 107: Runtime runtime = Runtime.getRuntime(); >> 108: long committedMemory = runtime.totalMemory() / 1024; // in kb > > Why divide by KB? Seems off. Ah yes, this is an oversight on my part. The old test converted to KB as the RSS read from `/proc/pid/status` reported KB. I had already added this conversion in the linux specific implementation of `os::rss`, so I'll go ahead and remove this. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19699#discussion_r1643008407 From stuefe at openjdk.org Mon Jun 17 15:44:16 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Mon, 17 Jun 2024 15:44:16 GMT Subject: RFR: 8333769: Pretouching tests dont test pretouching [v2] In-Reply-To: References: Message-ID: On Mon, 17 Jun 2024 15:38:30 GMT, Sonia Zaldana Calles wrote: >> Hi all, >> >> This PR addresses [8333769](https://bugs.openjdk.org/browse/JDK-8333769). >> >> We already have a test for parallel GC that makes sure pretouching behaviour is correct ([test/hotspot/jtreg/gc/parallel/TestAlwaysPreTouchBehavior.java](https://github.com/openjdk/jdk/blob/master/test/hotspot/jtreg/gc/parallel/TestAlwaysPreTouchBehavior.java)). >> >> Unfortunately this test is limited to linux because of the scanning of `/proc/pid/status`. With this patch I propose two changes: >> >> - Adding a function to the os namespace `os::rss` and exposing this API via WhiteBox. This in turn allows us to generalize the above test to be used across all platforms. >> - Running the modified test with all collectors. >> >> Additionally, I considered removing other pre-existing pretouch tests (for example, this [z test](https://github.com/openjdk/jdk/blob/master/test/hotspot/jtreg/gc/z/TestAlwaysPreTouch.java)), as this new test is a bit more thorough. However, I noticed that some of these tests run alongside other configurables such as varying numbers of parallel GC threads, varying heap sizes, etc. Therefore, there might not be any harm in running these tests as well. >> >> Looking forward to your comments, >> Sonia > > Sonia Zaldana Calles has updated the pull request incrementally with one additional commit since the last revision: > > Changes based on feedback and also adding test for serial collector LGTM, but I wonder whether we should then move the remaining functionality from the individual TestAlwaysPreTouch here and remove those tests, e.g. test/hotspot/jtreg/gc/shenandoah/options/TestAlwaysPreTouch.java @shipilev ? ------------- Marked as reviewed by stuefe (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19699#pullrequestreview-2123247552 From szaldana at openjdk.org Mon Jun 17 15:44:17 2024 From: szaldana at openjdk.org (Sonia Zaldana Calles) Date: Mon, 17 Jun 2024 15:44:17 GMT Subject: RFR: 8333769: Pretouching tests dont test pretouching In-Reply-To: References: Message-ID: On Mon, 17 Jun 2024 14:35:32 GMT, Ashutosh Mehra wrote: > `TestAlwaysPreTouchBehavior` has covered all GCs except the Serial. Is that intentional? Hi @ashu-mehra, thanks for pointing that out. I've added a test case for the serial collector as well. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19699#issuecomment-2173745007 From ayang at openjdk.org Mon Jun 17 15:49:12 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Mon, 17 Jun 2024 15:49:12 GMT Subject: RFR: 8333769: Pretouching tests dont test pretouching [v2] In-Reply-To: References: Message-ID: On Mon, 17 Jun 2024 15:38:30 GMT, Sonia Zaldana Calles wrote: >> Hi all, >> >> This PR addresses [8333769](https://bugs.openjdk.org/browse/JDK-8333769). >> >> We already have a test for parallel GC that makes sure pretouching behaviour is correct ([test/hotspot/jtreg/gc/parallel/TestAlwaysPreTouchBehavior.java](https://github.com/openjdk/jdk/blob/master/test/hotspot/jtreg/gc/parallel/TestAlwaysPreTouchBehavior.java)). >> >> Unfortunately this test is limited to linux because of the scanning of `/proc/pid/status`. With this patch I propose two changes: >> >> - Adding a function to the os namespace `os::rss` and exposing this API via WhiteBox. This in turn allows us to generalize the above test to be used across all platforms. >> - Running the modified test with all collectors. >> >> Additionally, I considered removing other pre-existing pretouch tests (for example, this [z test](https://github.com/openjdk/jdk/blob/master/test/hotspot/jtreg/gc/z/TestAlwaysPreTouch.java)), as this new test is a bit more thorough. However, I noticed that some of these tests run alongside other configurables such as varying numbers of parallel GC threads, varying heap sizes, etc. Therefore, there might not be any harm in running these tests as well. >> >> Looking forward to your comments, >> Sonia > > Sonia Zaldana Calles has updated the pull request incrementally with one additional commit since the last revision: > > Changes based on feedback and also adding test for serial collector Can you resolve the merge conflict? ------------- PR Comment: https://git.openjdk.org/jdk/pull/19699#issuecomment-2173751846 From shade at openjdk.org Mon Jun 17 15:49:13 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 17 Jun 2024 15:49:13 GMT Subject: RFR: 8333769: Pretouching tests dont test pretouching [v2] In-Reply-To: References: Message-ID: On Mon, 17 Jun 2024 15:39:19 GMT, Thomas Stuefe wrote: > LGTM, but I wonder whether we should then move the remaining functionality from the individual TestAlwaysPreTouch here and remove those tests, e.g. test/hotspot/jtreg/gc/shenandoah/options/TestAlwaysPreTouch.java I don't think so, leave them be. The simple tests that only check that startup with `-XX:+AlwaysPreTouch` works without crashes should remain, especially if you need to exclude a more comprehensive test for some platforms/configs. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19699#issuecomment-2173752887 From szaldana at openjdk.org Mon Jun 17 15:49:13 2024 From: szaldana at openjdk.org (Sonia Zaldana Calles) Date: Mon, 17 Jun 2024 15:49:13 GMT Subject: RFR: 8333769: Pretouching tests dont test pretouching [v2] In-Reply-To: References: Message-ID: On Mon, 17 Jun 2024 15:38:30 GMT, Sonia Zaldana Calles wrote: >> Hi all, >> >> This PR addresses [8333769](https://bugs.openjdk.org/browse/JDK-8333769). >> >> We already have a test for parallel GC that makes sure pretouching behaviour is correct ([test/hotspot/jtreg/gc/parallel/TestAlwaysPreTouchBehavior.java](https://github.com/openjdk/jdk/blob/master/test/hotspot/jtreg/gc/parallel/TestAlwaysPreTouchBehavior.java)). >> >> Unfortunately this test is limited to linux because of the scanning of `/proc/pid/status`. With this patch I propose two changes: >> >> - Adding a function to the os namespace `os::rss` and exposing this API via WhiteBox. This in turn allows us to generalize the above test to be used across all platforms. >> - Running the modified test with all collectors. >> >> Additionally, I considered removing other pre-existing pretouch tests (for example, this [z test](https://github.com/openjdk/jdk/blob/master/test/hotspot/jtreg/gc/z/TestAlwaysPreTouch.java)), as this new test is a bit more thorough. However, I noticed that some of these tests run alongside other configurables such as varying numbers of parallel GC threads, varying heap sizes, etc. Therefore, there might not be any harm in running these tests as well. >> >> Looking forward to your comments, >> Sonia > > Sonia Zaldana Calles has updated the pull request incrementally with one additional commit since the last revision: > > Changes based on feedback and also adding test for serial collector Hi folks, I have a conflict with https://github.com/openjdk/jdk/pull/18592. Just wanted to verify I should add the `-XX:-UseMadvPopulateWrite` for the new iteration of this test as well. cc: @tstuefe ------------- PR Comment: https://git.openjdk.org/jdk/pull/19699#issuecomment-2173758918 From ayang at openjdk.org Mon Jun 17 15:54:22 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Mon, 17 Jun 2024 15:54:22 GMT Subject: RFR: 8333962: Obsolete OldSize [v3] In-Reply-To: <2teqQqHjCobhvE7McOCbECpQ1z4kxnFg74KoqFHZiFg=.93fae87f-eeef-4010-a458-7a5aebbae869@github.com> References: <2teqQqHjCobhvE7McOCbECpQ1z4kxnFg74KoqFHZiFg=.93fae87f-eeef-4010-a458-7a5aebbae869@github.com> Message-ID: On Mon, 17 Jun 2024 14:30:30 GMT, Albert Mingkun Yang wrote: >> Obsolete OldSize and related code. An internal variable `OldSize` is kept to capture the capacity of old-gen size. > > Albert Mingkun Yang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains one additional commit since the last revision: > > obsolete-old-size Tests pass. Thanks for review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19647#issuecomment-2173767317 From ayang at openjdk.org Mon Jun 17 15:54:23 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Mon, 17 Jun 2024 15:54:23 GMT Subject: Integrated: 8333962: Obsolete OldSize In-Reply-To: References: Message-ID: On Tue, 11 Jun 2024 08:17:02 GMT, Albert Mingkun Yang wrote: > Obsolete OldSize and related code. An internal variable `OldSize` is kept to capture the capacity of old-gen size. This pull request has now been integrated. Changeset: c94af6f9 Author: Albert Mingkun Yang URL: https://git.openjdk.org/jdk/commit/c94af6f943c179553d1827550847b93491d47506 Stats: 192 lines in 15 files changed: 7 ins; 168 del; 17 mod 8333962: Obsolete OldSize Reviewed-by: dholmes, zgu ------------- PR: https://git.openjdk.org/jdk/pull/19647 From stuefe at openjdk.org Mon Jun 17 15:59:23 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Mon, 17 Jun 2024 15:59:23 GMT Subject: RFR: 8333769: Pretouching tests dont test pretouching [v2] In-Reply-To: References: Message-ID: On Mon, 17 Jun 2024 15:38:30 GMT, Sonia Zaldana Calles wrote: >> Hi all, >> >> This PR addresses [8333769](https://bugs.openjdk.org/browse/JDK-8333769). >> >> We already have a test for parallel GC that makes sure pretouching behaviour is correct ([test/hotspot/jtreg/gc/parallel/TestAlwaysPreTouchBehavior.java](https://github.com/openjdk/jdk/blob/master/test/hotspot/jtreg/gc/parallel/TestAlwaysPreTouchBehavior.java)). >> >> Unfortunately this test is limited to linux because of the scanning of `/proc/pid/status`. With this patch I propose two changes: >> >> - Adding a function to the os namespace `os::rss` and exposing this API via WhiteBox. This in turn allows us to generalize the above test to be used across all platforms. >> - Running the modified test with all collectors. >> >> Additionally, I considered removing other pre-existing pretouch tests (for example, this [z test](https://github.com/openjdk/jdk/blob/master/test/hotspot/jtreg/gc/z/TestAlwaysPreTouch.java)), as this new test is a bit more thorough. However, I noticed that some of these tests run alongside other configurables such as varying numbers of parallel GC threads, varying heap sizes, etc. Therefore, there might not be any harm in running these tests as well. >> >> Looking forward to your comments, >> Sonia > > Sonia Zaldana Calles has updated the pull request incrementally with one additional commit since the last revision: > > Changes based on feedback and also adding test for serial collector > Hi folks, I have a conflict with #18592. > > Just wanted to verify I should add the `-XX:-UseMadvPopulateWrite` for the new iteration of this test as well. cc: @tstuefe Urgh. Lets leave it out for now, that way we have a warning if this problem re-appears on Oracle Linux. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19699#issuecomment-2173779846 From szaldana at openjdk.org Mon Jun 17 16:09:48 2024 From: szaldana at openjdk.org (Sonia Zaldana Calles) Date: Mon, 17 Jun 2024 16:09:48 GMT Subject: RFR: 8333769: Pretouching tests dont test pretouching [v3] In-Reply-To: References: Message-ID: > Hi all, > > This PR addresses [8333769](https://bugs.openjdk.org/browse/JDK-8333769). > > We already have a test for parallel GC that makes sure pretouching behaviour is correct ([test/hotspot/jtreg/gc/parallel/TestAlwaysPreTouchBehavior.java](https://github.com/openjdk/jdk/blob/master/test/hotspot/jtreg/gc/parallel/TestAlwaysPreTouchBehavior.java)). > > Unfortunately this test is limited to linux because of the scanning of `/proc/pid/status`. With this patch I propose two changes: > > - Adding a function to the os namespace `os::rss` and exposing this API via WhiteBox. This in turn allows us to generalize the above test to be used across all platforms. > - Running the modified test with all collectors. > > Additionally, I considered removing other pre-existing pretouch tests (for example, this [z test](https://github.com/openjdk/jdk/blob/master/test/hotspot/jtreg/gc/z/TestAlwaysPreTouch.java)), as this new test is a bit more thorough. However, I noticed that some of these tests run alongside other configurables such as varying numbers of parallel GC threads, varying heap sizes, etc. Therefore, there might not be any harm in running these tests as well. > > Looking forward to your comments, > Sonia Sonia Zaldana Calles has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains four commits: - Fixing comment from KB to bytes - Merge master - Changes based on feedback and also adding test for serial collector - 8333769: Pretouching tests dont test pretouching ------------- Changes: https://git.openjdk.org/jdk/pull/19699/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19699&range=02 Stats: 257 lines in 9 files changed: 178 ins; 79 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/19699.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19699/head:pull/19699 PR: https://git.openjdk.org/jdk/pull/19699 From amitkumar at openjdk.org Mon Jun 17 16:11:26 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Mon, 17 Jun 2024 16:11:26 GMT Subject: RFR: 8332603: [PPC64] Improve itable_stub [v2] In-Reply-To: References: Message-ID: <4trW8f6rCupDrsm0dYVbXD2Spr50bNAykmP3vehEiys=.0410f646-7ba6-4065-8936-8648254a8ddd@github.com> On Sat, 15 Jun 2024 14:37:37 GMT, Amit Kumar wrote: >> PPC Port similar to [JDK-8305959 (x86)](https://bugs.openjdk.org/browse/JDK-8305959) and [JDK-8307352(aarch64)](https://bugs.openjdk.org/browse/JDK-8307352) > > Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: > > consistency latest changes should make things fine: Running test 'gtest:code/server' Note: Google Test filter = code* [==========] Running 2 tests from 1 test suite. [----------] Global test environment set-up. [----------] 2 tests from code [ RUN ] code.vtableStubs_vm [ OK ] code.vtableStubs_vm (0 ms) [ RUN ] code.itableStubs_vm [ OK ] code.itableStubs_vm (0 ms) [----------] 2 tests from code (395 ms total) [----------] Global test environment tear-down [==========] 2 tests from 1 test suite ran. (395 ms total) [ PASSED ] 2 tests. YOU HAVE 1 DISABLED TEST Finished running test 'gtest:code/server' Test report is stored in build/linux-ppc64le-server-fastdebug/test-results/gtest_code_server ============================== Test summary ============================== TEST TOTAL PASS FAIL ERROR gtest:code/server 2 2 0 0 ============================== TEST SUCCESS ------------- PR Comment: https://git.openjdk.org/jdk/pull/19733#issuecomment-2173800881 From amitkumar at openjdk.org Mon Jun 17 16:11:26 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Mon, 17 Jun 2024 16:11:26 GMT Subject: RFR: 8332603: [PPC64] Improve itable_stub [v3] In-Reply-To: References: Message-ID: > PPC Port similar to [JDK-8305959 (x86)](https://bugs.openjdk.org/browse/JDK-8305959) and [JDK-8307352(aarch64)](https://bugs.openjdk.org/browse/JDK-8307352) Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: fixes the test case ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19733/files - new: https://git.openjdk.org/jdk/pull/19733/files/57fbfe69..e7c60c71 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19733&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19733&range=01-02 Stats: 6 lines in 1 file changed: 0 ins; 0 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/19733.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19733/head:pull/19733 PR: https://git.openjdk.org/jdk/pull/19733 From amitkumar at openjdk.org Mon Jun 17 16:11:26 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Mon, 17 Jun 2024 16:11:26 GMT Subject: RFR: 8332603: [PPC64] Improve itable_stub [v2] In-Reply-To: References: Message-ID: On Mon, 17 Jun 2024 14:22:22 GMT, Martin Doerr wrote: >> Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: >> >> consistency > > src/hotspot/cpu/ppc/macroAssembler_ppc.cpp line 1976: > >> 1974: bne(CCR0, L_loop_search_resolved); >> 1975: >> 1976: mr_if_needed(holder_offset, scan_temp); > > I prefer using `mr` without "_if_needed" after you asserted that the registers are different. Yeah, makes sense; I have fixed it, please see the latest commit; ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19733#discussion_r1643063128 From kvn at openjdk.org Mon Jun 17 16:35:31 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 17 Jun 2024 16:35:31 GMT Subject: RFR: 8329032: C2 compiler register allocation support for APX EGPRs [v3] In-Reply-To: <8bWJmx1khK66SJPV3THbBLzDych3zwt5VlRSTeAViOU=.de4d309c-cd56-4963-97f0-3735046d6c27@github.com> References: <835IiFAjWPI9AoU9j1WSAhVY2EEHzjpvWucUXTGxgUw=.cdb547b2-e1e7-4c5b-915f-7fde46e26c81@github.com> <8bWJmx1khK66SJPV3THbBLzDych3zwt5VlRSTeAViOU=.de4d309c-cd56-4963-97f0-3735046d6c27@github.com> Message-ID: On Mon, 17 Jun 2024 05:53:35 GMT, Jatin Bhateja wrote: >> Intel? Advanced Performance Extensions (Intel? APX) adds 16 new 64 bit general purpose register also known as Extended General Purpose Registers in IA-32e 64 bit mode. >> >> Summary of changes introduced along with this patch:- >> >> 1. C2 compiler register allocation support. >> 2. Architecture state save restoration while transitioning from C1/C2 JIT compiled code to runtime services. >> 3. Support new PUSH2/POP2 instructions along with push-pop acceleration hints (PPX) to optimize register save/restore operation. >> 4. Applicable extensions to native interface used by runtime for patching instruction. >> >> We plan to address C1 register support in subsequent patch as there are hard upper bound allocation limits >> (currently set to r11) imposed by existing implementation of linear scan algorithm after which it reserves >> remaining register for special purpose. >> >> Patch has been regressed over stand alone test points after prioritizing EGPR allocations over existing GPR register by manually modifying the register sequences in relevant allocation class. >> >> We plan to do thorough validation using [Intel's SDE](https://www.intel.com/content/www/us/en/download/684897/intel-software-development-emulator.html) during course of time and release incremental patches for bug fixes >> found during testing. >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > jvmci test failures fixes I have comments. src/hotspot/cpu/x86/nativeInst_x86.hpp line 469: > 467: // mov rbx, [rip + offset] > 468: //FIXME: Currently not being used in hotspot code base, extend it to support REX2 prefix. > 469: class NativeLoadGot: public NativeInstruction { Leftover from JAOTC implementation. There are few other methods we forgot to remove. You don't need to fix it. Please, remove the comment. src/hotspot/cpu/x86/nativeInst_x86.hpp line 574: > 572: > 573: // far jump reg > 574: //FIXME: Currently not being used in hotspot code base, extend it to support REX2 prefix. JAOTC leftover. src/hotspot/cpu/x86/nativeInst_x86.hpp line 629: > 627: > 628: //FIXME: Register indirect jump interface, currently not being used in hotspot code base, > 629: // extend it to support REX2 prefix if needed. JAOTC leftover. src/hotspot/cpu/x86/nativeInst_x86.hpp line 670: > 668: } > 669: > 670: //FIXME: Currently not being used in hotspot code base, extend it to support REX2 prefix. Some very old code - never used. src/hotspot/cpu/x86/sharedRuntime_x86_64.cpp line 226: > 224: __ subq(rsp, 16 * wordSize); > 225: > 226: __ movq(Address(rsp, 15 * wordSize), rax); Consider moving saving register code into separate method. You have 3 places where you use it. src/hotspot/cpu/x86/sharedRuntime_x86_64.cpp line 535: > 533: __ pop_FPU_state(); > 534: > 535: __ movq(r15, Address(rsp, 0)); The same for restoring registers. src/hotspot/cpu/x86/sharedRuntime_x86_64.cpp line 3375: > 3373: ResourceMark rm; > 3374: > 3375: CodeBuffer buffer(name, 1752, 512); What cause the need to such increase of size? src/hotspot/cpu/x86/vm_version_x86.cpp line 422: > 420: __ movl(Address(rsi,12), rdx); > 421: > 422: #if !defined(PRODUCT) && defined(_LP64) Why it is still under `!PRODUCT`? You need to check if OS supports it in product VM too. Or I missing something? ------------- PR Review: https://git.openjdk.org/jdk/pull/19042#pullrequestreview-2123322141 PR Review Comment: https://git.openjdk.org/jdk/pull/19042#discussion_r1643063458 PR Review Comment: https://git.openjdk.org/jdk/pull/19042#discussion_r1643072642 PR Review Comment: https://git.openjdk.org/jdk/pull/19042#discussion_r1643065954 PR Review Comment: https://git.openjdk.org/jdk/pull/19042#discussion_r1643079601 PR Review Comment: https://git.openjdk.org/jdk/pull/19042#discussion_r1643084935 PR Review Comment: https://git.openjdk.org/jdk/pull/19042#discussion_r1643085861 PR Review Comment: https://git.openjdk.org/jdk/pull/19042#discussion_r1643087605 PR Review Comment: https://git.openjdk.org/jdk/pull/19042#discussion_r1643089973 From duke at openjdk.org Mon Jun 17 16:38:55 2024 From: duke at openjdk.org (Volodymyr Paprotski) Date: Mon, 17 Jun 2024 16:38:55 GMT Subject: RFR: 8333583: Crypto-XDH.generateSecret regression after JDK-8329538 [v3] In-Reply-To: References: Message-ID: > This fix recovers XDH performance but removes some of the P256 gains (~-8-14%). Still faster, but not as much. > > The fix is to undo 'int' return type on mult()/square(), which allowed to return partially reduced result (e.g. this avoids extra reductions when mult() result is fed into addition). This is the behaviour before the Montgomery ECC PR. > > --- > XDH.generateSecret performance > before Montgomery PR: > > Benchmark (algorithm) (keyLength) (kpgAlgorithm) (provider) Mode Cnt Score Error Units > KeyAgreementBench.XDH.generateSecret XDH 255 XDH thrpt 3 8435.277 ? 27.230 ops/s > > after Montgomery PR: > > Benchmark (algorithm) (keyLength) (kpgAlgorithm) (provider) Mode Cnt Score Error Units > KeyAgreementBench.XDH.generateSecret XDH 255 XDH thrpt 3 8309.028 ? 22.071 ops/s > > with this PR: > > Benchmark (algorithm) (keyLength) (kpgAlgorithm) (provider) Mode Cnt Score Error Units > KeyAgreementBench.XDH.generateSecret XDH 255 XDH thrpt 3 8491.268 ? 32.858 ops/s > > --- > > P256 performance with/without mult intrinsic: > > Performance before Montgomery PR: > > Benchmark (algorithm) (dataSize) (keyLength) (provider) Mode Cnt Score Error Units > SignatureBench.ECDSA.sign SHA256withECDSA 1024 256 thrpt 3 6398.727 ? 7.400 ops/s > SignatureBench.ECDSA.sign SHA256withECDSA 16384 256 thrpt 3 6129.739 ? 5.995 ops/s > SignatureBench.ECDSA.verify SHA256withECDSA 1024 256 thrpt 3 1889.928 ? 54.660 ops/s > SignatureBench.ECDSA.verify SHA256withECDSA 16384 256 thrpt 3 1866.339 ? 42.438 ops/s > Benchmark (algorithm) (keyLength) (kpgAlgorithm) (provider) Mode Cnt Score Error Units > o.o.b.j.c.full.KeyAgreementBench.EC.generateSecret ECDH 256 EC thrpt 3 1350.745 ? 28.514 ops/s > o.o.b.j.c.small.KeyAgreementBench.EC.generateSecret ECDH 256 EC thrpt 3 1349.393 ? 32.050 ops/s > > Performance in master without mult() intrinsic > > Benchmark (algorithm) (dataSize) (keyLength) (provider) Mode Cnt Score Err... Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision: comment from Sandhya ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19728/files - new: https://git.openjdk.org/jdk/pull/19728/files/2ab7bcbd..960b8333 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19728&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19728&range=01-02 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/19728.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19728/head:pull/19728 PR: https://git.openjdk.org/jdk/pull/19728 From duke at openjdk.org Mon Jun 17 16:38:55 2024 From: duke at openjdk.org (Volodymyr Paprotski) Date: Mon, 17 Jun 2024 16:38:55 GMT Subject: RFR: 8333583: Crypto-XDH.generateSecret regression after JDK-8329538 [v2] In-Reply-To: References: Message-ID: On Fri, 14 Jun 2024 23:39:54 GMT, Sandhya Viswanathan wrote: >> Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision: >> >> Improve non-intrinsic p256 performance > > src/hotspot/share/opto/runtime.cpp line 1417: > >> 1415: // result type needed >> 1416: fields = TypeTuple::fields(1); >> 1417: fields[TypeFunc::Parms + 0] = NULL; > > A minor nit: here NULL could be nullptr instead. done, thanks! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19728#discussion_r1643093891 From iklam at openjdk.org Mon Jun 17 16:55:19 2024 From: iklam at openjdk.org (Ioi Lam) Date: Mon, 17 Jun 2024 16:55:19 GMT Subject: RFR: 8332105: Exploded JDK does not include CDS [v2] In-Reply-To: References: Message-ID: On Mon, 17 Jun 2024 06:39:45 GMT, Thomas Stuefe wrote: >> An exploded JDK cannot be used with either -Xshare:on or -Xshare:auto. That causes tests like runtime/CompressedOops/CompressedCPUSpecificClassSpaceReservation.java to fail when running on an exploded JDK. >> >> Since an exploded JDK cannot use CDS, we should - for tests - treat it as if CDS had not been included. >> >> >> ---- >> >> Note that I was torn between two ways to fix this: >> >> - either this fix, which is rather simple and automatically updates the "vm.cds" `@requires` property >> - or to expose "exploded-ness" as a boolean property via `WhiteBox` and `VMProps`(`jdk.exploded`). See this draft PR: https://github.com/openjdk/jdk/pull/19178 . >> >> The latter is cleaner and clearer, conveying the message of exploded-ness without muddling it with the CDS aspect. But OTOH the complexity may not be required. >> >> I can go either way, though I have a slight preference for this PR, which is why I posted it. > > Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: > > - Merge branch 'openjdk:master' into JDK-8332105-Exploded-JDK-should-count-as-if-CDS-had-not-been-included-in-the-build > - JDK-8332105-Exploded-JDK-should-count-as-if-CDS-had-not-been-included-in-the-build LGTM ------------- Marked as reviewed by iklam (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19188#pullrequestreview-2123409301 From lmesnik at openjdk.org Mon Jun 17 17:18:01 2024 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Mon, 17 Jun 2024 17:18:01 GMT Subject: RFR: 8332252: Clean up vmTestbase/vm/share [v4] In-Reply-To: References: Message-ID: > The vmTestbase/vm/share is a shared test library for vmTestbase tests. This library contains a lot of code that is used by only by small number of tests or not used at all. There are no plans to actively develop new tests in vmTestsbase and improve this shared library. > The final goal of this and the following PRs is to reduce the maintenance cost of vmTestbase by eliminating this library. > > Also, this PR moves test-specific code into corresponding test directories to increase code locality. This allows later easier move tests from vmTestbase. > > The few remaining classes include > InMemoryJavaCompiler.java > that is very similar to same class from the standard testlibrary and could be merge with it and > ProcessUtils.java > which is used by > test/hotspot/jtreg/runtime/Thread/TestBreakSignalThreadDump.java > and thus should be moved into the standard testlibrary. > The stack and options might be merged in nsk/share test library. Leonid Mesnik has updated the pull request incrementally with one additional commit since the last revision: fixed imports. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19727/files - new: https://git.openjdk.org/jdk/pull/19727/files/53757d6f..12f63cc6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19727&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19727&range=02-03 Stats: 25 lines in 6 files changed: 11 ins; 9 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/19727.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19727/head:pull/19727 PR: https://git.openjdk.org/jdk/pull/19727 From cjplummer at openjdk.org Mon Jun 17 17:26:27 2024 From: cjplummer at openjdk.org (Chris Plummer) Date: Mon, 17 Jun 2024 17:26:27 GMT Subject: RFR: 8332252: Clean up vmTestbase/vm/share [v4] In-Reply-To: References: Message-ID: On Mon, 17 Jun 2024 17:18:01 GMT, Leonid Mesnik wrote: >> The vmTestbase/vm/share is a shared test library for vmTestbase tests. This library contains a lot of code that is used by only by small number of tests or not used at all. There are no plans to actively develop new tests in vmTestsbase and improve this shared library. >> The final goal of this and the following PRs is to reduce the maintenance cost of vmTestbase by eliminating this library. >> >> Also, this PR moves test-specific code into corresponding test directories to increase code locality. This allows later easier move tests from vmTestbase. >> >> The few remaining classes include >> InMemoryJavaCompiler.java >> that is very similar to same class from the standard testlibrary and could be merge with it and >> ProcessUtils.java >> which is used by >> test/hotspot/jtreg/runtime/Thread/TestBreakSignalThreadDump.java >> and thus should be moved into the standard testlibrary. >> The stack and options might be merged in nsk/share test library. > > Leonid Mesnik has updated the pull request incrementally with one additional commit since the last revision: > > fixed imports. Marked as reviewed by cjplummer (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/19727#pullrequestreview-2123463448 From stuefe at openjdk.org Mon Jun 17 17:30:26 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Mon, 17 Jun 2024 17:30:26 GMT Subject: Integrated: 8332105: Exploded JDK does not include CDS In-Reply-To: References: Message-ID: On Sat, 11 May 2024 06:13:29 GMT, Thomas Stuefe wrote: > An exploded JDK cannot be used with either -Xshare:on or -Xshare:auto. That causes tests like runtime/CompressedOops/CompressedCPUSpecificClassSpaceReservation.java to fail when running on an exploded JDK. > > Since an exploded JDK cannot use CDS, we should - for tests - treat it as if CDS had not been included. > > > ---- > > Note that I was torn between two ways to fix this: > > - either this fix, which is rather simple and automatically updates the "vm.cds" `@requires` property > - or to expose "exploded-ness" as a boolean property via `WhiteBox` and `VMProps`(`jdk.exploded`). See this draft PR: https://github.com/openjdk/jdk/pull/19178 . > > The latter is cleaner and clearer, conveying the message of exploded-ness without muddling it with the CDS aspect. But OTOH the complexity may not be required. > > I can go either way, though I have a slight preference for this PR, which is why I posted it. This pull request has now been integrated. Changeset: 801bf15f Author: Thomas Stuefe URL: https://git.openjdk.org/jdk/commit/801bf15f02ca47c3547eb677079d7d2f3af1de8c Stats: 6 lines in 2 files changed: 5 ins; 0 del; 1 mod 8332105: Exploded JDK does not include CDS Reviewed-by: dholmes, iklam ------------- PR: https://git.openjdk.org/jdk/pull/19188 From stuefe at openjdk.org Mon Jun 17 17:30:25 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Mon, 17 Jun 2024 17:30:25 GMT Subject: RFR: 8332105: Exploded JDK does not include CDS [v2] In-Reply-To: References: Message-ID: On Mon, 17 Jun 2024 09:57:31 GMT, Amit Kumar wrote: >> Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: >> >> - Merge branch 'openjdk:master' into JDK-8332105-Exploded-JDK-should-count-as-if-CDS-had-not-been-included-in-the-build >> - JDK-8332105-Exploded-JDK-should-count-as-if-CDS-had-not-been-included-in-the-build > > I still see some failures which are only failing in exploded-jvm; > > > java/foreign/TestLinker.java > java/lang/SecurityManager/CheckSecurityProvider.java > java/lang/StackWalker/VerifyStackTrace.java > java/lang/System/LoggerFinder/internal/BaseDefaultLoggerFinderTest/BaseDefaultLoggerFinderTest.java > java/lang/System/LoggerFinder/internal/BootstrapLogger/BootstrapLoggerTest.java > java/lang/System/LoggerFinder/internal/LoggerFinderLoaderTest/LoggerFinderLoaderTest.java > java/lang/invoke/RevealDirectTest.java > java/lang/invoke/lambda/LogGeneratedClassesTest.java > java/lang/reflect/records/IsRecordTest.java > java/lang/reflect/records/RecordPermissionsTest.java > java/lang/reflect/records/RecordReflectionTest.java > java/lang/runtime/ObjectMethodsTest.java > java/util/Currency/PropertiesTestRun.java > java/util/ResourceBundle/Bug6359330.java > java/util/TimeZone/TimeZoneDatePermissionCheckRun.java > java/util/logging/LogManager/Configuration/rootLoggerHandlers/BadRootLoggerHandlers.java > java/util/logging/LogManager/Configuration/rootLoggerHandlers/RootLoggerHandlers.java > java/util/logging/LogManager/Configuration/updateConfiguration/SimpleUpdateConfigWithInputStreamTest.java > java/util/logging/LogManager/Configuration/updateConfiguration/UpdateConfigurationTest.java > java/util/logging/Logger/getGlobal/TestGetGlobal.java > java/util/logging/Logger/getGlobal/TestGetGlobalByName.java > java/util/logging/Logger/getGlobal/TestGetGlobalConcurrent.java > java/util/logging/Logger/setResourceBundle/TestSetResourceBundle.java > java/util/logging/TestMainAppContext.java > jdk/internal/jimage/JImageReadTest.java > jdk/modules/etc/JmodExcludedFiles.java > sun/reflect/ReflectionFactory/ReflectionFactoryTest.java > sun/util/locale/provider/Bug8152817.java > > > Did you also notice them in tier1 with exploded-build ? > > I only looked at `TestLinker.java` failure, It doesn't "seem" to be related to CDS: > > TEST RESULT: Failed. Execution failed: `main' threw exception: java.util.ServiceConfigurationError: Locale provider adapter "CLDR"cannot be instantiated. @offamitkumar thanks for testing. If these tests are dependent on CDS, they should have the requirement specified. @iklam thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/19188#issuecomment-2173947915 From sviswanathan at openjdk.org Mon Jun 17 17:50:12 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Mon, 17 Jun 2024 17:50:12 GMT Subject: RFR: 8333583: Crypto-XDH.generateSecret regression after JDK-8329538 [v3] In-Reply-To: References: Message-ID: On Mon, 17 Jun 2024 16:38:55 GMT, Volodymyr Paprotski wrote: >> This fix recovers XDH performance but removes some of the P256 gains (~-8-14%). Still faster, but not as much. >> >> The fix is to undo 'int' return type on mult()/square(), which allowed to return partially reduced result (e.g. this avoids extra reductions when mult() result is fed into addition). This is the behaviour before the Montgomery ECC PR. >> >> --- >> XDH.generateSecret performance >> before Montgomery PR: >> >> Benchmark (algorithm) (keyLength) (kpgAlgorithm) (provider) Mode Cnt Score Error Units >> KeyAgreementBench.XDH.generateSecret XDH 255 XDH thrpt 3 8435.277 ? 27.230 ops/s >> >> after Montgomery PR: >> >> Benchmark (algorithm) (keyLength) (kpgAlgorithm) (provider) Mode Cnt Score Error Units >> KeyAgreementBench.XDH.generateSecret XDH 255 XDH thrpt 3 8309.028 ? 22.071 ops/s >> >> with this PR: >> >> Benchmark (algorithm) (keyLength) (kpgAlgorithm) (provider) Mode Cnt Score Error Units >> KeyAgreementBench.XDH.generateSecret XDH 255 XDH thrpt 3 8491.268 ? 32.858 ops/s >> >> --- >> >> P256 performance with/without mult intrinsic: >> >> Performance before Montgomery PR: >> >> Benchmark (algorithm) (dataSize) (keyLength) (provider) Mode Cnt Score Error Units >> SignatureBench.ECDSA.sign SHA256withECDSA 1024 256 thrpt 3 6398.727 ? 7.400 ops/s >> SignatureBench.ECDSA.sign SHA256withECDSA 16384 256 thrpt 3 6129.739 ? 5.995 ops/s >> SignatureBench.ECDSA.verify SHA256withECDSA 1024 256 thrpt 3 1889.928 ? 54.660 ops/s >> SignatureBench.ECDSA.verify SHA256withECDSA 16384 256 thrpt 3 1866.339 ? 42.438 ops/s >> Benchmark (algorithm) (keyLength) (kpgAlgorithm) (provider) Mode Cnt Score Error Units >> o.o.b.j.c.full.KeyAgreementBench.EC.generateSecret ECDH 256 EC thrpt 3 1350.745 ? 28.514 ops/s >> o.o.b.j.c.small.KeyAgreementBench.EC.generateSecret ECDH 256 EC thrpt 3 1349.393 ? 32.050 ops/s >> >> Performance in master without mult() intrinsic >> >> Benchmark ... > > Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision: > > comment from Sandhya Looks good to me. ------------- Marked as reviewed by sviswanathan (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19728#pullrequestreview-2123511329 From kvn at openjdk.org Mon Jun 17 18:15:13 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 17 Jun 2024 18:15:13 GMT Subject: RFR: 8333583: Crypto-XDH.generateSecret regression after JDK-8329538 [v3] In-Reply-To: References: Message-ID: On Mon, 17 Jun 2024 16:38:55 GMT, Volodymyr Paprotski wrote: >> This fix recovers XDH performance but removes some of the P256 gains (~-8-14%). Still faster, but not as much. >> >> The fix is to undo 'int' return type on mult()/square(), which allowed to return partially reduced result (e.g. this avoids extra reductions when mult() result is fed into addition). This is the behaviour before the Montgomery ECC PR. >> >> --- >> XDH.generateSecret performance >> before Montgomery PR: >> >> Benchmark (algorithm) (keyLength) (kpgAlgorithm) (provider) Mode Cnt Score Error Units >> KeyAgreementBench.XDH.generateSecret XDH 255 XDH thrpt 3 8435.277 ? 27.230 ops/s >> >> after Montgomery PR: >> >> Benchmark (algorithm) (keyLength) (kpgAlgorithm) (provider) Mode Cnt Score Error Units >> KeyAgreementBench.XDH.generateSecret XDH 255 XDH thrpt 3 8309.028 ? 22.071 ops/s >> >> with this PR: >> >> Benchmark (algorithm) (keyLength) (kpgAlgorithm) (provider) Mode Cnt Score Error Units >> KeyAgreementBench.XDH.generateSecret XDH 255 XDH thrpt 3 8491.268 ? 32.858 ops/s >> >> --- >> >> P256 performance with/without mult intrinsic: >> >> Performance before Montgomery PR: >> >> Benchmark (algorithm) (dataSize) (keyLength) (provider) Mode Cnt Score Error Units >> SignatureBench.ECDSA.sign SHA256withECDSA 1024 256 thrpt 3 6398.727 ? 7.400 ops/s >> SignatureBench.ECDSA.sign SHA256withECDSA 16384 256 thrpt 3 6129.739 ? 5.995 ops/s >> SignatureBench.ECDSA.verify SHA256withECDSA 1024 256 thrpt 3 1889.928 ? 54.660 ops/s >> SignatureBench.ECDSA.verify SHA256withECDSA 16384 256 thrpt 3 1866.339 ? 42.438 ops/s >> Benchmark (algorithm) (keyLength) (kpgAlgorithm) (provider) Mode Cnt Score Error Units >> o.o.b.j.c.full.KeyAgreementBench.EC.generateSecret ECDH 256 EC thrpt 3 1350.745 ? 28.514 ops/s >> o.o.b.j.c.small.KeyAgreementBench.EC.generateSecret ECDH 256 EC thrpt 3 1349.393 ? 32.050 ops/s >> >> Performance in master without mult() intrinsic >> >> Benchmark ... > > Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision: > > comment from Sandhya What causes regression in P256 "(~-8-14%)"? >From what I see, you re-arranged code to not execute some code ("reducePositive()") when it is not needed. How this affects P256? ------------- PR Comment: https://git.openjdk.org/jdk/pull/19728#issuecomment-2174028348 From duke at openjdk.org Mon Jun 17 18:55:16 2024 From: duke at openjdk.org (Volodymyr Paprotski) Date: Mon, 17 Jun 2024 18:55:16 GMT Subject: RFR: 8333583: Crypto-XDH.generateSecret regression after JDK-8329538 [v3] In-Reply-To: References: Message-ID: On Mon, 17 Jun 2024 18:12:16 GMT, Vladimir Kozlov wrote: > What causes regression in P256 "(~-8-14%)"? From what I see, you re-arranged code to not execute some code ("reducePositive()") when it is not needed. How this affects P256? Actually, the other way around; reducePositive is now an unconditionally executed for both pure java and the intrinsic paths. Perhaps that's what is misleading, it was only the mult() intrinsic that was taking advantage of this 'skip reduction' before. (pure java did not benefit from removing reduction, so I kept it. Now 'keeping it' for both paths) ------------- PR Comment: https://git.openjdk.org/jdk/pull/19728#issuecomment-2174201539 From ascarpino at openjdk.org Mon Jun 17 19:25:13 2024 From: ascarpino at openjdk.org (Anthony Scarpino) Date: Mon, 17 Jun 2024 19:25:13 GMT Subject: RFR: 8333583: Crypto-XDH.generateSecret regression after JDK-8329538 [v3] In-Reply-To: References: Message-ID: On Mon, 17 Jun 2024 18:51:33 GMT, Volodymyr Paprotski wrote: >> What causes regression in P256 "(~-8-14%)"? >> From what I see, you re-arranged code to not execute some code ("reducePositive()") when it is not needed. How this affects P256? > >> What causes regression in P256 "(~-8-14%)"? From what I see, you re-arranged code to not execute some code ("reducePositive()") when it is not needed. How this affects P256? > > Actually, the other way around; reducePositive is now an unconditionally executed for both pure java and the intrinsic paths. Perhaps that's what is misleading, it was only the mult() intrinsic that was taking advantage of this 'skip reduction' before. (pure java did not benefit from removing reduction, so I kept it. Now 'keeping it' for both paths) Hi @vpaprotsk, @ferakocz is going to take a look at the change. When he says it's ok, I'll approve the PR. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19728#issuecomment-2174249460 From kvn at openjdk.org Mon Jun 17 19:25:14 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 17 Jun 2024 19:25:14 GMT Subject: RFR: 8333583: Crypto-XDH.generateSecret regression after JDK-8329538 [v3] In-Reply-To: References: Message-ID: On Mon, 17 Jun 2024 18:51:33 GMT, Volodymyr Paprotski wrote: > Actually, the other way around; reducePositive is now an unconditionally executed for both pure java and the intrinsic paths. Looking on `MontgomeryIntegerPolynomialP256.java` the code in `multImpl() + reducePositive()` is similar to original `mult()` except new additional code at the end of `multImpl()`. Now you intrinsify only `multImpl()`. Looks like `reducePositive()`is not included into intrinsic and will be normally JIT compiled (hopeful inlined when JIT compiling `mult()`. Then what do you mean in above statement? Also you did not change assembler for intrinsic but you changed corresponding Java code (`multImpl()`). How it works? ------------- PR Comment: https://git.openjdk.org/jdk/pull/19728#issuecomment-2174250094 From duke at openjdk.org Mon Jun 17 20:30:16 2024 From: duke at openjdk.org (Volodymyr Paprotski) Date: Mon, 17 Jun 2024 20:30:16 GMT Subject: RFR: 8333583: Crypto-XDH.generateSecret regression after JDK-8329538 [v3] In-Reply-To: References: Message-ID: On Mon, 17 Jun 2024 19:22:01 GMT, Vladimir Kozlov wrote: > Looking on `MontgomeryIntegerPolynomialP256.java` the code in `multImpl() + reducePositive()` is similar to original `mult()` except new additional code at the end of `multImpl()`. Yep, I split the original java mult() into multImpl() and reducePositive(). > Now you intrinsify only `multImpl()`. Looks like `reducePositive()`is not included into intrinsic and will be normally JIT compiled (hopeful inlined when JIT compiling `mult()`. Then what do you mean in above statement? > Also you did not change assembler for intrinsic but you changed corresponding Java code (`multImpl()`). How it works? The intrinsic used to return 1 (i.e. numAdds = 1), which would let the next operation decide if it needed to do the reduction or skip it. Now reducePositive() reduction always happens after the intrinsic (when it could had been skipped before). ------------- PR Comment: https://git.openjdk.org/jdk/pull/19728#issuecomment-2174364189 From kvn at openjdk.org Mon Jun 17 21:23:17 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 17 Jun 2024 21:23:17 GMT Subject: RFR: 8333583: Crypto-XDH.generateSecret regression after JDK-8329538 [v3] In-Reply-To: References: Message-ID: On Mon, 17 Jun 2024 16:38:55 GMT, Volodymyr Paprotski wrote: >> This fix recovers XDH performance but removes some of the P256 gains (~-8-14%). Still faster, but not as much. >> >> The fix is to undo 'int' return type on mult()/square(), which allowed to return partially reduced result (e.g. this avoids extra reductions when mult() result is fed into addition). This is the behaviour before the Montgomery ECC PR. >> >> --- >> XDH.generateSecret performance >> before Montgomery PR: >> >> Benchmark (algorithm) (keyLength) (kpgAlgorithm) (provider) Mode Cnt Score Error Units >> KeyAgreementBench.XDH.generateSecret XDH 255 XDH thrpt 3 8435.277 ? 27.230 ops/s >> >> after Montgomery PR: >> >> Benchmark (algorithm) (keyLength) (kpgAlgorithm) (provider) Mode Cnt Score Error Units >> KeyAgreementBench.XDH.generateSecret XDH 255 XDH thrpt 3 8309.028 ? 22.071 ops/s >> >> with this PR: >> >> Benchmark (algorithm) (keyLength) (kpgAlgorithm) (provider) Mode Cnt Score Error Units >> KeyAgreementBench.XDH.generateSecret XDH 255 XDH thrpt 3 8491.268 ? 32.858 ops/s >> >> --- >> >> P256 performance with/without mult intrinsic: >> >> Performance before Montgomery PR: >> >> Benchmark (algorithm) (dataSize) (keyLength) (provider) Mode Cnt Score Error Units >> SignatureBench.ECDSA.sign SHA256withECDSA 1024 256 thrpt 3 6398.727 ? 7.400 ops/s >> SignatureBench.ECDSA.sign SHA256withECDSA 16384 256 thrpt 3 6129.739 ? 5.995 ops/s >> SignatureBench.ECDSA.verify SHA256withECDSA 1024 256 thrpt 3 1889.928 ? 54.660 ops/s >> SignatureBench.ECDSA.verify SHA256withECDSA 16384 256 thrpt 3 1866.339 ? 42.438 ops/s >> Benchmark (algorithm) (keyLength) (kpgAlgorithm) (provider) Mode Cnt Score Error Units >> o.o.b.j.c.full.KeyAgreementBench.EC.generateSecret ECDH 256 EC thrpt 3 1350.745 ? 28.514 ops/s >> o.o.b.j.c.small.KeyAgreementBench.EC.generateSecret ECDH 256 EC thrpt 3 1349.393 ? 32.050 ops/s >> >> Performance in master without mult() intrinsic >> >> Benchmark ... > > Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision: > > comment from Sandhya Let me know that I got it right: - The reduction operation was optional and P256 benefitted by not executing it. - Previous `mult()` **Java** code always retuned 0 because it executes reduction so callers do not need to do it. - `_intpoly_montgomeryMult_P256` intrinsic code executes only part of code from previous `mult()` and it returns 1 to indicate that reduction should be executed if needed. - Now `mult()` is split into 2 methods (with `multImpl()` intrinisfied) and always executes reduction so it can return 0. I like new implementation because intrinsic matches Java code. It would allow avoid confusion I had. The only question left: do we need to do something about Java code which checks return value? It is always 0 now. And I don't see you changed such checks. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19728#issuecomment-2174446053 From jbhateja at openjdk.org Mon Jun 17 21:31:27 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 17 Jun 2024 21:31:27 GMT Subject: RFR: 8329032: C2 compiler register allocation support for APX EGPRs [v4] In-Reply-To: <835IiFAjWPI9AoU9j1WSAhVY2EEHzjpvWucUXTGxgUw=.cdb547b2-e1e7-4c5b-915f-7fde46e26c81@github.com> References: <835IiFAjWPI9AoU9j1WSAhVY2EEHzjpvWucUXTGxgUw=.cdb547b2-e1e7-4c5b-915f-7fde46e26c81@github.com> Message-ID: <0D26UeLicWc2ENVCMPuOwolah2NoK7bQN99yRm4x_Eg=.7a339f33-58fb-4d09-a870-3b1ae0d813f8@github.com> > Intel? Advanced Performance Extensions (Intel? APX) adds 16 new 64 bit general purpose register also known as Extended General Purpose Registers in IA-32e 64 bit mode. > > Summary of changes introduced along with this patch:- > > 1. C2 compiler register allocation support. > 2. Architecture state save restoration while transitioning from C1/C2 JIT compiled code to runtime services. > 3. Support new PUSH2/POP2 instructions along with push-pop acceleration hints (PPX) to optimize register save/restore operation. > 4. Applicable extensions to native interface used by runtime for patching instruction. > > We plan to address C1 register support in subsequent patch as there are hard upper bound allocation limits > (currently set to r11) imposed by existing implementation of linear scan algorithm after which it reserves > remaining register for special purpose. > > Patch has been regressed over stand alone test points after prioritizing EGPR allocations over existing GPR register by manually modifying the register sequences in relevant allocation class. > > We plan to do thorough validation using [Intel's SDE](https://www.intel.com/content/www/us/en/download/684897/intel-software-development-emulator.html) during course of time and release incremental patches for bug fixes > found during testing. > > Kindly review and share your feedback. > > Best Regards, > Jatin Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Review comments resolutions. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19042/files - new: https://git.openjdk.org/jdk/pull/19042/files/f13a5574..3efdbb73 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19042&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19042&range=02-03 Stats: 138 lines in 8 files changed: 51 ins; 77 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/19042.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19042/head:pull/19042 PR: https://git.openjdk.org/jdk/pull/19042 From jbhateja at openjdk.org Mon Jun 17 21:31:27 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 17 Jun 2024 21:31:27 GMT Subject: RFR: 8329032: C2 compiler register allocation support for APX EGPRs [v3] In-Reply-To: References: <835IiFAjWPI9AoU9j1WSAhVY2EEHzjpvWucUXTGxgUw=.cdb547b2-e1e7-4c5b-915f-7fde46e26c81@github.com> <8bWJmx1khK66SJPV3THbBLzDych3zwt5VlRSTeAViOU=.de4d309c-cd56-4963-97f0-3735046d6c27@github.com> Message-ID: <1CKckcyXdKpr9J3cuxWZBaEBDT41RqP9hqTKxYZoxp0=.863e0cf9-fb71-42cd-9adf-e3d671cdbc34@github.com> On Mon, 17 Jun 2024 16:29:27 GMT, Vladimir Kozlov wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> jvmci test failures fixes > > src/hotspot/cpu/x86/sharedRuntime_x86_64.cpp line 3375: > >> 3373: ResourceMark rm; >> 3374: >> 3375: CodeBuffer buffer(name, 1752, 512); > > What cause the need to such increase of size? Stub makes multiple save restoration calls, additional buffer size needed to accommodate to move EGPRs from / to stack. Further constraining the size. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19042#discussion_r1643449420 From jbhateja at openjdk.org Mon Jun 17 21:35:10 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 17 Jun 2024 21:35:10 GMT Subject: RFR: 8329032: C2 compiler register allocation support for APX EGPRs [v3] In-Reply-To: References: <835IiFAjWPI9AoU9j1WSAhVY2EEHzjpvWucUXTGxgUw=.cdb547b2-e1e7-4c5b-915f-7fde46e26c81@github.com> <8bWJmx1khK66SJPV3THbBLzDych3zwt5VlRSTeAViOU=.de4d309c-cd56-4963-97f0-3735046d6c27@github.com> Message-ID: On Mon, 17 Jun 2024 16:31:37 GMT, Vladimir Kozlov wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> jvmci test failures fixes > > src/hotspot/cpu/x86/vm_version_x86.cpp line 422: > >> 420: __ movl(Address(rsi,12), rdx); >> 421: >> 422: #if !defined(PRODUCT) && defined(_LP64) > > Why it is still under `!PRODUCT`? You need to check if OS supports it in product VM too. Or I missing something? Idea is to enable the APX support for product builds only after completion of planned features listed in JDK-8329030. I have moved the check to [appropriate place](https://github.com/openjdk/jdk/pull/19042/files#diff-6ed856c57ddbe33e49883adb7c52ec51ed377e5f697dfd6d8bea505a97bfc5a5R3186) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19042#discussion_r1643452680 From duke at openjdk.org Mon Jun 17 21:55:10 2024 From: duke at openjdk.org (Volodymyr Paprotski) Date: Mon, 17 Jun 2024 21:55:10 GMT Subject: RFR: 8333583: Crypto-XDH.generateSecret regression after JDK-8329538 [v3] In-Reply-To: References: Message-ID: On Mon, 17 Jun 2024 21:21:01 GMT, Vladimir Kozlov wrote: > Let me know that I got it right: > > * The reduction operation was optional and P256 benefitted by not executing it. > * Previous `mult()` **Java** code always retuned 0 because it executes reduction so callers do not need to do it. > * `_intpoly_montgomeryMult_P256` intrinsic code executes only part of code from previous `mult()` and it returns 1 to indicate that reduction should be executed if needed. > * Now `mult()` is split into 2 methods (with `multImpl()` intrinisfied) and always executes reduction so it can return 0. Thats it exactly. Except I would correct the last two words `return 0`. It is now void so no return (and I imagine that is why XDH did not like it; having it hardcoded to 0, without having to do inlining, opens the doors for some more optimizations. Also, the code I had checked in as part of montgomery PR was returning 0 everywhere but the intrinsic. > I like new implementation because intrinsic matches Java code. It would allow avoid confusion I had. I disliked this too. I originally removed the Java reduction too, but it hurt the non-intrinsic performance, so put it back in. (Before I got distracted with this bug, I was actually working on next ECC iteration, and was trying to fix this mismatch. But I also hadn't realized how much this optimization actually helped.) There is also a 'bigger' complaint.. this optimization tried to use virtual methods to specialize one particular curve. Fairly standard practice. And it brought the other 'unaffected' curve down. If I can't use virtual methods for further optimizations.. how am I supposed to optimize further? Hmm. Not the time to discuss an answer, this release is going out, not the time to get 'creative', but this will give me problems next time I try to add code here. > The only question left: do we need to do something about Java code which checks return value? It is always 0 now. And I don't see you changed such checks. (Correction: no return, void). numAdds is now again pretty much a 'private' concept to the IntegerPolynomial class, so figure it was fine before, it should be fine now? ------------- PR Comment: https://git.openjdk.org/jdk/pull/19728#issuecomment-2174493638 From kvn at openjdk.org Mon Jun 17 23:13:10 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 17 Jun 2024 23:13:10 GMT Subject: RFR: 8333583: Crypto-XDH.generateSecret regression after JDK-8329538 [v3] In-Reply-To: References: Message-ID: <7HBid31iWxucnqfOrXblIH0zftUAnIwKTT1YtFGwkFs=.11485991-1bae-43a9-be45-5bcd681a8f18@github.com> On Mon, 17 Jun 2024 21:52:22 GMT, Volodymyr Paprotski wrote: > numAdds is now again pretty much a 'private' concept to the IntegerPolynomial class, so figure it was fine before, it should be fine now? I did not mean it for this changes but as general improvement of code in other RFE. But it is up to core libs group to decide. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19728#issuecomment-2174593762 From kvn at openjdk.org Mon Jun 17 23:32:12 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 17 Jun 2024 23:32:12 GMT Subject: RFR: 8333583: Crypto-XDH.generateSecret regression after JDK-8329538 [v3] In-Reply-To: References: Message-ID: <8ZwlWoI93ja8DTfsPYPOCjVY4rQd8CpAhpsWTkyqYkg=.b0447eed-81ed-41f8-a7b4-039ca4d29b47@github.com> On Mon, 17 Jun 2024 16:38:55 GMT, Volodymyr Paprotski wrote: >> This fix recovers XDH performance but removes some of the P256 gains (~-8-14%). Still faster, but not as much. >> >> The fix is to undo 'int' return type on mult()/square(), which allowed to return partially reduced result (e.g. this avoids extra reductions when mult() result is fed into addition). This is the behaviour before the Montgomery ECC PR. >> >> --- >> XDH.generateSecret performance >> before Montgomery PR: >> >> Benchmark (algorithm) (keyLength) (kpgAlgorithm) (provider) Mode Cnt Score Error Units >> KeyAgreementBench.XDH.generateSecret XDH 255 XDH thrpt 3 8435.277 ? 27.230 ops/s >> >> after Montgomery PR: >> >> Benchmark (algorithm) (keyLength) (kpgAlgorithm) (provider) Mode Cnt Score Error Units >> KeyAgreementBench.XDH.generateSecret XDH 255 XDH thrpt 3 8309.028 ? 22.071 ops/s >> >> with this PR: >> >> Benchmark (algorithm) (keyLength) (kpgAlgorithm) (provider) Mode Cnt Score Error Units >> KeyAgreementBench.XDH.generateSecret XDH 255 XDH thrpt 3 8491.268 ? 32.858 ops/s >> >> --- >> >> P256 performance with/without mult intrinsic: >> >> Performance before Montgomery PR: >> >> Benchmark (algorithm) (dataSize) (keyLength) (provider) Mode Cnt Score Error Units >> SignatureBench.ECDSA.sign SHA256withECDSA 1024 256 thrpt 3 6398.727 ? 7.400 ops/s >> SignatureBench.ECDSA.sign SHA256withECDSA 16384 256 thrpt 3 6129.739 ? 5.995 ops/s >> SignatureBench.ECDSA.verify SHA256withECDSA 1024 256 thrpt 3 1889.928 ? 54.660 ops/s >> SignatureBench.ECDSA.verify SHA256withECDSA 16384 256 thrpt 3 1866.339 ? 42.438 ops/s >> Benchmark (algorithm) (keyLength) (kpgAlgorithm) (provider) Mode Cnt Score Error Units >> o.o.b.j.c.full.KeyAgreementBench.EC.generateSecret ECDH 256 EC thrpt 3 1350.745 ? 28.514 ops/s >> o.o.b.j.c.small.KeyAgreementBench.EC.generateSecret ECDH 256 EC thrpt 3 1349.393 ? 32.050 ops/s >> >> Performance in master without mult() intrinsic >> >> Benchmark ... > > Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision: > > comment from Sandhya Talking about future improvements. Is it possible to optimize reduction code by converting it to intrinsic too? Or code generated by C2 is good enough? ------------- PR Comment: https://git.openjdk.org/jdk/pull/19728#issuecomment-2174616789 From kvn at openjdk.org Mon Jun 17 23:55:11 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 17 Jun 2024 23:55:11 GMT Subject: RFR: 8329032: C2 compiler register allocation support for APX EGPRs [v3] In-Reply-To: References: <835IiFAjWPI9AoU9j1WSAhVY2EEHzjpvWucUXTGxgUw=.cdb547b2-e1e7-4c5b-915f-7fde46e26c81@github.com> <8bWJmx1khK66SJPV3THbBLzDych3zwt5VlRSTeAViOU=.de4d309c-cd56-4963-97f0-3735046d6c27@github.com> Message-ID: On Mon, 17 Jun 2024 21:31:36 GMT, Jatin Bhateja wrote: >> src/hotspot/cpu/x86/vm_version_x86.cpp line 422: >> >>> 420: __ movl(Address(rsi,12), rdx); >>> 421: >>> 422: #if !defined(PRODUCT) && defined(_LP64) >> >> Why it is still under `!PRODUCT`? You need to check if OS supports it in product VM too. Or I missing something? > > Idea is to enable the APX support for product builds only after completion of planned features listed in JDK-8329030. > > I have moved the check to [appropriate place](https://github.com/openjdk/jdk/pull/19042/files#diff-6ed856c57ddbe33e49883adb7c52ec51ed377e5f697dfd6d8bea505a97bfc5a5R3186) Thank you for moving `PRODUCT` check and adding comment. New save/restore methods should be under `#ifdef _LP64`. 32-bit build is broken again because of that. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19042#discussion_r1643560772 From duke at openjdk.org Tue Jun 18 00:54:18 2024 From: duke at openjdk.org (Volodymyr Paprotski) Date: Tue, 18 Jun 2024 00:54:18 GMT Subject: RFR: 8333583: Crypto-XDH.generateSecret regression after JDK-8329538 [v3] In-Reply-To: <8ZwlWoI93ja8DTfsPYPOCjVY4rQd8CpAhpsWTkyqYkg=.b0447eed-81ed-41f8-a7b4-039ca4d29b47@github.com> References: <8ZwlWoI93ja8DTfsPYPOCjVY4rQd8CpAhpsWTkyqYkg=.b0447eed-81ed-41f8-a7b4-039ca4d29b47@github.com> Message-ID: <_zQhvRV5931epNYRcVbD7CGAnhOZOu7fresZxhkzkJU=.bc041541-0650-47b7-a8e0-1ad89fc126f7@github.com> On Mon, 17 Jun 2024 23:29:18 GMT, Vladimir Kozlov wrote: > Talking about future improvements. Is it possible to optimize reduction code by converting it to intrinsic too? Or code generated by C2 is good enough? I had some experiments to try where I was using virtual methods to add optimizations, similar to the optimization here (i.e. the default method 'does nothing' and have just one override). Perhaps this issue could had been solved differently and there is something to do on the compiler side i.e. requires a specific order of optimizations.. specialize the IntegerPolynomial.setProduct() hot path for XDH field type, inline mult() from XDH field, realize that the return is always zero, which allows whatever optimizations that werent run for 4% performance. (I don't yet know enough about the C2 to be able to answer or 'fix' that) ------------- PR Comment: https://git.openjdk.org/jdk/pull/19728#issuecomment-2174701231 From kvn at openjdk.org Tue Jun 18 01:32:12 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 18 Jun 2024 01:32:12 GMT Subject: RFR: 8333583: Crypto-XDH.generateSecret regression after JDK-8329538 [v3] In-Reply-To: References: Message-ID: On Mon, 17 Jun 2024 16:38:55 GMT, Volodymyr Paprotski wrote: >> This fix recovers XDH performance but removes some of the P256 gains (~-8-14%). Still faster, but not as much. >> >> The fix is to undo 'int' return type on mult()/square(), which allowed to return partially reduced result (e.g. this avoids extra reductions when mult() result is fed into addition). This is the behaviour before the Montgomery ECC PR. >> >> --- >> XDH.generateSecret performance >> before Montgomery PR: >> >> Benchmark (algorithm) (keyLength) (kpgAlgorithm) (provider) Mode Cnt Score Error Units >> KeyAgreementBench.XDH.generateSecret XDH 255 XDH thrpt 3 8435.277 ? 27.230 ops/s >> >> after Montgomery PR: >> >> Benchmark (algorithm) (keyLength) (kpgAlgorithm) (provider) Mode Cnt Score Error Units >> KeyAgreementBench.XDH.generateSecret XDH 255 XDH thrpt 3 8309.028 ? 22.071 ops/s >> >> with this PR: >> >> Benchmark (algorithm) (keyLength) (kpgAlgorithm) (provider) Mode Cnt Score Error Units >> KeyAgreementBench.XDH.generateSecret XDH 255 XDH thrpt 3 8491.268 ? 32.858 ops/s >> >> --- >> >> P256 performance with/without mult intrinsic: >> >> Performance before Montgomery PR: >> >> Benchmark (algorithm) (dataSize) (keyLength) (provider) Mode Cnt Score Error Units >> SignatureBench.ECDSA.sign SHA256withECDSA 1024 256 thrpt 3 6398.727 ? 7.400 ops/s >> SignatureBench.ECDSA.sign SHA256withECDSA 16384 256 thrpt 3 6129.739 ? 5.995 ops/s >> SignatureBench.ECDSA.verify SHA256withECDSA 1024 256 thrpt 3 1889.928 ? 54.660 ops/s >> SignatureBench.ECDSA.verify SHA256withECDSA 16384 256 thrpt 3 1866.339 ? 42.438 ops/s >> Benchmark (algorithm) (keyLength) (kpgAlgorithm) (provider) Mode Cnt Score Error Units >> o.o.b.j.c.full.KeyAgreementBench.EC.generateSecret ECDH 256 EC thrpt 3 1350.745 ? 28.514 ops/s >> o.o.b.j.c.small.KeyAgreementBench.EC.generateSecret ECDH 256 EC thrpt 3 1349.393 ? 32.050 ops/s >> >> Performance in master without mult() intrinsic >> >> Benchmark ... > > Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision: > > comment from Sandhya There are examples in C2 how to check method's class holder (intrinsic's predicate) before executing intrinsic code. See, for example, code for `_counterMode_AESCrypt` in `library_call.cpp`. I am not sure is this what you are asking for. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19728#issuecomment-2174736168 From asmehra at openjdk.org Tue Jun 18 02:28:10 2024 From: asmehra at openjdk.org (Ashutosh Mehra) Date: Tue, 18 Jun 2024 02:28:10 GMT Subject: RFR: 8333769: Pretouching tests dont test pretouching [v3] In-Reply-To: References: Message-ID: On Mon, 17 Jun 2024 16:09:48 GMT, Sonia Zaldana Calles wrote: >> Hi all, >> >> This PR addresses [8333769](https://bugs.openjdk.org/browse/JDK-8333769). >> >> We already have a test for parallel GC that makes sure pretouching behaviour is correct ([test/hotspot/jtreg/gc/parallel/TestAlwaysPreTouchBehavior.java](https://github.com/openjdk/jdk/blob/master/test/hotspot/jtreg/gc/parallel/TestAlwaysPreTouchBehavior.java)). >> >> Unfortunately this test is limited to linux because of the scanning of `/proc/pid/status`. With this patch I propose two changes: >> >> - Adding a function to the os namespace `os::rss` and exposing this API via WhiteBox. This in turn allows us to generalize the above test to be used across all platforms. >> - Running the modified test with all collectors. >> >> Additionally, I considered removing other pre-existing pretouch tests (for example, this [z test](https://github.com/openjdk/jdk/blob/master/test/hotspot/jtreg/gc/z/TestAlwaysPreTouch.java)), as this new test is a bit more thorough. However, I noticed that some of these tests run alongside other configurables such as varying numbers of parallel GC threads, varying heap sizes, etc. Therefore, there might not be any harm in running these tests as well. >> >> Looking forward to your comments, >> Sonia > > Sonia Zaldana Calles has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains four commits: > > - Fixing comment from KB to bytes > - Merge master > - Changes based on feedback and also adding test for serial collector > - 8333769: Pretouching tests dont test pretouching lgtm ------------- Marked as reviewed by asmehra (Committer). PR Review: https://git.openjdk.org/jdk/pull/19699#pullrequestreview-2124263576 From amitkumar at openjdk.org Tue Jun 18 02:39:10 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Tue, 18 Jun 2024 02:39:10 GMT Subject: RFR: 8332603: [PPC64] Improve itable_stub [v3] In-Reply-To: References: Message-ID: On Mon, 17 Jun 2024 16:11:26 GMT, Amit Kumar wrote: >> PPC Port similar to [JDK-8305959 (x86)](https://bugs.openjdk.org/browse/JDK-8305959) and [JDK-8307352(aarch64)](https://bugs.openjdk.org/browse/JDK-8307352) > > Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: > > fixes the test case I did one more round of `tier1` tests, here is result: Finished running test 'jtreg:test/lib-test:tier1' Test report is stored in build/linux-ppc64le-server-fastdebug/test-results/jtreg_test_lib_test_tier1 ============================== Test summary ============================== TEST TOTAL PASS FAIL ERROR jtreg:test/hotspot/jtreg:tier1 2470 2470 0 0 >> jtreg:test/jdk:tier1 2413 2412 1 0 << >> jtreg:test/langtools:tier1 4534 4533 1 0 << jtreg:test/jaxp:tier1 0 0 0 0 jtreg:test/lib-test:tier1 33 33 0 0 ============================== TEST FAILURE make[1]: *** [/root/amit/jdk/make/Init.gmk:327: main] Error 1 make: *** [/root/amit/jdk/make/Init.gmk:189: run-test-tier1] Error 2 root at crampon1:~/amit/jdk# cat $(f newfailures.txt) # newfailures.txt # newfailures.txt # newfailures.txt java/util/ResourceBundle/Control/MissingResourceCauseTestRun.java # newfailures.txt jdk/javadoc/doclet/testIOException/TestIOException.java # newfailures.txt # newfailures.txt root at crampon1:~/amit/jdk# ------------- PR Comment: https://git.openjdk.org/jdk/pull/19733#issuecomment-2174841211 From kvn at openjdk.org Tue Jun 18 03:04:11 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 18 Jun 2024 03:04:11 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 In-Reply-To: References: Message-ID: On Mon, 17 Jun 2024 09:49:25 GMT, Roberto Casta?eda Lozano wrote: > This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail. > > We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold: > > - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and > - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work. > > ## Summary of the Changes > > ### Platform-Independent Changes (`src/hotspot/share`) > > These consist mainly of: > > - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early; > - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and > - temporary support for porting the JEP to the remaining platforms. > > The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed. > > ### Platform-Dependent Changes (`src/hotspot/cpu`) > > These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms. > > #### ADL Changes > > The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy. > > #### `G1BarrierSetAssembler` Changes > > Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live registers, provided by the `SaveLiveRegisters` class. This c... Just first glance. In G1 specific .ad files predicate has check `UseG1GC && n->as_Store()->barrier_data() != 0` But in normal .ad files you check only `n->as_Store()->barrier_data() == 0`. >From what I see `barrier_data` is set only by G1 code now. But then why you check for `UseG1GC` in G1 specific .ad? I also have comment about generating relocation for card table base address. src/hotspot/cpu/x86/gc/g1/g1BarrierSetAssembler_x86.cpp line 297: > 295: // Do not use ExternalAddress to load 'byte_map_base', since 'byte_map_base' is NOT > 296: // a valid address and therefore is not properly handled by the relocation code. > 297: __ movptr(tmp2, (intptr_t)ct->card_table()->byte_map_base()); // tmp2 := card table base address Consider using `lea` instructions and `ExternalAddress` to generate relocation: __ lea(tmp2, ExternalAddress((address)ct->card_table()->byte_map_base())); The same in `generate_c1_post_barrier_runtime_stub()` Leyden needs relocation for card table base. ------------- PR Review: https://git.openjdk.org/jdk/pull/19746#pullrequestreview-2124316272 PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1643661201 From kvn at openjdk.org Tue Jun 18 03:08:19 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 18 Jun 2024 03:08:19 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 In-Reply-To: References: Message-ID: On Tue, 18 Jun 2024 03:01:56 GMT, Vladimir Kozlov wrote: > But then why you check for `UseG1GC` in G1 specific .ad? After some thinking, it seems reasonable to do if we intend to add such .ad files for other GCs. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19746#issuecomment-2174872639 From stuefe at openjdk.org Tue Jun 18 05:47:16 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 18 Jun 2024 05:47:16 GMT Subject: RFR: 8333769: Pretouching tests dont test pretouching [v3] In-Reply-To: References: Message-ID: On Mon, 17 Jun 2024 16:09:48 GMT, Sonia Zaldana Calles wrote: >> Hi all, >> >> This PR addresses [8333769](https://bugs.openjdk.org/browse/JDK-8333769). >> >> We already have a test for parallel GC that makes sure pretouching behaviour is correct ([test/hotspot/jtreg/gc/parallel/TestAlwaysPreTouchBehavior.java](https://github.com/openjdk/jdk/blob/master/test/hotspot/jtreg/gc/parallel/TestAlwaysPreTouchBehavior.java)). >> >> Unfortunately this test is limited to linux because of the scanning of `/proc/pid/status`. With this patch I propose two changes: >> >> - Adding a function to the os namespace `os::rss` and exposing this API via WhiteBox. This in turn allows us to generalize the above test to be used across all platforms. >> - Running the modified test with all collectors. >> >> Additionally, I considered removing other pre-existing pretouch tests (for example, this [z test](https://github.com/openjdk/jdk/blob/master/test/hotspot/jtreg/gc/z/TestAlwaysPreTouch.java)), as this new test is a bit more thorough. However, I noticed that some of these tests run alongside other configurables such as varying numbers of parallel GC threads, varying heap sizes, etc. Therefore, there might not be any harm in running these tests as well. >> >> Looking forward to your comments, >> Sonia > > Sonia Zaldana Calles has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains four commits: > > - Fixing comment from KB to bytes > - Merge master > - Changes based on feedback and also adding test for serial collector > - 8333769: Pretouching tests dont test pretouching Still good. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19699#issuecomment-2175060160 From jbhateja at openjdk.org Tue Jun 18 07:06:39 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 18 Jun 2024 07:06:39 GMT Subject: RFR: 8329032: C2 compiler register allocation support for APX EGPRs [v5] In-Reply-To: <835IiFAjWPI9AoU9j1WSAhVY2EEHzjpvWucUXTGxgUw=.cdb547b2-e1e7-4c5b-915f-7fde46e26c81@github.com> References: <835IiFAjWPI9AoU9j1WSAhVY2EEHzjpvWucUXTGxgUw=.cdb547b2-e1e7-4c5b-915f-7fde46e26c81@github.com> Message-ID: <-EktKXV73_sjXl4zEVd4LQUO3Llq5o1dzDFPYQTitdM=.91b6e91b-d5ea-4996-b8d6-704107112722@github.com> > Intel? Advanced Performance Extensions (Intel? APX) adds 16 new 64 bit general purpose register also known as Extended General Purpose Registers in IA-32e 64 bit mode. > > Summary of changes introduced along with this patch:- > > 1. C2 compiler register allocation support. > 2. Architecture state save restoration while transitioning from C1/C2 JIT compiled code to runtime services. > 3. Support new PUSH2/POP2 instructions along with push-pop acceleration hints (PPX) to optimize register save/restore operation. > 4. Applicable extensions to native interface used by runtime for patching instruction. > > We plan to address C1 register support in subsequent patch as there are hard upper bound allocation limits > (currently set to r11) imposed by existing implementation of linear scan algorithm after which it reserves > remaining register for special purpose. > > Patch has been regressed over stand alone test points after prioritizing EGPR allocations over existing GPR register by manually modifying the register sequences in relevant allocation class. > > We plan to do thorough validation using [Intel's SDE](https://www.intel.com/content/www/us/en/download/684897/intel-software-development-emulator.html) during course of time and release incremental patches for bug fixes > found during testing. > > Kindly review and share your feedback. > > Best Regards, > Jatin Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: 32-bit build fix. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19042/files - new: https://git.openjdk.org/jdk/pull/19042/files/3efdbb73..9c90080b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19042&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19042&range=03-04 Stats: 5 lines in 2 files changed: 4 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/19042.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19042/head:pull/19042 PR: https://git.openjdk.org/jdk/pull/19042 From cstein at openjdk.org Tue Jun 18 07:14:11 2024 From: cstein at openjdk.org (Christian Stein) Date: Tue, 18 Jun 2024 07:14:11 GMT Subject: RFR: 8331431: Update to use jtreg 7.4 In-Reply-To: <_Q1eZAC0M9Q3B8idE8tfSg0TZ7Lh-tXoLdcbV4LZsa4=.392cc442-74ce-47c0-aea5-eaee500da7c1@github.com> References: <_Q1eZAC0M9Q3B8idE8tfSg0TZ7Lh-tXoLdcbV4LZsa4=.392cc442-74ce-47c0-aea5-eaee500da7c1@github.com> Message-ID: On Thu, 2 May 2024 09:48:51 GMT, Christian Stein wrote: > Please review the change to update to using `jtreg` **7.4**. > > The primary change is to the `jib-profiles.js` file, which specifies the version of jtreg to use, for those systems that rely on this file. In addition, the `requiredVersion` has been updated in the various `TEST.ROOT` files. > > Tested: tier 1 ? tier 8 Tests in tier 6-8 also look good. About to make `jtreg 7.4` the default now. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19052#issuecomment-2175335129 From cstein at openjdk.org Tue Jun 18 07:30:18 2024 From: cstein at openjdk.org (Christian Stein) Date: Tue, 18 Jun 2024 07:30:18 GMT Subject: Integrated: 8331431: Update to use jtreg 7.4 In-Reply-To: <_Q1eZAC0M9Q3B8idE8tfSg0TZ7Lh-tXoLdcbV4LZsa4=.392cc442-74ce-47c0-aea5-eaee500da7c1@github.com> References: <_Q1eZAC0M9Q3B8idE8tfSg0TZ7Lh-tXoLdcbV4LZsa4=.392cc442-74ce-47c0-aea5-eaee500da7c1@github.com> Message-ID: On Thu, 2 May 2024 09:48:51 GMT, Christian Stein wrote: > Please review the change to update to using `jtreg` **7.4**. > > The primary change is to the `jib-profiles.js` file, which specifies the version of jtreg to use, for those systems that rely on this file. In addition, the `requiredVersion` has been updated in the various `TEST.ROOT` files. > > Tested: tier 1 ? tier 8 This pull request has now been integrated. Changeset: 99fefec0 Author: Christian Stein URL: https://git.openjdk.org/jdk/commit/99fefec092f49cd759f93aa75e008cfa06d2a183 Stats: 12 lines in 8 files changed: 0 ins; 0 del; 12 mod 8331431: Update to use jtreg 7.4 Reviewed-by: ihse, erikj, jpai ------------- PR: https://git.openjdk.org/jdk/pull/19052 From dholmes at openjdk.org Tue Jun 18 09:46:21 2024 From: dholmes at openjdk.org (David Holmes) Date: Tue, 18 Jun 2024 09:46:21 GMT Subject: RFR: 8334223: Make Arena MEMFLAGs immutable [v2] In-Reply-To: References: Message-ID: On Mon, 17 Jun 2024 15:33:34 GMT, Thomas Stuefe wrote: >> Arenas carry NMT flags. >> >> An arena should never change that flag. But it does: Arenas (as ResourceAreas), used by CompilerThread, are accounted toward mtCompiler. But since the RA is already created in the parent class constructor (as mtThread), we then have to awkwardly change the flag of an already existing RA in the CompilerThread constructor. >> >> As a prerequisite for future NMT work I would like Arena MEMFLAGS to be immutable. >> >> The patch does that: >> - we hand in MEMFLAGS to the Thread constructor now (defaults to mtThread) >> - CompilerThread hands in mtCompiler, all other threads rely on the default >> - on creation, both ResourceArea and HandleArea are now accounted toward the flag handed in >> - that allows us to make Arena::flags const, and to remove ResourceArea::bias_to which changed the flag in-flight for the arena >> - it also allows us to make Arena::flags private >> >> Other, unrelated cleanups: >> - Made Arena::_size_in_bytes and Arena::_tag private >> - Merged both Arena constructors into one by specifying a default value of `Chunk::init_size` for `init_size` argument. That makes it equivalent to the old `Arena(flag, tag)` constructor >> - removed `JavaThread::JavaThread(bool)`. That constructor was used when creating threads that are getting attached. There was only a single use for that constructor, and I replaced it with functionally equivalent code. >> >> Tests: >> >> I manually verified that the NMT numbers printed don't change. > > Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - feedback johan > - Merge branch 'master' into arena-constify-memflags > - start Changes requested by dholmes (Reviewer). src/hotspot/share/runtime/javaThread.hpp line 1138: > 1136: bool is_attaching_via_jni() const { return _jni_attach_state == _attaching_via_jni; } > 1137: bool has_attached_via_jni() const { return is_attaching_via_jni() || _jni_attach_state == _attached_via_jni; } > 1138: inline void set_is_attaching_via_jni(); So the only thing I don't like about this aspect is that this has to be called at construction time, so having it be a stand-alone function call invites misuse. I'm tempted to add a factory method: JavaThread* JavaThread::new_attaching_thread() { JavaThread* jt = new JavaThread(); jt>set_is_attaching_via_jni(); } and make `set_is_attaching_via_jni()` private. What do you think? src/hotspot/share/runtime/javaThread.inline.hpp line 201: > 199: inline void JavaThread::set_is_attaching_via_jni() { > 200: _jni_attach_state = _attaching_via_jni; > 201: OrderAccess::fence(); No need for a fence here as this should be set before the thread has been "published" and is visible to anyone else. In contrast `set_done_attaching_via_jni` is set after the thread has been published. ------------- PR Review: https://git.openjdk.org/jdk/pull/19693#pullrequestreview-2125019202 PR Review Comment: https://git.openjdk.org/jdk/pull/19693#discussion_r1644155425 PR Review Comment: https://git.openjdk.org/jdk/pull/19693#discussion_r1644156749 From ayang at openjdk.org Tue Jun 18 10:02:19 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Tue, 18 Jun 2024 10:02:19 GMT Subject: RFR: 8333769: Pretouching tests dont test pretouching [v3] In-Reply-To: References: Message-ID: On Mon, 17 Jun 2024 16:09:48 GMT, Sonia Zaldana Calles wrote: >> Hi all, >> >> This PR addresses [8333769](https://bugs.openjdk.org/browse/JDK-8333769). >> >> We already have a test for parallel GC that makes sure pretouching behaviour is correct ([test/hotspot/jtreg/gc/parallel/TestAlwaysPreTouchBehavior.java](https://github.com/openjdk/jdk/blob/master/test/hotspot/jtreg/gc/parallel/TestAlwaysPreTouchBehavior.java)). >> >> Unfortunately this test is limited to linux because of the scanning of `/proc/pid/status`. With this patch I propose two changes: >> >> - Adding a function to the os namespace `os::rss` and exposing this API via WhiteBox. This in turn allows us to generalize the above test to be used across all platforms. >> - Running the modified test with all collectors. >> >> Additionally, I considered removing other pre-existing pretouch tests (for example, this [z test](https://github.com/openjdk/jdk/blob/master/test/hotspot/jtreg/gc/z/TestAlwaysPreTouch.java)), as this new test is a bit more thorough. However, I noticed that some of these tests run alongside other configurables such as varying numbers of parallel GC threads, varying heap sizes, etc. Therefore, there might not be any harm in running these tests as well. >> >> Looking forward to your comments, >> Sonia > > Sonia Zaldana Calles has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains four commits: > > - Fixing comment from KB to bytes > - Merge master > - Changes based on feedback and also adding test for serial collector > - 8333769: Pretouching tests dont test pretouching I recall there's some issue for debug-build due to `ZapUnusedHeapArea` so that tests would still pass even if pretouching logic is broken/disabled in VM. Have you checked whether test case fails if you purposely introduce a bug in pretouch logic in VM? ------------- PR Comment: https://git.openjdk.org/jdk/pull/19699#issuecomment-2175700039 From rcastanedalo at openjdk.org Tue Jun 18 10:22:12 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 18 Jun 2024 10:22:12 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 In-Reply-To: References: Message-ID: On Tue, 18 Jun 2024 03:05:04 GMT, Vladimir Kozlov wrote: > > But then why you check for `UseG1GC` in G1 specific .ad? > > After some thinking, it seems reasonable to do if we intend to add such .ad files for other GCs. Right, note that this is already the case for ZGC, see e.g. https://github.com/openjdk/jdk/blob/614b99a8f8360dc0a6a018f06fb336c6883f0f4a/src/hotspot/cpu/x86/gc/z/z_x86_64.ad#L117 ------------- PR Comment: https://git.openjdk.org/jdk/pull/19746#issuecomment-2175742957 From rcastanedalo at openjdk.org Tue Jun 18 10:22:13 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 18 Jun 2024 10:22:13 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 In-Reply-To: References: Message-ID: On Tue, 18 Jun 2024 02:46:08 GMT, Vladimir Kozlov wrote: > Consider using lea instructions and ExternalAddress to generate relocation: Just for my understanding, do you mean that the comment immediately above (introduced in [JDK-8028109](https://bugs.openjdk.org/browse/JDK-8028109)): // Do not use ExternalAddress to load 'byte_map_base', since 'byte_map_base' is NOT // a valid address and therefore is not properly handled by the relocation code. does not hold anymore in mainline? > The same in generate_c1_post_barrier_runtime_stub() Note that this JEP is not concerned with C1 barriers, in fact `generate_c1_post_barrier_runtime_stub()` is not touched by the changeset. > Leyden needs relocation for card table base. For ease of reviewing, porting, etc. I would suggest to introduce the required changes for project Leyden as a follow-up RFE, would that work? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1644210103 From ayang at openjdk.org Tue Jun 18 10:51:11 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Tue, 18 Jun 2024 10:51:11 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 In-Reply-To: References: Message-ID: On Mon, 17 Jun 2024 09:49:25 GMT, Roberto Casta?eda Lozano wrote: > This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail. > > We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold: > > - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and > - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work. > > ## Summary of the Changes > > ### Platform-Independent Changes (`src/hotspot/share`) > > These consist mainly of: > > - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early; > - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and > - temporary support for porting the JEP to the remaining platforms. > > The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed. > > ### Platform-Dependent Changes (`src/hotspot/cpu`) > > These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms. > > #### ADL Changes > > The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy. > > #### `G1BarrierSetAssembler` Changes > > Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live registers, provided by the `SaveLiveRegisters` class. This c... src/hotspot/cpu/x86/gc/g1/g1BarrierSetAssembler_x86.cpp line 328: > 326: __ jcc(is_single_region, done); > 327: > 328: Assembler::Condition is_new_val_null = generate_new_val_null_test(masm, new_val); I actually kind of like the previous style that (almost) all "asm" is at the same level -- the newly introduced helper functions hinder the flow, IMO. src/hotspot/share/gc/g1/c2/g1BarrierSetC2.cpp line 479: > 477: > 478: if (g1_can_remove_pre_barrier(kit, &kit->gvn(), adr, access.type(), adr_idx)) { > 479: barriers ^= G1C2BarrierPre; Is it possible to rewrite this method to remove `^=`? This method first construct both pre/post barriers and remove if needed. I wonder if the logic will be cleaner if we track if pre/post barrier can be removed using two `bool` and construct the actual barrier in the end using two `bool` -- this way, we can avoid mutating the `barriers` var. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1644242374 PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1644247208 From erik.osterlund at oracle.com Tue Jun 18 12:01:22 2024 From: erik.osterlund at oracle.com (Erik Osterlund) Date: Tue, 18 Jun 2024 12:01:22 +0000 Subject: [External] : Re: Adaptable Heap Sizing for G1 GC In-Reply-To: References: <8717965B-DD60-4D97-8AA8-564194083D51@oracle.com> Message-ID: <142653B0-F170-463A-928A-F59169FD43C1@oracle.com> Hi Jonathan, Regarding alternative 1, the JVM already does this. The os::available_memory() and os::physical_memory() APIs detect if we are running in a container, and then queries container limits. So if that?s the concern, then it would seem that we already have a solution. That?s what I?m using in my automatic heap sizing work for ZGC, and it?s working just fine. Am I missing anything? Thanks, /Erik On 15 Jun 2024, at 00:56, Jonathan Joo wrote: Hi Erik, We had a similar vision with regards to not having to set heap sizes manually :) Agreed that with the currently proposed OpenJDK changes alone, there would be no effect for the user, just an entry point to allow for more intelligent heap sizing. We definitely do want to ship a policy that actually calculates and sets these flags, but I think a good point for discussion is *how* to ship such a policy. Note that as long as the two flags are introduced into the OpenJDK, there is always a way for people to modify the flags on their own and get AHS-like behavior. I guess the question is, to what extent do we want to take our current implementation of AHS logic, and move that from outside the JVM into the JVM? I think there are a few different possibilities, given that currently, AHS relies on internal Google services to access all the data we need. 1. Try to replicate exactly the way AHS works using APIs available from within hotspot code. For example, querying container limit and fullness information in a way that can work in any generic container environment. (Is there a good way to obtain this?) 2. Come up with a potentially less complex, but general working solution that is maintained solely within the hotspot code. The cons of this is that Google's implementation and upstream's implementation will diverge, and so there is more maintenance overhead from our end. It also won't have as robust functionality as the solution we are using at Google. 3. Don't bother with importing any AHS logic into the OpenJDK, but instead simply open-source/publish our current policies. This would allow for people to adopt their own implementations of AHS to plug it in a way they see fit, or fiddle with our code and integrate it into their own environments. Though I agree that without access to a special launcher or other mechanism to run this code, this approach may have limited usefulness. I'm not as familiar with logistically how viable it would be to do these solutions. Would love to hear whether you think these approaches are viable, and/or any blockers you might foresee. Best, ~ Jonathan On Thu, Jun 13, 2024 at 4:17?AM Erik Osterlund > wrote: Hi Jonathan, I?m currently working on automatic heap sizing for ZGC. My vision is that users shouldn?t have to set heap sizes. Would love to see that in G1 as well. What you are describing sounds like it would do something similar. Having said that, it seems like the concrete changes you are proposing for OpenJDK, would not actually yield automatic heap sizing for the user. By the sound of it, you would need your special launcher with an extra thread that contains the actual heap sizing policy. The proposed JVM changes are mostly for being *able* to change the heap sizing policies externally, but without any policy shipped that actually changes it. While having a pluggable policy is great because anyone can put in their own favourite policy, there is also an obvious disadvantage that 99.9% of deployments won?t have any special launcher or supplier of an external heap sizing policy, or even know what we are talking about. Therefore, unless we also ship the policies, I unfortunately think that limits the usefulness of the feature. If, however, a policy was shipped so the heap can be sized automatically, I think that would make it much more widely useful. In my automatic heap sizing work, the goal is to ship both the mechanisms and the policies needed to automatically size (and resize) the heap, adapting to changing load and environments. Are you open to the idea of shipping a policy that actually changes the heap size as well? It would be great to be aligned on this, I think. Thanks, /Erik On 13 Jun 2024, at 01:32, Jonathan Joo > wrote: Hello hotspot-dev and hotspot-gc-dev, I'd like to reopen discussion on Adaptable Heap Sizing (AHS) for the G1 Garbage Collector, since we now have some time to dedicate to bringing this effort to the OpenJDK Community. Please see https://mail.openjdk.org/pipermail/hotspot-gc-dev/2022-September/040096.html for the original thread. The bullet points contained in the above link are still largely the same, and we have made significant improvements to the service over the past few years, and found success deploying it broadly across jobs internally. Now that we feel the feature has matured, we'd like to introduce it to the OpenJDK community in hopes that it can be adopted for broader use. In short - the goal of Adaptable Heap Sizing is to improve memory usage and reduce OOMs for Java applications, especially those deployed in containerized environments. The key insights are as follows: 1. Applications with low memory requirements but configured with high RAM often use RAM unnecessarily. We can utilize GC CPU overhead metrics to help guide heap sizing, allowing for RAM savings in these scenarios. 2. For Java applications running in containers, we can bound Java heap usage based on our knowledge of the current container memory usage as well as the current container size, to prevent container OOMs. The implementation of AHS currently involves some fairly lightweight changes to the JVM, through the introduction of two new manageable flags. They are essentially the same as these two (open feature requests): * https://bugs.openjdk.org/browse/JDK-8236073 * https://bugs.openjdk.org/browse/JDK-8204088 In addition, we have a separate thread (outside of the JVM, in our custom Java launcher) which reads in GC CPU overhead data and container information, and calculates appropriate values for these two flags. We call this the AHS worker thread, and this thread updates frequently (currently every second). The vast majority of the AHS logic is in this worker thread - the introduction of the new JVM flags above simply gives AHS a way to tune GC heuristics given this additional information. Thomas Schatzl mentioned there is a similar-sounding effort going on in ZGC, and also there were folks outside of Google who expressed interest in this project, so I think it is an appropriate time to discuss this again on an open forum. Given the positive results we've had deploying AHS internally at Google, we feel this is a valuable feature to the broader Java community that should be able to be leveraged by all to achieve more stable and efficient Java heap behavior ? I'd appreciate hearing peoples' thoughts on this. Thank you! ~ Jonathan (P.S. For more information, a talk given about this project can be viewed here, though it is somewhat dated.) -------------- next part -------------- An HTML attachment was scrubbed... URL: From aboldtch at openjdk.org Tue Jun 18 12:31:33 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Tue, 18 Jun 2024 12:31:33 GMT Subject: RFR: 8326820: Metadata artificially kept alive Message-ID: <1xvsDMD50nAALDma6JqSvqNVfYl1KnN0ihFdkztDHYE=.97983fa3-6feb-4ee6-b578-c76e326bcc33@github.com> ClassLoaderDataGraph provides APIs for walking different metadata. All the iterators which are not designed to be used by the GC also keep the holder of the CLDs alive and by extensions keeps all metadata alive. This is problematic for concurrent GC as it keeps otherwise unreachable classes from being unloaded and the respective metadata freed. This patch changes the default iteration behaviour to not keep the holder alive, with the exception of `loaded_classes_do` (renamed `loaded_classes_do_keepalive`) and `modules_do` (renamed `modules_do_keepalive`) which is used by jvmti APIs that requires that the holder is kept alive. All other uses consumes all the metadata it queries during its safepoint or before releasing the `ClassLoaderDataGraph_lock`. Before this change some jcmd, new jfr chunks and some jfr events, all of which consumed these APIs, could cause class unloading to not occur. Been running our internal stress test in an even more stressful mode which without this patch reproduces the metaspace OOME [JDK-8326005](https://bugs.openjdk.org/browse/JDK-8326005) consistently within a few hours. And after this patch it does not. Currently running tier1-tier8 testing. ------------- Commit messages: - Fixup comments after renaming - Rename functions with their side effects - jvmti GetAllModules requires holder to be kept alive - jvmti GetLoadedClasses requires holder to be kept alive - 8326820: Metadata artificially kept alive Changes: https://git.openjdk.org/jdk/pull/19769/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19769&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8326820 Stats: 112 lines in 22 files changed: 21 ins; 26 del; 65 mod Patch: https://git.openjdk.org/jdk/pull/19769.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19769/head:pull/19769 PR: https://git.openjdk.org/jdk/pull/19769 From eosterlund at openjdk.org Tue Jun 18 13:45:17 2024 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Tue, 18 Jun 2024 13:45:17 GMT Subject: RFR: 8326820: Metadata artificially kept alive In-Reply-To: <1xvsDMD50nAALDma6JqSvqNVfYl1KnN0ihFdkztDHYE=.97983fa3-6feb-4ee6-b578-c76e326bcc33@github.com> References: <1xvsDMD50nAALDma6JqSvqNVfYl1KnN0ihFdkztDHYE=.97983fa3-6feb-4ee6-b578-c76e326bcc33@github.com> Message-ID: <_gK8rjr_K_JWiLN0upBG-ZBdVUazT4Rr_CbN7rhF88E=.3857a7c8-6870-4311-b45f-8aaa09af312a@github.com> On Tue, 18 Jun 2024 12:25:36 GMT, Axel Boldt-Christmas wrote: > ClassLoaderDataGraph provides APIs for walking different metadata. All the iterators which are not designed to be used by the GC also keep the holder of the CLDs alive and by extensions keeps all metadata alive. This is problematic for concurrent GC as it keeps otherwise unreachable classes from being unloaded and the respective metadata freed. > > This patch changes the default iteration behaviour to not keep the holder alive, with the exception of `loaded_classes_do` (renamed `loaded_classes_do_keepalive`) and `modules_do` (renamed `modules_do_keepalive`) which is used by jvmti APIs that requires that the holder is kept alive. > > All other uses consumes all the metadata it queries during its safepoint or before releasing the `ClassLoaderDataGraph_lock`. > > Before this change some jcmd, new jfr chunks and some jfr events, all of which consumed these APIs, could cause class unloading to not occur. > > Been running our internal stress test in an even more stressful mode which without this patch reproduces the metaspace OOME [JDK-8326005](https://bugs.openjdk.org/browse/JDK-8326005) consistently within a few hours. And after this patch it does not. > > Currently running tier1-tier8 testing. Changes requested by eosterlund (Reviewer). src/hotspot/share/classfile/classLoaderDataGraph.cpp line 245: > 243: // Iterating over the CLDG needs to be locked because > 244: // unloading can remove entries concurrently soon. > 245: class ClassLoaderDataGraph::ClassLoaderDataGraphIterator : public StackObj { If we now have only a no keepalive variation of this iterator implementation, then perhaps it would be a good idea to have a comment here making it clear that 1) this iterator does not keep the metadata alive and hence that 2) it is up to the user to keep oops alive manually if they are to be exposed in the object graph, or we will crash. src/hotspot/share/prims/jvmtiEnvBase.cpp line 2342: > 2340: > 2341: // Iterate over all the modules loaded to the system. > 2342: ClassLoaderDataGraph::modules_do_keepalive(&do_module); Looks like this code exposes an OopHandle backed by the CLD handle area, which isn't a strong root that the GC will start tracing from. So it would seem that we need to keep these oops alive somehow. src/hotspot/share/prims/jvmtiGetLoadedClasses.cpp line 108: > 106: // and collect them using the LoadedClassesClosure > 107: MutexLocker mcld(ClassLoaderDataGraph_lock); > 108: ClassLoaderDataGraph::loaded_classes_do_keepalive(&closure); This one looks like it might not be safe to exposes without keeping the classes alive. ------------- PR Review: https://git.openjdk.org/jdk/pull/19769#pullrequestreview-2125542310 PR Review Comment: https://git.openjdk.org/jdk/pull/19769#discussion_r1644492836 PR Review Comment: https://git.openjdk.org/jdk/pull/19769#discussion_r1644478958 PR Review Comment: https://git.openjdk.org/jdk/pull/19769#discussion_r1644479082 From eosterlund at openjdk.org Tue Jun 18 13:45:18 2024 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Tue, 18 Jun 2024 13:45:18 GMT Subject: RFR: 8326820: Metadata artificially kept alive In-Reply-To: <_gK8rjr_K_JWiLN0upBG-ZBdVUazT4Rr_CbN7rhF88E=.3857a7c8-6870-4311-b45f-8aaa09af312a@github.com> References: <1xvsDMD50nAALDma6JqSvqNVfYl1KnN0ihFdkztDHYE=.97983fa3-6feb-4ee6-b578-c76e326bcc33@github.com> <_gK8rjr_K_JWiLN0upBG-ZBdVUazT4Rr_CbN7rhF88E=.3857a7c8-6870-4311-b45f-8aaa09af312a@github.com> Message-ID: On Tue, 18 Jun 2024 13:33:59 GMT, Erik ?sterlund wrote: >> ClassLoaderDataGraph provides APIs for walking different metadata. All the iterators which are not designed to be used by the GC also keep the holder of the CLDs alive and by extensions keeps all metadata alive. This is problematic for concurrent GC as it keeps otherwise unreachable classes from being unloaded and the respective metadata freed. >> >> This patch changes the default iteration behaviour to not keep the holder alive, with the exception of `loaded_classes_do` (renamed `loaded_classes_do_keepalive`) and `modules_do` (renamed `modules_do_keepalive`) which is used by jvmti APIs that requires that the holder is kept alive. >> >> All other uses consumes all the metadata it queries during its safepoint or before releasing the `ClassLoaderDataGraph_lock`. >> >> Before this change some jcmd, new jfr chunks and some jfr events, all of which consumed these APIs, could cause class unloading to not occur. >> >> Been running our internal stress test in an even more stressful mode which without this patch reproduces the metaspace OOME [JDK-8326005](https://bugs.openjdk.org/browse/JDK-8326005) consistently within a few hours. And after this patch it does not. >> >> Currently running tier1-tier8 testing. > > src/hotspot/share/prims/jvmtiGetLoadedClasses.cpp line 108: > >> 106: // and collect them using the LoadedClassesClosure >> 107: MutexLocker mcld(ClassLoaderDataGraph_lock); >> 108: ClassLoaderDataGraph::loaded_classes_do_keepalive(&closure); > > This one looks like it might not be safe to exposes without keeping the classes alive. Looks like we fetch the mirror with Klass::java_mirror() and not Klass::java_mirror_no_keepalive(). Its tempting to think that java_mirror will, unlike its evil twin function, keep the mirror alive. However, this maps to _java_mirror.resolve() vs _java_mirror.peek(). The difference between these was only a thing in single generation ZGC as the _java_mirror is an OopHandle with strong references. Single generation ZGC was the only collector that needed to keep oops alive with strong reference loads - no other collector does that. In summary, unless you use single generation ZGC, we don't seem to keep the mirrors alive that we expose from here. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19769#discussion_r1644490774 From szaldana at openjdk.org Tue Jun 18 14:01:12 2024 From: szaldana at openjdk.org (Sonia Zaldana Calles) Date: Tue, 18 Jun 2024 14:01:12 GMT Subject: RFR: 8333769: Pretouching tests dont test pretouching [v3] In-Reply-To: References: Message-ID: On Tue, 18 Jun 2024 09:59:04 GMT, Albert Mingkun Yang wrote: > I recall there's some issue for debug-build due to `ZapUnusedHeapArea` so that tests would still pass even if pretouching logic is broken/disabled in VM. > > Have you checked whether test case fails if you purposely introduce a bug in pretouch logic in VM? Hi @albertnetymk, yes I broke `os::pretouch_memory` and confirmed the tests fail. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19699#issuecomment-2176175942 From szaldana at openjdk.org Tue Jun 18 14:08:19 2024 From: szaldana at openjdk.org (Sonia Zaldana Calles) Date: Tue, 18 Jun 2024 14:08:19 GMT Subject: Integrated: 8333769: Pretouching tests dont test pretouching In-Reply-To: References: Message-ID: On Thu, 13 Jun 2024 13:24:42 GMT, Sonia Zaldana Calles wrote: > Hi all, > > This PR addresses [8333769](https://bugs.openjdk.org/browse/JDK-8333769). > > We already have a test for parallel GC that makes sure pretouching behaviour is correct ([test/hotspot/jtreg/gc/parallel/TestAlwaysPreTouchBehavior.java](https://github.com/openjdk/jdk/blob/master/test/hotspot/jtreg/gc/parallel/TestAlwaysPreTouchBehavior.java)). > > Unfortunately this test is limited to linux because of the scanning of `/proc/pid/status`. With this patch I propose two changes: > > - Adding a function to the os namespace `os::rss` and exposing this API via WhiteBox. This in turn allows us to generalize the above test to be used across all platforms. > - Running the modified test with all collectors. > > Additionally, I considered removing other pre-existing pretouch tests (for example, this [z test](https://github.com/openjdk/jdk/blob/master/test/hotspot/jtreg/gc/z/TestAlwaysPreTouch.java)), as this new test is a bit more thorough. However, I noticed that some of these tests run alongside other configurables such as varying numbers of parallel GC threads, varying heap sizes, etc. Therefore, there might not be any harm in running these tests as well. > > Looking forward to your comments, > Sonia This pull request has now been integrated. Changeset: 8bc2fbe5 Author: Sonia Zaldana Calles Committer: Thomas Stuefe URL: https://git.openjdk.org/jdk/commit/8bc2fbe57893b110fdb5fd567df4615e7833e5ae Stats: 257 lines in 9 files changed: 178 ins; 79 del; 0 mod 8333769: Pretouching tests dont test pretouching Reviewed-by: stuefe, asmehra ------------- PR: https://git.openjdk.org/jdk/pull/19699 From stuefe at openjdk.org Tue Jun 18 14:08:18 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 18 Jun 2024 14:08:18 GMT Subject: RFR: 8333769: Pretouching tests dont test pretouching [v3] In-Reply-To: References: Message-ID: On Tue, 18 Jun 2024 13:58:16 GMT, Sonia Zaldana Calles wrote: > I recall there's some issue for debug-build due to `ZapUnusedHeapArea` so that tests would still pass even if pretouching logic is broken/disabled in VM. > > Have you checked whether test case fails if you purposely introduce a bug in pretouch logic in VM? @albertnetymk I thought `ZapUnusedHeapArea` is only for clearing after evacuation. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19699#issuecomment-2176190486 From ayang at openjdk.org Tue Jun 18 14:25:19 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Tue, 18 Jun 2024 14:25:19 GMT Subject: RFR: 8333769: Pretouching tests dont test pretouching [v3] In-Reply-To: References: Message-ID: On Tue, 18 Jun 2024 13:58:16 GMT, Sonia Zaldana Calles wrote: > yes I broke os::pretouch_memory and confirmed the tests fail. I just tried using `return` to impl `os::pretouch_memory` and `test/hotspot/jtreg/gc/TestAlwaysPreTouchBehavior.java` still passes for Serial in debug-build. > I thought ZapUnusedHeapArea is only for clearing after evacuation. Any unused mem will be written with this value; can be mistaken as "pretoucing". ------------- PR Comment: https://git.openjdk.org/jdk/pull/19699#issuecomment-2176232104 From stuefe at openjdk.org Tue Jun 18 14:46:31 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 18 Jun 2024 14:46:31 GMT Subject: RFR: 8322475: Extend printing for System.map [v5] In-Reply-To: References: Message-ID: > This is an expansion on the new `System.map` command introduced with JDK-8318636. > > We now print valuable information per memory region, such as: > > - the actual resident set size > - the actual number of huge pages > - the actual used page size > - the THP state of the region (was advised, is eligible, uses THP, ...) > - whether the region is shared > - whether the region had been committed (backed by swap) > - whether the region has been swapped out. > > Example output: > > > from to size rss hugetlb pgsz prot notes vm info/file > 0x00000000c0000000 - 0x00000000ffe00000 1071644672 0 4194304 2M rw-p huge JAVAHEAP /anon_hugepage > 0x00000000ffe00000 - 0x0000000100000000 2097152 0 0 2M rw-p huge JAVAHEAP /anon_hugepage > 0x0000558016b67000 - 0x0000558016b68000 4096 4096 0 4K r--p /shared/projects/openjdk/jdk-jdk/output-fastdebug/images/jdk/bin/java > 0x0000558016b68000 - 0x0000558016b69000 4096 4096 0 4K r-xp /shared/projects/openjdk/jdk-jdk/output-fastdebug/images/jdk/bin/java > 0x00007f3a749f2000 - 0x00007f3a74c62000 2555904 2555904 0 4K rwxp CODE(CodeHeap 'profiled nmethods') > 0x00007f3a74c62000 - 0x00007f3a7be51000 119468032 0 0 4K ---p nores CODE(CodeHeap 'profiled nmethods') > 0x00007f3a7be51000 - 0x00007f3a7c1c1000 3604480 3604480 0 4K rwxp CODE(CodeHeap 'profiled nmethods') > 0x00007f3a7c1c1000 - 0x00007f3a7c592000 4001792 0 0 4K ---p nores CODE(CodeHeap 'non-nmethods') > 0x00007f3a7c592000 - 0x00007f3a7c802000 2555904 2555904 0 4K rwxp CODE(CodeHeap 'non-profiled nmethods') > 0x00007f3a7c802000 - 0x00007f3a839f200... Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 20 commits: - copyrights - Merge branch 'master' into System.maps-more-info - fix merge issue - Merge branch 'master' into System.maps-more-info - fix whitespace issue - wip - exhuming - Merge branch 'master' into System.maps-more-info - Merge - remove codecache name printing - ... and 10 more: https://git.openjdk.org/jdk/compare/91bd85d6...231a8a91 ------------- Changes: https://git.openjdk.org/jdk/pull/17158/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=17158&range=04 Stats: 656 lines in 14 files changed: 464 ins; 107 del; 85 mod Patch: https://git.openjdk.org/jdk/pull/17158.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17158/head:pull/17158 PR: https://git.openjdk.org/jdk/pull/17158 From kvn at openjdk.org Tue Jun 18 14:48:15 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 18 Jun 2024 14:48:15 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 In-Reply-To: References: Message-ID: On Tue, 18 Jun 2024 10:17:47 GMT, Roberto Casta?eda Lozano wrote: >> src/hotspot/cpu/x86/gc/g1/g1BarrierSetAssembler_x86.cpp line 297: >> >>> 295: // Do not use ExternalAddress to load 'byte_map_base', since 'byte_map_base' is NOT >>> 296: // a valid address and therefore is not properly handled by the relocation code. >>> 297: __ movptr(tmp2, (intptr_t)ct->card_table()->byte_map_base()); // tmp2 := card table base address >> >> Consider using `lea` instructions and `ExternalAddress` to generate relocation: >> >> __ lea(tmp2, ExternalAddress((address)ct->card_table()->byte_map_base())); >> >> The same in `generate_c1_post_barrier_runtime_stub()` >> >> Leyden needs relocation for card table base. > >> Consider using lea instructions and ExternalAddress to generate relocation: > > Just for my understanding, do you mean that the comment immediately above (introduced in [JDK-8028109](https://bugs.openjdk.org/browse/JDK-8028109)): > > // Do not use ExternalAddress to load 'byte_map_base', since 'byte_map_base' is NOT > // a valid address and therefore is not properly handled by the relocation code. > > does not hold anymore in mainline? > >> The same in generate_c1_post_barrier_runtime_stub() > > Note that this JEP is not concerned with C1 barriers, in fact `generate_c1_post_barrier_runtime_stub()` is not touched by the changeset. > >> Leyden needs relocation for card table base. > > For ease of reviewing, porting, etc. I would suggest to introduce the required changes for project Leyden as a follow-up RFE, would that work? Okay. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1644594488 From kvn at openjdk.org Tue Jun 18 14:58:15 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 18 Jun 2024 14:58:15 GMT Subject: RFR: 8329032: C2 compiler register allocation support for APX EGPRs [v5] In-Reply-To: <-EktKXV73_sjXl4zEVd4LQUO3Llq5o1dzDFPYQTitdM=.91b6e91b-d5ea-4996-b8d6-704107112722@github.com> References: <835IiFAjWPI9AoU9j1WSAhVY2EEHzjpvWucUXTGxgUw=.cdb547b2-e1e7-4c5b-915f-7fde46e26c81@github.com> <-EktKXV73_sjXl4zEVd4LQUO3Llq5o1dzDFPYQTitdM=.91b6e91b-d5ea-4996-b8d6-704107112722@github.com> Message-ID: On Tue, 18 Jun 2024 07:06:39 GMT, Jatin Bhateja wrote: >> Intel? Advanced Performance Extensions (Intel? APX) adds 16 new 64 bit general purpose register also known as Extended General Purpose Registers in IA-32e 64 bit mode. >> >> Summary of changes introduced along with this patch:- >> >> 1. C2 compiler register allocation support. >> 2. Architecture state save restoration while transitioning from C1/C2 JIT compiled code to runtime services. >> 3. Support new PUSH2/POP2 instructions along with push-pop acceleration hints (PPX) to optimize register save/restore operation. >> 4. Applicable extensions to native interface used by runtime for patching instruction. >> >> We plan to address C1 register support in subsequent patch as there are hard upper bound allocation limits >> (currently set to r11) imposed by existing implementation of linear scan algorithm after which it reserves >> remaining register for special purpose. >> >> Patch has been regressed over stand alone test points after prioritizing EGPR allocations over existing GPR register by manually modifying the register sequences in relevant allocation class. >> >> We plan to do thorough validation using [Intel's SDE](https://www.intel.com/content/www/us/en/download/684897/intel-software-development-emulator.html) during course of time and release incremental patches for bug fixes >> found during testing. >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > 32-bit build fix. Looks good. I pushed [#19758](https://github.com/openjdk/jdk/pull/19758) changes to remove unused code in nativeInst_x86.* @jatin-bhateja, please merge latest mainline to include the changes. I will start testing after that. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19042#issuecomment-2176311177 From kvn at openjdk.org Tue Jun 18 15:13:15 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 18 Jun 2024 15:13:15 GMT Subject: RFR: 8333583: Crypto-XDH.generateSecret regression after JDK-8329538 [v3] In-Reply-To: References: Message-ID: On Mon, 17 Jun 2024 16:38:55 GMT, Volodymyr Paprotski wrote: >> This fix recovers XDH performance but removes some of the P256 gains (~-8-14%). Still faster, but not as much. >> >> The fix is to undo 'int' return type on mult()/square(), which allowed to return partially reduced result (e.g. this avoids extra reductions when mult() result is fed into addition). This is the behaviour before the Montgomery ECC PR. >> >> --- >> XDH.generateSecret performance >> before Montgomery PR: >> >> Benchmark (algorithm) (keyLength) (kpgAlgorithm) (provider) Mode Cnt Score Error Units >> KeyAgreementBench.XDH.generateSecret XDH 255 XDH thrpt 3 8435.277 ? 27.230 ops/s >> >> after Montgomery PR: >> >> Benchmark (algorithm) (keyLength) (kpgAlgorithm) (provider) Mode Cnt Score Error Units >> KeyAgreementBench.XDH.generateSecret XDH 255 XDH thrpt 3 8309.028 ? 22.071 ops/s >> >> with this PR: >> >> Benchmark (algorithm) (keyLength) (kpgAlgorithm) (provider) Mode Cnt Score Error Units >> KeyAgreementBench.XDH.generateSecret XDH 255 XDH thrpt 3 8491.268 ? 32.858 ops/s >> >> --- >> >> P256 performance with/without mult intrinsic: >> >> Performance before Montgomery PR: >> >> Benchmark (algorithm) (dataSize) (keyLength) (provider) Mode Cnt Score Error Units >> SignatureBench.ECDSA.sign SHA256withECDSA 1024 256 thrpt 3 6398.727 ? 7.400 ops/s >> SignatureBench.ECDSA.sign SHA256withECDSA 16384 256 thrpt 3 6129.739 ? 5.995 ops/s >> SignatureBench.ECDSA.verify SHA256withECDSA 1024 256 thrpt 3 1889.928 ? 54.660 ops/s >> SignatureBench.ECDSA.verify SHA256withECDSA 16384 256 thrpt 3 1866.339 ? 42.438 ops/s >> Benchmark (algorithm) (keyLength) (kpgAlgorithm) (provider) Mode Cnt Score Error Units >> o.o.b.j.c.full.KeyAgreementBench.EC.generateSecret ECDH 256 EC thrpt 3 1350.745 ? 28.514 ops/s >> o.o.b.j.c.small.KeyAgreementBench.EC.generateSecret ECDH 256 EC thrpt 3 1349.393 ? 32.050 ops/s >> >> Performance in master without mult() intrinsic >> >> Benchmark ... > > Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision: > > comment from Sandhya Approved for VM changes. @TobiHartmann ran our testing and it passed. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19728#pullrequestreview-2125802405 PR Comment: https://git.openjdk.org/jdk/pull/19728#issuecomment-2176347091 From szaldana at openjdk.org Tue Jun 18 15:26:20 2024 From: szaldana at openjdk.org (Sonia Zaldana Calles) Date: Tue, 18 Jun 2024 15:26:20 GMT Subject: RFR: 8333769: Pretouching tests dont test pretouching [v3] In-Reply-To: References: Message-ID: On Tue, 18 Jun 2024 14:22:29 GMT, Albert Mingkun Yang wrote: > > yes I broke os::pretouch_memory and confirmed the tests fail. > > I just tried using `return` to impl `os::pretouch_memory` and `test/hotspot/jtreg/gc/TestAlwaysPreTouchBehavior.java` still passes for Serial in debug-build. > > > I thought ZapUnusedHeapArea is only for clearing after evacuation. > > Any unused mem will be written with this value; can be mistaken as "pretoucing". I had tried it out with the release build previously. Just tried it with the debug build and I observe the following: 4 pass and 3 fail (ZGenerational, ZSingleGen and G1). Would it be worthwhile to open a follow-up issue to disable these tests for the debug build for the passing tests and/or all of them? @albertnetymk @tstuefe ------------- PR Comment: https://git.openjdk.org/jdk/pull/19699#issuecomment-2176376454 From duke at openjdk.org Tue Jun 18 15:34:23 2024 From: duke at openjdk.org (Volodymyr Paprotski) Date: Tue, 18 Jun 2024 15:34:23 GMT Subject: RFR: 8333583: Crypto-XDH.generateSecret regression after JDK-8329538 [v3] In-Reply-To: References: Message-ID: <01gvTW3KEfA0QBgffjs0F-EqZPdWn3MmZTnTR0vnbL0=.4018624a-c493-4aad-a8ce-c51d363edf8f@github.com> On Tue, 18 Jun 2024 15:10:37 GMT, Vladimir Kozlov wrote: >> Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision: >> >> comment from Sandhya > > @TobiHartmann ran our testing and it passed. Thanks @vnkozlov @TobiHartmann ! ------------- PR Comment: https://git.openjdk.org/jdk/pull/19728#issuecomment-2176392727 From aboldtch at openjdk.org Tue Jun 18 16:14:16 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Tue, 18 Jun 2024 16:14:16 GMT Subject: RFR: 8326820: Metadata artificially kept alive In-Reply-To: <_gK8rjr_K_JWiLN0upBG-ZBdVUazT4Rr_CbN7rhF88E=.3857a7c8-6870-4311-b45f-8aaa09af312a@github.com> References: <1xvsDMD50nAALDma6JqSvqNVfYl1KnN0ihFdkztDHYE=.97983fa3-6feb-4ee6-b578-c76e326bcc33@github.com> <_gK8rjr_K_JWiLN0upBG-ZBdVUazT4Rr_CbN7rhF88E=.3857a7c8-6870-4311-b45f-8aaa09af312a@github.com> Message-ID: On Tue, 18 Jun 2024 13:42:24 GMT, Erik ?sterlund wrote: >> ClassLoaderDataGraph provides APIs for walking different metadata. All the iterators which are not designed to be used by the GC also keep the holder of the CLDs alive and by extensions keeps all metadata alive. This is problematic for concurrent GC as it keeps otherwise unreachable classes from being unloaded and the respective metadata freed. >> >> This patch changes the default iteration behaviour to not keep the holder alive, with the exception of `loaded_classes_do` (renamed `loaded_classes_do_keepalive`) and `modules_do` (renamed `modules_do_keepalive`) which is used by jvmti APIs that requires that the holder is kept alive. >> >> All other uses consumes all the metadata it queries during its safepoint or before releasing the `ClassLoaderDataGraph_lock`. >> >> Before this change some jcmd, new jfr chunks and some jfr events, all of which consumed these APIs, could cause class unloading to not occur. >> >> Been running our internal stress test in an even more stressful mode which without this patch reproduces the metaspace OOME [JDK-8326005](https://bugs.openjdk.org/browse/JDK-8326005) consistently within a few hours. And after this patch it does not. >> >> Currently running tier1-tier8 testing. > > src/hotspot/share/classfile/classLoaderDataGraph.cpp line 245: > >> 243: // Iterating over the CLDG needs to be locked because >> 244: // unloading can remove entries concurrently soon. >> 245: class ClassLoaderDataGraph::ClassLoaderDataGraphIterator : public StackObj { > > If we now have only a no keepalive variation of this iterator implementation, then perhaps it would be a good idea to have a comment here making it clear that 1) this iterator does not keep the metadata alive and hence that 2) it is up to the user to keep oops alive manually if they are to be exposed in the object graph, or we will crash. Yes, let's add a comment and change the name to make this clear, even tough `ClassLoaderDataGraphIterator` is internal to `ClassLoaderDataGraph`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19769#discussion_r1644735596 From aboldtch at openjdk.org Tue Jun 18 16:17:30 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Tue, 18 Jun 2024 16:17:30 GMT Subject: RFR: 8326820: Metadata artificially kept alive In-Reply-To: <_gK8rjr_K_JWiLN0upBG-ZBdVUazT4Rr_CbN7rhF88E=.3857a7c8-6870-4311-b45f-8aaa09af312a@github.com> References: <1xvsDMD50nAALDma6JqSvqNVfYl1KnN0ihFdkztDHYE=.97983fa3-6feb-4ee6-b578-c76e326bcc33@github.com> <_gK8rjr_K_JWiLN0upBG-ZBdVUazT4Rr_CbN7rhF88E=.3857a7c8-6870-4311-b45f-8aaa09af312a@github.com> Message-ID: On Tue, 18 Jun 2024 13:33:55 GMT, Erik ?sterlund wrote: >> ClassLoaderDataGraph provides APIs for walking different metadata. All the iterators which are not designed to be used by the GC also keep the holder of the CLDs alive and by extensions keeps all metadata alive. This is problematic for concurrent GC as it keeps otherwise unreachable classes from being unloaded and the respective metadata freed. >> >> This patch changes the default iteration behaviour to not keep the holder alive, with the exception of `loaded_classes_do` (renamed `loaded_classes_do_keepalive`) and `modules_do` (renamed `modules_do_keepalive`) which is used by jvmti APIs that requires that the holder is kept alive. >> >> All other uses consumes all the metadata it queries during its safepoint or before releasing the `ClassLoaderDataGraph_lock`. >> >> Before this change some jcmd, new jfr chunks and some jfr events, all of which consumed these APIs, could cause class unloading to not occur. >> >> Been running our internal stress test in an even more stressful mode which without this patch reproduces the metaspace OOME [JDK-8326005](https://bugs.openjdk.org/browse/JDK-8326005) consistently within a few hours. And after this patch it does not. >> >> Currently running tier1-tier8 testing. > > src/hotspot/share/prims/jvmtiEnvBase.cpp line 2342: > >> 2340: >> 2341: // Iterate over all the modules loaded to the system. >> 2342: ClassLoaderDataGraph::modules_do_keepalive(&do_module); > > Looks like this code exposes an OopHandle backed by the CLD handle area, which isn't a strong root that the GC will start tracing from. So it would seem that we need to keep these oops alive somehow. `ClassLoaderDataGraph:: modules_do_keepalive ` is a `keepalive` iteration. That is it will load the holder of the CLD. https://github.com/openjdk/jdk/blob/08366b1244775e5892bbbb184660821e8774f37a/src/hotspot/share/classfile/classLoaderDataGraph.cpp#L302-L310 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19769#discussion_r1644739284 From aboldtch at openjdk.org Tue Jun 18 16:17:30 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Tue, 18 Jun 2024 16:17:30 GMT Subject: RFR: 8326820: Metadata artificially kept alive In-Reply-To: References: <1xvsDMD50nAALDma6JqSvqNVfYl1KnN0ihFdkztDHYE=.97983fa3-6feb-4ee6-b578-c76e326bcc33@github.com> <_gK8rjr_K_JWiLN0upBG-ZBdVUazT4Rr_CbN7rhF88E=.3857a7c8-6870-4311-b45f-8aaa09af312a@github.com> Message-ID: On Tue, 18 Jun 2024 13:41:05 GMT, Erik ?sterlund wrote: >> src/hotspot/share/prims/jvmtiGetLoadedClasses.cpp line 108: >> >>> 106: // and collect them using the LoadedClassesClosure >>> 107: MutexLocker mcld(ClassLoaderDataGraph_lock); >>> 108: ClassLoaderDataGraph::loaded_classes_do_keepalive(&closure); >> >> This one looks like it might not be safe to exposes without keeping the classes alive. > > Looks like we fetch the mirror with Klass::java_mirror() and not Klass::java_mirror_no_keepalive(). Its tempting to think that java_mirror will, unlike its evil twin function, keep the mirror alive. However, this maps to _java_mirror.resolve() vs _java_mirror.peek(). The difference between these was only a thing in single generation ZGC as the _java_mirror is an OopHandle with strong references. Single generation ZGC was the only collector that needed to keep oops alive with strong reference loads - no other collector does that. > > In summary, unless you use single generation ZGC, we don't seem to keep the mirrors alive that we expose from here. `ClassLoaderDataGraph::loaded_classes_do_keepalive` is a `keepalive` iteration. That is it will load the holder of the CLD. https://github.com/openjdk/jdk/blob/08366b1244775e5892bbbb184660821e8774f37a/src/hotspot/share/classfile/classLoaderDataGraph.cpp#L328-L335 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19769#discussion_r1644738502 From jbhateja at openjdk.org Tue Jun 18 16:17:57 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 18 Jun 2024 16:17:57 GMT Subject: RFR: 8329032: C2 compiler register allocation support for APX EGPRs [v6] In-Reply-To: <835IiFAjWPI9AoU9j1WSAhVY2EEHzjpvWucUXTGxgUw=.cdb547b2-e1e7-4c5b-915f-7fde46e26c81@github.com> References: <835IiFAjWPI9AoU9j1WSAhVY2EEHzjpvWucUXTGxgUw=.cdb547b2-e1e7-4c5b-915f-7fde46e26c81@github.com> Message-ID: > Intel? Advanced Performance Extensions (Intel? APX) adds 16 new 64 bit general purpose register also known as Extended General Purpose Registers in IA-32e 64 bit mode. > > Summary of changes introduced along with this patch:- > > 1. C2 compiler register allocation support. > 2. Architecture state save restoration while transitioning from C1/C2 JIT compiled code to runtime services. > 3. Support new PUSH2/POP2 instructions along with push-pop acceleration hints (PPX) to optimize register save/restore operation. > 4. Applicable extensions to native interface used by runtime for patching instruction. > > We plan to address C1 register support in subsequent patch as there are hard upper bound allocation limits > (currently set to r11) imposed by existing implementation of linear scan algorithm after which it reserves > remaining register for special purpose. > > Patch has been regressed over stand alone test points after prioritizing EGPR allocations over existing GPR register by manually modifying the register sequences in relevant allocation class. > > We plan to do thorough validation using [Intel's SDE](https://www.intel.com/content/www/us/en/download/684897/intel-software-development-emulator.html) during course of time and release incremental patches for bug fixes > found during testing. > > Kindly review and share your feedback. > > Best Regards, > Jatin Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 14 commits: - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8329032 - 32-bit build fix. - Review comments resolutions. - jvmci test failures fixes - 32-bit build fixes. - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8329032 - Changes to skip over stack alignment gaps while popping registers using POP2 after comment from sviswa7 - 32 bit build fix and enforced stack alignment constraints. - Support new PUSH2/POP2 instructions along with Push-Pop Acceleration (PPX) to optimize register save/restore operation. - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8329032 - ... and 4 more: https://git.openjdk.org/jdk/compare/6f860f8f...8db22672 ------------- Changes: https://git.openjdk.org/jdk/pull/19042/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19042&range=05 Stats: 785 lines in 26 files changed: 601 ins; 53 del; 131 mod Patch: https://git.openjdk.org/jdk/pull/19042.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19042/head:pull/19042 PR: https://git.openjdk.org/jdk/pull/19042 From ihse at openjdk.org Tue Jun 18 16:19:39 2024 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Tue, 18 Jun 2024 16:19:39 GMT Subject: RFR: 8333268: Fixes for static build [v2] In-Reply-To: References: Message-ID: > This patch contains a set of changes to improve static builds. They will pave the way for implementing a full static-only java launcher. The changes here will: > > 1) Make sure non-exported symbols are made local in the static libraries. This means that the risk of symbol conflict is the same for static libraries as for dynamic libraries (i.e. in practice zero, as long as a consistent naming scheme is used for exported functions). > > 2) Remove the work-arounds to exclude duplicated symbols. > > 3) Fix some code in hotspot and the JDK libraries that did not work properly with a static java launcher. > > The latter fixes are copied from or inspired by the work done by @jianglizhou and her team as part of the Project Leyden [Hermetic Java](https://github.com/openjdk/leyden/tree/hermetic-java-runtime). Magnus Ihse Bursie has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: - Merge branch 'master' into static-linking-progress - Merge branch 'master' into static-linking-progress - Move the exported JVM_IsStaticallyLinked to a better location - Use runtime lookup of static vs dynamic instead of #ifdef STATIC_BUILD - Copy fix for init_system_properties_values on linux - Make sure we do not try to build static libraries on Windows - 8333268: Fixes for static build ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19478/files - new: https://git.openjdk.org/jdk/pull/19478/files/6b24a789..e1c46562 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19478&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19478&range=00-01 Stats: 2608 lines in 114 files changed: 1321 ins; 955 del; 332 mod Patch: https://git.openjdk.org/jdk/pull/19478.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19478/head:pull/19478 PR: https://git.openjdk.org/jdk/pull/19478 From jbhateja at openjdk.org Tue Jun 18 16:22:18 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 18 Jun 2024 16:22:18 GMT Subject: RFR: 8329032: C2 compiler register allocation support for APX EGPRs [v5] In-Reply-To: References: <835IiFAjWPI9AoU9j1WSAhVY2EEHzjpvWucUXTGxgUw=.cdb547b2-e1e7-4c5b-915f-7fde46e26c81@github.com> <-EktKXV73_sjXl4zEVd4LQUO3Llq5o1dzDFPYQTitdM=.91b6e91b-d5ea-4996-b8d6-704107112722@github.com> Message-ID: <2GY_97A9EF4sFAJiT4LdMimXj22aRUW0J4K7Z57Lq9E=.9ffa78c5-3a0b-4497-a16a-0980339a756d@github.com> On Tue, 18 Jun 2024 14:55:05 GMT, Vladimir Kozlov wrote: > Looks good. I pushed [#19758](https://github.com/openjdk/jdk/pull/19758) changes to remove unused code in nativeInst_x86.* @jatin-bhateja, please merge latest mainline to include the changes. I will start testing after that. Hi @vnkozlov , Please submit it for testing. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19042#issuecomment-2176497969 From sviswanathan at openjdk.org Tue Jun 18 16:52:17 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Tue, 18 Jun 2024 16:52:17 GMT Subject: RFR: 8329032: C2 compiler register allocation support for APX EGPRs [v6] In-Reply-To: References: <835IiFAjWPI9AoU9j1WSAhVY2EEHzjpvWucUXTGxgUw=.cdb547b2-e1e7-4c5b-915f-7fde46e26c81@github.com> Message-ID: On Tue, 18 Jun 2024 16:17:57 GMT, Jatin Bhateja wrote: >> Intel? Advanced Performance Extensions (Intel? APX) adds 16 new 64 bit general purpose register also known as Extended General Purpose Registers in IA-32e 64 bit mode. >> >> Summary of changes introduced along with this patch:- >> >> 1. C2 compiler register allocation support. >> 2. Architecture state save restoration while transitioning from C1/C2 JIT compiled code to runtime services. >> 3. Support new PUSH2/POP2 instructions along with push-pop acceleration hints (PPX) to optimize register save/restore operation. >> 4. Applicable extensions to native interface used by runtime for patching instruction. >> >> We plan to address C1 register support in subsequent patch as there are hard upper bound allocation limits >> (currently set to r11) imposed by existing implementation of linear scan algorithm after which it reserves >> remaining register for special purpose. >> >> Patch has been regressed over stand alone test points after prioritizing EGPR allocations over existing GPR register by manually modifying the register sequences in relevant allocation class. >> >> We plan to do thorough validation using [Intel's SDE](https://www.intel.com/content/www/us/en/download/684897/intel-software-development-emulator.html) during course of time and release incremental patches for bug fixes >> found during testing. >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 14 commits: > > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8329032 > - 32-bit build fix. > - Review comments resolutions. > - jvmci test failures fixes > - 32-bit build fixes. > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8329032 > - Changes to skip over stack alignment gaps while popping registers using POP2 after comment from sviswa7 > - 32 bit build fix and enforced stack alignment constraints. > - Support new PUSH2/POP2 instructions along with Push-Pop Acceleration (PPX) to optimize register save/restore operation. > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8329032 > - ... and 4 more: https://git.openjdk.org/jdk/compare/6f860f8f...8db22672 src/hotspot/cpu/x86/assembler_x86.cpp line 1679: > 1677: void Assembler::andnl(Register dst, Register src1, Address src2) { > 1678: assert(VM_Version::supports_bmi1(), "bit manipulation instructions not supported"); > 1679: assert((!needs_eevex(dst, src1) && !needs_eevex(src2.base(), src2.index())) || UseAPX, "extended gpr use requires UseAPX and UseAVX > 2"); This assert seems to be a leftover and can be removed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19042#discussion_r1638629834 From kvn at openjdk.org Tue Jun 18 16:52:16 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 18 Jun 2024 16:52:16 GMT Subject: RFR: 8329032: C2 compiler register allocation support for APX EGPRs [v6] In-Reply-To: References: <835IiFAjWPI9AoU9j1WSAhVY2EEHzjpvWucUXTGxgUw=.cdb547b2-e1e7-4c5b-915f-7fde46e26c81@github.com> Message-ID: On Tue, 18 Jun 2024 16:17:57 GMT, Jatin Bhateja wrote: >> Intel? Advanced Performance Extensions (Intel? APX) adds 16 new 64 bit general purpose register also known as Extended General Purpose Registers in IA-32e 64 bit mode. >> >> Summary of changes introduced along with this patch:- >> >> 1. C2 compiler register allocation support. >> 2. Architecture state save restoration while transitioning from C1/C2 JIT compiled code to runtime services. >> 3. Support new PUSH2/POP2 instructions along with push-pop acceleration hints (PPX) to optimize register save/restore operation. >> 4. Applicable extensions to native interface used by runtime for patching instruction. >> >> We plan to address C1 register support in subsequent patch as there are hard upper bound allocation limits >> (currently set to r11) imposed by existing implementation of linear scan algorithm after which it reserves >> remaining register for special purpose. >> >> Patch has been regressed over stand alone test points after prioritizing EGPR allocations over existing GPR register by manually modifying the register sequences in relevant allocation class. >> >> We plan to do thorough validation using [Intel's SDE](https://www.intel.com/content/www/us/en/download/684897/intel-software-development-emulator.html) during course of time and release incremental patches for bug fixes >> found during testing. >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 14 commits: > > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8329032 > - 32-bit build fix. > - Review comments resolutions. > - jvmci test failures fixes > - 32-bit build fixes. > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8329032 > - Changes to skip over stack alignment gaps while popping registers using POP2 after comment from sviswa7 > - 32 bit build fix and enforced stack alignment constraints. > - Support new PUSH2/POP2 instructions along with Push-Pop Acceleration (PPX) to optimize register save/restore operation. > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8329032 > - ... and 4 more: https://git.openjdk.org/jdk/compare/6f860f8f...8db22672 I start testing ------------- PR Comment: https://git.openjdk.org/jdk/pull/19042#issuecomment-2176551333 From ihse at openjdk.org Tue Jun 18 17:55:20 2024 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Tue, 18 Jun 2024 17:55:20 GMT Subject: RFR: 8333268: Fixes for static build In-Reply-To: References: Message-ID: On Thu, 30 May 2024 19:35:44 GMT, Magnus Ihse Bursie wrote: > Do os::lookup_function need to be implemented on Windows too, for symmetry, even if it is only used on Unix platforms? @AlanBateman suggested to add `lookup_function` to os_windows.cpp as well, but just let it contain ShouldNotReachHere. That sounds like a good solution to me. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19478#issuecomment-2176657975 From ihse at openjdk.org Tue Jun 18 18:00:13 2024 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Tue, 18 Jun 2024 18:00:13 GMT Subject: RFR: 8333268: Fixes for static build [v2] In-Reply-To: References: Message-ID: <0dEUfxGGkUTfm3TPCNbBxREmGZScyLCXwKv9-7AFf3M=.b69446a9-0828-4a99-a677-8f948ea612b6@github.com> On Tue, 18 Jun 2024 16:19:39 GMT, Magnus Ihse Bursie wrote: >> This patch contains a set of changes to improve static builds. They will pave the way for implementing a full static-only java launcher. The changes here will: >> >> 1) Make sure non-exported symbols are made local in the static libraries. This means that the risk of symbol conflict is the same for static libraries as for dynamic libraries (i.e. in practice zero, as long as a consistent naming scheme is used for exported functions). >> >> 2) Remove the work-arounds to exclude duplicated symbols. >> >> 3) Fix some code in hotspot and the JDK libraries that did not work properly with a static java launcher. >> >> The latter fixes are copied from or inspired by the work done by @jianglizhou and her team as part of the Project Leyden [Hermetic Java](https://github.com/openjdk/leyden/tree/hermetic-java-runtime). > > Magnus Ihse Bursie has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: > > - Merge branch 'master' into static-linking-progress > - Merge branch 'master' into static-linking-progress > - Move the exported JVM_IsStaticallyLinked to a better location > - Use runtime lookup of static vs dynamic instead of #ifdef STATIC_BUILD > - Copy fix for init_system_properties_values on linux > - Make sure we do not try to build static libraries on Windows > - 8333268: Fixes for static build src/hotspot/os/linux/os_linux.cpp line 605: > 603: > 604: // Get rid of /{client|server|hotspot}, if binary is libjvm.so. > 605: // Or, cut off /. @jianglizhou This code is based on changes in the Hermetic Java repo, but I do not fully understand neither the comment nor what the purpose is. If you could explain this a bit I'd be grateful. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19478#discussion_r1644855137 From rcastanedalo at openjdk.org Tue Jun 18 18:40:13 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 18 Jun 2024 18:40:13 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 In-Reply-To: References: Message-ID: On Tue, 18 Jun 2024 10:44:17 GMT, Albert Mingkun Yang wrote: >> This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail. >> >> We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold: >> >> - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and >> - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work. >> >> ## Summary of the Changes >> >> ### Platform-Independent Changes (`src/hotspot/share`) >> >> These consist mainly of: >> >> - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early; >> - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and >> - temporary support for porting the JEP to the remaining platforms. >> >> The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed. >> >> ### Platform-Dependent Changes (`src/hotspot/cpu`) >> >> These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms. >> >> #### ADL Changes >> >> The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy. >> >> #### `G1BarrierSetAssembler` Changes >> >> Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live ... > > src/hotspot/cpu/x86/gc/g1/g1BarrierSetAssembler_x86.cpp line 328: > >> 326: __ jcc(is_single_region, done); >> 327: >> 328: Assembler::Condition is_new_val_null = generate_new_val_null_test(masm, new_val); > > I actually kind of like the previous style that (almost) all "asm" is at the same level -- the newly introduced helper functions hinder the flow, IMO. Note that if we want to optimize the barrier code layout (see the [JEP description](https://openjdk.org/jeps/475), *Candidate optimizations* sub-section), splitting the assembly of each barrier in at least two blocks is necessary, since we need to separate the inline from the out-of-line (barrier stub) code. And since the assembly code has to be split into multiple functions anyway, I think it makes sense to group the code by logical blocks (different barrier tests, queue insertion, etc.), as proposed in this changeset. This also improves code reuse, e.g. the same `generate_queue_insertion` implementation is used for the pre- and post-barriers. If you still think there is value in grouping together the blocks that can be grouped together (e.g. `generate_single_region_test` + `generate_new_val_null_test` + `generate_card_young_test`), I can prototype the refactoring and let the G1 maintainers decide which alternative is more readable/maintainable. > src/hotspot/share/gc/g1/c2/g1BarrierSetC2.cpp line 479: > >> 477: >> 478: if (g1_can_remove_pre_barrier(kit, &kit->gvn(), adr, access.type(), adr_idx)) { >> 479: barriers ^= G1C2BarrierPre; > > Is it possible to rewrite this method to remove `^=`? This method first construct both pre/post barriers and remove if needed. I wonder if the logic will be cleaner if we track if pre/post barrier can be removed using two `bool` and construct the actual barrier in the end using two `bool` -- this way, we can avoid mutating the `barriers` var. Thanks for the suggestion, I will try it out. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1644895421 PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1644896332 From stuefe at openjdk.org Tue Jun 18 18:43:15 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 18 Jun 2024 18:43:15 GMT Subject: RFR: 8334223: Make Arena MEMFLAGs immutable [v2] In-Reply-To: References: Message-ID: <1rux-SZ00Jbs7dWqmfsT2Hmiplc0ndQO93PFbQTf1js=.2a609db5-66c6-4802-b90a-d3948bb3c05c@github.com> On Tue, 18 Jun 2024 09:36:12 GMT, David Holmes wrote: >> Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: >> >> - feedback johan >> - Merge branch 'master' into arena-constify-memflags >> - start > > src/hotspot/share/runtime/javaThread.hpp line 1138: > >> 1136: bool is_attaching_via_jni() const { return _jni_attach_state == _attaching_via_jni; } >> 1137: bool has_attached_via_jni() const { return is_attaching_via_jni() || _jni_attach_state == _attached_via_jni; } >> 1138: inline void set_is_attaching_via_jni(); > > So the only thing I don't like about this aspect is that this has to be called at construction time, so having it be a stand-alone function call invites misuse. I'm tempted to add a factory method: > > JavaThread* JavaThread::new_attaching_thread() { > JavaThread* jt = new JavaThread(); > jt>set_is_attaching_via_jni(); > } > > and make `set_is_attaching_via_jni()` private. > > What do you think? Yes, I like that better, too. Okay, I'll do that. > src/hotspot/share/runtime/javaThread.inline.hpp line 201: > >> 199: inline void JavaThread::set_is_attaching_via_jni() { >> 200: _jni_attach_state = _attaching_via_jni; >> 201: OrderAccess::fence(); > > No need for a fence here as this should be set before the thread has been "published" and is visible to anyone else. In contrast `set_done_attaching_via_jni` is set after the thread has been published. I had the vague notion of not wanting to tie `set_done_attaching_via_jni` to an exact point in time. But that was not well thought out anyway. With a factory method, this problem disappears. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19693#discussion_r1644897175 PR Review Comment: https://git.openjdk.org/jdk/pull/19693#discussion_r1644899449 From jonathanjoo at google.com Tue Jun 18 19:14:29 2024 From: jonathanjoo at google.com (Jonathan Joo) Date: Tue, 18 Jun 2024 12:14:29 -0700 Subject: [External] : Re: Adaptable Heap Sizing for G1 GC In-Reply-To: <142653B0-F170-463A-928A-F59169FD43C1@oracle.com> References: <8717965B-DD60-4D97-8AA8-564194083D51@oracle.com> <142653B0-F170-463A-928A-F59169FD43C1@oracle.com> Message-ID: Hi Erik, Thank you for the pointer - this does seem promising. I'll be on vacation for the next week (+ a few days), but I'll look into this more when I get back! Best, ~ Jonathan On Tue, Jun 18, 2024 at 5:01?AM Erik Osterlund wrote: > Hi Jonathan, > > Regarding alternative 1, the JVM already does this. The > os::available_memory() and os::physical_memory() APIs detect if we are > running in a container, and then queries container limits. So if that?s the > concern, then it would seem that we already have a solution. That?s what > I?m using in my automatic heap sizing work for ZGC, and it?s working just > fine. Am I missing anything? > > Thanks, > /Erik > > On 15 Jun 2024, at 00:56, Jonathan Joo wrote: > > Hi Erik, > > We had a similar vision with regards to not having to set heap sizes > manually :) Agreed that with the currently proposed OpenJDK changes alone, > there would be no effect for the user, just an entry point to allow for > more intelligent heap sizing. > > We definitely do want to ship a policy that actually calculates and sets > these flags, but I think a good point for discussion is *how* to ship such > a policy. Note that as long as the two flags are introduced into the > OpenJDK, there is always a way for people to modify the flags on their own > and get AHS-like behavior. I guess the question is, to what extent do we > want to take our current implementation of AHS logic, and move that from > outside the JVM into the JVM? I think there are a few different > possibilities, given that currently, AHS relies on internal Google services > to access all the data we need. > > 1. Try to replicate exactly the way AHS works using APIs available from > within hotspot code. For example, querying container limit and fullness > information in a way that can work in any generic container environment. > (Is there a good way to obtain this?) > 2. Come up with a potentially less complex, but general working solution > that is maintained solely within the hotspot code. The cons of this is that > Google's implementation and upstream's implementation will diverge, and so > there is more maintenance overhead from our end. It also won't have as > robust functionality as the solution we are using at Google. > 3. Don't bother with importing any AHS logic into the OpenJDK, but instead > simply open-source/publish our current policies. This would allow for > people to adopt their own implementations of AHS to plug it in a way they > see fit, or fiddle with our code and integrate it into their own > environments. Though I agree that without access to a special launcher or > other mechanism to run this code, this approach may have limited usefulness. > > I'm not as familiar with logistically how viable it would be to do these > solutions. Would love to hear whether you think these approaches are > viable, and/or any blockers you might foresee. > > Best, > > ~ Jonathan > > > On Thu, Jun 13, 2024 at 4:17?AM Erik Osterlund > wrote: > >> Hi Jonathan, >> >> I?m currently working on automatic heap sizing for ZGC. My vision is that >> users shouldn?t have to set heap sizes. >> Would love to see that in G1 as well. What you are describing sounds like >> it would do something similar. >> >> Having said that, it seems like the concrete changes you are proposing >> for OpenJDK, would not actually >> yield automatic heap sizing for the user. By the sound of it, you would >> need your special launcher >> with an extra thread that contains the actual heap sizing policy. The >> proposed JVM changes are mostly for >> being *able* to change the heap sizing policies externally, but without >> any policy shipped that actually >> changes it. >> >> While having a pluggable policy is great because anyone can put in their >> own favourite policy, there >> is also an obvious disadvantage that 99.9% of deployments won?t have any >> special launcher or >> supplier of an external heap sizing policy, or even know what we are >> talking about. Therefore, >> unless we also ship the policies, I unfortunately think that limits the >> usefulness of the feature. >> If, however, a policy was shipped so the heap can be sized automatically, >> I think that would make it >> much more widely useful. >> >> In my automatic heap sizing work, the goal is to ship both the mechanisms >> and the policies needed >> to automatically size (and resize) the heap, adapting to changing load >> and environments. Are you >> open to the idea of shipping a policy that actually changes the heap size >> as well? It would be great >> to be aligned on this, I think. >> >> Thanks, >> /Erik >> >> On 13 Jun 2024, at 01:32, Jonathan Joo wrote: >> >> Hello hotspot-dev and hotspot-gc-dev, >> >> I'd like to reopen discussion on Adaptable Heap Sizing (AHS) for the G1 >> Garbage Collector, since we now have some time to dedicate to bringing this >> effort to the OpenJDK Community. Please see >> https://mail.openjdk.org/pipermail/hotspot-gc-dev/2022-September/040096.html >> for the original thread. >> >> The bullet points contained in the above link are still largely the same, >> and we have made significant improvements to the service over the past few >> years, and found success deploying it broadly across jobs internally. Now >> that we feel the feature has matured, we'd like to introduce it to the >> OpenJDK community in hopes that it can be adopted for broader use. >> >> In short - the goal of Adaptable Heap Sizing is to improve memory usage >> and reduce OOMs for Java applications, especially those deployed in >> containerized environments. The key insights are as follows: >> >> >> 1. Applications with low memory requirements but configured with high >> RAM often use RAM unnecessarily. We can utilize GC CPU overhead metrics to >> help guide heap sizing, allowing for RAM savings in these scenarios. >> 2. For Java applications running in containers, we can bound Java >> heap usage based on our knowledge of the current container memory usage as >> well as the current container size, to prevent container OOMs. >> >> >> The implementation of AHS currently involves some fairly lightweight >> changes to the JVM, through the introduction of two new manageable flags. >> They are essentially the same as these two (open feature requests): >> >> - https://bugs.openjdk.org/browse/JDK-8236073 >> - https://bugs.openjdk.org/browse/JDK-8204088 >> >> >> In addition, we have a separate thread (outside of the JVM, in our custom >> Java launcher) which reads in GC CPU overhead data and container >> information, and calculates appropriate values for these two flags. We call >> this the AHS worker thread, and this thread updates frequently (currently >> every second). The vast majority of the AHS logic is in this worker thread >> - the introduction of the new JVM flags above simply gives AHS a way to >> tune GC heuristics given this additional information. >> >> Thomas Schatzl mentioned there is a similar-sounding effort going on in >> ZGC , and also there were >> folks outside of Google who expressed interest in this project, so I think >> it is an appropriate time to discuss this again on an open forum. Given the >> positive results we've had deploying AHS internally at Google, we feel this >> is a valuable feature to the broader Java community that should be able to >> be leveraged by all to achieve more stable and efficient Java heap behavior >> ? >> >> I'd appreciate hearing peoples' thoughts on this. Thank you! >> >> ~ Jonathan >> >> (P.S. For more information, a talk given about this project can be viewed >> here >> , >> though it is somewhat dated.) >> >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From gziemski at openjdk.org Tue Jun 18 20:43:10 2024 From: gziemski at openjdk.org (Gerard Ziemski) Date: Tue, 18 Jun 2024 20:43:10 GMT Subject: RFR: 8333994: NMT: call stacks should show source information [v2] In-Reply-To: References: Message-ID: On Mon, 17 Jun 2024 15:25:33 GMT, Thomas Stuefe wrote: > > I have the same question. Did dwarf decoder performance improve? If so, could you point me the PR? Thanks! > > I completely forgot that this had been an issue. The comment was even written by me :( > > No, Elf decoder is still slow. But I have found myself too many times staring at NMT output now trying to make sense of the offsets. Missing source info in combination with the small stack size of 4 makes investigations a pain. > > I added a simple caching mechanism to aid printing. Its pretty straight-forward, but still I am not sure it is worth the complexity. Here the numbers: > > Running all NMT jtreg tests: > > * Stock JVM (no source info): 40 seconds > * Source info: 2 min 30 seconds > * Source info + caching: 1 min 15 seconds > > I think that is acceptable. Any more intricate caching would be over the complexity-benefit line. I simply pointed out your own old concern. If you are happy with the final performance now, then I'm good. I will look at the cache shortly. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19655#issuecomment-2176934980 From kvn at openjdk.org Tue Jun 18 21:54:12 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 18 Jun 2024 21:54:12 GMT Subject: RFR: 8329032: C2 compiler register allocation support for APX EGPRs [v6] In-Reply-To: References: <835IiFAjWPI9AoU9j1WSAhVY2EEHzjpvWucUXTGxgUw=.cdb547b2-e1e7-4c5b-915f-7fde46e26c81@github.com> Message-ID: On Tue, 18 Jun 2024 16:17:57 GMT, Jatin Bhateja wrote: >> Intel? Advanced Performance Extensions (Intel? APX) adds 16 new 64 bit general purpose register also known as Extended General Purpose Registers in IA-32e 64 bit mode. >> >> Summary of changes introduced along with this patch:- >> >> 1. C2 compiler register allocation support. >> 2. Architecture state save restoration while transitioning from C1/C2 JIT compiled code to runtime services. >> 3. Support new PUSH2/POP2 instructions along with push-pop acceleration hints (PPX) to optimize register save/restore operation. >> 4. Applicable extensions to native interface used by runtime for patching instruction. >> >> We plan to address C1 register support in subsequent patch as there are hard upper bound allocation limits >> (currently set to r11) imposed by existing implementation of linear scan algorithm after which it reserves >> remaining register for special purpose. >> >> Patch has been regressed over stand alone test points after prioritizing EGPR allocations over existing GPR register by manually modifying the register sequences in relevant allocation class. >> >> We plan to do thorough validation using [Intel's SDE](https://www.intel.com/content/www/us/en/download/684897/intel-software-development-emulator.html) during course of time and release incremental patches for bug fixes >> found during testing. >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 14 commits: > > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8329032 > - 32-bit build fix. > - Review comments resolutions. > - jvmci test failures fixes > - 32-bit build fixes. > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8329032 > - Changes to skip over stack alignment gaps while popping registers using POP2 after comment from sviswa7 > - 32 bit build fix and enforced stack alignment constraints. > - Support new PUSH2/POP2 instructions along with Push-Pop Acceleration (PPX) to optimize register save/restore operation. > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8329032 > - ... and 4 more: https://git.openjdk.org/jdk/compare/6f860f8f...8db22672 My testing passed. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19042#pullrequestreview-2126617659 From sviswanathan at openjdk.org Tue Jun 18 23:12:11 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Tue, 18 Jun 2024 23:12:11 GMT Subject: RFR: 8329032: C2 compiler register allocation support for APX EGPRs [v6] In-Reply-To: References: <835IiFAjWPI9AoU9j1WSAhVY2EEHzjpvWucUXTGxgUw=.cdb547b2-e1e7-4c5b-915f-7fde46e26c81@github.com> Message-ID: On Tue, 18 Jun 2024 16:17:57 GMT, Jatin Bhateja wrote: >> Intel? Advanced Performance Extensions (Intel? APX) adds 16 new 64 bit general purpose register also known as Extended General Purpose Registers in IA-32e 64 bit mode. >> >> Summary of changes introduced along with this patch:- >> >> 1. C2 compiler register allocation support. >> 2. Architecture state save restoration while transitioning from C1/C2 JIT compiled code to runtime services. >> 3. Support new PUSH2/POP2 instructions along with push-pop acceleration hints (PPX) to optimize register save/restore operation. >> 4. Applicable extensions to native interface used by runtime for patching instruction. >> >> We plan to address C1 register support in subsequent patch as there are hard upper bound allocation limits >> (currently set to r11) imposed by existing implementation of linear scan algorithm after which it reserves >> remaining register for special purpose. >> >> Patch has been regressed over stand alone test points after prioritizing EGPR allocations over existing GPR register by manually modifying the register sequences in relevant allocation class. >> >> We plan to do thorough validation using [Intel's SDE](https://www.intel.com/content/www/us/en/download/684897/intel-software-development-emulator.html) during course of time and release incremental patches for bug fixes >> found during testing. >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 14 commits: > > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8329032 > - 32-bit build fix. > - Review comments resolutions. > - jvmci test failures fixes > - 32-bit build fixes. > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8329032 > - Changes to skip over stack alignment gaps while popping registers using POP2 after comment from sviswa7 > - 32 bit build fix and enforced stack alignment constraints. > - Support new PUSH2/POP2 instructions along with Push-Pop Acceleration (PPX) to optimize register save/restore operation. > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8329032 > - ... and 4 more: https://git.openjdk.org/jdk/compare/6f860f8f...8db22672 src/hotspot/cpu/x86/assembler_x86.cpp line 14315: > 14313: push2p(rbx, rdx); > 14314: // To maintain 16 byte alignment after rcx is pushed. > 14315: subq(rsp, 8); Just curious, why do we need to maintain 16 byte alignment here? It looks to me that the subq(rsp, 8) is not required here. The next push is not a push2. src/hotspot/cpu/x86/assembler_x86.cpp line 14327: > 14325: // from the value of rsp immediately after pusha (rsp + 16 * wordSize). > 14326: // FIXME: For APX any such direct access should also consider EGPR size > 14327: // during address compution. This comment could be removed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19042#discussion_r1644799787 PR Review Comment: https://git.openjdk.org/jdk/pull/19042#discussion_r1644869780 From gcao at openjdk.org Wed Jun 19 05:00:33 2024 From: gcao at openjdk.org (Gui Cao) Date: Wed, 19 Jun 2024 05:00:33 GMT Subject: RFR: 8334505: RISC-V: Several tests fail when MaxVectorSize does not match VM_Version::_initial_vector_length Message-ID: HI, It's possible to specify a MaxVectorSize which is not equal to VM_Version::_initial_vector_length on RISC-V. For example, it could happen on Banana-Pi that MaxVectorSize equals 16, while VM_Version::_initial_vector_length is 32. This may lead to several jtreg test failures, see jbs issue for exception information. The reason for this problem is that when spill vector registers into memory, the whole width of the register is used incorrectly, and MaxVectorSize should be used to handle the number of elements spill. https://github.com/openjdk/jdk/blob/326dbb1b139dd1ec1b8605339b91697cdf49da9a/src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.hpp#L133-L136 PR propose to simply set MaxVectorSize to VM_Version::_initial_vector_length for the following reasons: 1. The CSR_VLENB register of RISC-V is read-only, we can't change it to MaxVectorSize like like aarch64. 2. It does not make sense to me to set MaxVectorSize to a value smaller than VM_Version::_initial_vector_length in the real world, which might bring negative impact on performance. 3. If MaxVectorSize equals to VM_Version::_initial_vector_length, then we can make use of vs1r_v/vl1r_v when saving and restoring vector registers, which avoids the need to control the number of elements with vsetvli. After this patch, MaxVectorSize always equal to VM_Version::_initial_vector_length: zifeihan at plct-c8:~/jdk/build/linux-riscv64-server-fastdebug/jdk/bin$ ./java -XX:MaxVectorSize=16 -XX:+PrintFlagsFinal -version |grep MaxVectorSize OpenJDK 64-Bit Server VM warning: MaxVectorSize is set to 32 on this platform intx MaxVectorSize = 32 {C2 product} {command line} openjdk version "24-internal" 2025-03-18 OpenJDK Runtime Environment (fastdebug build 24-internal-adhoc.zifeihan.jdk) OpenJDK 64-Bit Server VM (fastdebug build 24-internal-adhoc.zifeihan.jdk, mixed mode) zifeihan at plct-c8:~/jdk/build/linux-riscv64-server-fastdebug/jdk/bin$ ./java -XX:MaxVectorSize=32 -XX:+PrintFlagsFinal -version |grep MaxVectorSize intx MaxVectorSize = 32 {C2 product} {command line} openjdk version "24-internal" 2025-03-18 OpenJDK Runtime Environment (fastdebug build 24-internal-adhoc.zifeihan.jdk) OpenJDK 64-Bit Server VM (fastdebug build 24-internal-adhoc.zifeihan.jdk, mixed mode) zifeihan at plct-c8:~/jdk/build/linux-riscv64-server-fastdebug/jdk/bin$ ./java -XX:MaxVectorSize=64 -XX:+PrintFlagsFinal -version |grep MaxVectorSize OpenJDK 64-Bit Server VM warning: MaxVectorSize is set to 32 on this platform intx MaxVectorSize = 32 {C2 product} {command line} openjdk version "24-internal" 2025-03-18 OpenJDK Runtime Environment (fastdebug build 24-internal-adhoc.zifeihan.jdk) OpenJDK 64-Bit Server VM (fastdebug build 24-internal-adhoc.zifeihan.jdk, mixed mode) zifeihan at plct-c8:~/jdk/build/linux-riscv64-server-fastdebug/jdk/bin$ ### Testing - [x] test/jdk/jdk/incubator/vector on Banana Pi BPI-F3 board (with RVV1.0) ------------- Commit messages: - 8334505: RISC-V: Several tests fail when MaxVectorSize does not match VM_Version::_initial_vector_length Changes: https://git.openjdk.org/jdk/pull/19785/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19785&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8334505 Stats: 9 lines in 1 file changed: 1 ins; 6 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/19785.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19785/head:pull/19785 PR: https://git.openjdk.org/jdk/pull/19785 From varadam at openjdk.org Wed Jun 19 05:42:10 2024 From: varadam at openjdk.org (Varada M) Date: Wed, 19 Jun 2024 05:42:10 GMT Subject: RFR: 8332603: [PPC64] Improve itable_stub [v3] In-Reply-To: References: Message-ID: On Mon, 17 Jun 2024 16:11:26 GMT, Amit Kumar wrote: >> PPC Port similar to [JDK-8305959 (x86)](https://bugs.openjdk.org/browse/JDK-8305959) and [JDK-8307352(aarch64)](https://bugs.openjdk.org/browse/JDK-8307352) > > Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: > > fixes the test case I reran the tier1 test. There are no test failures on aix-ppc. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19733#issuecomment-2177789562 From aboldtch at openjdk.org Wed Jun 19 06:14:21 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Wed, 19 Jun 2024 06:14:21 GMT Subject: RFR: 8326820: Metadata artificially kept alive [v2] In-Reply-To: <1xvsDMD50nAALDma6JqSvqNVfYl1KnN0ihFdkztDHYE=.97983fa3-6feb-4ee6-b578-c76e326bcc33@github.com> References: <1xvsDMD50nAALDma6JqSvqNVfYl1KnN0ihFdkztDHYE=.97983fa3-6feb-4ee6-b578-c76e326bcc33@github.com> Message-ID: > ClassLoaderDataGraph provides APIs for walking different metadata. All the iterators which are not designed to be used by the GC also keep the holder of the CLDs alive and by extensions keeps all metadata alive. This is problematic for concurrent GC as it keeps otherwise unreachable classes from being unloaded and the respective metadata freed. > > This patch changes the default iteration behaviour to not keep the holder alive, with the exception of `loaded_classes_do` (renamed `loaded_classes_do_keepalive`) and `modules_do` (renamed `modules_do_keepalive`) which is used by jvmti APIs that requires that the holder is kept alive. > > All other uses consumes all the metadata it queries during its safepoint or before releasing the `ClassLoaderDataGraph_lock`. > > Before this change some jcmd, new jfr chunks and some jfr events, all of which consumed these APIs, could cause class unloading to not occur. > > Been running our internal stress test in an even more stressful mode which without this patch reproduces the metaspace OOME [JDK-8326005](https://bugs.openjdk.org/browse/JDK-8326005) consistently within a few hours. And after this patch it does not. > > Currently running tier1-tier8 testing. Axel Boldt-Christmas has updated the pull request incrementally with one additional commit since the last revision: Document the iterator and functions ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19769/files - new: https://git.openjdk.org/jdk/pull/19769/files/08366b12..0048f8bd Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19769&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19769&range=00-01 Stats: 28 lines in 2 files changed: 12 ins; 0 del; 16 mod Patch: https://git.openjdk.org/jdk/pull/19769.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19769/head:pull/19769 PR: https://git.openjdk.org/jdk/pull/19769 From stuefe at openjdk.org Wed Jun 19 06:26:13 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 19 Jun 2024 06:26:13 GMT Subject: RFR: 8333994: NMT: call stacks should show source information [v2] In-Reply-To: References: Message-ID: On Tue, 18 Jun 2024 20:41:01 GMT, Gerard Ziemski wrote: > > * Stock JVM (no source info): 40 seconds > > * Source info: 2 min 30 seconds > > * Source info + caching: 1 min 15 seconds > > > > I think that is acceptable. Any more intricate caching would be over the complexity-benefit line. > > I simply pointed out your own old concern. Haha, sure, good catch. I forgot all about this. > If you are happy with the final performance now, then I'm good. > > I will look at the cache shortly. Cool thanks. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19655#issuecomment-2177843764 From jbhateja at openjdk.org Wed Jun 19 06:34:17 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 19 Jun 2024 06:34:17 GMT Subject: RFR: 8329032: C2 compiler register allocation support for APX EGPRs [v6] In-Reply-To: References: <835IiFAjWPI9AoU9j1WSAhVY2EEHzjpvWucUXTGxgUw=.cdb547b2-e1e7-4c5b-915f-7fde46e26c81@github.com> Message-ID: <6B5c3wkVFmuGJGSqqrALVUYCSrRkTw26ajg9JptnVD8=.4d705b48-4266-4435-adaa-395c5deec3a0@github.com> On Tue, 18 Jun 2024 17:07:27 GMT, Sandhya Viswanathan wrote: >> Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 14 commits: >> >> - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8329032 >> - 32-bit build fix. >> - Review comments resolutions. >> - jvmci test failures fixes >> - 32-bit build fixes. >> - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8329032 >> - Changes to skip over stack alignment gaps while popping registers using POP2 after comment from sviswa7 >> - 32 bit build fix and enforced stack alignment constraints. >> - Support new PUSH2/POP2 instructions along with Push-Pop Acceleration (PPX) to optimize register save/restore operation. >> - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8329032 >> - ... and 4 more: https://git.openjdk.org/jdk/compare/6f860f8f...8db22672 > > src/hotspot/cpu/x86/assembler_x86.cpp line 14315: > >> 14313: push2p(rbx, rdx); >> 14314: // To maintain 16 byte alignment after rcx is pushed. >> 14315: subq(rsp, 8); > > Just curious, why do we need to maintain 16 byte alignment here? It looks to me that the subq(rsp, 8) is not required here. The next push is not a push2. Its a safety padding inserted to ensure 16 byte stack alignment constraints are preserved for subsequent consumers of stack after pusha. > src/hotspot/cpu/x86/assembler_x86.cpp line 14327: > >> 14325: // from the value of rsp immediately after pusha (rsp + 16 * wordSize). >> 14326: // FIXME: For APX any such direct access should also consider EGPR size >> 14327: // during address compution. > > This comment could be removed. Its added in context of above comment which talks about computing the original RSP prior to pushing GPRs state on stack. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19042#discussion_r1645481972 PR Review Comment: https://git.openjdk.org/jdk/pull/19042#discussion_r1645482023 From stuefe at openjdk.org Wed Jun 19 06:37:32 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 19 Jun 2024 06:37:32 GMT Subject: RFR: 8334223: Make Arena MEMFLAGs immutable [v3] In-Reply-To: References: Message-ID: > Arenas carry NMT flags. > > An arena should never change that flag. But it does: Arenas (as ResourceAreas), used by CompilerThread, are accounted toward mtCompiler. But since the RA is already created in the parent class constructor (as mtThread), we then have to awkwardly change the flag of an already existing RA in the CompilerThread constructor. > > As a prerequisite for future NMT work I would like Arena MEMFLAGS to be immutable. > > The patch does that: > - we hand in MEMFLAGS to the Thread constructor now (defaults to mtThread) > - CompilerThread hands in mtCompiler, all other threads rely on the default > - on creation, both ResourceArea and HandleArea are now accounted toward the flag handed in > - that allows us to make Arena::flags const, and to remove ResourceArea::bias_to which changed the flag in-flight for the arena > - it also allows us to make Arena::flags private > > Other, unrelated cleanups: > - Made Arena::_size_in_bytes and Arena::_tag private > - Merged both Arena constructors into one by specifying a default value of `Chunk::init_size` for `init_size` argument. That makes it equivalent to the old `Arena(flag, tag)` constructor > - removed `JavaThread::JavaThread(bool)`. That constructor was used when creating threads that are getting attached. There was only a single use for that constructor, and I replaced it with functionally equivalent code. > > Tests: > > I manually verified that the NMT numbers printed don't change. Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: - feedback david - Merge branch 'master' into arena-constify-memflags - feedback johan - Merge branch 'master' into arena-constify-memflags - start ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19693/files - new: https://git.openjdk.org/jdk/pull/19693/files/1dabcc59..ac61bf42 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19693&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19693&range=01-02 Stats: 1542 lines in 69 files changed: 682 ins; 672 del; 188 mod Patch: https://git.openjdk.org/jdk/pull/19693.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19693/head:pull/19693 PR: https://git.openjdk.org/jdk/pull/19693 From stuefe at openjdk.org Wed Jun 19 06:37:33 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 19 Jun 2024 06:37:33 GMT Subject: RFR: 8334223: Make Arena MEMFLAGs immutable [v2] In-Reply-To: References: Message-ID: On Tue, 18 Jun 2024 09:44:01 GMT, David Holmes wrote: >> Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: >> >> - feedback johan >> - Merge branch 'master' into arena-constify-memflags >> - start > > Changes requested by dholmes (Reviewer). @dholmes-ora okay, I worked in your feedback. Thank you! ------------- PR Comment: https://git.openjdk.org/jdk/pull/19693#issuecomment-2177858363 From jbhateja at openjdk.org Wed Jun 19 06:54:35 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 19 Jun 2024 06:54:35 GMT Subject: RFR: 8329032: C2 compiler register allocation support for APX EGPRs [v7] In-Reply-To: <835IiFAjWPI9AoU9j1WSAhVY2EEHzjpvWucUXTGxgUw=.cdb547b2-e1e7-4c5b-915f-7fde46e26c81@github.com> References: <835IiFAjWPI9AoU9j1WSAhVY2EEHzjpvWucUXTGxgUw=.cdb547b2-e1e7-4c5b-915f-7fde46e26c81@github.com> Message-ID: <1gSd6mI4H5dykQJLrg9GSyfaW71GjWKaaO86Bwl7Maw=.99e79c49-6829-4072-a8c1-f0674b8295b6@github.com> > Intel? Advanced Performance Extensions (Intel? APX) adds 16 new 64 bit general purpose register also known as Extended General Purpose Registers in IA-32e 64 bit mode. > > Summary of changes introduced along with this patch:- > > 1. C2 compiler register allocation support. > 2. Architecture state save restoration while transitioning from C1/C2 JIT compiled code to runtime services. > 3. Support new PUSH2/POP2 instructions along with push-pop acceleration hints (PPX) to optimize register save/restore operation. > 4. Applicable extensions to native interface used by runtime for patching instruction. > > We plan to address C1 register support in subsequent patch as there are hard upper bound allocation limits > (currently set to r11) imposed by existing implementation of linear scan algorithm after which it reserves > remaining register for special purpose. > > Patch has been regressed over stand alone test points after prioritizing EGPR allocations over existing GPR register by manually modifying the register sequences in relevant allocation class. > > We plan to do thorough validation using [Intel's SDE](https://www.intel.com/content/www/us/en/download/684897/intel-software-development-emulator.html) during course of time and release incremental patches for bug fixes > found during testing. > > Kindly review and share your feedback. > > Best Regards, > Jatin Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Review comments resolution. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19042/files - new: https://git.openjdk.org/jdk/pull/19042/files/8db22672..4ecca0f2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19042&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19042&range=05-06 Stats: 4 lines in 1 file changed: 2 ins; 1 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/19042.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19042/head:pull/19042 PR: https://git.openjdk.org/jdk/pull/19042 From dholmes at openjdk.org Wed Jun 19 07:01:19 2024 From: dholmes at openjdk.org (David Holmes) Date: Wed, 19 Jun 2024 07:01:19 GMT Subject: RFR: 8333769: Pretouching tests dont test pretouching [v3] In-Reply-To: References: Message-ID: <8LnAesBWHSRlqnEYUlT-LMPRLczbOxqypkaWn6uh4k0=.aa37dc11-c617-48d7-869d-5e6e1c73c47e@github.com> On Mon, 17 Jun 2024 16:09:48 GMT, Sonia Zaldana Calles wrote: >> Hi all, >> >> This PR addresses [8333769](https://bugs.openjdk.org/browse/JDK-8333769). >> >> We already have a test for parallel GC that makes sure pretouching behaviour is correct ([test/hotspot/jtreg/gc/parallel/TestAlwaysPreTouchBehavior.java](https://github.com/openjdk/jdk/blob/master/test/hotspot/jtreg/gc/parallel/TestAlwaysPreTouchBehavior.java)). >> >> Unfortunately this test is limited to linux because of the scanning of `/proc/pid/status`. With this patch I propose two changes: >> >> - Adding a function to the os namespace `os::rss` and exposing this API via WhiteBox. This in turn allows us to generalize the above test to be used across all platforms. >> - Running the modified test with all collectors. >> >> Additionally, I considered removing other pre-existing pretouch tests (for example, this [z test](https://github.com/openjdk/jdk/blob/master/test/hotspot/jtreg/gc/z/TestAlwaysPreTouch.java)), as this new test is a bit more thorough. However, I noticed that some of these tests run alongside other configurables such as varying numbers of parallel GC threads, varying heap sizes, etc. Therefore, there might not be any harm in running these tests as well. >> >> Looking forward to your comments, >> Sonia > > Sonia Zaldana Calles has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains four commits: > > - Fixing comment from KB to bytes > - Merge master > - Changes based on feedback and also adding test for serial collector > - 8333769: Pretouching tests dont test pretouching The new test is failing in our CI. I have filed [JDK-8334513](https://bugs.openjdk.org/browse/JDK-8334513) ------------- PR Comment: https://git.openjdk.org/jdk/pull/19699#issuecomment-2177895033 From fyang at openjdk.org Wed Jun 19 07:20:11 2024 From: fyang at openjdk.org (Fei Yang) Date: Wed, 19 Jun 2024 07:20:11 GMT Subject: RFR: 8332587: RISC-V: secondary_super_cache does not scale well [v7] In-Reply-To: References: Message-ID: <0fLhoUNcVyJmPZ1mTra-BI4AX2x2tbNgJvn9bpI_1Z4=.716e8e00-5a1c-4f57-ae50-b9f3e647dd0e@github.com> On Sat, 15 Jun 2024 07:06:23 GMT, Gui Cao wrote: >> Implementation of subtype checking [JDK-8180450](https://bugs.openjdk.org/browse/JDK-8180450) for linux-riscv64. >> This optimization depends on availability of the Zbb extension which has the cpop instruction. >> >> ### Correctness testing: >> >> - [x] Run tier1-3, hotspot:tier4 tests on SOPHON SG2042 (release) >> - [x] Run tier1-3 tests with -XX:+UseRVV on qemu 8.1.0 (fastdebug) >> - [x] Run all of tier1 with `-XX:+VerifySecondarySupers` >> >> ### JMH tested on Banana Pi BPI-F3 board (has Zbb) and Enable UseZbb, Not Enable UseZba, UseZbs >> Original: >> >> Benchmark Mode Cnt Score Error Units >> SecondarySuperCacheHits.test avgt 15 11.375 ? 0.071 ns/op >> SecondarySuperCacheInterContention.test avgt 15 646.087 ? 32.587 ns/op >> SecondarySuperCacheInterContention.test:t1 avgt 15 600.090 ? 83.779 ns/op >> SecondarySuperCacheInterContention.test:t2 avgt 15 692.084 ? 73.218 ns/op >> SecondarySupersLookup.testNegative00 avgt 15 16.420 ? 0.239 ns/op >> SecondarySupersLookup.testNegative01 avgt 15 18.307 ? 0.260 ns/op >> SecondarySupersLookup.testNegative02 avgt 15 21.695 ? 0.458 ns/op >> SecondarySupersLookup.testNegative03 avgt 15 24.855 ? 0.664 ns/op >> SecondarySupersLookup.testNegative04 avgt 15 27.305 ? 0.522 ns/op >> SecondarySupersLookup.testNegative05 avgt 15 29.719 ? 0.385 ns/op >> SecondarySupersLookup.testNegative06 avgt 15 32.231 ? 0.498 ns/op >> SecondarySupersLookup.testNegative07 avgt 15 33.747 ? 0.603 ns/op >> SecondarySupersLookup.testNegative08 avgt 15 35.856 ? 0.629 ns/op >> SecondarySupersLookup.testNegative09 avgt 15 37.077 ? 0.546 ns/op >> SecondarySupersLookup.testNegative10 avgt 15 39.408 ? 0.465 ns/op >> SecondarySupersLookup.testNegative16 avgt 15 51.041 ? 0.547 ns/op >> SecondarySupersLookup.testNegative20 avgt 15 58.722 ? 0.922 ns/op >> SecondarySupersLookup.testNegative30 avgt 15 77.310 ? 0.654 ns/op >> SecondarySupersLookup.testNegative32 avgt 15 81.116 ? 0.854 ns/op >> SecondarySupersLookup.testNegative40 avgt 15 96.311 ? 0.840 ns/op >> SecondarySupersLookup.testNegative50 avgt 15 115.427 ? 0.838 ns/op >> SecondarySupersLookup.testNegative55 avgt 15 124.371 ? 1.076 ns/op >> SecondarySupersLookup.testNegative56 avgt 15 126.796 ? 0.916 ns/op >> SecondarySupersLookup.testNegative57 avgt 15 127.952 ? 1.202 ns/op >> SecondarySupersLookup.testNegative58 avgt 15 131.956 ? 4.515 ns/op >> Seco... > > Gui Cao has updated the pull request incrementally with one additional commit since the last revision: > > Put "secondary super table" generate code inside COMPILER2 macro LGTM. ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19320#pullrequestreview-2127267432 From gcao at openjdk.org Wed Jun 19 07:26:31 2024 From: gcao at openjdk.org (Gui Cao) Date: Wed, 19 Jun 2024 07:26:31 GMT Subject: RFR: 8332587: RISC-V: secondary_super_cache does not scale well [v8] In-Reply-To: References: Message-ID: <_tVvVlAaTUHImBFbxSp67liIoh0toRLkmI_FPwN0vy0=.85da9026-0218-48da-b400-d0e97365f997@github.com> > Implementation of subtype checking [JDK-8180450](https://bugs.openjdk.org/browse/JDK-8180450) for linux-riscv64. > This optimization depends on availability of the Zbb extension which has the cpop instruction. > > ### Correctness testing: > > - [x] Run tier1-3, hotspot:tier4 tests on SOPHON SG2042 (release) > - [x] Run tier1-3 tests with -XX:+UseRVV on qemu 8.1.0 (fastdebug) > - [x] Run all of tier1 with `-XX:+VerifySecondarySupers` > > ### JMH tested on Banana Pi BPI-F3 board (has Zbb) and Enable UseZbb, Not Enable UseZba, UseZbs > Original: > > Benchmark Mode Cnt Score Error Units > SecondarySuperCacheHits.test avgt 15 11.375 ? 0.071 ns/op > SecondarySuperCacheInterContention.test avgt 15 646.087 ? 32.587 ns/op > SecondarySuperCacheInterContention.test:t1 avgt 15 600.090 ? 83.779 ns/op > SecondarySuperCacheInterContention.test:t2 avgt 15 692.084 ? 73.218 ns/op > SecondarySupersLookup.testNegative00 avgt 15 16.420 ? 0.239 ns/op > SecondarySupersLookup.testNegative01 avgt 15 18.307 ? 0.260 ns/op > SecondarySupersLookup.testNegative02 avgt 15 21.695 ? 0.458 ns/op > SecondarySupersLookup.testNegative03 avgt 15 24.855 ? 0.664 ns/op > SecondarySupersLookup.testNegative04 avgt 15 27.305 ? 0.522 ns/op > SecondarySupersLookup.testNegative05 avgt 15 29.719 ? 0.385 ns/op > SecondarySupersLookup.testNegative06 avgt 15 32.231 ? 0.498 ns/op > SecondarySupersLookup.testNegative07 avgt 15 33.747 ? 0.603 ns/op > SecondarySupersLookup.testNegative08 avgt 15 35.856 ? 0.629 ns/op > SecondarySupersLookup.testNegative09 avgt 15 37.077 ? 0.546 ns/op > SecondarySupersLookup.testNegative10 avgt 15 39.408 ? 0.465 ns/op > SecondarySupersLookup.testNegative16 avgt 15 51.041 ? 0.547 ns/op > SecondarySupersLookup.testNegative20 avgt 15 58.722 ? 0.922 ns/op > SecondarySupersLookup.testNegative30 avgt 15 77.310 ? 0.654 ns/op > SecondarySupersLookup.testNegative32 avgt 15 81.116 ? 0.854 ns/op > SecondarySupersLookup.testNegative40 avgt 15 96.311 ? 0.840 ns/op > SecondarySupersLookup.testNegative50 avgt 15 115.427 ? 0.838 ns/op > SecondarySupersLookup.testNegative55 avgt 15 124.371 ? 1.076 ns/op > SecondarySupersLookup.testNegative56 avgt 15 126.796 ? 0.916 ns/op > SecondarySupersLookup.testNegative57 avgt 15 127.952 ? 1.202 ns/op > SecondarySupersLookup.testNegative58 avgt 15 131.956 ? 4.515 ns/op > SecondarySupersLookup.testNegative59 avgt 15 131.858 ? 1.066 ns/op > SecondaryS... Gui Cao has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 14 additional commits since the last revision: - Merge remote-tracking branch 'upstream/master' into JDK-8332587 - Put "secondary super table" generate code inside COMPILER2 macro - Merge remote-tracking branch 'upstream/master' into JDK-8332587 - Update ins_cost for PartialSubtypeCheck - Code Format - Merge remote-tracking branch 'upstream/master' into JDK-8332587 - Polish Code Comment - Merge remote-tracking branch 'upstream/master' into JDK-8332587 - Fix Code format - Fix for Hamlin comment - ... and 4 more: https://git.openjdk.org/jdk/compare/65dac6b9...cd656692 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19320/files - new: https://git.openjdk.org/jdk/pull/19320/files/ec01d64b..cd656692 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19320&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19320&range=06-07 Stats: 4381 lines in 179 files changed: 2743 ins; 1063 del; 575 mod Patch: https://git.openjdk.org/jdk/pull/19320.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19320/head:pull/19320 PR: https://git.openjdk.org/jdk/pull/19320 From mbaesken at openjdk.org Wed Jun 19 07:31:28 2024 From: mbaesken at openjdk.org (Matthias Baesken) Date: Wed, 19 Jun 2024 07:31:28 GMT Subject: RFR: 8334239: Introduce macro for ubsan method/function exclusions [v3] In-Reply-To: References: Message-ID: > A number of functions/methods have to be excluded from ubsan detection (e.g. because they do things that ubsan warns about, however it is still valid what is done there). > We can simplify this by introducing a macro (similar to asan-related ATTRIBUTE_NO_ASAN, see sanitizers/address.hpp). > Currently something like this is used : > > #if defined(__clang__) || defined(__GNUC__) > __attribute__((no_sanitize("undefined"))) > #endif Matthias Baesken has updated the pull request incrementally with two additional commits since the last revision: - change comment - ubsan.hpp -> ub.hpp ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19722/files - new: https://git.openjdk.org/jdk/pull/19722/files/735a6871..c07f9324 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19722&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19722&range=01-02 Stats: 6 lines in 4 files changed: 0 ins; 0 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/19722.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19722/head:pull/19722 PR: https://git.openjdk.org/jdk/pull/19722 From mbaesken at openjdk.org Wed Jun 19 07:31:28 2024 From: mbaesken at openjdk.org (Matthias Baesken) Date: Wed, 19 Jun 2024 07:31:28 GMT Subject: RFR: 8334239: Introduce macro for ubsan method/function exclusions [v2] In-Reply-To: References: Message-ID: On Mon, 17 Jun 2024 06:49:41 GMT, Matthias Baesken wrote: >> A number of functions/methods have to be excluded from ubsan detection (e.g. because they do things that ubsan warns about, however it is still valid what is done there). >> We can simplify this by introducing a macro (similar to asan-related ATTRIBUTE_NO_ASAN, see sanitizers/address.hpp). >> Currently something like this is used : >> >> #if defined(__clang__) || defined(__GNUC__) >> __attribute__((no_sanitize("undefined"))) >> #endif > > Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: > > move ATTRIBUTE_NO_UBSAN to a separate line I changed the header name from ubsan.hpp to ub.hpp . ------------- PR Comment: https://git.openjdk.org/jdk/pull/19722#issuecomment-2177949581 From aph at openjdk.org Wed Jun 19 07:48:13 2024 From: aph at openjdk.org (Andrew Haley) Date: Wed, 19 Jun 2024 07:48:13 GMT Subject: RFR: 8332587: RISC-V: secondary_super_cache does not scale well [v8] In-Reply-To: <_tVvVlAaTUHImBFbxSp67liIoh0toRLkmI_FPwN0vy0=.85da9026-0218-48da-b400-d0e97365f997@github.com> References: <_tVvVlAaTUHImBFbxSp67liIoh0toRLkmI_FPwN0vy0=.85da9026-0218-48da-b400-d0e97365f997@github.com> Message-ID: On Wed, 19 Jun 2024 07:26:31 GMT, Gui Cao wrote: >> Implementation of subtype checking [JDK-8180450](https://bugs.openjdk.org/browse/JDK-8180450) for linux-riscv64. >> This optimization depends on availability of the Zbb extension which has the cpop instruction. >> >> ### Correctness testing: >> >> - [x] Run tier1-3, hotspot:tier4 tests on SOPHON SG2042 (release) >> - [x] Run tier1-3 tests with -XX:+UseRVV on qemu 8.1.0 (fastdebug) >> - [x] Run all of tier1 with `-XX:+VerifySecondarySupers` >> >> ### JMH tested on Banana Pi BPI-F3 board (has Zbb) and Enable UseZbb, Not Enable UseZba, UseZbs >> Original: >> >> Benchmark Mode Cnt Score Error Units >> SecondarySuperCacheHits.test avgt 15 11.375 ? 0.071 ns/op >> SecondarySuperCacheInterContention.test avgt 15 646.087 ? 32.587 ns/op >> SecondarySuperCacheInterContention.test:t1 avgt 15 600.090 ? 83.779 ns/op >> SecondarySuperCacheInterContention.test:t2 avgt 15 692.084 ? 73.218 ns/op >> SecondarySupersLookup.testNegative00 avgt 15 16.420 ? 0.239 ns/op >> SecondarySupersLookup.testNegative01 avgt 15 18.307 ? 0.260 ns/op >> SecondarySupersLookup.testNegative02 avgt 15 21.695 ? 0.458 ns/op >> SecondarySupersLookup.testNegative03 avgt 15 24.855 ? 0.664 ns/op >> SecondarySupersLookup.testNegative04 avgt 15 27.305 ? 0.522 ns/op >> SecondarySupersLookup.testNegative05 avgt 15 29.719 ? 0.385 ns/op >> SecondarySupersLookup.testNegative06 avgt 15 32.231 ? 0.498 ns/op >> SecondarySupersLookup.testNegative07 avgt 15 33.747 ? 0.603 ns/op >> SecondarySupersLookup.testNegative08 avgt 15 35.856 ? 0.629 ns/op >> SecondarySupersLookup.testNegative09 avgt 15 37.077 ? 0.546 ns/op >> SecondarySupersLookup.testNegative10 avgt 15 39.408 ? 0.465 ns/op >> SecondarySupersLookup.testNegative16 avgt 15 51.041 ? 0.547 ns/op >> SecondarySupersLookup.testNegative20 avgt 15 58.722 ? 0.922 ns/op >> SecondarySupersLookup.testNegative30 avgt 15 77.310 ? 0.654 ns/op >> SecondarySupersLookup.testNegative32 avgt 15 81.116 ? 0.854 ns/op >> SecondarySupersLookup.testNegative40 avgt 15 96.311 ? 0.840 ns/op >> SecondarySupersLookup.testNegative50 avgt 15 115.427 ? 0.838 ns/op >> SecondarySupersLookup.testNegative55 avgt 15 124.371 ? 1.076 ns/op >> SecondarySupersLookup.testNegative56 avgt 15 126.796 ? 0.916 ns/op >> SecondarySupersLookup.testNegative57 avgt 15 127.952 ? 1.202 ns/op >> SecondarySupersLookup.testNegative58 avgt 15 131.956 ? 4.515 ns/op >> Seco... > > Gui Cao has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 14 additional commits since the last revision: > > - Merge remote-tracking branch 'upstream/master' into JDK-8332587 > - Put "secondary super table" generate code inside COMPILER2 macro > - Merge remote-tracking branch 'upstream/master' into JDK-8332587 > - Update ins_cost for PartialSubtypeCheck > - Code Format > - Merge remote-tracking branch 'upstream/master' into JDK-8332587 > - Polish Code Comment > - Merge remote-tracking branch 'upstream/master' into JDK-8332587 > - Fix Code format > - Fix for Hamlin comment > - ... and 4 more: https://git.openjdk.org/jdk/compare/23e348ba...cd656692 src/hotspot/cpu/riscv/vm_version_riscv.cpp line 210: > 208: if (!FLAG_IS_DEFAULT(UseSecondarySupersTable)) { > 209: warning("UseSecondarySupersTable is not supported on this CPU"); > 210: } Disabling `UseSecondarySupersTable` is probably not a good idea. https://github.com/openjdk/jdk/blob/48621ae193ef70b2fae4dcb7ddc524f349beb131/src/hotspot/cpu/x86/macroAssembler_x86.cpp#L4743 is a better and easier way to handle this situation. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19320#discussion_r1645597500 From stefank at openjdk.org Wed Jun 19 07:52:11 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Wed, 19 Jun 2024 07:52:11 GMT Subject: RFR: 8334239: Introduce macro for ubsan method/function exclusions [v3] In-Reply-To: References: Message-ID: On Wed, 19 Jun 2024 07:31:28 GMT, Matthias Baesken wrote: >> A number of functions/methods have to be excluded from ubsan detection (e.g. because they do things that ubsan warns about, however it is still valid what is done there). >> We can simplify this by introducing a macro (similar to asan-related ATTRIBUTE_NO_ASAN, see sanitizers/address.hpp). >> Currently something like this is used : >> >> #if defined(__clang__) || defined(__GNUC__) >> __attribute__((no_sanitize("undefined"))) >> #endif > > Matthias Baesken has updated the pull request incrementally with two additional commits since the last revision: > > - change comment > - ubsan.hpp -> ub.hpp I've listed some whitespace and punctuation nits. I have a question/comment about this comment: ... special or even 'dangerous', for example causing desired signals/crashes or *overflows*. Do we really intend to use ATTRIBUTE_NO_UBSAN to silence overflow issues? Don't we want to fix all of those? src/hotspot/share/sanitizers/ub.hpp line 25: > 23: * > 24: */ > 25: #ifndef SHARE_SANITIZERS_UB_HPP Suggestion: */ #ifndef SHARE_SANITIZERS_UB_HPP src/hotspot/share/sanitizers/ub.hpp line 33: > 31: // following function or method. > 32: // Useful if the function or method is known to do something special or even 'dangerous', for > 33: // example causing desired signals/crashes or overflows Suggestion: // example causing desired signals/crashes or overflows. src/hotspot/share/sanitizers/ub.hpp line 42: > 40: #endif > 41: > 42: Suggestion: ------------- PR Review: https://git.openjdk.org/jdk/pull/19722#pullrequestreview-2127350754 PR Review Comment: https://git.openjdk.org/jdk/pull/19722#discussion_r1645594087 PR Review Comment: https://git.openjdk.org/jdk/pull/19722#discussion_r1645595275 PR Review Comment: https://git.openjdk.org/jdk/pull/19722#discussion_r1645594391 From eosterlund at openjdk.org Wed Jun 19 07:53:10 2024 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Wed, 19 Jun 2024 07:53:10 GMT Subject: RFR: 8326820: Metadata artificially kept alive [v2] In-Reply-To: References: <1xvsDMD50nAALDma6JqSvqNVfYl1KnN0ihFdkztDHYE=.97983fa3-6feb-4ee6-b578-c76e326bcc33@github.com> Message-ID: On Wed, 19 Jun 2024 06:14:21 GMT, Axel Boldt-Christmas wrote: >> ClassLoaderDataGraph provides APIs for walking different metadata. All the iterators which are not designed to be used by the GC also keep the holder of the CLDs alive and by extensions keeps all metadata alive. This is problematic for concurrent GC as it keeps otherwise unreachable classes from being unloaded and the respective metadata freed. >> >> This patch changes the default iteration behaviour to not keep the holder alive, with the exception of `loaded_classes_do` (renamed `loaded_classes_do_keepalive`) and `modules_do` (renamed `modules_do_keepalive`) which is used by jvmti APIs that requires that the holder is kept alive. >> >> All other uses consumes all the metadata it queries during its safepoint or before releasing the `ClassLoaderDataGraph_lock`. >> >> Before this change some jcmd, new jfr chunks and some jfr events, all of which consumed these APIs, could cause class unloading to not occur. >> >> Been running our internal stress test in an even more stressful mode which without this patch reproduces the metaspace OOME [JDK-8326005](https://bugs.openjdk.org/browse/JDK-8326005) consistently within a few hours. And after this patch it does not. >> >> Currently running tier1-tier8 testing. > > Axel Boldt-Christmas has updated the pull request incrementally with one additional commit since the last revision: > > Document the iterator and functions Looks good! ------------- Marked as reviewed by eosterlund (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19769#pullrequestreview-2127367739 From stuefe at openjdk.org Wed Jun 19 07:57:11 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 19 Jun 2024 07:57:11 GMT Subject: RFR: 8334239: Introduce macro for ubsan method/function exclusions [v3] In-Reply-To: References: Message-ID: On Wed, 19 Jun 2024 07:31:28 GMT, Matthias Baesken wrote: >> A number of functions/methods have to be excluded from ubsan detection (e.g. because they do things that ubsan warns about, however it is still valid what is done there). >> We can simplify this by introducing a macro (similar to asan-related ATTRIBUTE_NO_ASAN, see sanitizers/address.hpp). >> Currently something like this is used : >> >> #if defined(__clang__) || defined(__GNUC__) >> __attribute__((no_sanitize("undefined"))) >> #endif > > Matthias Baesken has updated the pull request incrementally with two additional commits since the last revision: > > - change comment > - ubsan.hpp -> ub.hpp good, if you take @stefank's suggestions ------------- Marked as reviewed by stuefe (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19722#pullrequestreview-2127377298 From gcao at openjdk.org Wed Jun 19 08:00:16 2024 From: gcao at openjdk.org (Gui Cao) Date: Wed, 19 Jun 2024 08:00:16 GMT Subject: RFR: 8332587: RISC-V: secondary_super_cache does not scale well [v8] In-Reply-To: References: <_tVvVlAaTUHImBFbxSp67liIoh0toRLkmI_FPwN0vy0=.85da9026-0218-48da-b400-d0e97365f997@github.com> Message-ID: On Wed, 19 Jun 2024 07:45:35 GMT, Andrew Haley wrote: > Disabling `UseSecondarySupersTable` is probably not a good idea. > > https://github.com/openjdk/jdk/blob/48621ae193ef70b2fae4dcb7ddc524f349beb131/src/hotspot/cpu/x86/macroAssembler_x86.cpp#L4743 > > is a better and easier way to handle this situation. Hi, The same question has been raised before. Here is the implementation and test data using scalar registers. https://github.com/openjdk/jdk/pull/19320#issuecomment-2144900070 With the jmh test data, we see that there is a performance decrease from testNegative55 to testNegative64 when zbb is not available. This is because a loop is needed to count the number of 1 when Zbb is not available. Therefore, the current patch is only optimized when Zbb is available. What do you think? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19320#discussion_r1645614507 From fyang at openjdk.org Wed Jun 19 08:11:08 2024 From: fyang at openjdk.org (Fei Yang) Date: Wed, 19 Jun 2024 08:11:08 GMT Subject: RFR: 8334505: RISC-V: Several tests fail when MaxVectorSize does not match VM_Version::_initial_vector_length In-Reply-To: References: Message-ID: <5Dp1uHtGX5nhb1LTDJ-eCinxb5k30GAVnx9iqSjnD84=.9d029b0e-919d-40b7-aff8-0d2276380072@github.com> On Wed, 19 Jun 2024 04:21:24 GMT, Gui Cao wrote: > HI, It's possible to specify a MaxVectorSize which is not equal to VM_Version::_initial_vector_length on RISC-V. For example, it could happen on Banana-Pi that MaxVectorSize equals 16, while VM_Version::_initial_vector_length is 32. This may lead to several jtreg test failures, see jbs issue for exception information. > > The reason for this problem is that when spill vector registers into memory, the whole width of the register is used incorrectly, and MaxVectorSize should be used to handle the number of elements spill. > > https://github.com/openjdk/jdk/blob/326dbb1b139dd1ec1b8605339b91697cdf49da9a/src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.hpp#L133-L136 > > PR propose to simply set MaxVectorSize to VM_Version::_initial_vector_length for the following reasons: > 1. The CSR_VLENB register of RISC-V is read-only, we can't change it to MaxVectorSize like like aarch64. > 2. It does not make sense to me to set MaxVectorSize to a value smaller than VM_Version::_initial_vector_length in the real world, which might bring negative impact on performance. > 3. If MaxVectorSize equals to VM_Version::_initial_vector_length, then we can make use of vs1r_v/vl1r_v when saving and restoring vector registers, which avoids the need to control the number of elements with vsetvli. > > After this patch, MaxVectorSize always equal to VM_Version::_initial_vector_length: > > zifeihan at plct-c8:~/jdk/build/linux-riscv64-server-fastdebug/jdk/bin$ ./java -XX:MaxVectorSize=16 -XX:+PrintFlagsFinal -version |grep MaxVectorSize > OpenJDK 64-Bit Server VM warning: MaxVectorSize is set to 32 on this platform > intx MaxVectorSize = 32 {C2 product} {command line} > openjdk version "24-internal" 2025-03-18 > OpenJDK Runtime Environment (fastdebug build 24-internal-adhoc.zifeihan.jdk) > OpenJDK 64-Bit Server VM (fastdebug build 24-internal-adhoc.zifeihan.jdk, mixed mode) > zifeihan at plct-c8:~/jdk/build/linux-riscv64-server-fastdebug/jdk/bin$ ./java -XX:MaxVectorSize=32 -XX:+PrintFlagsFinal -version |grep MaxVectorSize > intx MaxVectorSize = 32 {C2 product} {command line} > openjdk version "24-internal" 2025-03-18 > OpenJDK Runtime Environment (fastdebug build 24-internal-adhoc.zifeihan.jdk) > OpenJDK 64-Bit Server VM (fastdebug build 24-internal-adhoc.zifeihan.jdk, mixed mode) > zifeihan at plct-c8:~/jdk/build/linux-riscv64-server-fastdebug/jdk/bin$ ./java -XX:MaxVectorSize=64 -XX:+PrintFlagsFinal -vers... Yeah, most of the related RISC-V code was written under the assumption that MaxVectorSize matches `vlenb` CSR (the vector register length in bytes). I agree it will be safer to have this change for now. Also I don't think a MaxVectorSize smaller than `vlenb` would work if we want to experiment with vector register groups (LMUL > 1) some day for C2 especially when we come to vector reduction operations. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19785#issuecomment-2178035623 From aph at openjdk.org Wed Jun 19 08:22:12 2024 From: aph at openjdk.org (Andrew Haley) Date: Wed, 19 Jun 2024 08:22:12 GMT Subject: RFR: 8332587: RISC-V: secondary_super_cache does not scale well [v8] In-Reply-To: References: <_tVvVlAaTUHImBFbxSp67liIoh0toRLkmI_FPwN0vy0=.85da9026-0218-48da-b400-d0e97365f997@github.com> Message-ID: <9ZIxjqK5lFNFQh6S5IIO_TM-olZpMa5rLiKWpHEXXEw=.7987809a-be5a-451e-a03e-7bc41073bc56@github.com> On Wed, 19 Jun 2024 07:56:08 GMT, Gui Cao wrote: >> src/hotspot/cpu/riscv/vm_version_riscv.cpp line 210: >> >>> 208: if (!FLAG_IS_DEFAULT(UseSecondarySupersTable)) { >>> 209: warning("UseSecondarySupersTable is not supported on this CPU"); >>> 210: } >> >> Disabling `UseSecondarySupersTable` is probably not a good idea. https://github.com/openjdk/jdk/blob/48621ae193ef70b2fae4dcb7ddc524f349beb131/src/hotspot/cpu/x86/macroAssembler_x86.cpp#L4743 is a better and easier way to handle this situation. > >> Disabling `UseSecondarySupersTable` is probably not a good idea. >> >> https://github.com/openjdk/jdk/blob/48621ae193ef70b2fae4dcb7ddc524f349beb131/src/hotspot/cpu/x86/macroAssembler_x86.cpp#L4743 >> >> is a better and easier way to handle this situation. > > Hi, The same question has been raised before. Here is the implementation and test data using scalar registers. https://github.com/openjdk/jdk/pull/19320#issuecomment-2144900070 > With the jmh test data, we see that there is a performance decrease from testNegative55 to testNegative64 when zbb is not available. This is because a loop is needed to count the number of 1 when Zbb is not available. Therefore, the current patch is only optimized when Zbb is available. What do you think? > > Disabling `UseSecondarySupersTable` is probably not a good idea. > > https://github.com/openjdk/jdk/blob/48621ae193ef70b2fae4dcb7ddc524f349beb131/src/hotspot/cpu/x86/macroAssembler_x86.cpp#L4743 > > > > is a better and easier way to handle this situation. > > Hi, The same question has been raised before. Here is the implementation and test data using scalar registers. [#19320 (comment)](https://github.com/openjdk/jdk/pull/19320#issuecomment-2144900070) With the jmh test data, we see that there is a performance decrease from testNegative55 to testNegative64 Such huge numbers of secondary supers don't occur in real-world code. > when zbb is not available. This is because a loop is needed to count the number of 1 when Zbb is not available. Therefore, the current patch is only optimized when Zbb is available. What do you think? Please do that test again with -XX:-UseSecondarySupersCache. There is a problem I don't quite understand which causes the benchmark to use the existing secondary_super_cache, and this gives results that are misleading. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19320#discussion_r1645665331 From rcastanedalo at openjdk.org Wed Jun 19 08:37:11 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 19 Jun 2024 08:37:11 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 In-Reply-To: References: Message-ID: On Tue, 18 Jun 2024 18:37:41 GMT, Roberto Casta?eda Lozano wrote: >> src/hotspot/share/gc/g1/c2/g1BarrierSetC2.cpp line 479: >> >>> 477: >>> 478: if (g1_can_remove_pre_barrier(kit, &kit->gvn(), adr, access.type(), adr_idx)) { >>> 479: barriers ^= G1C2BarrierPre; >> >> Is it possible to rewrite this method to remove `^=`? This method first construct both pre/post barriers and remove if needed. I wonder if the logic will be cleaner if we track if pre/post barrier can be removed using two `bool` and construct the actual barrier in the end using two `bool` -- this way, we can avoid mutating the `barriers` var. > > Thanks for the suggestion, I will try it out. Is [this](https://github.com/robcasloz/jdk/compare/JDK-8334060-g1-late-barrier-expansion...robcasloz:jdk:JDK-8334060-g1-late-barrier-expansion-alberts-barriers-suggestion) what you had in mind? I don't have a strong opinion, but if you think the change improves readability I am happy to merge it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1645691876 From kbarrett at openjdk.org Wed Jun 19 08:41:24 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Wed, 19 Jun 2024 08:41:24 GMT Subject: RFR: 8333133: Simplify QuickSort::sort [v2] In-Reply-To: References: Message-ID: > The "idempotent" argument is removed from that function, with associated > simplifications to the implementation. Callers are updated to remove that > argument. Callers that were providing a false value are unaffected in their > behavior. The 3 callers that were providing a true value to request the > associated feature are also unaffected (other than by being made faster), > because the arrays involved don't contain any equivalent pairs. > > There are also some miscellaneous cleanups, including using the swap utility > and fixing some comments. > > Testing: mach5 tier1-3 Kim Barrett has updated the pull request incrementally with one additional commit since the last revision: improve find_pivot description ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19464/files - new: https://git.openjdk.org/jdk/pull/19464/files/154a5ed0..5cee3b81 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19464&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19464&range=00-01 Stats: 6 lines in 1 file changed: 2 ins; 2 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/19464.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19464/head:pull/19464 PR: https://git.openjdk.org/jdk/pull/19464 From ayang at openjdk.org Wed Jun 19 08:45:17 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Wed, 19 Jun 2024 08:45:17 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 In-Reply-To: References: Message-ID: On Wed, 19 Jun 2024 08:34:25 GMT, Roberto Casta?eda Lozano wrote: >> Thanks for the suggestion, I will try it out. > > Is [this](https://github.com/robcasloz/jdk/compare/JDK-8334060-g1-late-barrier-expansion...robcasloz:jdk:JDK-8334060-g1-late-barrier-expansion-alberts-barriers-suggestion) what you had in mind? I don't have a strong opinion, but if you think the change improves readability I am happy to merge it. Yes. Two nits: add `can_` to those two bools and unpack the final return expr, sth like: int barriers = 0; if (!can_remove_pre...) { barriers |= pre; } if (!can_remove_post...) { barriers |= post; } return barriers; ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1645708369 From ayang at openjdk.org Wed Jun 19 08:48:11 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Wed, 19 Jun 2024 08:48:11 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 In-Reply-To: References: Message-ID: On Tue, 18 Jun 2024 18:36:50 GMT, Roberto Casta?eda Lozano wrote: > This also improves code reuse In this area, I think code duplication is less of an issue -- it's more crucial that one can follow the asm flow as if reading real asm. (Ofc, this is subjective; feel free to keep as is.) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1645713269 From rcastanedalo at openjdk.org Wed Jun 19 08:59:10 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 19 Jun 2024 08:59:10 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 In-Reply-To: References: Message-ID: On Wed, 19 Jun 2024 08:42:26 GMT, Albert Mingkun Yang wrote: >> Is [this](https://github.com/robcasloz/jdk/compare/JDK-8334060-g1-late-barrier-expansion...robcasloz:jdk:JDK-8334060-g1-late-barrier-expansion-alberts-barriers-suggestion) what you had in mind? I don't have a strong opinion, but if you think the change improves readability I am happy to merge it. > > Yes. > > Two nits: add `can_` to those two bools and unpack the final return expr, sth like: > > > int barriers = 0; > > if (!can_remove_pre...) { > barriers |= pre; > } > if (!can_remove_post...) { > barriers |= post; > } > > return barriers; Thanks, I will do some testing before merging. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1645729241 From mbaesken at openjdk.org Wed Jun 19 09:02:34 2024 From: mbaesken at openjdk.org (Matthias Baesken) Date: Wed, 19 Jun 2024 09:02:34 GMT Subject: RFR: 8334239: Introduce macro for ubsan method/function exclusions [v4] In-Reply-To: References: Message-ID: > A number of functions/methods have to be excluded from ubsan detection (e.g. because they do things that ubsan warns about, however it is still valid what is done there). > We can simplify this by introducing a macro (similar to asan-related ATTRIBUTE_NO_ASAN, see sanitizers/address.hpp). > Currently something like this is used : > > #if defined(__clang__) || defined(__GNUC__) > __attribute__((no_sanitize("undefined"))) > #endif Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: whitespace adjustments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19722/files - new: https://git.openjdk.org/jdk/pull/19722/files/c07f9324..1b451500 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19722&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19722&range=02-03 Stats: 4 lines in 1 file changed: 1 ins; 2 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/19722.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19722/head:pull/19722 PR: https://git.openjdk.org/jdk/pull/19722 From stefank at openjdk.org Wed Jun 19 09:06:15 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Wed, 19 Jun 2024 09:06:15 GMT Subject: RFR: 8326820: Metadata artificially kept alive [v2] In-Reply-To: References: <1xvsDMD50nAALDma6JqSvqNVfYl1KnN0ihFdkztDHYE=.97983fa3-6feb-4ee6-b578-c76e326bcc33@github.com> Message-ID: On Wed, 19 Jun 2024 06:14:21 GMT, Axel Boldt-Christmas wrote: >> ClassLoaderDataGraph provides APIs for walking different metadata. All the iterators which are not designed to be used by the GC also keep the holder of the CLDs alive and by extensions keeps all metadata alive. This is problematic for concurrent GC as it keeps otherwise unreachable classes from being unloaded and the respective metadata freed. >> >> This patch changes the default iteration behaviour to not keep the holder alive, with the exception of `loaded_classes_do` (renamed `loaded_classes_do_keepalive`) and `modules_do` (renamed `modules_do_keepalive`) which is used by jvmti APIs that requires that the holder is kept alive. >> >> All other uses consumes all the metadata it queries during its safepoint or before releasing the `ClassLoaderDataGraph_lock`. >> >> Before this change some jcmd, new jfr chunks and some jfr events, all of which consumed these APIs, could cause class unloading to not occur. >> >> Been running our internal stress test in an even more stressful mode which without this patch reproduces the metaspace OOME [JDK-8326005](https://bugs.openjdk.org/browse/JDK-8326005) consistently within a few hours. And after this patch it does not. >> >> Currently running tier1-tier8 testing. > > Axel Boldt-Christmas has updated the pull request incrementally with one additional commit since the last revision: > > Document the iterator and functions Looks good. One question below. src/hotspot/share/classfile/systemDictionary.cpp line 1588: > 1586: { > 1587: MutexLocker ml(ClassLoaderDataGraph_lock); > 1588: ClassLoaderDataGraph::methods_do_no_keepalive(f); What about the call (SystemDictionary::methods_do), should we leave that name as-is or does it also need to be suffixed with no_keepalive? ------------- Marked as reviewed by stefank (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19769#pullrequestreview-2127563591 PR Review Comment: https://git.openjdk.org/jdk/pull/19769#discussion_r1645724854 From mbaesken at openjdk.org Wed Jun 19 09:08:15 2024 From: mbaesken at openjdk.org (Matthias Baesken) Date: Wed, 19 Jun 2024 09:08:15 GMT Subject: RFR: 8334239: Introduce macro for ubsan method/function exclusions [v3] In-Reply-To: References: Message-ID: On Wed, 19 Jun 2024 07:49:20 GMT, Stefan Karlsson wrote: > Do we really intend to use ATTRIBUTE_NO_UBSAN to silence overflow issues? Don't we want to fix all of those? Some stuff needs fixing in external libs, that can take some (maybe a lot) of time. Other coding might already handle overflows but only after the overflow happened; so ubsan is triggered but the coding is fine. For now I would be in a first step already happy to be able to *build* with ubsan enabled (we are close to this but not 100% there, I still have to use some patches on top of OpenJDK). ------------- PR Comment: https://git.openjdk.org/jdk/pull/19722#issuecomment-2178157829 From stefank at openjdk.org Wed Jun 19 09:15:10 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Wed, 19 Jun 2024 09:15:10 GMT Subject: RFR: 8334239: Introduce macro for ubsan method/function exclusions [v3] In-Reply-To: References: Message-ID: <5dEXB789kyn5s8I_A9HS2_777bo4VK3d_tyEnxNX7IY=.f599ead7-1e77-4a47-ad31-148f56efc292@github.com> On Wed, 19 Jun 2024 09:05:35 GMT, Matthias Baesken wrote: > > Do we really intend to use ATTRIBUTE_NO_UBSAN to silence overflow issues? Don't we want to fix all of those? > > Some stuff needs fixing in external libs, that can take some (maybe a lot) of time. Will ATTRIBUTE_NO_UBSAN be used in those case? > Other coding might already handle overflows but only after the overflow happened; so ubsan is triggered but the coding is fine. This is the part that I'm questioning. I don't think this is true. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19722#issuecomment-2178171789 From dholmes at openjdk.org Wed Jun 19 10:39:16 2024 From: dholmes at openjdk.org (David Holmes) Date: Wed, 19 Jun 2024 10:39:16 GMT Subject: RFR: 8334223: Make Arena MEMFLAGs immutable [v3] In-Reply-To: References: Message-ID: <1jfawA_vX-sPB988TtqDCKlYpqBcZdZbJT1fYjdIvmg=.a07a2483-ddb6-4ed4-b807-65392cdc7182@github.com> On Wed, 19 Jun 2024 06:37:32 GMT, Thomas Stuefe wrote: >> Arenas carry NMT flags. >> >> An arena should never change that flag. But it does: Arenas (as ResourceAreas), used by CompilerThread, are accounted toward mtCompiler. But since the RA is already created in the parent class constructor (as mtThread), we then have to awkwardly change the flag of an already existing RA in the CompilerThread constructor. >> >> As a prerequisite for future NMT work I would like Arena MEMFLAGS to be immutable. >> >> The patch does that: >> - we hand in MEMFLAGS to the Thread constructor now (defaults to mtThread) >> - CompilerThread hands in mtCompiler, all other threads rely on the default >> - on creation, both ResourceArea and HandleArea are now accounted toward the flag handed in >> - that allows us to make Arena::flags const, and to remove ResourceArea::bias_to which changed the flag in-flight for the arena >> - it also allows us to make Arena::flags private >> >> Other, unrelated cleanups: >> - Made Arena::_size_in_bytes and Arena::_tag private >> - Merged both Arena constructors into one by specifying a default value of `Chunk::init_size` for `init_size` argument. That makes it equivalent to the old `Arena(flag, tag)` constructor >> - removed `JavaThread::JavaThread(bool)`. That constructor was used when creating threads that are getting attached. There was only a single use for that constructor, and I replaced it with functionally equivalent code. >> >> Tests: >> >> I manually verified that the NMT numbers printed don't change. > > Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - feedback david > - Merge branch 'master' into arena-constify-memflags > - feedback johan > - Merge branch 'master' into arena-constify-memflags > - start Thanks for update - thread changes look good. ------------- PR Review: https://git.openjdk.org/jdk/pull/19693#pullrequestreview-2127823529 From aph at openjdk.org Wed Jun 19 10:49:18 2024 From: aph at openjdk.org (Andrew Haley) Date: Wed, 19 Jun 2024 10:49:18 GMT Subject: RFR: 8332587: RISC-V: secondary_super_cache does not scale well [v8] In-Reply-To: <9ZIxjqK5lFNFQh6S5IIO_TM-olZpMa5rLiKWpHEXXEw=.7987809a-be5a-451e-a03e-7bc41073bc56@github.com> References: <_tVvVlAaTUHImBFbxSp67liIoh0toRLkmI_FPwN0vy0=.85da9026-0218-48da-b400-d0e97365f997@github.com> <9ZIxjqK5lFNFQh6S5IIO_TM-olZpMa5rLiKWpHEXXEw=.7987809a-be5a-451e-a03e-7bc41073bc56@github.com> Message-ID: On Wed, 19 Jun 2024 08:19:38 GMT, Andrew Haley wrote: >>> Disabling `UseSecondarySupersTable` is probably not a good idea. >>> >>> https://github.com/openjdk/jdk/blob/48621ae193ef70b2fae4dcb7ddc524f349beb131/src/hotspot/cpu/x86/macroAssembler_x86.cpp#L4743 >>> >>> is a better and easier way to handle this situation. >> >> Hi, The same question has been raised before. Here is the implementation and test data using scalar registers. https://github.com/openjdk/jdk/pull/19320#issuecomment-2144900070 >> With the jmh test data, we see that there is a performance decrease from testNegative55 to testNegative64 when zbb is not available. This is because a loop is needed to count the number of 1 when Zbb is not available. Therefore, the current patch is only optimized when Zbb is available. What do you think? > >> > Disabling `UseSecondarySupersTable` is probably not a good idea. >> > https://github.com/openjdk/jdk/blob/48621ae193ef70b2fae4dcb7ddc524f349beb131/src/hotspot/cpu/x86/macroAssembler_x86.cpp#L4743 >> > >> > is a better and easier way to handle this situation. >> >> Hi, The same question has been raised before. Here is the implementation and test data using scalar registers. [#19320 (comment)](https://github.com/openjdk/jdk/pull/19320#issuecomment-2144900070) With the jmh test data, we see that there is a performance decrease from testNegative55 to testNegative64 > > Such huge numbers of secondary supers don't occur in real-world code. > >> when zbb is not available. This is because a loop is needed to count the number of 1 when Zbb is not available. Therefore, the current patch is only optimized when Zbb is available. What do you think? > > Please do that test again with -XX:-UseSecondarySupersCache. There is a problem I don't quite understand which causes the benchmark to use the existing secondary_super_cache, and this gives results that are misleading. I think I may have found the problem: the warmup loop doesn't run for long enough on some machines. Can you try something like `-wi 10` ? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19320#discussion_r1645908089 From mbaesken at openjdk.org Wed Jun 19 12:02:15 2024 From: mbaesken at openjdk.org (Matthias Baesken) Date: Wed, 19 Jun 2024 12:02:15 GMT Subject: RFR: 8334239: Introduce macro for ubsan method/function exclusions [v4] In-Reply-To: References: Message-ID: On Wed, 19 Jun 2024 09:02:34 GMT, Matthias Baesken wrote: >> A number of functions/methods have to be excluded from ubsan detection (e.g. because they do things that ubsan warns about, however it is still valid what is done there). >> We can simplify this by introducing a macro (similar to asan-related ATTRIBUTE_NO_ASAN, see sanitizers/address.hpp). >> Currently something like this is used : >> >> #if defined(__clang__) || defined(__GNUC__) >> __attribute__((no_sanitize("undefined"))) >> #endif > > Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: > > whitespace adjustments I can remove the 2 word 'or overflows' if you want . (there was at least a 'division by 0' example case here https://github.com/openjdk/jdk/pull/19674 where the case detected by ubsan was already handled; but the coding could be reorganized to avoid it at all ; comment in that review from the area expert 'This whole ubsan thing is just random findings based on what tests we have. We'll never be ubsan clean and I'm not sure it is advisable in all cases.' ) ------------- PR Comment: https://git.openjdk.org/jdk/pull/19722#issuecomment-2178517419 From stefank at openjdk.org Wed Jun 19 12:32:12 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Wed, 19 Jun 2024 12:32:12 GMT Subject: RFR: 8334239: Introduce macro for ubsan method/function exclusions [v4] In-Reply-To: References: Message-ID: On Wed, 19 Jun 2024 11:59:09 GMT, Matthias Baesken wrote: > I can remove the 2 word 'or overflows' if you want. Yes, please. Maybe there are places where overflows are OK, but I would like to get that better explained before we hint people that they can use ATTRIBUTE_NO_UBSAN to "silence" ubsan. > the case detected by ubsan was already handled I've seen this argument a number of times in the UBSAN PRs. My concern is that I'm not convinced that you can "handle" real UB issue. You can work around or fix them, but if you have a real UB and and try to silence ubsan, then you still have the UB and the compiler is within rights to throw away the code that "handles" the issue. Over the years we have seen a few issues where this happens: some if statements check for overflows, but the checks were unexpectedly removed by the compiler. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19722#issuecomment-2178585093 From mbaesken at openjdk.org Wed Jun 19 12:41:43 2024 From: mbaesken at openjdk.org (Matthias Baesken) Date: Wed, 19 Jun 2024 12:41:43 GMT Subject: RFR: 8334239: Introduce macro for ubsan method/function exclusions [v5] In-Reply-To: References: Message-ID: > A number of functions/methods have to be excluded from ubsan detection (e.g. because they do things that ubsan warns about, however it is still valid what is done there). > We can simplify this by introducing a macro (similar to asan-related ATTRIBUTE_NO_ASAN, see sanitizers/address.hpp). > Currently something like this is used : > > #if defined(__clang__) || defined(__GNUC__) > __attribute__((no_sanitize("undefined"))) > #endif Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: adjust comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19722/files - new: https://git.openjdk.org/jdk/pull/19722/files/1b451500..41b96248 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19722&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19722&range=03-04 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/19722.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19722/head:pull/19722 PR: https://git.openjdk.org/jdk/pull/19722 From stefank at openjdk.org Wed Jun 19 13:46:11 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Wed, 19 Jun 2024 13:46:11 GMT Subject: RFR: 8334239: Introduce macro for ubsan method/function exclusions [v5] In-Reply-To: References: Message-ID: On Wed, 19 Jun 2024 12:41:43 GMT, Matthias Baesken wrote: >> A number of functions/methods have to be excluded from ubsan detection (e.g. because they do things that ubsan warns about, however it is still valid what is done there). >> We can simplify this by introducing a macro (similar to asan-related ATTRIBUTE_NO_ASAN, see sanitizers/address.hpp). >> Currently something like this is used : >> >> #if defined(__clang__) || defined(__GNUC__) >> __attribute__((no_sanitize("undefined"))) >> #endif > > Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: > > adjust comments Thanks for making the change. I think this looks good, minus one nit in the whitespace change. src/hotspot/share/sanitizers/ub.hpp line 42: > 40: #define ATTRIBUTE_NO_UBSAN > 41: #endif > 42: #endif // SHARE_SANITIZERS_UB_HPP You removed one too many blank lines here. It used to be two, now there are none. ------------- PR Review: https://git.openjdk.org/jdk/pull/19722#pullrequestreview-2128330864 PR Review Comment: https://git.openjdk.org/jdk/pull/19722#discussion_r1646229154 From mbaesken at openjdk.org Wed Jun 19 13:50:40 2024 From: mbaesken at openjdk.org (Matthias Baesken) Date: Wed, 19 Jun 2024 13:50:40 GMT Subject: RFR: 8334239: Introduce macro for ubsan method/function exclusions [v6] In-Reply-To: References: Message-ID: > A number of functions/methods have to be excluded from ubsan detection (e.g. because they do things that ubsan warns about, however it is still valid what is done there). > We can simplify this by introducing a macro (similar to asan-related ATTRIBUTE_NO_ASAN, see sanitizers/address.hpp). > Currently something like this is used : > > #if defined(__clang__) || defined(__GNUC__) > __attribute__((no_sanitize("undefined"))) > #endif Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: add blank line ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19722/files - new: https://git.openjdk.org/jdk/pull/19722/files/41b96248..a9900b30 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19722&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19722&range=04-05 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/19722.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19722/head:pull/19722 PR: https://git.openjdk.org/jdk/pull/19722 From stefank at openjdk.org Wed Jun 19 14:44:14 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Wed, 19 Jun 2024 14:44:14 GMT Subject: RFR: 8334239: Introduce macro for ubsan method/function exclusions [v6] In-Reply-To: References: Message-ID: <1VTvPu9x9u3ZES8PPqGfgiTJEEfZEGAQ9L0ZwpgS8XM=.4d7367f0-ea37-460f-bb74-a89b04df728e@github.com> On Wed, 19 Jun 2024 13:50:40 GMT, Matthias Baesken wrote: >> A number of functions/methods have to be excluded from ubsan detection (e.g. because they do things that ubsan warns about, however it is still valid what is done there). >> We can simplify this by introducing a macro (similar to asan-related ATTRIBUTE_NO_ASAN, see sanitizers/address.hpp). >> Currently something like this is used : >> >> #if defined(__clang__) || defined(__GNUC__) >> __attribute__((no_sanitize("undefined"))) >> #endif > > Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: > > add blank line Marked as reviewed by stefank (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/19722#pullrequestreview-2128486035 From aboldtch at openjdk.org Wed Jun 19 15:06:25 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Wed, 19 Jun 2024 15:06:25 GMT Subject: RFR: 8326820: Metadata artificially kept alive [v3] In-Reply-To: <1xvsDMD50nAALDma6JqSvqNVfYl1KnN0ihFdkztDHYE=.97983fa3-6feb-4ee6-b578-c76e326bcc33@github.com> References: <1xvsDMD50nAALDma6JqSvqNVfYl1KnN0ihFdkztDHYE=.97983fa3-6feb-4ee6-b578-c76e326bcc33@github.com> Message-ID: > ClassLoaderDataGraph provides APIs for walking different metadata. All the iterators which are not designed to be used by the GC also keep the holder of the CLDs alive and by extensions keeps all metadata alive. This is problematic for concurrent GC as it keeps otherwise unreachable classes from being unloaded and the respective metadata freed. > > This patch changes the default iteration behaviour to not keep the holder alive, with the exception of `loaded_classes_do` (renamed `loaded_classes_do_keepalive`) and `modules_do` (renamed `modules_do_keepalive`) which is used by jvmti APIs that requires that the holder is kept alive. > > All other uses consumes all the metadata it queries during its safepoint or before releasing the `ClassLoaderDataGraph_lock`. > > Before this change some jcmd, new jfr chunks and some jfr events, all of which consumed these APIs, could cause class unloading to not occur. > > Been running our internal stress test in an even more stressful mode which without this patch reproduces the metaspace OOME [JDK-8326005](https://bugs.openjdk.org/browse/JDK-8326005) consistently within a few hours. And after this patch it does not. > > Currently running tier1-tier8 testing. Axel Boldt-Christmas has updated the pull request incrementally with one additional commit since the last revision: Rename and comment SystemDictionary::methods_do ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19769/files - new: https://git.openjdk.org/jdk/pull/19769/files/0048f8bd..5f29cfbe Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19769&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19769&range=01-02 Stats: 7 lines in 3 files changed: 2 ins; 1 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/19769.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19769/head:pull/19769 PR: https://git.openjdk.org/jdk/pull/19769 From aboldtch at openjdk.org Wed Jun 19 15:06:25 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Wed, 19 Jun 2024 15:06:25 GMT Subject: RFR: 8326820: Metadata artificially kept alive [v2] In-Reply-To: References: <1xvsDMD50nAALDma6JqSvqNVfYl1KnN0ihFdkztDHYE=.97983fa3-6feb-4ee6-b578-c76e326bcc33@github.com> Message-ID: On Wed, 19 Jun 2024 08:53:33 GMT, Stefan Karlsson wrote: >> Axel Boldt-Christmas has updated the pull request incrementally with one additional commit since the last revision: >> >> Document the iterator and functions > > src/hotspot/share/classfile/systemDictionary.cpp line 1588: > >> 1586: { >> 1587: MutexLocker ml(ClassLoaderDataGraph_lock); >> 1588: ClassLoaderDataGraph::methods_do_no_keepalive(f); > > What about the call (SystemDictionary::methods_do), should we leave that name as-is or does it also need to be suffixed with no_keepalive? Probably safer to add the suffix. With a comment. `static void classes_do(MetaspaceClosure* it);` Above had a scary name but it has no definition I can find. Seems like it was removed in [JDK-8213346](https://bugs.openjdk.org/browse/JDK-8213346) / 147fc3ed1373286f3a849bf5e8cac83deeb55a3e ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19769#discussion_r1646356371 From ihse at openjdk.org Wed Jun 19 15:11:24 2024 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Wed, 19 Jun 2024 15:11:24 GMT Subject: RFR: 8333268: Fixes for static build [v3] In-Reply-To: References: Message-ID: > This patch contains a set of changes to improve static builds. They will pave the way for implementing a full static-only java launcher. The changes here will: > > 1) Make sure non-exported symbols are made local in the static libraries. This means that the risk of symbol conflict is the same for static libraries as for dynamic libraries (i.e. in practice zero, as long as a consistent naming scheme is used for exported functions). > > 2) Remove the work-arounds to exclude duplicated symbols. > > 3) Fix some code in hotspot and the JDK libraries that did not work properly with a static java launcher. > > The latter fixes are copied from or inspired by the work done by @jianglizhou and her team as part of the Project Leyden [Hermetic Java](https://github.com/openjdk/leyden/tree/hermetic-java-runtime). Magnus Ihse Bursie has updated the pull request incrementally with one additional commit since the last revision: Do not use partial linking when building static libraries for internal consumption ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19478/files - new: https://git.openjdk.org/jdk/pull/19478/files/e1c46562..4ab70df3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19478&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19478&range=01-02 Stats: 7 lines in 1 file changed: 7 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/19478.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19478/head:pull/19478 PR: https://git.openjdk.org/jdk/pull/19478 From szaldana at openjdk.org Wed Jun 19 15:13:43 2024 From: szaldana at openjdk.org (Sonia Zaldana Calles) Date: Wed, 19 Jun 2024 15:13:43 GMT Subject: RFR: 8334570: Problem list gc/TestAlwaysPreTouchBehavior.java Message-ID: Hi all, This PR addresses [8334570](https://bugs.openjdk.org/browse/JDK-8334570) problem listing `gc/TestAlwaysPreTouchBehavior.java` until the underlying issues are resolved. Note that despite failures only showing up for some collectors (G1 on linux-ppc64le, ZSingleGen and the parallel collector on macos-aarch64), I am problem listing all collectors in case it has to do with underprovisioned machines leading to the errors trickling down to the other test cases. Thanks, Sonia ------------- Commit messages: - 8334570: Problem list gc/TestAlwaysPreTouchBehavior.java Changes: https://git.openjdk.org/jdk/pull/19794/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19794&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8334570 Stats: 7 lines in 1 file changed: 7 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/19794.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19794/head:pull/19794 PR: https://git.openjdk.org/jdk/pull/19794 From stuefe at openjdk.org Wed Jun 19 15:13:43 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 19 Jun 2024 15:13:43 GMT Subject: RFR: 8334570: Problem list gc/TestAlwaysPreTouchBehavior.java In-Reply-To: References: Message-ID: On Wed, 19 Jun 2024 14:59:01 GMT, Sonia Zaldana Calles wrote: > Hi all, > > This PR addresses [8334570](https://bugs.openjdk.org/browse/JDK-8334570) problem listing `gc/TestAlwaysPreTouchBehavior.java` until the underlying issues are resolved. > > Note that despite failures only showing up for some collectors (G1 on linux-ppc64le, ZSingleGen and the parallel collector on macos-aarch64), I am problem listing all collectors in case it has to do with underprovisioned machines leading to the errors trickling down to the other test cases. > > Thanks, > Sonia Can you please ignore this for generic-all? ------------- PR Comment: https://git.openjdk.org/jdk/pull/19794#issuecomment-2178941660 From ihse at openjdk.org Wed Jun 19 15:15:43 2024 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Wed, 19 Jun 2024 15:15:43 GMT Subject: RFR: 8333268: Fixes for static build [v4] In-Reply-To: References: Message-ID: > This patch contains a set of changes to improve static builds. They will pave the way for implementing a full static-only java launcher. The changes here will: > > 1) Make sure non-exported symbols are made local in the static libraries. This means that the risk of symbol conflict is the same for static libraries as for dynamic libraries (i.e. in practice zero, as long as a consistent naming scheme is used for exported functions). > > 2) Remove the work-arounds to exclude duplicated symbols. > > 3) Fix some code in hotspot and the JDK libraries that did not work properly with a static java launcher. > > The latter fixes are copied from or inspired by the work done by @jianglizhou and her team as part of the Project Leyden [Hermetic Java](https://github.com/openjdk/leyden/tree/hermetic-java-runtime). Magnus Ihse Bursie has updated the pull request incrementally with one additional commit since the last revision: Add dummy implementation of os::lookup_function for Windows ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19478/files - new: https://git.openjdk.org/jdk/pull/19478/files/4ab70df3..b88d813e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19478&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19478&range=02-03 Stats: 6 lines in 1 file changed: 6 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/19478.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19478/head:pull/19478 PR: https://git.openjdk.org/jdk/pull/19478 From szaldana at openjdk.org Wed Jun 19 15:17:41 2024 From: szaldana at openjdk.org (Sonia Zaldana Calles) Date: Wed, 19 Jun 2024 15:17:41 GMT Subject: RFR: 8334570: Problem list gc/TestAlwaysPreTouchBehavior.java In-Reply-To: References: Message-ID: On Wed, 19 Jun 2024 15:09:42 GMT, Thomas Stuefe wrote: > Can you please ignore this for generic-all? Sure, I updated it. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19794#issuecomment-2178950927 From ihse at openjdk.org Wed Jun 19 15:21:10 2024 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Wed, 19 Jun 2024 15:21:10 GMT Subject: RFR: 8333268: Fixes for static build [v4] In-Reply-To: References: Message-ID: On Wed, 19 Jun 2024 15:15:43 GMT, Magnus Ihse Bursie wrote: >> This patch contains a set of changes to improve static builds. They will pave the way for implementing a full static-only java launcher. The changes here will: >> >> 1) Make sure non-exported symbols are made local in the static libraries. This means that the risk of symbol conflict is the same for static libraries as for dynamic libraries (i.e. in practice zero, as long as a consistent naming scheme is used for exported functions). >> >> 2) Remove the work-arounds to exclude duplicated symbols. >> >> 3) Fix some code in hotspot and the JDK libraries that did not work properly with a static java launcher. >> >> The latter fixes are copied from or inspired by the work done by @jianglizhou and her team as part of the Project Leyden [Hermetic Java](https://github.com/openjdk/leyden/tree/hermetic-java-runtime). > > Magnus Ihse Bursie has updated the pull request incrementally with one additional commit since the last revision: > > Add dummy implementation of os::lookup_function for Windows The reason the gtest failed was that we build a static library libgtest.a, which is linked with the gtest libjvm.so. With the changes in this PR, libgtest.a was being built using the `ld -r` + `objcopy --localize-hidden` method. This caused some weird issues with gcc, related to C++ code and the `COMDAT` object info. I failed to track down any proper solution, so instead I added a patch where the libraries that we explicitly declare as `STATIC_LIBRARIES` are linked as before, without the partial linking step. These libraries are only intended for internal consumption (that is, they are linked to and used by another, "external" library), and so the extra protection added by the partial linking is not really needed. It's a bit sad that this did not work, but it is no big deal. It won't affect files released in the image, and it will not be a regression as compared to now. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19478#issuecomment-2178961562 From stefank at openjdk.org Wed Jun 19 15:24:10 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Wed, 19 Jun 2024 15:24:10 GMT Subject: RFR: 8326820: Metadata artificially kept alive [v3] In-Reply-To: References: <1xvsDMD50nAALDma6JqSvqNVfYl1KnN0ihFdkztDHYE=.97983fa3-6feb-4ee6-b578-c76e326bcc33@github.com> Message-ID: On Wed, 19 Jun 2024 15:06:25 GMT, Axel Boldt-Christmas wrote: >> ClassLoaderDataGraph provides APIs for walking different metadata. All the iterators which are not designed to be used by the GC also keep the holder of the CLDs alive and by extensions keeps all metadata alive. This is problematic for concurrent GC as it keeps otherwise unreachable classes from being unloaded and the respective metadata freed. >> >> This patch changes the default iteration behaviour to not keep the holder alive, with the exception of `loaded_classes_do` (renamed `loaded_classes_do_keepalive`) and `modules_do` (renamed `modules_do_keepalive`) which is used by jvmti APIs that requires that the holder is kept alive. >> >> All other uses consumes all the metadata it queries during its safepoint or before releasing the `ClassLoaderDataGraph_lock`. >> >> Before this change some jcmd, new jfr chunks and some jfr events, all of which consumed these APIs, could cause class unloading to not occur. >> >> Been running our internal stress test in an even more stressful mode which without this patch reproduces the metaspace OOME [JDK-8326005](https://bugs.openjdk.org/browse/JDK-8326005) consistently within a few hours. And after this patch it does not. >> >> Currently running tier1-tier8 testing. > > Axel Boldt-Christmas has updated the pull request incrementally with one additional commit since the last revision: > > Rename and comment SystemDictionary::methods_do Marked as reviewed by stefank (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/19769#pullrequestreview-2128591105 From ayang at openjdk.org Wed Jun 19 15:34:10 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Wed, 19 Jun 2024 15:34:10 GMT Subject: RFR: 8334570: Problem list gc/TestAlwaysPreTouchBehavior.java [v2] In-Reply-To: References: Message-ID: <8IKivNwNm9hMUHTuQ6d7YOwY94AltB1y9JESPL50N-g=.1a4bcdad-8ceb-47c8-9f9d-38d5d637caed@github.com> On Wed, 19 Jun 2024 15:17:41 GMT, Sonia Zaldana Calles wrote: >> Hi all, >> >> This PR addresses [8334570](https://bugs.openjdk.org/browse/JDK-8334570) problem listing `gc/TestAlwaysPreTouchBehavior.java` until the underlying issues are resolved. >> >> Note that despite failures only showing up for some collectors (G1 on linux-ppc64le, ZSingleGen and the parallel collector on macos-aarch64), I am problem listing all collectors in case it has to do with underprovisioned machines leading to the errors trickling down to the other test cases. >> >> Thanks, >> Sonia > > Sonia Zaldana Calles has updated the pull request incrementally with one additional commit since the last revision: > > Updating problem list to generic-all Marked as reviewed by ayang (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/19794#pullrequestreview-2128617275 From tschatzl at openjdk.org Wed Jun 19 16:02:11 2024 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Wed, 19 Jun 2024 16:02:11 GMT Subject: RFR: 8334570: Problem list gc/TestAlwaysPreTouchBehavior.java [v2] In-Reply-To: References: Message-ID: On Wed, 19 Jun 2024 15:17:41 GMT, Sonia Zaldana Calles wrote: >> Hi all, >> >> This PR addresses [8334570](https://bugs.openjdk.org/browse/JDK-8334570) problem listing `gc/TestAlwaysPreTouchBehavior.java` until the underlying issues are resolved. >> >> Note that despite failures only showing up for some collectors (G1 on linux-ppc64le, ZSingleGen and the parallel collector on macos-aarch64), I am problem listing all collectors in case it has to do with underprovisioned machines leading to the errors trickling down to the other test cases. >> >> Thanks, >> Sonia > > Sonia Zaldana Calles has updated the pull request incrementally with one additional commit since the last revision: > > Updating problem list to generic-all lgtm and trivial. ------------- Marked as reviewed by tschatzl (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19794#pullrequestreview-2128673755 From never at openjdk.org Wed Jun 19 16:14:19 2024 From: never at openjdk.org (Tom Rodriguez) Date: Wed, 19 Jun 2024 16:14:19 GMT Subject: RFR: 8333300: [JVMCI] add support for generational ZGC [v4] In-Reply-To: References: Message-ID: > This exposes the required values for JVMCI to support generational ZGC. It includes a few things worth mentioning. JVMCI still exports XBarrierSetRuntime as fields in CompilerToVM::Data under the original name of ZBarrierSetRuntime. I have exported the XBarrierSetRuntime and ZBarrierSetRuntime functions as addresses under their actual name. This permits backward compatibility until all the required parts are in place. We can eventually delete the CompilerToVM::Data names. > > I added ZBarrierSetRuntime::load_barrier_on_oop_array paralleling XBarrierSetRuntime::load_barrier_on_oop_array as we use that for a vector barrier. I could create the function as part of JVMCIRuntime if there are any concerns about including that in the ZGC core. Tom Rodriguez has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 12 commits: - fix spelling of JVMCI - Merge branch 'master' into tkr-genz - Merge remote-tracking branch 'origin/master' into tkr-genz - Merge branch 'tkr-genz' of github.com:tkrodriguez/jdk into tkr-genz - Fix riscv compilation - 8333300: [JVMCI] add support for generational ZGC - Merge branch 'master' into tkr-genz - Merge branch 'master' into tkr-genz - Use NativeAccess to read from handles - Enable support for UseEpsilonGC - ... and 2 more: https://git.openjdk.org/jdk/compare/8464ce6d...7e2b72d5 ------------- Changes: https://git.openjdk.org/jdk/pull/19490/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19490&range=03 Stats: 245 lines in 16 files changed: 194 ins; 10 del; 41 mod Patch: https://git.openjdk.org/jdk/pull/19490.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19490/head:pull/19490 PR: https://git.openjdk.org/jdk/pull/19490 From jsjolen at openjdk.org Wed Jun 19 16:59:14 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Wed, 19 Jun 2024 16:59:14 GMT Subject: RFR: 8333658: NMT: Use an allocator with 4-byte pointers to save memory in NativeCallStackStorage [v16] In-Reply-To: References: <7Cct-1XEeiZHrWVzyS0raTYcsVSOhW2ilF15g3G41LM=.b54a1b88-db2d-44bf-868d-414c9d72059d@github.com> Message-ID: On Thu, 13 Jun 2024 08:34:54 GMT, Thomas Stuefe wrote: >> Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: >> >> Return on free if is_nil() > > Some general remarks, (apart from my remark about the integer wrapper), and then I wait until you say its ready for review. > > 1) I would remove/not bother with the Allocators. There isn't much of a point, and small code is good code. The arena version is particularly questionable since Arenas don't really support arbitrary deallocation. I would not want anyone to use this allocator in real code. > > 2) I would like to have a non-growing version. Optionally, one where I can hand in an address range, and that gets used. Could possibly be combined for simplicity (if you specify a range, its a non-growing array). Reason: I want to be able to place stuff that needs to be address-stable, and I often need to do this in pre-allocated ranges. @tstuefe, Is there anything left in this that you'd like to see? @gerard-ziemski, @afshin-zafari , may I receive a second review on this? ------------- PR Comment: https://git.openjdk.org/jdk/pull/18979#issuecomment-2179128021 From jsjolen at openjdk.org Wed Jun 19 17:01:19 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Wed, 19 Jun 2024 17:01:19 GMT Subject: RFR: 8333994: NMT: call stacks should show source information [v2] In-Reply-To: References: Message-ID: <0RCl1H2ExrXJVZgH8GZp97gktiZhdeBH_83encrHH9E=.29674d56-b15f-4871-bd32-bbd31dbc8a01@github.com> On Mon, 17 Jun 2024 15:19:42 GMT, Thomas Stuefe wrote: >> Patch adds printing of source file and line number to NMT call stacks that are printed in detail mode. Useful for hunting down leaks. > > Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: > > - caching > - Merge branch 'master' into JDK-8333994-NMT-call-stacks-should-show-source-information > - exclude macos from testing source info > - copyrights > - test > - JDK-8333994-NMT-call-stacks-should-show-source-information Still LGTM ------------- Marked as reviewed by jsjolen (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19655#pullrequestreview-2128760269 From sviswanathan at openjdk.org Wed Jun 19 17:03:13 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Wed, 19 Jun 2024 17:03:13 GMT Subject: RFR: 8329032: C2 compiler register allocation support for APX EGPRs [v7] In-Reply-To: <1gSd6mI4H5dykQJLrg9GSyfaW71GjWKaaO86Bwl7Maw=.99e79c49-6829-4072-a8c1-f0674b8295b6@github.com> References: <835IiFAjWPI9AoU9j1WSAhVY2EEHzjpvWucUXTGxgUw=.cdb547b2-e1e7-4c5b-915f-7fde46e26c81@github.com> <1gSd6mI4H5dykQJLrg9GSyfaW71GjWKaaO86Bwl7Maw=.99e79c49-6829-4072-a8c1-f0674b8295b6@github.com> Message-ID: On Wed, 19 Jun 2024 06:54:35 GMT, Jatin Bhateja wrote: >> Intel? Advanced Performance Extensions (Intel? APX) adds 16 new 64 bit general purpose register also known as Extended General Purpose Registers in IA-32e 64 bit mode. >> >> Summary of changes introduced along with this patch:- >> >> 1. C2 compiler register allocation support. >> 2. Architecture state save restoration while transitioning from C1/C2 JIT compiled code to runtime services. >> 3. Support new PUSH2/POP2 instructions along with push-pop acceleration hints (PPX) to optimize register save/restore operation. >> 4. Applicable extensions to native interface used by runtime for patching instruction. >> >> We plan to address C1 register support in subsequent patch as there are hard upper bound allocation limits >> (currently set to r11) imposed by existing implementation of linear scan algorithm after which it reserves >> remaining register for special purpose. >> >> Patch has been regressed over stand alone test points after prioritizing EGPR allocations over existing GPR register by manually modifying the register sequences in relevant allocation class. >> >> We plan to do thorough validation using [Intel's SDE](https://www.intel.com/content/www/us/en/download/684897/intel-software-development-emulator.html) during course of time and release incremental patches for bug fixes >> found during testing. >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Review comments resolution. src/hotspot/cpu/x86/assembler_x86.cpp line 11080: > 11078: void Assembler::vpgatherdd(XMMRegister dst, Address src, XMMRegister mask, int vector_len) { > 11079: assert(VM_Version::supports_avx2(), ""); > 11080: assert(!needs_eevex(src.base(), src.index()), "does not support extended gprs as BASE or INDEX of address operand"); Why this new assert in vpgatherdd? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19042#discussion_r1646520131 From jbhateja at openjdk.org Wed Jun 19 17:15:16 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 19 Jun 2024 17:15:16 GMT Subject: RFR: 8329032: C2 compiler register allocation support for APX EGPRs [v7] In-Reply-To: References: <835IiFAjWPI9AoU9j1WSAhVY2EEHzjpvWucUXTGxgUw=.cdb547b2-e1e7-4c5b-915f-7fde46e26c81@github.com> <1gSd6mI4H5dykQJLrg9GSyfaW71GjWKaaO86Bwl7Maw=.99e79c49-6829-4072-a8c1-f0674b8295b6@github.com> Message-ID: On Wed, 19 Jun 2024 17:00:34 GMT, Sandhya Viswanathan wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> Review comments resolution. > > src/hotspot/cpu/x86/assembler_x86.cpp line 11080: > >> 11078: void Assembler::vpgatherdd(XMMRegister dst, Address src, XMMRegister mask, int vector_len) { >> 11079: assert(VM_Version::supports_avx2(), ""); >> 11080: assert(!needs_eevex(src.base(), src.index()), "does not support extended gprs as BASE or INDEX of address operand"); > > Why this new assert in vpgatherdd? These are not promotable to extended EVEX encoding, BASE and INDEX registers of address operand must not be EGPRs. APX support is anyways enabled for AVX512 targets currently, but still it's good to add safety assertions for completeness. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19042#discussion_r1646528050 From gcao at openjdk.org Wed Jun 19 17:17:15 2024 From: gcao at openjdk.org (Gui Cao) Date: Wed, 19 Jun 2024 17:17:15 GMT Subject: RFR: 8332587: RISC-V: secondary_super_cache does not scale well [v3] In-Reply-To: References: <4BgBBn_Oqhipw03h7BA7ZV4ZbhnMhdDqXU94Z-IFshs=.bd949215-5dd8-43a6-ac4e-348feee4853b@github.com> Message-ID: <9P2Ui6ZClzqYrbwxNiNn4cZ7Rc9f_cM-6hfRuTHQmVY=.a61cc1c5-473f-4710-a73b-52b46c99acd4@github.com> On Mon, 10 Jun 2024 18:32:05 GMT, Hamlin Li wrote: >> Gui Cao has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix Code format > > Thanks for updating! > > With the fix, although it improves the perf for testNegative63/64, but seems it brings some regression for testNegative55-62, in this sense the fix should not be taken. > I'll take another look, sorry for long waiting. @Hamlin-Li @RealFYang @theRealAph : In the case of scalar register implementations, as discussed above, `Such huge numbers of secondary supers don't occur in real-world code.`, can we add scalar register implementations to this PR as well? ------------- PR Comment: https://git.openjdk.org/jdk/pull/19320#issuecomment-2179193084 From gcao at openjdk.org Wed Jun 19 17:17:16 2024 From: gcao at openjdk.org (Gui Cao) Date: Wed, 19 Jun 2024 17:17:16 GMT Subject: RFR: 8332587: RISC-V: secondary_super_cache does not scale well [v8] In-Reply-To: References: <_tVvVlAaTUHImBFbxSp67liIoh0toRLkmI_FPwN0vy0=.85da9026-0218-48da-b400-d0e97365f997@github.com> <9ZIxjqK5lFNFQh6S5IIO_TM-olZpMa5rLiKWpHEXXEw=.7987809a-be5a-451e-a03e-7bc41073bc56@github.com> Message-ID: On Wed, 19 Jun 2024 10:46:36 GMT, Andrew Haley wrote: > I think I may have found the problem: the warmup loop doesn't run for long enough on some machines. Can you try something like `-wi 10` ? Hi, Sorry for being late. JMH tested on SOPHON SG2042 (has not Zbb) Original(not with patch): Benchmark Mode Cnt Score Error Units SecondarySupersLookup.testNegative00 avgt 15 22.653 ? 0.158 ns/op SecondarySupersLookup.testNegative01 avgt 15 23.154 ? 0.116 ns/op SecondarySupersLookup.testNegative02 avgt 15 24.165 ? 0.138 ns/op SecondarySupersLookup.testNegative03 avgt 15 25.177 ? 0.145 ns/op SecondarySupersLookup.testNegative04 avgt 15 26.389 ? 0.457 ns/op SecondarySupersLookup.testNegative05 avgt 15 27.181 ? 0.129 ns/op SecondarySupersLookup.testNegative06 avgt 15 28.186 ? 0.146 ns/op SecondarySupersLookup.testNegative07 avgt 15 29.185 ? 0.148 ns/op SecondarySupersLookup.testNegative08 avgt 15 30.196 ? 0.149 ns/op SecondarySupersLookup.testNegative09 avgt 15 31.219 ? 0.151 ns/op SecondarySupersLookup.testNegative10 avgt 15 32.220 ? 0.162 ns/op SecondarySupersLookup.testNegative16 avgt 15 62.332 ? 0.211 ns/op SecondarySupersLookup.testNegative20 avgt 15 54.752 ? 4.521 ns/op SecondarySupersLookup.testNegative30 avgt 15 64.195 ? 0.072 ns/op SecondarySupersLookup.testNegative32 avgt 15 67.184 ? 0.074 ns/op SecondarySupersLookup.testNegative40 avgt 15 79.235 ? 0.078 ns/op SecondarySupersLookup.testNegative50 avgt 15 94.304 ? 0.093 ns/op SecondarySupersLookup.testNegative55 avgt 15 101.850 ? 0.182 ns/op SecondarySupersLookup.testNegative56 avgt 15 103.310 ? 0.099 ns/op SecondarySupersLookup.testNegative57 avgt 15 104.864 ? 0.140 ns/op SecondarySupersLookup.testNegative58 avgt 15 106.375 ? 0.170 ns/op SecondarySupersLookup.testNegative59 avgt 15 108.056 ? 0.459 ns/op SecondarySupersLookup.testNegative60 avgt 15 109.431 ? 0.142 ns/op SecondarySupersLookup.testNegative61 avgt 15 110.859 ? 0.136 ns/op SecondarySupersLookup.testNegative62 avgt 15 112.627 ? 0.477 ns/op SecondarySupersLookup.testNegative63 avgt 15 113.870 ? 0.145 ns/op SecondarySupersLookup.testNegative64 avgt 15 115.399 ? 0.145 ns/op SecondarySupersLookup.testPositive01 avgt 15 20.129 ? 0.101 ns/op SecondarySupersLookup.testPositive02 avgt 15 20.130 ? 0.105 ns/op SecondarySupersLookup.testPositive03 avgt 15 20.130 ? 0.117 ns/op SecondarySupersLookup.testPositive04 avgt 15 20.133 ? 0.106 ns/op SecondarySupersLookup.testPositive05 avgt 15 20.144 ? 0.110 ns/op SecondarySupersLookup.testPositive06 avgt 15 20.324 ? 0.454 ns/op SecondarySupersLookup.testPositive07 avgt 15 20.131 ? 0.097 ns/op SecondarySupersLookup.testPositive08 avgt 15 20.139 ? 0.102 ns/op SecondarySupersLookup.testPositive09 avgt 15 20.135 ? 0.107 ns/op SecondarySupersLookup.testPositive10 avgt 15 20.136 ? 0.111 ns/op SecondarySupersLookup.testPositive16 avgt 15 20.122 ? 0.105 ns/op SecondarySupersLookup.testPositive20 avgt 15 20.133 ? 0.106 ns/op SecondarySupersLookup.testPositive30 avgt 15 20.147 ? 0.104 ns/op SecondarySupersLookup.testPositive32 avgt 15 20.137 ? 0.101 ns/op SecondarySupersLookup.testPositive40 avgt 15 20.139 ? 0.110 ns/op SecondarySupersLookup.testPositive50 avgt 15 20.135 ? 0.120 ns/op SecondarySupersLookup.testPositive60 avgt 15 20.133 ? 0.122 ns/op SecondarySupersLookup.testPositive63 avgt 15 20.128 ? 0.106 ns/op SecondarySupersLookup.testPositive64 avgt 15 20.138 ? 0.120 ns/op Finished running test 'micro:vm.lang.SecondarySupersLookup' With patch Benchmark Mode Cnt Score Error Units SecondarySupersLookup.testNegative00 avgt 15 20.644 ? 0.152 ns/op SecondarySupersLookup.testNegative01 avgt 15 20.658 ? 0.120 ns/op SecondarySupersLookup.testNegative02 avgt 15 20.855 ? 0.472 ns/op SecondarySupersLookup.testNegative03 avgt 15 20.848 ? 0.479 ns/op SecondarySupersLookup.testNegative04 avgt 15 20.885 ? 0.476 ns/op SecondarySupersLookup.testNegative05 avgt 15 20.652 ? 0.124 ns/op SecondarySupersLookup.testNegative06 avgt 15 20.644 ? 0.122 ns/op SecondarySupersLookup.testNegative07 avgt 15 20.651 ? 0.124 ns/op SecondarySupersLookup.testNegative08 avgt 15 20.654 ? 0.128 ns/op SecondarySupersLookup.testNegative09 avgt 15 20.672 ? 0.133 ns/op SecondarySupersLookup.testNegative10 avgt 15 20.649 ? 0.127 ns/op SecondarySupersLookup.testNegative16 avgt 15 21.072 ? 0.571 ns/op SecondarySupersLookup.testNegative20 avgt 15 20.880 ? 0.492 ns/op SecondarySupersLookup.testNegative30 avgt 15 20.673 ? 0.167 ns/op SecondarySupersLookup.testNegative32 avgt 15 20.672 ? 0.165 ns/op SecondarySupersLookup.testNegative40 avgt 15 20.678 ? 0.184 ns/op SecondarySupersLookup.testNegative50 avgt 15 20.687 ? 0.183 ns/op SecondarySupersLookup.testNegative55 avgt 15 119.554 ? 4.316 ns/op SecondarySupersLookup.testNegative56 avgt 15 121.866 ? 4.324 ns/op SecondarySupersLookup.testNegative57 avgt 15 120.363 ? 4.179 ns/op SecondarySupersLookup.testNegative58 avgt 15 123.774 ? 4.534 ns/op SecondarySupersLookup.testNegative59 avgt 15 122.846 ? 3.909 ns/op SecondarySupersLookup.testNegative60 avgt 15 150.788 ? 3.687 ns/op SecondarySupersLookup.testNegative61 avgt 15 152.192 ? 4.246 ns/op SecondarySupersLookup.testNegative62 avgt 15 154.314 ? 4.429 ns/op SecondarySupersLookup.testNegative63 avgt 15 211.804 ? 2.630 ns/op SecondarySupersLookup.testNegative64 avgt 15 213.304 ? 2.362 ns/op SecondarySupersLookup.testPositive01 avgt 15 20.334 ? 0.453 ns/op SecondarySupersLookup.testPositive02 avgt 15 20.128 ? 0.106 ns/op SecondarySupersLookup.testPositive03 avgt 15 20.144 ? 0.115 ns/op SecondarySupersLookup.testPositive04 avgt 15 20.124 ? 0.101 ns/op SecondarySupersLookup.testPositive05 avgt 15 20.134 ? 0.111 ns/op SecondarySupersLookup.testPositive06 avgt 15 20.125 ? 0.098 ns/op SecondarySupersLookup.testPositive07 avgt 15 20.216 ? 0.212 ns/op SecondarySupersLookup.testPositive08 avgt 15 20.379 ? 0.462 ns/op SecondarySupersLookup.testPositive09 avgt 15 20.145 ? 0.095 ns/op SecondarySupersLookup.testPositive10 avgt 15 20.135 ? 0.102 ns/op SecondarySupersLookup.testPositive16 avgt 15 20.135 ? 0.118 ns/op SecondarySupersLookup.testPositive20 avgt 15 20.128 ? 0.110 ns/op SecondarySupersLookup.testPositive30 avgt 15 20.322 ? 0.455 ns/op SecondarySupersLookup.testPositive32 avgt 15 20.130 ? 0.105 ns/op SecondarySupersLookup.testPositive40 avgt 15 20.138 ? 0.111 ns/op SecondarySupersLookup.testPositive50 avgt 15 20.135 ? 0.113 ns/op SecondarySupersLookup.testPositive60 avgt 15 20.138 ? 0.099 ns/op SecondarySupersLookup.testPositive63 avgt 15 20.335 ? 0.460 ns/op SecondarySupersLookup.testPositive64 avgt 15 20.139 ? 0.105 ns/op Finished running test 'micro:vm.lang.SecondarySupersLookup' With patch and set warmup to 10 iterations Benchmark Mode Cnt Score Error Units SecondarySupersLookup.testNegative00 avgt 15 20.584 ? 0.041 ns/op SecondarySupersLookup.testNegative01 avgt 15 20.607 ? 0.059 ns/op SecondarySupersLookup.testNegative02 avgt 15 20.586 ? 0.034 ns/op SecondarySupersLookup.testNegative03 avgt 15 20.601 ? 0.086 ns/op SecondarySupersLookup.testNegative04 avgt 15 20.795 ? 0.452 ns/op SecondarySupersLookup.testNegative05 avgt 15 20.579 ? 0.042 ns/op SecondarySupersLookup.testNegative06 avgt 15 20.587 ? 0.033 ns/op SecondarySupersLookup.testNegative07 avgt 15 20.594 ? 0.053 ns/op SecondarySupersLookup.testNegative08 avgt 15 20.815 ? 0.451 ns/op SecondarySupersLookup.testNegative09 avgt 15 20.571 ? 0.042 ns/op SecondarySupersLookup.testNegative10 avgt 15 20.577 ? 0.035 ns/op SecondarySupersLookup.testNegative16 avgt 15 20.773 ? 0.442 ns/op SecondarySupersLookup.testNegative20 avgt 15 20.585 ? 0.046 ns/op SecondarySupersLookup.testNegative30 avgt 15 20.780 ? 0.444 ns/op SecondarySupersLookup.testNegative32 avgt 15 20.591 ? 0.039 ns/op SecondarySupersLookup.testNegative40 avgt 15 20.581 ? 0.030 ns/op SecondarySupersLookup.testNegative50 avgt 15 20.579 ? 0.034 ns/op SecondarySupersLookup.testNegative55 avgt 15 119.795 ? 4.727 ns/op SecondarySupersLookup.testNegative56 avgt 15 120.976 ? 4.602 ns/op SecondarySupersLookup.testNegative57 avgt 15 124.057 ? 4.196 ns/op SecondarySupersLookup.testNegative58 avgt 15 121.156 ? 3.756 ns/op SecondarySupersLookup.testNegative59 avgt 15 126.792 ? 1.566 ns/op SecondarySupersLookup.testNegative60 avgt 15 151.713 ? 3.150 ns/op SecondarySupersLookup.testNegative61 avgt 15 151.494 ? 2.852 ns/op SecondarySupersLookup.testNegative62 avgt 15 155.028 ? 2.996 ns/op SecondarySupersLookup.testNegative63 avgt 15 211.724 ? 1.745 ns/op SecondarySupersLookup.testNegative64 avgt 15 212.431 ? 1.077 ns/op SecondarySupersLookup.testPositive01 avgt 15 20.074 ? 0.034 ns/op SecondarySupersLookup.testPositive02 avgt 15 20.089 ? 0.037 ns/op SecondarySupersLookup.testPositive03 avgt 15 20.086 ? 0.049 ns/op SecondarySupersLookup.testPositive04 avgt 15 20.279 ? 0.434 ns/op SecondarySupersLookup.testPositive05 avgt 15 20.090 ? 0.044 ns/op SecondarySupersLookup.testPositive06 avgt 15 20.084 ? 0.038 ns/op SecondarySupersLookup.testPositive07 avgt 15 20.281 ? 0.450 ns/op SecondarySupersLookup.testPositive08 avgt 15 20.091 ? 0.047 ns/op SecondarySupersLookup.testPositive09 avgt 15 20.083 ? 0.051 ns/op SecondarySupersLookup.testPositive10 avgt 15 20.282 ? 0.454 ns/op SecondarySupersLookup.testPositive16 avgt 15 20.086 ? 0.044 ns/op SecondarySupersLookup.testPositive20 avgt 15 20.069 ? 0.026 ns/op SecondarySupersLookup.testPositive30 avgt 15 20.285 ? 0.444 ns/op SecondarySupersLookup.testPositive32 avgt 15 20.314 ? 0.450 ns/op SecondarySupersLookup.testPositive40 avgt 15 20.089 ? 0.035 ns/op SecondarySupersLookup.testPositive50 avgt 15 20.091 ? 0.042 ns/op SecondarySupersLookup.testPositive60 avgt 15 20.090 ? 0.051 ns/op SecondarySupersLookup.testPositive63 avgt 15 20.084 ? 0.047 ns/op SecondarySupersLookup.testPositive64 avgt 15 20.284 ? 0.451 ns/op Finished running test 'micro:vm.lang.SecondarySupersLookup' JMH tested on SOPHON SG2042 (has not Zbb) and use -XX:-UseSecondarySupersCache to disable UseSecondarySupersCache Original(not with patch): Benchmark Mode Cnt Score Error Units SecondarySupersLookup.testNegative00 avgt 15 21.655 ? 0.131 ns/op SecondarySupersLookup.testNegative01 avgt 15 22.739 ? 0.174 ns/op SecondarySupersLookup.testNegative02 avgt 15 23.670 ? 0.116 ns/op SecondarySupersLookup.testNegative03 avgt 15 24.674 ? 0.162 ns/op SecondarySupersLookup.testNegative04 avgt 15 25.661 ? 0.125 ns/op SecondarySupersLookup.testNegative05 avgt 15 26.683 ? 0.136 ns/op SecondarySupersLookup.testNegative06 avgt 15 27.871 ? 0.461 ns/op SecondarySupersLookup.testNegative07 avgt 15 28.669 ? 0.133 ns/op SecondarySupersLookup.testNegative08 avgt 15 29.705 ? 0.155 ns/op SecondarySupersLookup.testNegative09 avgt 15 46.543 ? 2.841 ns/op SecondarySupersLookup.testNegative10 avgt 15 31.702 ? 0.157 ns/op SecondarySupersLookup.testNegative16 avgt 15 58.564 ? 1.906 ns/op SecondarySupersLookup.testNegative20 avgt 15 54.193 ? 2.378 ns/op SecondarySupersLookup.testNegative30 avgt 15 66.840 ? 6.458 ns/op SecondarySupersLookup.testNegative32 avgt 15 70.205 ? 6.984 ns/op SecondarySupersLookup.testNegative40 avgt 15 83.552 ? 9.081 ns/op SecondarySupersLookup.testNegative50 avgt 15 100.477 ? 11.709 ns/op SecondarySupersLookup.testNegative55 avgt 15 108.599 ? 13.016 ns/op SecondarySupersLookup.testNegative56 avgt 15 110.320 ? 13.335 ns/op SecondarySupersLookup.testNegative57 avgt 15 112.058 ? 13.553 ns/op SecondarySupersLookup.testNegative58 avgt 15 113.837 ? 13.766 ns/op SecondarySupersLookup.testNegative59 avgt 15 115.353 ? 14.075 ns/op SecondarySupersLookup.testNegative60 avgt 15 117.174 ? 14.268 ns/op SecondarySupersLookup.testNegative61 avgt 15 118.633 ? 14.570 ns/op SecondarySupersLookup.testNegative62 avgt 15 120.310 ? 14.772 ns/op SecondarySupersLookup.testNegative63 avgt 15 122.178 ? 15.065 ns/op SecondarySupersLookup.testNegative64 avgt 15 123.633 ? 15.321 ns/op SecondarySupersLookup.testPositive01 avgt 15 22.652 ? 0.122 ns/op SecondarySupersLookup.testPositive02 avgt 15 23.845 ? 0.474 ns/op SecondarySupersLookup.testPositive03 avgt 15 24.649 ? 0.101 ns/op SecondarySupersLookup.testPositive04 avgt 15 25.648 ? 0.114 ns/op SecondarySupersLookup.testPositive05 avgt 15 26.838 ? 0.454 ns/op SecondarySupersLookup.testPositive06 avgt 15 27.857 ? 0.439 ns/op SecondarySupersLookup.testPositive07 avgt 15 28.885 ? 0.460 ns/op SecondarySupersLookup.testPositive08 avgt 15 29.672 ? 0.125 ns/op SecondarySupersLookup.testPositive09 avgt 15 30.696 ? 0.110 ns/op SecondarySupersLookup.testPositive10 avgt 15 31.856 ? 0.446 ns/op SecondarySupersLookup.testPositive16 avgt 15 59.366 ? 1.918 ns/op SecondarySupersLookup.testPositive20 avgt 15 54.020 ? 0.517 ns/op SecondarySupersLookup.testPositive30 avgt 15 68.943 ? 6.791 ns/op SecondarySupersLookup.testPositive32 avgt 15 70.269 ? 6.151 ns/op SecondarySupersLookup.testPositive40 avgt 15 83.555 ? 7.987 ns/op SecondarySupersLookup.testPositive50 avgt 15 99.817 ? 10.157 ns/op SecondarySupersLookup.testPositive60 avgt 15 116.173 ? 11.984 ns/op SecondarySupersLookup.testPositive63 avgt 15 120.997 ? 12.620 ns/op SecondarySupersLookup.testPositive64 avgt 15 122.597 ? 12.695 ns/op Finished running test 'micro:vm.lang.SecondarySupersLookup' With patch Benchmark Mode Cnt Score Error Units [9/1814] SecondarySupersLookup.testNegative00 avgt 15 20.140 ? 0.129 ns/op SecondarySupersLookup.testNegative01 avgt 15 20.155 ? 0.127 ns/op SecondarySupersLookup.testNegative02 avgt 15 20.142 ? 0.116 ns/op SecondarySupersLookup.testNegative03 avgt 15 20.137 ? 0.111 ns/op SecondarySupersLookup.testNegative04 avgt 15 20.171 ? 0.125 ns/op SecondarySupersLookup.testNegative05 avgt 15 20.148 ? 0.112 ns/op SecondarySupersLookup.testNegative06 avgt 15 20.148 ? 0.137 ns/op SecondarySupersLookup.testNegative07 avgt 15 20.544 ? 0.563 ns/op SecondarySupersLookup.testNegative08 avgt 15 20.345 ? 0.462 ns/op SecondarySupersLookup.testNegative09 avgt 15 20.360 ? 0.453 ns/op SecondarySupersLookup.testNegative10 avgt 15 20.342 ? 0.458 ns/op SecondarySupersLookup.testNegative16 avgt 15 20.160 ? 0.152 ns/op SecondarySupersLookup.testNegative20 avgt 15 20.366 ? 0.477 ns/op SecondarySupersLookup.testNegative30 avgt 15 20.360 ? 0.470 ns/op SecondarySupersLookup.testNegative32 avgt 15 20.178 ? 0.155 ns/op SecondarySupersLookup.testNegative40 avgt 15 20.173 ? 0.173 ns/op SecondarySupersLookup.testNegative50 avgt 15 20.191 ? 0.181 ns/op SecondarySupersLookup.testNegative55 avgt 15 116.033 ? 1.852 ns/op SecondarySupersLookup.testNegative56 avgt 15 117.699 ? 2.857 ns/op SecondarySupersLookup.testNegative57 avgt 15 118.839 ? 1.786 ns/op SecondarySupersLookup.testNegative58 avgt 15 120.219 ? 3.305 ns/op SecondarySupersLookup.testNegative59 avgt 15 122.110 ? 3.362 ns/op SecondarySupersLookup.testNegative60 avgt 15 148.384 ? 3.282 ns/op SecondarySupersLookup.testNegative61 avgt 15 149.934 ? 2.935 ns/op SecondarySupersLookup.testNegative62 avgt 15 149.537 ? 1.754 ns/op SecondarySupersLookup.testNegative63 avgt 15 210.797 ? 1.352 ns/op SecondarySupersLookup.testNegative64 avgt 15 212.224 ? 1.642 ns/op SecondarySupersLookup.testPositive01 avgt 15 24.010 ? 0.138 ns/op SecondarySupersLookup.testPositive02 avgt 15 23.137 ? 0.110 ns/op SecondarySupersLookup.testPositive03 avgt 15 26.117 ? 0.474 ns/op SecondarySupersLookup.testPositive04 avgt 15 25.814 ? 0.606 ns/op SecondarySupersLookup.testPositive05 avgt 15 23.353 ? 0.496 ns/op SecondarySupersLookup.testPositive06 avgt 15 30.559 ? 1.957 ns/op SecondarySupersLookup.testPositive07 avgt 15 26.078 ? 0.411 ns/op SecondarySupersLookup.testPositive08 avgt 15 36.190 ? 4.512 ns/op SecondarySupersLookup.testPositive09 avgt 15 31.033 ? 1.767 ns/op SecondarySupersLookup.testPositive10 avgt 15 24.724 ? 0.397 ns/op SecondarySupersLookup.testPositive16 avgt 15 41.952 ? 3.365 ns/op SecondarySupersLookup.testPositive20 avgt 15 32.843 ? 3.088 ns/op SecondarySupersLookup.testPositive30 avgt 15 40.470 ? 2.166 ns/op SecondarySupersLookup.testPositive32 avgt 15 70.726 ? 3.383 ns/op SecondarySupersLookup.testPositive40 avgt 15 76.331 ? 4.519 ns/op SecondarySupersLookup.testPositive50 avgt 15 73.510 ? 1.509 ns/op SecondarySupersLookup.testPositive60 avgt 15 112.032 ? 2.564 ns/op SecondarySupersLookup.testPositive63 avgt 15 215.567 ? 18.732 ns/op SecondarySupersLookup.testPositive64 avgt 15 168.949 ? 2.979 ns/op Finished running test 'micro:vm.lang.SecondarySupersLookup' With patch and set warmup to 10 iterations Benchmark Mode Cnt Score Error Units SecondarySupersLookup.testNegative00 avgt 15 20.485 ? 0.543 ns/op SecondarySupersLookup.testNegative01 avgt 15 20.103 ? 0.050 ns/op SecondarySupersLookup.testNegative02 avgt 15 20.081 ? 0.034 ns/op SecondarySupersLookup.testNegative03 avgt 15 20.087 ? 0.042 ns/op SecondarySupersLookup.testNegative04 avgt 15 20.077 ? 0.034 ns/op SecondarySupersLookup.testNegative05 avgt 15 20.085 ? 0.031 ns/op SecondarySupersLookup.testNegative06 avgt 15 20.273 ? 0.449 ns/op SecondarySupersLookup.testNegative07 avgt 15 20.076 ? 0.042 ns/op SecondarySupersLookup.testNegative08 avgt 15 20.081 ? 0.041 ns/op SecondarySupersLookup.testNegative09 avgt 15 20.095 ? 0.052 ns/op SecondarySupersLookup.testNegative10 avgt 15 20.281 ? 0.442 ns/op SecondarySupersLookup.testNegative16 avgt 15 20.085 ? 0.036 ns/op SecondarySupersLookup.testNegative20 avgt 15 20.096 ? 0.052 ns/op SecondarySupersLookup.testNegative30 avgt 15 20.075 ? 0.032 ns/op SecondarySupersLookup.testNegative32 avgt 15 20.091 ? 0.047 ns/op SecondarySupersLookup.testNegative40 avgt 15 20.078 ? 0.027 ns/op SecondarySupersLookup.testNegative50 avgt 15 20.082 ? 0.039 ns/op SecondarySupersLookup.testNegative55 avgt 15 115.646 ? 2.057 ns/op SecondarySupersLookup.testNegative56 avgt 15 117.470 ? 1.702 ns/op SecondarySupersLookup.testNegative57 avgt 15 119.403 ? 2.837 ns/op SecondarySupersLookup.testNegative58 avgt 15 120.050 ? 3.097 ns/op SecondarySupersLookup.testNegative59 avgt 15 121.112 ? 3.306 ns/op SecondarySupersLookup.testNegative60 avgt 15 148.076 ? 1.812 ns/op SecondarySupersLookup.testNegative61 avgt 15 150.464 ? 3.034 ns/op SecondarySupersLookup.testNegative62 avgt 15 150.514 ? 2.516 ns/op SecondarySupersLookup.testNegative63 avgt 15 211.773 ? 1.382 ns/op SecondarySupersLookup.testNegative64 avgt 15 212.451 ? 2.154 ns/op SecondarySupersLookup.testPositive01 avgt 15 23.950 ? 0.118 ns/op SecondarySupersLookup.testPositive02 avgt 15 23.080 ? 0.033 ns/op SecondarySupersLookup.testPositive03 avgt 15 26.222 ? 0.457 ns/op SecondarySupersLookup.testPositive04 avgt 15 25.841 ? 0.525 ns/op SecondarySupersLookup.testPositive05 avgt 15 23.097 ? 0.059 ns/op SecondarySupersLookup.testPositive06 avgt 15 31.586 ? 2.145 ns/op SecondarySupersLookup.testPositive07 avgt 15 26.271 ? 0.567 ns/op SecondarySupersLookup.testPositive08 avgt 15 33.403 ? 0.255 ns/op SecondarySupersLookup.testPositive09 avgt 15 32.042 ? 2.643 ns/op SecondarySupersLookup.testPositive10 avgt 15 24.276 ? 0.104 ns/op SecondarySupersLookup.testPositive16 avgt 15 40.799 ? 0.489 ns/op SecondarySupersLookup.testPositive20 avgt 15 35.114 ? 4.011 ns/op SecondarySupersLookup.testPositive30 avgt 15 42.361 ? 2.791 ns/op SecondarySupersLookup.testPositive32 avgt 15 67.847 ? 0.689 ns/op SecondarySupersLookup.testPositive40 avgt 15 73.685 ? 1.336 ns/op SecondarySupersLookup.testPositive50 avgt 15 72.951 ? 0.548 ns/op SecondarySupersLookup.testPositive60 avgt 15 111.593 ? 3.444 ns/op SecondarySupersLookup.testPositive63 avgt 15 203.572 ? 2.176 ns/op SecondarySupersLookup.testPositive64 avgt 15 168.671 ? 0.581 ns/op Finished running test 'micro:vm.lang.SecondarySupersLookup' With the jmh test data, we see that there is a performance decrease from testNegative55 to testNegative64 when zbb is not available. This is because a loop is needed to count the number of 1 when Zbb is not available. I've tested this before on Banana Pi BPI-F3 board (has Zbb) and Disable UseZbb to using scalar registers as well, with similar data. When Zbb is available, the test data is better in all test cases except testNegative63, testNegative64. So the performance decrease here compared to when using Zbb is caused by the number of 1's counted using the scalar register loop. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19320#discussion_r1646528616 From jsjolen at openjdk.org Wed Jun 19 17:19:17 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Wed, 19 Jun 2024 17:19:17 GMT Subject: RFR: 8322475: Extend printing for System.map [v5] In-Reply-To: References: Message-ID: On Tue, 18 Jun 2024 14:46:31 GMT, Thomas Stuefe wrote: >> This is an expansion on the new `System.map` command introduced with JDK-8318636. >> >> We now print valuable information per memory region, such as: >> >> - the actual resident set size >> - the actual number of huge pages >> - the actual used page size >> - the THP state of the region (was advised, is eligible, uses THP, ...) >> - whether the region is shared >> - whether the region had been committed (backed by swap) >> - whether the region has been swapped out. >> >> Example output: >> >> >> from to size rss hugetlb pgsz prot notes vm info/file >> 0x00000000c0000000 - 0x00000000ffe00000 1071644672 0 4194304 2M rw-p huge JAVAHEAP /anon_hugepage >> 0x00000000ffe00000 - 0x0000000100000000 2097152 0 0 2M rw-p huge JAVAHEAP /anon_hugepage >> 0x0000558016b67000 - 0x0000558016b68000 4096 4096 0 4K r--p /shared/projects/openjdk/jdk-jdk/output-fastdebug/images/jdk/bin/java >> 0x0000558016b68000 - 0x0000558016b69000 4096 4096 0 4K r-xp /shared/projects/openjdk/jdk-jdk/output-fastdebug/images/jdk/bin/java >> 0x00007f3a749f2000 - 0x00007f3a74c62000 2555904 2555904 0 4K rwxp CODE(CodeHeap 'profiled nmethods') >> 0x00007f3a74c62000 - 0x00007f3a7be51000 119468032 0 0 4K ---p nores CODE(CodeHeap 'profiled nmethods') >> 0x00007f3a7be51000 - 0x00007f3a7c1c1000 3604480 3604480 0 4K rwxp CODE(CodeHeap 'profiled nmethods') >> 0x00007f3a7c1c1000 - 0x00007f3a7c592000 4001792 0 0 4K ---p nores CODE(CodeHeap 'non-nmethods') >> 0x00007f3a7c592000 - 0x00007f3a7c802000 2555904 2555904 0 4K rwxp CODE(CodeHeap 'non-profiled nmethods') ... > > Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 20 commits: > > - copyrights > - Merge branch 'master' into System.maps-more-info > - fix merge issue > - Merge branch 'master' into System.maps-more-info > - fix whitespace issue > - wip > - exhuming > - Merge branch 'master' into System.maps-more-info > - Merge > - remove codecache name printing > - ... and 10 more: https://git.openjdk.org/jdk/compare/91bd85d6...231a8a91 A first round with this code. src/hotspot/os/linux/memMapPrinter_linux.cpp line 41: > 39: size_t _vsize; // combined virtual size > 40: size_t _rss; // combined resident set size > 41: size_t _committed; // combined committed space What's the difference between a "space" and a "size"? src/hotspot/os/linux/memMapPrinter_linux.cpp line 83: > 81: outputStream* st = _session.out(); > 82: #define INDENT_BY(n) \ > 83: if (st->fill_to(n) == 0) { \ `fill_to` returns `void`, am I missing something? src/hotspot/os/linux/memMapPrinter_linux.cpp line 161: > 159: > 160: void MemMapPrinter::pd_print_all_mappings(const MappingPrintSession& session) { > 161: constexpr char filename[] = "/proc/self/smaps"; Is this non-constexpr if it's a `const char *` instead of `char[]`? src/hotspot/os/linux/procMapsParser.cpp line 95: > 93: SCAN(nh); > 94: #undef SCAN > 95: return; Style: Doesn't matter if the return is here or not. src/hotspot/os/linux/procMapsParser.inline.hpp line 1: > 1: /* Is it important to have these in an inline file? They're very small, seems like they can just be in the header. ------------- PR Review: https://git.openjdk.org/jdk/pull/17158#pullrequestreview-2128766725 PR Review Comment: https://git.openjdk.org/jdk/pull/17158#discussion_r1646521846 PR Review Comment: https://git.openjdk.org/jdk/pull/17158#discussion_r1646523960 PR Review Comment: https://git.openjdk.org/jdk/pull/17158#discussion_r1646524764 PR Review Comment: https://git.openjdk.org/jdk/pull/17158#discussion_r1646528449 PR Review Comment: https://git.openjdk.org/jdk/pull/17158#discussion_r1646530621 From mli at openjdk.org Wed Jun 19 17:45:12 2024 From: mli at openjdk.org (Hamlin Li) Date: Wed, 19 Jun 2024 17:45:12 GMT Subject: RFR: 8332587: RISC-V: secondary_super_cache does not scale well [v3] In-Reply-To: References: <4BgBBn_Oqhipw03h7BA7ZV4ZbhnMhdDqXU94Z-IFshs=.bd949215-5dd8-43a6-ac4e-348feee4853b@github.com> Message-ID: On Mon, 10 Jun 2024 18:32:05 GMT, Hamlin Li wrote: >> Gui Cao has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix Code format > > Thanks for updating! > > With the fix, although it improves the perf for testNegative63/64, but seems it brings some regression for testNegative55-62, in this sense the fix should not be taken. > I'll take another look, sorry for long waiting. > @Hamlin-Li @RealFYang @theRealAph : In the case of scalar register implementations, as discussed above, `Such huge numbers of secondary supers don't occur in real-world code.`, can we add scalar register implementations to this PR as well? I suggest to add scalar version (i.e. use population_count instead when zbb is not supported). ------------- PR Comment: https://git.openjdk.org/jdk/pull/19320#issuecomment-2179231802 From kbarrett at openjdk.org Wed Jun 19 18:59:17 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Wed, 19 Jun 2024 18:59:17 GMT Subject: RFR: 8334239: Introduce macro for ubsan method/function exclusions [v6] In-Reply-To: References: Message-ID: On Wed, 19 Jun 2024 13:50:40 GMT, Matthias Baesken wrote: >> A number of functions/methods have to be excluded from ubsan detection (e.g. because they do things that ubsan warns about, however it is still valid what is done there). >> We can simplify this by introducing a macro (similar to asan-related ATTRIBUTE_NO_ASAN, see sanitizers/address.hpp). >> Currently something like this is used : >> >> #if defined(__clang__) || defined(__GNUC__) >> __attribute__((no_sanitize("undefined"))) >> #endif > > Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: > > add blank line Looks good. ------------- Marked as reviewed by kbarrett (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19722#pullrequestreview-2128905659 From sviswanathan at openjdk.org Wed Jun 19 21:02:17 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Wed, 19 Jun 2024 21:02:17 GMT Subject: RFR: 8329032: C2 compiler register allocation support for APX EGPRs [v7] In-Reply-To: References: <835IiFAjWPI9AoU9j1WSAhVY2EEHzjpvWucUXTGxgUw=.cdb547b2-e1e7-4c5b-915f-7fde46e26c81@github.com> <1gSd6mI4H5dykQJLrg9GSyfaW71GjWKaaO86Bwl7Maw=.99e79c49-6829-4072-a8c1-f0674b8295b6@github.com> Message-ID: On Wed, 19 Jun 2024 17:11:27 GMT, Jatin Bhateja wrote: >> src/hotspot/cpu/x86/assembler_x86.cpp line 11080: >> >>> 11078: void Assembler::vpgatherdd(XMMRegister dst, Address src, XMMRegister mask, int vector_len) { >>> 11079: assert(VM_Version::supports_avx2(), ""); >>> 11080: assert(!needs_eevex(src.base(), src.index()), "does not support extended gprs as BASE or INDEX of address operand"); >> >> Why this new assert in vpgatherdd? > > These are not promotable to extended EVEX encoding, BASE and INDEX registers of address operand must not be EGPRs. APX support is anyways enabled for AVX512 targets currently, but still it's good to add safety assertions for completeness. But the index here is xmm register which could be xmm16 to xmm31 so the assert needs to be corrected. Also good to have the similar assert in vpgatherdq. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19042#discussion_r1646684706 From szaldana at openjdk.org Thu Jun 20 01:39:19 2024 From: szaldana at openjdk.org (Sonia Zaldana Calles) Date: Thu, 20 Jun 2024 01:39:19 GMT Subject: Integrated: 8334570: Problem list gc/TestAlwaysPreTouchBehavior.java In-Reply-To: References: Message-ID: On Wed, 19 Jun 2024 14:59:01 GMT, Sonia Zaldana Calles wrote: > Hi all, > > This PR addresses [8334570](https://bugs.openjdk.org/browse/JDK-8334570) problem listing `gc/TestAlwaysPreTouchBehavior.java` until the underlying issues are resolved. > > Note that despite failures only showing up for some collectors (G1 on linux-ppc64le, ZSingleGen and the parallel collector on macos-aarch64), I am problem listing all collectors in case it has to do with underprovisioned machines leading to the errors trickling down to the other test cases. > > Thanks, > Sonia This pull request has now been integrated. Changeset: b211929e Author: Sonia Zaldana Calles Committer: David Holmes URL: https://git.openjdk.org/jdk/commit/b211929e05c0acdf7343c3edd025749d573c67b3 Stats: 7 lines in 1 file changed: 7 ins; 0 del; 0 mod 8334570: Problem list gc/TestAlwaysPreTouchBehavior.java Reviewed-by: ayang, tschatzl ------------- PR: https://git.openjdk.org/jdk/pull/19794 From gcao at openjdk.org Thu Jun 20 02:07:44 2024 From: gcao at openjdk.org (Gui Cao) Date: Thu, 20 Jun 2024 02:07:44 GMT Subject: RFR: 8332587: RISC-V: secondary_super_cache does not scale well [v9] In-Reply-To: References: Message-ID: > Implementation of subtype checking [JDK-8180450](https://bugs.openjdk.org/browse/JDK-8180450) for linux-riscv64. > This optimization depends on availability of the Zbb extension which has the cpop instruction. > > ### Correctness testing: > > - [x] Run tier1-3, hotspot:tier4 tests on SOPHON SG2042 (release) > - [x] Run tier1-3 tests with -XX:+UseRVV on qemu 8.1.0 (fastdebug) > - [x] Run all of tier1 with `-XX:+VerifySecondarySupers` > > ### JMH tested on Banana Pi BPI-F3 board (has Zbb) and Enable UseZbb, Not Enable UseZba, UseZbs > Original: > > Benchmark Mode Cnt Score Error Units > SecondarySuperCacheHits.test avgt 15 11.375 ? 0.071 ns/op > SecondarySuperCacheInterContention.test avgt 15 646.087 ? 32.587 ns/op > SecondarySuperCacheInterContention.test:t1 avgt 15 600.090 ? 83.779 ns/op > SecondarySuperCacheInterContention.test:t2 avgt 15 692.084 ? 73.218 ns/op > SecondarySupersLookup.testNegative00 avgt 15 16.420 ? 0.239 ns/op > SecondarySupersLookup.testNegative01 avgt 15 18.307 ? 0.260 ns/op > SecondarySupersLookup.testNegative02 avgt 15 21.695 ? 0.458 ns/op > SecondarySupersLookup.testNegative03 avgt 15 24.855 ? 0.664 ns/op > SecondarySupersLookup.testNegative04 avgt 15 27.305 ? 0.522 ns/op > SecondarySupersLookup.testNegative05 avgt 15 29.719 ? 0.385 ns/op > SecondarySupersLookup.testNegative06 avgt 15 32.231 ? 0.498 ns/op > SecondarySupersLookup.testNegative07 avgt 15 33.747 ? 0.603 ns/op > SecondarySupersLookup.testNegative08 avgt 15 35.856 ? 0.629 ns/op > SecondarySupersLookup.testNegative09 avgt 15 37.077 ? 0.546 ns/op > SecondarySupersLookup.testNegative10 avgt 15 39.408 ? 0.465 ns/op > SecondarySupersLookup.testNegative16 avgt 15 51.041 ? 0.547 ns/op > SecondarySupersLookup.testNegative20 avgt 15 58.722 ? 0.922 ns/op > SecondarySupersLookup.testNegative30 avgt 15 77.310 ? 0.654 ns/op > SecondarySupersLookup.testNegative32 avgt 15 81.116 ? 0.854 ns/op > SecondarySupersLookup.testNegative40 avgt 15 96.311 ? 0.840 ns/op > SecondarySupersLookup.testNegative50 avgt 15 115.427 ? 0.838 ns/op > SecondarySupersLookup.testNegative55 avgt 15 124.371 ? 1.076 ns/op > SecondarySupersLookup.testNegative56 avgt 15 126.796 ? 0.916 ns/op > SecondarySupersLookup.testNegative57 avgt 15 127.952 ? 1.202 ns/op > SecondarySupersLookup.testNegative58 avgt 15 131.956 ? 4.515 ns/op > SecondarySupersLookup.testNegative59 avgt 15 131.858 ? 1.066 ns/op > SecondaryS... Gui Cao has updated the pull request incrementally with one additional commit since the last revision: Add population_count for scalar version ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19320/files - new: https://git.openjdk.org/jdk/pull/19320/files/cd656692..4c829c86 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19320&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19320&range=07-08 Stats: 41 lines in 3 files changed: 31 ins; 9 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/19320.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19320/head:pull/19320 PR: https://git.openjdk.org/jdk/pull/19320 From fyang at openjdk.org Thu Jun 20 02:24:17 2024 From: fyang at openjdk.org (Fei Yang) Date: Thu, 20 Jun 2024 02:24:17 GMT Subject: RFR: 8332587: RISC-V: secondary_super_cache does not scale well [v9] In-Reply-To: References: Message-ID: On Thu, 20 Jun 2024 02:07:44 GMT, Gui Cao wrote: >> Implementation of subtype checking [JDK-8180450](https://bugs.openjdk.org/browse/JDK-8180450) for linux-riscv64. >> This optimization depends on availability of the Zbb extension which has the cpop instruction. >> >> ### Correctness testing: >> >> - [x] Run tier1-3, hotspot:tier4 tests on SOPHON SG2042 (release) >> - [x] Run tier1-3 tests with -XX:+UseRVV on qemu 8.1.0 (fastdebug) >> - [x] Run all of tier1 with `-XX:+VerifySecondarySupers` >> >> ### JMH tested on Banana Pi BPI-F3 board (has Zbb) and Enable UseZbb, Not Enable UseZba, UseZbs >> Original: >> >> Benchmark Mode Cnt Score Error Units >> SecondarySuperCacheHits.test avgt 15 11.375 ? 0.071 ns/op >> SecondarySuperCacheInterContention.test avgt 15 646.087 ? 32.587 ns/op >> SecondarySuperCacheInterContention.test:t1 avgt 15 600.090 ? 83.779 ns/op >> SecondarySuperCacheInterContention.test:t2 avgt 15 692.084 ? 73.218 ns/op >> SecondarySupersLookup.testNegative00 avgt 15 16.420 ? 0.239 ns/op >> SecondarySupersLookup.testNegative01 avgt 15 18.307 ? 0.260 ns/op >> SecondarySupersLookup.testNegative02 avgt 15 21.695 ? 0.458 ns/op >> SecondarySupersLookup.testNegative03 avgt 15 24.855 ? 0.664 ns/op >> SecondarySupersLookup.testNegative04 avgt 15 27.305 ? 0.522 ns/op >> SecondarySupersLookup.testNegative05 avgt 15 29.719 ? 0.385 ns/op >> SecondarySupersLookup.testNegative06 avgt 15 32.231 ? 0.498 ns/op >> SecondarySupersLookup.testNegative07 avgt 15 33.747 ? 0.603 ns/op >> SecondarySupersLookup.testNegative08 avgt 15 35.856 ? 0.629 ns/op >> SecondarySupersLookup.testNegative09 avgt 15 37.077 ? 0.546 ns/op >> SecondarySupersLookup.testNegative10 avgt 15 39.408 ? 0.465 ns/op >> SecondarySupersLookup.testNegative16 avgt 15 51.041 ? 0.547 ns/op >> SecondarySupersLookup.testNegative20 avgt 15 58.722 ? 0.922 ns/op >> SecondarySupersLookup.testNegative30 avgt 15 77.310 ? 0.654 ns/op >> SecondarySupersLookup.testNegative32 avgt 15 81.116 ? 0.854 ns/op >> SecondarySupersLookup.testNegative40 avgt 15 96.311 ? 0.840 ns/op >> SecondarySupersLookup.testNegative50 avgt 15 115.427 ? 0.838 ns/op >> SecondarySupersLookup.testNegative55 avgt 15 124.371 ? 1.076 ns/op >> SecondarySupersLookup.testNegative56 avgt 15 126.796 ? 0.916 ns/op >> SecondarySupersLookup.testNegative57 avgt 15 127.952 ? 1.202 ns/op >> SecondarySupersLookup.testNegative58 avgt 15 131.956 ? 4.515 ns/op >> Seco... > > Gui Cao has updated the pull request incrementally with one additional commit since the last revision: > > Add population_count for scalar version Changes requested by fyang (Reviewer). src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 3614: > 3612: } > 3613: > 3614: void MacroAssembler::population_count(Register dst, Register src, Better to add a comment for this function, like: // population_count variant for running without the CPOP // instruction, which was introduced with Zbb extension. src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 3635: > 3633: addi(dst, dst, 1); > 3634: mv(tmp2, tmp1); > 3635: addi(tmp2, tmp2, -1); Suggestion: `addi(tmp2, tmp1, -1);`. This help save the preceding `mv` instruction. src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 3699: > 3697: if (bit != 0) { > 3698: slli(r_array_index, r_bitmap, (Klass::SECONDARY_SUPERS_TABLE_MASK - bit)); > 3699: population_count(r_array_index, r_array_index, t0, tmp1); Suggestion: `population_count(r_array_index, r_array_index, tmp1, tmp2);` ------------- PR Review: https://git.openjdk.org/jdk/pull/19320#pullrequestreview-2129257919 PR Review Comment: https://git.openjdk.org/jdk/pull/19320#discussion_r1646839166 PR Review Comment: https://git.openjdk.org/jdk/pull/19320#discussion_r1646843823 PR Review Comment: https://git.openjdk.org/jdk/pull/19320#discussion_r1646839653 From gcao at openjdk.org Thu Jun 20 03:18:41 2024 From: gcao at openjdk.org (Gui Cao) Date: Thu, 20 Jun 2024 03:18:41 GMT Subject: RFR: 8332587: RISC-V: secondary_super_cache does not scale well [v10] In-Reply-To: References: Message-ID: > Implementation of subtype checking [JDK-8180450](https://bugs.openjdk.org/browse/JDK-8180450) for linux-riscv64. > This optimization depends on availability of the Zbb extension which has the cpop instruction. > > ### Correctness testing: > > - [x] Run tier1-3, hotspot:tier4 tests on SOPHON SG2042 (release) > - [x] Run tier1-3 tests with -XX:+UseRVV on qemu 8.1.0 (fastdebug) > - [x] Run all of tier1 with `-XX:+VerifySecondarySupers` > > ### JMH tested on Banana Pi BPI-F3 board (has Zbb) and Enable UseZbb, Not Enable UseZba, UseZbs > Original: > > Benchmark Mode Cnt Score Error Units > SecondarySuperCacheHits.test avgt 15 11.375 ? 0.071 ns/op > SecondarySuperCacheInterContention.test avgt 15 646.087 ? 32.587 ns/op > SecondarySuperCacheInterContention.test:t1 avgt 15 600.090 ? 83.779 ns/op > SecondarySuperCacheInterContention.test:t2 avgt 15 692.084 ? 73.218 ns/op > SecondarySupersLookup.testNegative00 avgt 15 16.420 ? 0.239 ns/op > SecondarySupersLookup.testNegative01 avgt 15 18.307 ? 0.260 ns/op > SecondarySupersLookup.testNegative02 avgt 15 21.695 ? 0.458 ns/op > SecondarySupersLookup.testNegative03 avgt 15 24.855 ? 0.664 ns/op > SecondarySupersLookup.testNegative04 avgt 15 27.305 ? 0.522 ns/op > SecondarySupersLookup.testNegative05 avgt 15 29.719 ? 0.385 ns/op > SecondarySupersLookup.testNegative06 avgt 15 32.231 ? 0.498 ns/op > SecondarySupersLookup.testNegative07 avgt 15 33.747 ? 0.603 ns/op > SecondarySupersLookup.testNegative08 avgt 15 35.856 ? 0.629 ns/op > SecondarySupersLookup.testNegative09 avgt 15 37.077 ? 0.546 ns/op > SecondarySupersLookup.testNegative10 avgt 15 39.408 ? 0.465 ns/op > SecondarySupersLookup.testNegative16 avgt 15 51.041 ? 0.547 ns/op > SecondarySupersLookup.testNegative20 avgt 15 58.722 ? 0.922 ns/op > SecondarySupersLookup.testNegative30 avgt 15 77.310 ? 0.654 ns/op > SecondarySupersLookup.testNegative32 avgt 15 81.116 ? 0.854 ns/op > SecondarySupersLookup.testNegative40 avgt 15 96.311 ? 0.840 ns/op > SecondarySupersLookup.testNegative50 avgt 15 115.427 ? 0.838 ns/op > SecondarySupersLookup.testNegative55 avgt 15 124.371 ? 1.076 ns/op > SecondarySupersLookup.testNegative56 avgt 15 126.796 ? 0.916 ns/op > SecondarySupersLookup.testNegative57 avgt 15 127.952 ? 1.202 ns/op > SecondarySupersLookup.testNegative58 avgt 15 131.956 ? 4.515 ns/op > SecondarySupersLookup.testNegative59 avgt 15 131.858 ? 1.066 ns/op > SecondaryS... Gui Cao has updated the pull request incrementally with one additional commit since the last revision: Add comment and fix population_count ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19320/files - new: https://git.openjdk.org/jdk/pull/19320/files/4c829c86..0e81d27e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19320&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19320&range=08-09 Stats: 5 lines in 1 file changed: 2 ins; 1 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/19320.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19320/head:pull/19320 PR: https://git.openjdk.org/jdk/pull/19320 From gcao at openjdk.org Thu Jun 20 03:24:17 2024 From: gcao at openjdk.org (Gui Cao) Date: Thu, 20 Jun 2024 03:24:17 GMT Subject: RFR: 8332587: RISC-V: secondary_super_cache does not scale well [v9] In-Reply-To: References: Message-ID: On Thu, 20 Jun 2024 02:20:18 GMT, Fei Yang wrote: >> Gui Cao has updated the pull request incrementally with one additional commit since the last revision: >> >> Add population_count for scalar version > > src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 3635: > >> 3633: addi(dst, dst, 1); >> 3634: mv(tmp2, tmp1); >> 3635: addi(tmp2, tmp2, -1); > > Suggestion: `addi(tmp2, tmp1, -1);`. This help save the preceding `mv` instruction. Thanks for the review. I've retested the JMH. JMH tested on SOPHON SG2042 (has not Zbb) with patch: Benchmark Mode Cnt Score Error Units SecondarySupersLookup.testNegative00 avgt 15 20.649 ? 0.147 ns/op SecondarySupersLookup.testNegative01 avgt 15 20.649 ? 0.117 ns/op SecondarySupersLookup.testNegative02 avgt 15 20.637 ? 0.116 ns/op SecondarySupersLookup.testNegative03 avgt 15 20.638 ? 0.113 ns/op SecondarySupersLookup.testNegative04 avgt 15 20.638 ? 0.127 ns/op SecondarySupersLookup.testNegative05 avgt 15 20.639 ? 0.115 ns/op SecondarySupersLookup.testNegative06 avgt 15 20.638 ? 0.119 ns/op SecondarySupersLookup.testNegative07 avgt 15 20.850 ? 0.457 ns/op SecondarySupersLookup.testNegative08 avgt 15 20.842 ? 0.459 ns/op SecondarySupersLookup.testNegative09 avgt 15 20.650 ? 0.124 ns/op SecondarySupersLookup.testNegative10 avgt 15 20.642 ? 0.127 ns/op SecondarySupersLookup.testNegative16 avgt 15 20.657 ? 0.157 ns/op SecondarySupersLookup.testNegative20 avgt 15 20.669 ? 0.152 ns/op SecondarySupersLookup.testNegative30 avgt 15 20.668 ? 0.166 ns/op SecondarySupersLookup.testNegative32 avgt 15 20.669 ? 0.168 ns/op SecondarySupersLookup.testNegative40 avgt 15 20.668 ? 0.174 ns/op SecondarySupersLookup.testNegative50 avgt 15 20.682 ? 0.194 ns/op SecondarySupersLookup.testNegative55 avgt 15 113.369 ? 3.792 ns/op SecondarySupersLookup.testNegative56 avgt 15 113.888 ? 3.769 ns/op SecondarySupersLookup.testNegative57 avgt 15 115.320 ? 4.271 ns/op SecondarySupersLookup.testNegative58 avgt 15 115.648 ? 2.985 ns/op SecondarySupersLookup.testNegative59 avgt 15 117.730 ? 3.370 ns/op SecondarySupersLookup.testNegative60 avgt 15 142.533 ? 3.636 ns/op SecondarySupersLookup.testNegative61 avgt 15 144.901 ? 5.267 ns/op SecondarySupersLookup.testNegative62 avgt 15 145.926 ? 3.799 ns/op SecondarySupersLookup.testNegative63 avgt 15 207.704 ? 5.370 ns/op SecondarySupersLookup.testNegative64 avgt 15 210.631 ? 3.832 ns/op SecondarySupersLookup.testPositive01 avgt 15 20.334 ? 0.455 ns/op SecondarySupersLookup.testPositive02 avgt 15 20.126 ? 0.101 ns/op SecondarySupersLookup.testPositive03 avgt 15 20.126 ? 0.097 ns/op SecondarySupersLookup.testPositive04 avgt 15 20.124 ? 0.102 ns/op SecondarySupersLookup.testPositive05 avgt 15 20.119 ? 0.100 ns/op SecondarySupersLookup.testPositive06 avgt 15 20.126 ? 0.098 ns/op SecondarySupersLookup.testPositive07 avgt 15 20.321 ? 0.462 ns/op SecondarySupersLookup.testPositive08 avgt 15 20.117 ? 0.098 ns/op SecondarySupersLookup.testPositive09 avgt 15 20.534 ? 0.555 ns/op SecondarySupersLookup.testPositive10 avgt 15 20.120 ? 0.100 ns/op SecondarySupersLookup.testPositive16 avgt 15 20.125 ? 0.104 ns/op SecondarySupersLookup.testPositive20 avgt 15 20.125 ? 0.116 ns/op SecondarySupersLookup.testPositive30 avgt 15 20.132 ? 0.110 ns/op SecondarySupersLookup.testPositive32 avgt 15 20.328 ? 0.449 ns/op SecondarySupersLookup.testPositive40 avgt 15 20.132 ? 0.096 ns/op SecondarySupersLookup.testPositive50 avgt 15 20.331 ? 0.460 ns/op SecondarySupersLookup.testPositive60 avgt 15 20.134 ? 0.104 ns/op SecondarySupersLookup.testPositive63 avgt 15 20.128 ? 0.104 ns/op SecondarySupersLookup.testPositive64 avgt 15 20.334 ? 0.456 ns/op Finished running test 'micro:vm.lang.SecondarySupersLookup' The JMH test data is a little better than before, Thanks a lot. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19320#discussion_r1646875539 From fyang at openjdk.org Thu Jun 20 04:12:16 2024 From: fyang at openjdk.org (Fei Yang) Date: Thu, 20 Jun 2024 04:12:16 GMT Subject: RFR: 8332689: RISC-V: Use load instead of trampolines [v12] In-Reply-To: References: Message-ID: On Fri, 14 Jun 2024 13:25:32 GMT, Robbin Ehn wrote: >> Hi all, please consider! >> >> Today we do JAL to **dest** if **dest** is in reach (+/- 1 MB). >> Using a very small application or running very short time we have fast patchable calls. >> But any normal application running longer will increase the code size and code chrun/fragmentation. >> So whatever or not you get hot fast calls rely on luck. >> >> To be patchable and get code cache reach we also emit a stub trampoline which we can point the JAL to. >> This would be the common case for a patchable call. >> >> Code stream: >> JAL >> Stubs: >> AUIPC >> LD >> JALR >> >> >> >> On some CPUs L1D and L1I can't contain the same cache line, which means the tramopline stub can bounce from L1I->L1D->L1I, which is expensive. >> Even if you don't have that problem having a call to a jump is not the fastest way. >> Loading the address avoids the pitsfalls of cmodx. >> >> This patch suggest to solve the problems with trampolines, we take small penalty in the naive case of JAL to **dest**, >> and instead do by default: >> >> Code stream: >> AUIPC >> LD >> JALR >> Stubs: >> >> >> An experimental option for turning trampolines back on exists. >> >> It should be possible to enhanced this with the WIP [Zjid](https://github.com/riscv/riscv-j-extension) by changing the JALR to JAL and nop out the auipc+ld (as the current proposal of Zjid forces the I-fetcher to fetch instruction in order (meaning we will avoid a lot issues which arm has)) when in reach and vice-versa. >> >> Numbers from VF2 (I have done them a few times, they are always overall in favor of this patch): >> >> fop (msec) 2239 | 2128 = 0.950424 >> h2 (msec) 18660 | 16594 = 0.889282 >> jython (msec) 22022 | 21925 = 0.995595 >> luindex (msec) 2866 | 2842 = 0.991626 >> lusearch (msec) 4108 | 4311 = 1.04942 >> lusearch-fix (msec) 4406 | 4116 = 0.934181 >> pmd (msec) 5976 | 5897 = 0.98678 >> jython (msec) 22022 | 21925 = 0.995595 >> Avg: 0.974112 >> fop(xcomp) (msec) 2721 | 2714 = 0.997427 >> h2(xcomp) ... > > Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 18 commits: > > - Review comments, removed dead code. > - Merge branch 'master' into 8332689 > - Merge branch 'master' into 8332689 > - Merge branch 'master' into 8332689 > - Merge branch 'master' into 8332689 > - Merge branch 'master' into 8332689 > - Remove tmp file > - Prepare for dynamic NativeCall size > - Only allow one calling convetion, i.e. fixed sized > - Merge branch 'master' into 8332689 > - ... and 8 more: https://git.openjdk.org/jdk/compare/cc64aeac...f1dd3e16 src/hotspot/cpu/riscv/codeBuffer_riscv.cpp line 74: > 72: > 73: assert(requests->number_of_entries() >= 1, "at least one"); > 74: const int total_requested_size = MacroAssembler::max_patchable_far_call_stub_size() * requests->number_of_entries(); I see mixed uses of bot `MacroAssembler::max_patchable_far_call_stub_size()` and `MacroAssembler::NativeShortCall::trampoline_size` in this function. As this is only used under `UseTrampolines`, seems more reasonable to use `MacroAssembler::NativeShortCall::trampoline_size`. src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 3932: > 3930: return instruction_size + MacroAssembler::NativeShortCall::trampoline_size; > 3931: } > 3932: return 2 * wordSize; Seems to me that this should be `instruction_size + wordSize`? That is one possible nop for alignment and an address of size `wordSize`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19453#discussion_r1642373830 PR Review Comment: https://git.openjdk.org/jdk/pull/19453#discussion_r1642365486 From rcastanedalo at openjdk.org Thu Jun 20 04:17:30 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 20 Jun 2024 04:17:30 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v2] In-Reply-To: References: Message-ID: > This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail. > > We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold: > > - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and > - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work. > > ## Summary of the Changes > > ### Platform-Independent Changes (`src/hotspot/share`) > > These consist mainly of: > > - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early; > - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and > - temporary support for porting the JEP to the remaining platforms. > > The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed. > > ### Platform-Dependent Changes (`src/hotspot/cpu`) > > These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms. > > #### ADL Changes > > The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy. > > #### `G1BarrierSetAssembler` Changes > > Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live registers, provided by the `SaveLiveRegisters` class. This c... Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: Build barrier data in G1BarrierSetC2::get_store_barrier() by adding, rather than removing, barrier tags ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19746/files - new: https://git.openjdk.org/jdk/pull/19746/files/937019ad..d722d4c7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19746&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19746&range=00-01 Stats: 17 lines in 1 file changed: 9 ins; 3 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/19746.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19746/head:pull/19746 PR: https://git.openjdk.org/jdk/pull/19746 From rcastanedalo at openjdk.org Thu Jun 20 04:17:30 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 20 Jun 2024 04:17:30 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v2] In-Reply-To: References: Message-ID: On Wed, 19 Jun 2024 08:56:27 GMT, Roberto Casta?eda Lozano wrote: >> Yes. >> >> Two nits: add `can_` to those two bools and unpack the final return expr, sth like: >> >> >> int barriers = 0; >> >> if (!can_remove_pre...) { >> barriers |= pre; >> } >> if (!can_remove_post...) { >> barriers |= post; >> } >> >> return barriers; > > Thanks, I will do some testing before merging. Done (commit d722d4c7c1534794aaa38d54ecf7a4c12b158e84). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1646903037 From rcastanedalo at openjdk.org Thu Jun 20 04:39:10 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 20 Jun 2024 04:39:10 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v2] In-Reply-To: References: Message-ID: On Wed, 19 Jun 2024 08:45:45 GMT, Albert Mingkun Yang wrote: >> Note that if we want to optimize the barrier code layout (see the [JEP description](https://openjdk.org/jeps/475), *Candidate optimizations* sub-section), splitting the assembly of each barrier in at least two blocks is necessary, since we need to separate the inline from the out-of-line (barrier stub) code. And since the assembly code has to be split into multiple functions anyway, I think it makes sense to group the code by logical blocks (different barrier tests, queue insertion, etc.), as proposed in this changeset. This also improves code reuse, e.g. the same `generate_queue_insertion` implementation is used for the pre- and post-barriers. >> If you still think there is value in grouping together the blocks that can be grouped together (e.g. `generate_single_region_test` + `generate_new_val_null_test` + `generate_card_young_test`), I can prototype the refactoring and let the G1 maintainers decide which alternative is more readable/maintainable. > >> This also improves code reuse > > In this area, I think code duplication is less of an issue -- it's more crucial that one can follow the asm flow as if reading real asm. (Ofc, this is subjective; feel free to keep as is.) I personally find the current style more maintainable in that it 1) makes the high-level structure of the barriers more explicit and 2) reuses code across the pre- and post-barrier (the fact that queue insertion is an identical operation for both barriers is obscured in mainline by naming and instruction choice differences). Having said that, if there is consensus among G1 maintainers that generating the fast (inline) and slow (out of line) paths of each barrier in single functions (sacrificing reuse of the queue insertion logic) is preferable, I'm happy to change that. @tschatzl @kimbarrett what do you think? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1646925045 From stuefe at openjdk.org Thu Jun 20 05:56:17 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 20 Jun 2024 05:56:17 GMT Subject: RFR: 8333658: NMT: Use an allocator with 4-byte pointers to save memory in NativeCallStackStorage [v21] In-Reply-To: References: Message-ID: On Mon, 17 Jun 2024 09:15:46 GMT, Johan Sj?len wrote: >> Hi, >> >> This PR introduces a new allocator, `IndexedFreelistAllocator`. It uses a `GrowableArray` in order to have 4-byte "pointers" to its elements and also works as a freelist as unused slots form a linked list. This allocator cannot shrink its used memory. I'm always open for better names. >> >> We then use this allocator in order to store the `NativeCallStackStorage` hash table. This saves `4 * 4099 = ~16KiB` of memory as each table element is now only 4 bytes. For each stored stack it also saves 4 bytes. The fact that the allocator cannot shrink is fine, we do not ever remove stacks from the storage. >> >> The main point of this is to introduce a pointer-generic experimential API, so I also implemented CHeap and Arena allocators. It's currently placed in NMT, but we might want to move it into utilities. It uses a bit of template-magic, but my IDE (Emacs+clangd) was actually able to figure things out when the types didn't line up correctly etc, so it's not an enemy to IDE help. >> >> It sounded expensive to have the GrowableArray continously realloc its underlying data array, so I did a basic test where we allocate 1 000 000 stacks and push them into NativeCallStackStorage backed by different allocators. This is available in the PR. >> >> The results are as follows on linux-x64-slowdebug: >> >> >> Generate stacks... Done >> Time taken with GrowableArray: 8341.240945 >> Time taken with CHeap: 12189.031318 >> Time taken with Arena: 8800.703092 >> Time taken with GrowableArray again: 8295.508829 >> >> >> And on linux-x64: >> >> >> Time taken with GrowableArray: 8378.018135 >> Time taken with CHeap: 12437.347868 >> Time taken with Arena: 8758.064717 >> Time taken with GrowableArray again: 8391.076291 >> >> >> Obviously, this is a very basic benchmark, but it seems like we're faster than CHeap and Arena for this case. > > Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: > > Fix tests @jdksjolen sorry, had this review sitting in pending state for days appearantly and forgot to send it off. One thing I don't understand. I thought the point of this new class is to move the **stacks** over to the new allocator-with-freelist, and make StackIndex a simple typedef for the allocator index? Then, get rid of the GrowableArray? You put the hashtable entries in there, which is okay, but not using the full potential of this new class. src/hotspot/share/nmt/indexedFreeListAllocator.hpp line 34: > 32: // memory savings if a pointer-heavy self-referential structure is used. > 33: // It is "indexed" as a reference is base + index * sizeof(E). > 34: // It never returns any memory to the system. Proposal: A flat array of elements E, backed by C-heap, growing on-demand. It allows for returning arbitrary elements and keeps them in a freelist. Elements can be uniquely identified via array index. src/hotspot/share/nmt/indexedFreeListAllocator.hpp line 36: > 34: // It never returns any memory to the system. > 35: template > 36: class IndexedFreeListAllocator { This is a mouth-full and does not describe well what it does. How about "HomogenousObjectHeap" or "HomogenousObjectArray" or "HomogenousObjectSpace" src/hotspot/share/nmt/indexedFreeListAllocator.hpp line 38: > 36: class IndexedFreeListAllocator { > 37: public: > 38: using I = int32_t; Proposal, possibly for a follow-up RFE: - its a pity to bury this in nmt, we may have uses for it elsewhere - If we only have a small number of elements, I don't want to burn 4 bytes on an index. - I dislike the fact that we burn half the value range of index to "invalid" - AFAICS we don't have a index overflow check in allocate (what happens with >2 billion elements?) So: - make this a general-purpose container in utilities - I would make the index type a template parameter, (X) with the rule that it should be an unsigned integral (and STATIC_ASSERT that) - I would then: constexpr X nil = std::numeric_limits::max(); constexpr X max = nil - 1; Then, down in allocate, make sure we don't go over `max`. Now, If I know that I only need e.g. <255 elements, I can use a single byte as index, that could give me good packing depending on where the index is stored. And we get index overflow checking inbuilt, too. src/hotspot/share/nmt/indexedFreeListAllocator.hpp line 82: > 80: } > 81: > 82: void free(I i) { `free_element` ? Or `return_element`? I dislike `free` as a name because of the conflict with free. For symmetry, then we need `allocate_element`. src/hotspot/share/nmt/indexedFreeListAllocator.hpp line 97: > 95: > 96: const E& at(I i) const { > 97: assert(i != nil, "null pointer dereference"); Not needed? src/hotspot/share/nmt/nmtNativeCallStackStorage.hpp line 44: > 42: // We achieve this by using a closed hashtable for finding previously existing NCS:s and referring to them by an index that's smaller than a pointer. > 43: template class ALLOCATOR> > 44: class NativeCallStackStorageWithAllocator : public CHeapObj { I don't think we need a templatized version of NativeCallStackStorage. Please let's keep it simple and stupid. We will only ever have one form of NativeCallStackStorage. Should we ever need multiple forms of NativeCallStackStorage simultaneously, we can still templatize. src/hotspot/share/nmt/nmtNativeCallStackStorage.hpp line 82: > 80: LinkPtr next; > 81: StackIndex stack; > 82: Link(LinkPtr next, StackIndex v) Pre-existing. We should never modify the hash table, so the stackindex can be const. ------------- PR Review: https://git.openjdk.org/jdk/pull/18979#pullrequestreview-2124550622 PR Review Comment: https://git.openjdk.org/jdk/pull/18979#discussion_r1643868189 PR Review Comment: https://git.openjdk.org/jdk/pull/18979#discussion_r1643866034 PR Review Comment: https://git.openjdk.org/jdk/pull/18979#discussion_r1643872807 PR Review Comment: https://git.openjdk.org/jdk/pull/18979#discussion_r1643890313 PR Review Comment: https://git.openjdk.org/jdk/pull/18979#discussion_r1643900631 PR Review Comment: https://git.openjdk.org/jdk/pull/18979#discussion_r1646970936 PR Review Comment: https://git.openjdk.org/jdk/pull/18979#discussion_r1646977747 From stuefe at openjdk.org Thu Jun 20 05:56:17 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 20 Jun 2024 05:56:17 GMT Subject: RFR: 8333658: NMT: Use an allocator with 4-byte pointers to save memory in NativeCallStackStorage [v21] In-Reply-To: References: Message-ID: <6iyt0NEYAh4V4M8AlsSL0PtPP8E3KzOn8xHZKVDBAkk=.dfb3d6ec-b974-48f8-854e-2429c0e6e64c@github.com> On Thu, 20 Jun 2024 05:25:59 GMT, Thomas Stuefe wrote: >> Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix tests > > src/hotspot/share/nmt/nmtNativeCallStackStorage.hpp line 44: > >> 42: // We achieve this by using a closed hashtable for finding previously existing NCS:s and referring to them by an index that's smaller than a pointer. >> 43: template class ALLOCATOR> >> 44: class NativeCallStackStorageWithAllocator : public CHeapObj { > > I don't think we need a templatized version of NativeCallStackStorage. Please let's keep it simple and stupid. We will only ever have one form of NativeCallStackStorage. > > Should we ever need multiple forms of NativeCallStackStorage simultaneously, we can still templatize. Making this a template also makes it difficult to move implementation over to a cpp file. Something which you really should do, since the implementation here is non-trivial. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18979#discussion_r1646993299 From jbhateja at openjdk.org Thu Jun 20 06:01:39 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 20 Jun 2024 06:01:39 GMT Subject: RFR: 8329032: C2 compiler register allocation support for APX EGPRs [v8] In-Reply-To: <835IiFAjWPI9AoU9j1WSAhVY2EEHzjpvWucUXTGxgUw=.cdb547b2-e1e7-4c5b-915f-7fde46e26c81@github.com> References: <835IiFAjWPI9AoU9j1WSAhVY2EEHzjpvWucUXTGxgUw=.cdb547b2-e1e7-4c5b-915f-7fde46e26c81@github.com> Message-ID: > Intel? Advanced Performance Extensions (Intel? APX) adds 16 new 64 bit general purpose register also known as Extended General Purpose Registers in IA-32e 64 bit mode. > > Summary of changes introduced along with this patch:- > > 1. C2 compiler register allocation support. > 2. Architecture state save restoration while transitioning from C1/C2 JIT compiled code to runtime services. > 3. Support new PUSH2/POP2 instructions along with push-pop acceleration hints (PPX) to optimize register save/restore operation. > 4. Applicable extensions to native interface used by runtime for patching instruction. > > We plan to address C1 register support in subsequent patch as there are hard upper bound allocation limits > (currently set to r11) imposed by existing implementation of linear scan algorithm after which it reserves > remaining register for special purpose. > > Patch has been regressed over stand alone test points after prioritizing EGPR allocations over existing GPR register by manually modifying the register sequences in relevant allocation class. > > We plan to do thorough validation using [Intel's SDE](https://www.intel.com/content/www/us/en/download/684897/intel-software-development-emulator.html) during course of time and release incremental patches for bug fixes > found during testing. > > Kindly review and share your feedback. > > Best Regards, > Jatin Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Review comments resolution. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19042/files - new: https://git.openjdk.org/jdk/pull/19042/files/4ecca0f2..e2e2bc59 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19042&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19042&range=06-07 Stats: 4 lines in 1 file changed: 3 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/19042.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19042/head:pull/19042 PR: https://git.openjdk.org/jdk/pull/19042 From jbhateja at openjdk.org Thu Jun 20 06:01:40 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 20 Jun 2024 06:01:40 GMT Subject: RFR: 8329032: C2 compiler register allocation support for APX EGPRs [v7] In-Reply-To: References: <835IiFAjWPI9AoU9j1WSAhVY2EEHzjpvWucUXTGxgUw=.cdb547b2-e1e7-4c5b-915f-7fde46e26c81@github.com> <1gSd6mI4H5dykQJLrg9GSyfaW71GjWKaaO86Bwl7Maw=.99e79c49-6829-4072-a8c1-f0674b8295b6@github.com> Message-ID: On Wed, 19 Jun 2024 20:59:37 GMT, Sandhya Viswanathan wrote: >> src/hotspot/cpu/x86/assembler_x86.cpp line 11080: >> >>> 11078: void Assembler::vpgatherdd(XMMRegister dst, Address src, XMMRegister mask, int vector_len) { >>> 11079: assert(VM_Version::supports_avx2(), ""); >>> 11080: assert(!needs_eevex(src.base(), src.index()), "does not support extended gprs as BASE or INDEX of address operand"); >> >> Why this new assert in vpgatherdd? > > But the index here is xmm register which could be xmm16 to xmm31 so the assert needs to be corrected. Also good to have the similar assert in vpgatherdq. That's correct, GATHER uses VSIB encoding where only base address is passed though a GPR while gather indexes are passed using vector register. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19042#discussion_r1646999007 From mbaesken at openjdk.org Thu Jun 20 06:18:16 2024 From: mbaesken at openjdk.org (Matthias Baesken) Date: Thu, 20 Jun 2024 06:18:16 GMT Subject: RFR: 8334239: Introduce macro for ubsan method/function exclusions [v6] In-Reply-To: References: Message-ID: On Wed, 19 Jun 2024 13:50:40 GMT, Matthias Baesken wrote: >> A number of functions/methods have to be excluded from ubsan detection (e.g. because they do things that ubsan warns about, however it is still valid what is done there). >> We can simplify this by introducing a macro (similar to asan-related ATTRIBUTE_NO_ASAN, see sanitizers/address.hpp). >> Currently something like this is used : >> >> #if defined(__clang__) || defined(__GNUC__) >> __attribute__((no_sanitize("undefined"))) >> #endif > > Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: > > add blank line Thanks for the reviews ! ------------- PR Comment: https://git.openjdk.org/jdk/pull/19722#issuecomment-2179899486 From mbaesken at openjdk.org Thu Jun 20 06:18:17 2024 From: mbaesken at openjdk.org (Matthias Baesken) Date: Thu, 20 Jun 2024 06:18:17 GMT Subject: Integrated: 8334239: Introduce macro for ubsan method/function exclusions In-Reply-To: References: Message-ID: On Fri, 14 Jun 2024 13:39:14 GMT, Matthias Baesken wrote: > A number of functions/methods have to be excluded from ubsan detection (e.g. because they do things that ubsan warns about, however it is still valid what is done there). > We can simplify this by introducing a macro (similar to asan-related ATTRIBUTE_NO_ASAN, see sanitizers/address.hpp). > Currently something like this is used : > > #if defined(__clang__) || defined(__GNUC__) > __attribute__((no_sanitize("undefined"))) > #endif This pull request has now been integrated. Changeset: ff302409 Author: Matthias Baesken URL: https://git.openjdk.org/jdk/commit/ff30240926224b2f98e173bcd606c157af788919 Stats: 55 lines in 4 files changed: 46 ins; 6 del; 3 mod 8334239: Introduce macro for ubsan method/function exclusions Reviewed-by: stefank, stuefe, kbarrett ------------- PR: https://git.openjdk.org/jdk/pull/19722 From aph at openjdk.org Thu Jun 20 07:49:12 2024 From: aph at openjdk.org (Andrew Haley) Date: Thu, 20 Jun 2024 07:49:12 GMT Subject: RFR: 8332587: RISC-V: secondary_super_cache does not scale well [v8] In-Reply-To: References: <_tVvVlAaTUHImBFbxSp67liIoh0toRLkmI_FPwN0vy0=.85da9026-0218-48da-b400-d0e97365f997@github.com> <9ZIxjqK5lFNFQh6S5IIO_TM-olZpMa5rLiKWpHEXXEw=.7987809a-be5a-451e-a03e-7bc41073bc56@github.com> Message-ID: On Wed, 19 Jun 2024 17:12:15 GMT, Gui Cao wrote: > With the jmh test data, we see that there is a performance decrease from testNegative55 to testNegative64 when zbb is not available. This is because a loop is needed to count the number of 1 when Zbb is not available. I've tested this before on Banana Pi BPI-F3 board (has Zbb) and Disable UseZbb to using scalar registers as well, with similar data. When Zbb is available, the test data is better in all test cases except testNegative63, testNegative64. So the performance decrease here compared to when using Zbb is caused by the number of 1's counted using the scalar register loop. Yes, exactly. There are faster general-purpose ways to calculate popcount (there's one in share/utilities/population_count.hpp) but the simple loop is better for small sets, which they almost always are in this case. And small code matters, because this is expanded a lot. We could have an early exit for `SECONDARY_SUPERS_BITMAP_FULL` but because that doesn't occur in real code it'd only slow things down. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19320#discussion_r1647118639 From sgehwolf at openjdk.org Thu Jun 20 08:34:45 2024 From: sgehwolf at openjdk.org (Severin Gehwolf) Date: Thu, 20 Jun 2024 08:34:45 GMT Subject: RFR: 8333446: Add tests for hierarchical container support [v2] In-Reply-To: References: Message-ID: <7NnRX5w4y2cPb9q5BDEH8UgwWnMS5u1nUYYOrS0YIaI=.2f8ee7b9-51d9-4493-89b6-b733026a6614@github.com> > Please review this PR which adds test support for systemd slices so that bugs like [JDK-8217338](https://bugs.openjdk.org/browse/JDK-8217338) can be verified. The added test, `SystemdMemoryAwarenessTest` currently passes on cgroups v1 and fails on cgroups v2 due to the way how [JDK-8217338](https://bugs.openjdk.org/browse/JDK-8217338) was implemented when JDK 13 was a thing. Therefore immediately problem-listed. It should get unlisted once [JDK-8322420](https://bugs.openjdk.org/browse/JDK-8322420) merges. > > I'm adding those tests in order to not regress another time. > > Testing: > - [x] Container tests on Linux x86_64 cgroups v2 and Linux x86_64 cgroups v1. > - [x] New systemd test on cg v1 (passes). Fails on cg v2 (due to JDK-8322420) > - [x] GHA Severin Gehwolf has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: - Merge branch 'master' into jdk-8333446-systemd-slice-tests - Fix comments - 8333446: Add tests for hierarchical container support ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19530/files - new: https://git.openjdk.org/jdk/pull/19530/files/98d780ac..00b528ae Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19530&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19530&range=00-01 Stats: 45352 lines in 1129 files changed: 26950 ins; 13694 del; 4708 mod Patch: https://git.openjdk.org/jdk/pull/19530.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19530/head:pull/19530 PR: https://git.openjdk.org/jdk/pull/19530 From jsjolen at openjdk.org Thu Jun 20 08:37:20 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Thu, 20 Jun 2024 08:37:20 GMT Subject: RFR: 8333658: NMT: Use an allocator with 4-byte pointers to save memory in NativeCallStackStorage [v21] In-Reply-To: References: Message-ID: On Tue, 18 Jun 2024 06:16:30 GMT, Thomas Stuefe wrote: >> Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix tests > > src/hotspot/share/nmt/indexedFreeListAllocator.hpp line 38: > >> 36: class IndexedFreeListAllocator { >> 37: public: >> 38: using I = int32_t; > > Proposal, possibly for a follow-up RFE: > > - its a pity to bury this in nmt, we may have uses for it elsewhere > - If we only have a small number of elements, I don't want to burn 4 bytes on an index. > - I dislike the fact that we burn half the value range of index to "invalid" > - AFAICS we don't have a index overflow check in allocate (what happens with >2 billion elements?) > > So: > > > - make this a general-purpose container in utilities > - I would make the index type a template parameter, (X) with the rule that it should be an unsigned integral (and STATIC_ASSERT that) > - I would then: > > > constexpr X nil = std::numeric_limits::max(); > constexpr X max = nil - 1; > > > Then, down in allocate, make sure we don't go over `max`. > > Now, If I know that I only need e.g. <255 elements, I can use a single byte as index, that could give me good packing depending on where the index is stored. And we get index overflow checking inbuilt, too. This is what I'd like to have in a follow-up RFE. Your last two points are because of `GrowableArray`: GA uses int32, so we do, and GA checks overflow when expanding, so we're OK. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18979#discussion_r1647196145 From erikj at openjdk.org Thu Jun 20 08:39:13 2024 From: erikj at openjdk.org (Erik Joelsson) Date: Thu, 20 Jun 2024 08:39:13 GMT Subject: RFR: 8333268: Fixes for static build [v4] In-Reply-To: References: Message-ID: On Wed, 19 Jun 2024 15:15:43 GMT, Magnus Ihse Bursie wrote: >> This patch contains a set of changes to improve static builds. They will pave the way for implementing a full static-only java launcher. The changes here will: >> >> 1) Make sure non-exported symbols are made local in the static libraries. This means that the risk of symbol conflict is the same for static libraries as for dynamic libraries (i.e. in practice zero, as long as a consistent naming scheme is used for exported functions). >> >> 2) Remove the work-arounds to exclude duplicated symbols. >> >> 3) Fix some code in hotspot and the JDK libraries that did not work properly with a static java launcher. >> >> The latter fixes are copied from or inspired by the work done by @jianglizhou and her team as part of the Project Leyden [Hermetic Java](https://github.com/openjdk/leyden/tree/hermetic-java-runtime). > > Magnus Ihse Bursie has updated the pull request incrementally with one additional commit since the last revision: > > Add dummy implementation of os::lookup_function for Windows Build changes look ok. ------------- Marked as reviewed by erikj (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19478#pullrequestreview-2129807207 From jsjolen at openjdk.org Thu Jun 20 08:41:12 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Thu, 20 Jun 2024 08:41:12 GMT Subject: RFR: 8333658: NMT: Use an allocator with 4-byte pointers to save memory in NativeCallStackStorage [v21] In-Reply-To: References: Message-ID: <05mACj5OQhpSekocsvQCSZMgWF2iDk8g9nnuKFxzsjY=.0f0751c1-92bc-40f5-8534-c34fe45ce54e@github.com> On Tue, 18 Jun 2024 06:26:25 GMT, Thomas Stuefe wrote: >> Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix tests > > src/hotspot/share/nmt/indexedFreeListAllocator.hpp line 82: > >> 80: } >> 81: >> 82: void free(I i) { > > `free_element` ? Or `return_element`? I dislike `free` as a name because of the conflict with free. For symmetry, then we need `allocate_element`. Just `deallocate`? > src/hotspot/share/nmt/indexedFreeListAllocator.hpp line 97: > >> 95: >> 96: const E& at(I i) const { >> 97: assert(i != nil, "null pointer dereference"); > > Not needed? You're probably right, I'm not sure what a `const IFLA` would be. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18979#discussion_r1647202494 PR Review Comment: https://git.openjdk.org/jdk/pull/18979#discussion_r1647201951 From stuefe at openjdk.org Thu Jun 20 08:42:15 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 20 Jun 2024 08:42:15 GMT Subject: RFR: 8322475: Extend printing for System.map [v5] In-Reply-To: References: Message-ID: On Wed, 19 Jun 2024 17:06:59 GMT, Johan Sj?len wrote: >> Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 20 commits: >> >> - copyrights >> - Merge branch 'master' into System.maps-more-info >> - fix merge issue >> - Merge branch 'master' into System.maps-more-info >> - fix whitespace issue >> - wip >> - exhuming >> - Merge branch 'master' into System.maps-more-info >> - Merge >> - remove codecache name printing >> - ... and 10 more: https://git.openjdk.org/jdk/compare/91bd85d6...231a8a91 > > src/hotspot/os/linux/memMapPrinter_linux.cpp line 161: > >> 159: >> 160: void MemMapPrinter::pd_print_all_mappings(const MappingPrintSession& session) { >> 161: constexpr char filename[] = "/proc/self/smaps"; > > Is this non-constexpr if it's a `const char *` instead of `char[]`? not sure, tbh. sizeof is different: sizeof (constexpr char[]) will give you the string size incl 0, sizeof(constexpr char*) gives you pointer size. But the pointer points to runtime addresses, even if those are in the constant segment, so how could it be constexpr? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17158#discussion_r1647203127 From jsjolen at openjdk.org Thu Jun 20 08:44:12 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Thu, 20 Jun 2024 08:44:12 GMT Subject: RFR: 8333658: NMT: Use an allocator with 4-byte pointers to save memory in NativeCallStackStorage [v21] In-Reply-To: References: Message-ID: On Tue, 18 Jun 2024 06:08:57 GMT, Thomas Stuefe wrote: >> Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix tests > > src/hotspot/share/nmt/indexedFreeListAllocator.hpp line 36: > >> 34: // It never returns any memory to the system. >> 35: template >> 36: class IndexedFreeListAllocator { > > This is a mouth-full and does not describe well what it does. How about "HomogenousObjectHeap" or "HomogenousObjectArray" or "HomogenousObjectSpace" "HomogenousObjectArray" ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18979#discussion_r1647206953 From jsjolen at openjdk.org Thu Jun 20 08:49:12 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Thu, 20 Jun 2024 08:49:12 GMT Subject: RFR: 8333658: NMT: Use an allocator with 4-byte pointers to save memory in NativeCallStackStorage [v21] In-Reply-To: References: Message-ID: On Thu, 20 Jun 2024 05:53:08 GMT, Thomas Stuefe wrote: >One thing I don't understand. I thought the point of this new class is to move the stacks over to the new allocator-with-freelist, and make StackIndex a simple typedef for the allocator index? Then, get rid of the GrowableArray? >You put the hashtable entries in there, which is okay, but not using the full potential of this new class. No, the initial point was to get rid of pointer-heavy structure's unnecessarily large pointers. The discussion of StackIndex and its wrapping came later in the review process. Moving the `StackIndex` to a simple typedef is its own RFE. If I were to use `HOA` for the stacks in this PR, then I wouldn't do the refactoring of `StackIndex` into a typedef :-). ------------- PR Comment: https://git.openjdk.org/jdk/pull/18979#issuecomment-2180149420 From jsjolen at openjdk.org Thu Jun 20 08:53:10 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Thu, 20 Jun 2024 08:53:10 GMT Subject: RFR: 8322475: Extend printing for System.map [v5] In-Reply-To: References: Message-ID: On Thu, 20 Jun 2024 08:39:11 GMT, Thomas Stuefe wrote: >so how could it be constexpr? Is this a nerd snipe :)? I'm fine with this being kept as is. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17158#discussion_r1647219418 From stuefe at openjdk.org Thu Jun 20 09:11:12 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 20 Jun 2024 09:11:12 GMT Subject: RFR: 8322475: Extend printing for System.map [v5] In-Reply-To: References: Message-ID: On Thu, 20 Jun 2024 08:50:31 GMT, Johan Sj?len wrote: > > so how could it be constexpr? > > Is this a nerd snipe :)? I'm fine with this being kept as is. ? No :) What I meant is: If I have `constexpr const char* f = "ccc"` and sizeof(f) is pointer size, I imagine there to be a pointer somewhere that points to a constant string literal in the constant segment. But if it is a real pointer, it would have to be set after program load, since only then you know the address of the string literal. Idk. Did not investigate deeply. I like the ability of build time stringlen via sizeof for constexpr char[], that's why I prefer it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17158#discussion_r1647245611 From shade at openjdk.org Thu Jun 20 09:16:22 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 20 Jun 2024 09:16:22 GMT Subject: RFR: 8334594: Generational ZGC: Deadlock after OopMap rewrites in 8331572 Message-ID: As shown in the bug, there are cases when acquiring the `ServiceLock` for opportunistic notification leads to deadlock. We can untie the deadlock by checking if `ServiceLock` can be acquired on triggering path, and never blocking otherwise. Additional testing: - [ ] Linux x86_64 service fastdebug, `all` - [ ] Linux AArch64 service fastdebug, `all` ------------- Commit messages: - Fix Changes: https://git.openjdk.org/jdk/pull/19800/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19800&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8334594 Stats: 12 lines in 6 files changed: 3 ins; 0 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/19800.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19800/head:pull/19800 PR: https://git.openjdk.org/jdk/pull/19800 From stuefe at openjdk.org Thu Jun 20 09:16:19 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 20 Jun 2024 09:16:19 GMT Subject: RFR: 8322475: Extend printing for System.map [v5] In-Reply-To: References: Message-ID: On Wed, 19 Jun 2024 17:02:53 GMT, Johan Sj?len wrote: >> Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 20 commits: >> >> - copyrights >> - Merge branch 'master' into System.maps-more-info >> - fix merge issue >> - Merge branch 'master' into System.maps-more-info >> - fix whitespace issue >> - wip >> - exhuming >> - Merge branch 'master' into System.maps-more-info >> - Merge >> - remove codecache name printing >> - ... and 10 more: https://git.openjdk.org/jdk/compare/91bd85d6...231a8a91 > > src/hotspot/os/linux/memMapPrinter_linux.cpp line 41: > >> 39: size_t _vsize; // combined virtual size >> 40: size_t _rss; // combined resident set size >> 41: size_t _committed; // combined committed space > > What's the difference between a "space" and a "size"? None, will fix. > src/hotspot/os/linux/memMapPrinter_linux.cpp line 83: > >> 81: outputStream* st = _session.out(); >> 82: #define INDENT_BY(n) \ >> 83: if (st->fill_to(n) == 0) { \ > > `fill_to` returns `void`, am I missing something? It was changed in this PR to return the number of spaces actually printed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17158#discussion_r1647249197 PR Review Comment: https://git.openjdk.org/jdk/pull/17158#discussion_r1647251870 From stuefe at openjdk.org Thu Jun 20 09:31:48 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 20 Jun 2024 09:31:48 GMT Subject: RFR: 8322475: Extend printing for System.map [v6] In-Reply-To: References: Message-ID: <-Qkoj2CJIqS0pNR-3JxXULeaty66oPIAJZgFx7IskTA=.9e679c42-24e4-4fb2-a3fd-d27be65aeac0@github.com> > This is an expansion on the new `System.map` command introduced with JDK-8318636. > > We now print valuable information per memory region, such as: > > - the actual resident set size > - the actual number of huge pages > - the actual used page size > - the THP state of the region (was advised, is eligible, uses THP, ...) > - whether the region is shared > - whether the region had been committed (backed by swap) > - whether the region has been swapped out. > > Example output: > > > from to size rss hugetlb pgsz prot notes vm info/file > 0x00000000c0000000 - 0x00000000ffe00000 1071644672 0 4194304 2M rw-p huge JAVAHEAP /anon_hugepage > 0x00000000ffe00000 - 0x0000000100000000 2097152 0 0 2M rw-p huge JAVAHEAP /anon_hugepage > 0x0000558016b67000 - 0x0000558016b68000 4096 4096 0 4K r--p /shared/projects/openjdk/jdk-jdk/output-fastdebug/images/jdk/bin/java > 0x0000558016b68000 - 0x0000558016b69000 4096 4096 0 4K r-xp /shared/projects/openjdk/jdk-jdk/output-fastdebug/images/jdk/bin/java > 0x00007f3a749f2000 - 0x00007f3a74c62000 2555904 2555904 0 4K rwxp CODE(CodeHeap 'profiled nmethods') > 0x00007f3a74c62000 - 0x00007f3a7be51000 119468032 0 0 4K ---p nores CODE(CodeHeap 'profiled nmethods') > 0x00007f3a7be51000 - 0x00007f3a7c1c1000 3604480 3604480 0 4K rwxp CODE(CodeHeap 'profiled nmethods') > 0x00007f3a7c1c1000 - 0x00007f3a7c592000 4001792 0 0 4K ---p nores CODE(CodeHeap 'non-nmethods') > 0x00007f3a7c592000 - 0x00007f3a7c802000 2555904 2555904 0 4K rwxp CODE(CodeHeap 'non-profiled nmethods') > 0x00007f3a7c802000 - 0x00007f3a839f200... Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 23 commits: - feedback johan - fix merge errors - Merge branch 'master' into System.maps-more-info - copyrights - Merge branch 'master' into System.maps-more-info - fix merge issue - Merge branch 'master' into System.maps-more-info - fix whitespace issue - wip - exhuming - ... and 13 more: https://git.openjdk.org/jdk/compare/c6f3bf4b...940199de ------------- Changes: https://git.openjdk.org/jdk/pull/17158/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=17158&range=05 Stats: 614 lines in 14 files changed: 425 ins; 103 del; 86 mod Patch: https://git.openjdk.org/jdk/pull/17158.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17158/head:pull/17158 PR: https://git.openjdk.org/jdk/pull/17158 From stuefe at openjdk.org Thu Jun 20 09:31:48 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 20 Jun 2024 09:31:48 GMT Subject: RFR: 8322475: Extend printing for System.map [v5] In-Reply-To: References: Message-ID: On Wed, 19 Jun 2024 17:16:20 GMT, Johan Sj?len wrote: >> Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 20 commits: >> >> - copyrights >> - Merge branch 'master' into System.maps-more-info >> - fix merge issue >> - Merge branch 'master' into System.maps-more-info >> - fix whitespace issue >> - wip >> - exhuming >> - Merge branch 'master' into System.maps-more-info >> - Merge >> - remove codecache name printing >> - ... and 10 more: https://git.openjdk.org/jdk/compare/91bd85d6...231a8a91 > > A first round with this code. Thanks @jdksjolen ! Feedback integrated. Keep them coming. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17158#issuecomment-2180231986 From stuefe at openjdk.org Thu Jun 20 09:41:48 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 20 Jun 2024 09:41:48 GMT Subject: RFR: 8333994: NMT: call stacks should show source information [v3] In-Reply-To: References: Message-ID: > Patch adds printing of source file and line number to NMT call stacks that are printed in detail mode. Useful for hunting down leaks. Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: - fix windows build - Merge branch 'master' into JDK-8333994-NMT-call-stacks-should-show-source-information - caching - Merge branch 'master' into JDK-8333994-NMT-call-stacks-should-show-source-information - exclude macos from testing source info - copyrights - test - JDK-8333994-NMT-call-stacks-should-show-source-information ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19655/files - new: https://git.openjdk.org/jdk/pull/19655/files/63240369..0c8e98ea Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19655&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19655&range=01-02 Stats: 15120 lines in 185 files changed: 11951 ins; 1871 del; 1298 mod Patch: https://git.openjdk.org/jdk/pull/19655.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19655/head:pull/19655 PR: https://git.openjdk.org/jdk/pull/19655 From stefank at openjdk.org Thu Jun 20 09:46:18 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Thu, 20 Jun 2024 09:46:18 GMT Subject: RFR: 8334594: Generational ZGC: Deadlock after OopMap rewrites in 8331572 In-Reply-To: References: Message-ID: <2FpJxRnMwrCrs1VPpDIrvA2KwtC0aJSacgceNUXW_K4=.27f6882b-23ee-4cb9-a4dc-8ce3b568a9d4@github.com> On Thu, 20 Jun 2024 09:11:04 GMT, Aleksey Shipilev wrote: > As shown in the bug, there are cases when acquiring the `ServiceLock` for opportunistic notification leads to deadlock. We can untie the deadlock by checking if `ServiceLock` can be acquired on triggering path, and never blocking otherwise. > > Additional testing: > - [ ] Linux x86_64 service fastdebug, `all` > - [ ] Linux AArch64 service fastdebug, `all` Looks good to me. Thanks for fixing this. For those not reading the JBS entry: We might want to do a follow-up to limit the usage of the Service_lock so that we don't hold it while processing the JVMTI oops. ------------- Marked as reviewed by stefank (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19800#pullrequestreview-2129957643 From eosterlund at openjdk.org Thu Jun 20 10:00:11 2024 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Thu, 20 Jun 2024 10:00:11 GMT Subject: RFR: 8334594: Generational ZGC: Deadlock after OopMap rewrites in 8331572 In-Reply-To: References: Message-ID: On Thu, 20 Jun 2024 09:11:04 GMT, Aleksey Shipilev wrote: > As shown in the bug, there are cases when acquiring the `ServiceLock` for opportunistic notification leads to deadlock. We can untie the deadlock by checking if `ServiceLock` can be acquired on triggering path, and never blocking otherwise. > > Additional testing: > - [ ] Linux x86_64 service fastdebug, `all` > - [ ] Linux AArch64 service fastdebug, `all` This looks good as a direct fix to the bug. I agree though with the assessment that we should use a different lock for the queue going forward. It's also interesting to read the placement of these hooks. For the GC VM operation there is a comment saying that we probably just used the oop map cache so now is a good time to trigger cleanup. The same comment is present for ZGC and Shenandoah where we perform safepoints. But there the situation is typically the exact opposite to what the comment suggests then. Since we perform concurrent root scanning, we are *just about to* use the oop map cache, and the placement is probably the most unfortunate instead. It seems like we clean the caches *just before* using them instead just after. But again, that seems like a follow-up thing. ------------- Marked as reviewed by eosterlund (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19800#pullrequestreview-2129994072 From shade at openjdk.org Thu Jun 20 10:22:11 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 20 Jun 2024 10:22:11 GMT Subject: RFR: 8334594: Generational ZGC: Deadlock after OopMap rewrites in 8331572 In-Reply-To: References: Message-ID: On Thu, 20 Jun 2024 09:57:18 GMT, Erik ?sterlund wrote: > The same comment is present for ZGC and Shenandoah where we perform safepoints. But there the situation is typically the exact opposite to what the comment suggests then. Since we perform concurrent root scanning, we are just about to use the oop map cache, and the placement is probably the most unfortunate instead. It seems like we clean the caches just before using them instead just after. But again, that seems like a follow-up thing. Yeah, we did it out of symmetry with other (STW) GC ops. We can actually move these around to the exact places in Shenandoah and ZGC right after concurrent root handling happens. Relaxing the requirement for actually locking ServiceLock with this particular fix would help safety when we move the trigger around. (Re phasing: at least for Shenandoah, the start-update-refs op would start soon after root processing and evac is over, so we get the _after_ effect as well. There might be additional contention on reclamation queue and GlobalCounter that we can avoid, though). ------------- PR Comment: https://git.openjdk.org/jdk/pull/19800#issuecomment-2180325904 From rehn at openjdk.org Thu Jun 20 11:20:17 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Thu, 20 Jun 2024 11:20:17 GMT Subject: RFR: 8332689: RISC-V: Use load instead of trampolines [v12] In-Reply-To: References: Message-ID: On Mon, 17 Jun 2024 08:07:57 GMT, Fei Yang wrote: >> Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 18 commits: >> >> - Review comments, removed dead code. >> - Merge branch 'master' into 8332689 >> - Merge branch 'master' into 8332689 >> - Merge branch 'master' into 8332689 >> - Merge branch 'master' into 8332689 >> - Merge branch 'master' into 8332689 >> - Remove tmp file >> - Prepare for dynamic NativeCall size >> - Only allow one calling convetion, i.e. fixed sized >> - Merge branch 'master' into 8332689 >> - ... and 8 more: https://git.openjdk.org/jdk/compare/cc64aeac...f1dd3e16 > > src/hotspot/cpu/riscv/codeBuffer_riscv.cpp line 74: > >> 72: >> 73: assert(requests->number_of_entries() >= 1, "at least one"); >> 74: const int total_requested_size = MacroAssembler::max_patchable_far_call_stub_size() * requests->number_of_entries(); > > I see mixed uses of bot `MacroAssembler::max_patchable_far_call_stub_size()` and `MacroAssembler::NativeShortCall::trampoline_size` in this function. As this is only used under `UseTrampolines`, seems more reasonable to use `MacroAssembler::NativeShortCall::trampoline_size`. As you see in diff the mixing is pre-exsisting, I only changed names. Fixed. > src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 3932: > >> 3930: return instruction_size + MacroAssembler::NativeShortCall::trampoline_size; >> 3931: } >> 3932: return 2 * wordSize; > > Seems to me that this should be `instruction_size + wordSize`? That is one possible nop for alignment and an address of size `wordSize`. Yes, thanks. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19453#discussion_r1647412264 PR Review Comment: https://git.openjdk.org/jdk/pull/19453#discussion_r1647410894 From rehn at openjdk.org Thu Jun 20 11:24:24 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Thu, 20 Jun 2024 11:24:24 GMT Subject: RFR: 8332689: RISC-V: Use load instead of trampolines [v14] In-Reply-To: References: Message-ID: > Hi all, please consider! > > Today we do JAL to **dest** if **dest** is in reach (+/- 1 MB). > Using a very small application or running very short time we have fast patchable calls. > But any normal application running longer will increase the code size and code chrun/fragmentation. > So whatever or not you get hot fast calls rely on luck. > > To be patchable and get code cache reach we also emit a stub trampoline which we can point the JAL to. > This would be the common case for a patchable call. > > Code stream: > JAL > Stubs: > AUIPC > LD > JALR > > > > On some CPUs L1D and L1I can't contain the same cache line, which means the tramopline stub can bounce from L1I->L1D->L1I, which is expensive. > Even if you don't have that problem having a call to a jump is not the fastest way. > Loading the address avoids the pitsfalls of cmodx. > > This patch suggest to solve the problems with trampolines, we take small penalty in the naive case of JAL to **dest**, > and instead do by default: > > Code stream: > AUIPC > LD > JALR > Stubs: > > > An experimental option for turning trampolines back on exists. > > It should be possible to enhanced this with the WIP [Zjid](https://github.com/riscv/riscv-j-extension) by changing the JALR to JAL and nop out the auipc+ld (as the current proposal of Zjid forces the I-fetcher to fetch instruction in order (meaning we will avoid a lot issues which arm has)) when in reach and vice-versa. > > Numbers from VF2 (I have done them a few times, they are always overall in favor of this patch): > > fop (msec) 2239 | 2128 = 0.950424 > h2 (msec) 18660 | 16594 = 0.889282 > jython (msec) 22022 | 21925 = 0.995595 > luindex (msec) 2866 | 2842 = 0.991626 > lusearch (msec) 4108 | 4311 = 1.04942 > lusearch-fix (msec) 4406 | 4116 = 0.934181 > pmd (msec) 5976 | 5897 = 0.98678 > jython (msec) 22022 | 21925 = 0.995595 > Avg: 0.974112 > fop(xcomp) (msec) 2721 | 2714 = 0.997427 > h2(xcomp) (msec) 37719 | 38004 = 1.00756 > jython(xcomp) ... Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 22 commits: - Minor review comments - Merge branch 'master' into 8332689 - To be pushed - Merge branch 'master' into 8332689 - Review comments, removed dead code. - Merge branch 'master' into 8332689 - Merge branch 'master' into 8332689 - Merge branch 'master' into 8332689 - Merge branch 'master' into 8332689 - Merge branch 'master' into 8332689 - ... and 12 more: https://git.openjdk.org/jdk/compare/d7dad50a...e47f2454 ------------- Changes: https://git.openjdk.org/jdk/pull/19453/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19453&range=13 Stats: 874 lines in 16 files changed: 611 ins; 168 del; 95 mod Patch: https://git.openjdk.org/jdk/pull/19453.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19453/head:pull/19453 PR: https://git.openjdk.org/jdk/pull/19453 From duke at openjdk.org Thu Jun 20 11:39:23 2024 From: duke at openjdk.org (Inigo Mediavilla Saiz) Date: Thu, 20 Jun 2024 11:39:23 GMT Subject: RFR: 8333775: Small improvement to outputStream auto-indentation mode [v2] In-Reply-To: References: Message-ID: On Mon, 10 Jun 2024 10:34:50 GMT, Thomas Stuefe wrote: >> Almost trivial enhancement. >> >> [JDK-8333211](https://bugs.openjdk.org/browse/JDK-8333211) added automatic indentation. Some changes to complement that: >> >> - let outputStream::set_autoindent() return the old value for later restoration >> - add an RAII object to enable autoindent and restore the old state when leaving. > > Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: > > feedback johann @tstuefe I was wondering if you think that it would make sense to include an `auto_indent` field in the `streamIndentor` object, rather than having a `StreamAutoIndentor` ? The reason why I'm suggesting that is that I cannot think of a valid case where I'd want to use the `streamIndentor` with `auto_indent` disabled. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19592#issuecomment-2180460210 From sgehwolf at openjdk.org Thu Jun 20 11:50:11 2024 From: sgehwolf at openjdk.org (Severin Gehwolf) Date: Thu, 20 Jun 2024 11:50:11 GMT Subject: RFR: 8333446: Add tests for hierarchical container support [v2] In-Reply-To: <7NnRX5w4y2cPb9q5BDEH8UgwWnMS5u1nUYYOrS0YIaI=.2f8ee7b9-51d9-4493-89b6-b733026a6614@github.com> References: <7NnRX5w4y2cPb9q5BDEH8UgwWnMS5u1nUYYOrS0YIaI=.2f8ee7b9-51d9-4493-89b6-b733026a6614@github.com> Message-ID: On Thu, 20 Jun 2024 08:34:45 GMT, Severin Gehwolf wrote: >> Please review this PR which adds test support for systemd slices so that bugs like [JDK-8217338](https://bugs.openjdk.org/browse/JDK-8217338) can be verified. The added test, `SystemdMemoryAwarenessTest` currently passes on cgroups v1 and fails on cgroups v2 due to the way how [JDK-8217338](https://bugs.openjdk.org/browse/JDK-8217338) was implemented when JDK 13 was a thing. Therefore immediately problem-listed. It should get unlisted once [JDK-8322420](https://bugs.openjdk.org/browse/JDK-8322420) merges. >> >> I'm adding those tests in order to not regress another time. >> >> Testing: >> - [x] Container tests on Linux x86_64 cgroups v2 and Linux x86_64 cgroups v1. >> - [x] New systemd test on cg v1 (passes). Fails on cg v2 (due to JDK-8322420) >> - [x] GHA > > Severin Gehwolf has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Merge branch 'master' into jdk-8333446-systemd-slice-tests > - Fix comments > - 8333446: Add tests for hierarchical container support Anyone willing to review this? ------------- PR Comment: https://git.openjdk.org/jdk/pull/19530#issuecomment-2180477503 From sgehwolf at openjdk.org Thu Jun 20 11:57:42 2024 From: sgehwolf at openjdk.org (Severin Gehwolf) Date: Thu, 20 Jun 2024 11:57:42 GMT Subject: RFR: 8261242: [Linux] OSContainer::is_containerized() returns true when run outside a container [v5] In-Reply-To: References: Message-ID: > Please review this enhancement to the container detection code which allows it to figure out whether the JVM is actually running inside a container (`podman`, `docker`, `crio`), or with some other means that enforces memory/cpu limits by means of the cgroup filesystem. If neither of those conditions hold, the JVM runs in not containerized mode, addressing the issue described in the JBS tracker. For example, on my Linux system `is_containerized() == false" is being indicated with the following trace log line: > > > [0.001s][debug][os,container] OSContainer::init: is_containerized() = false because no cpu or memory limit is present > > > This state is being exposed by the Java `Metrics` API class using the new (still JDK internal) `isContainerized()` method. Example: > > > java -XshowSettings:system --version > Operating System Metrics: > Provider: cgroupv1 > System not containerized. > openjdk 23-internal 2024-09-17 > OpenJDK Runtime Environment (fastdebug build 23-internal-adhoc.sgehwolf.jdk-jdk) > OpenJDK 64-Bit Server VM (fastdebug build 23-internal-adhoc.sgehwolf.jdk-jdk, mixed mode, sharing) > > > The basic property this is being built on is the observation that the cgroup controllers typically get mounted read only into containers. Note that the current container tests assert that `OSContainer::is_containerized() == true` in various tests. Therefore, using the heuristic of "is any memory or cpu limit present" isn't sufficient. I had considered that in an earlier iteration, but many container tests failed. > > Overall, I think, with this patch we improve the current situation of claiming a containerized system being present when it's actually just a regular Linux system. > > Testing: > > - [x] GHA (risc-v failure seems infra related) > - [x] Container tests on Linux x86_64 of cgroups v1 and cgroups v2 (including gtests) > - [x] Some manual testing using cri-o > > Thoughts? Severin Gehwolf has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 14 commits: - Merge branch 'master' into jdk-8261242-is-containerized-fix - Merge branch 'master' into jdk-8261242-is-containerized-fix - Add doc for mountinfo scanning. - Unify naming of variables - Merge branch 'master' into jdk-8261242-is-containerized-fix - Merge branch 'master' into jdk-8261242-is-containerized-fix - jcheck fixes - Fix tests - Implement Metrics.isContainerized() - Some clean-up - ... and 4 more: https://git.openjdk.org/jdk/compare/01ee4241...7c163cb2 ------------- Changes: https://git.openjdk.org/jdk/pull/18201/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=18201&range=04 Stats: 406 lines in 19 files changed: 301 ins; 78 del; 27 mod Patch: https://git.openjdk.org/jdk/pull/18201.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18201/head:pull/18201 PR: https://git.openjdk.org/jdk/pull/18201 From sgehwolf at openjdk.org Thu Jun 20 12:06:43 2024 From: sgehwolf at openjdk.org (Severin Gehwolf) Date: Thu, 20 Jun 2024 12:06:43 GMT Subject: RFR: 8261242: [Linux] OSContainer::is_containerized() returns true when run outside a container [v6] In-Reply-To: References: Message-ID: > Please review this enhancement to the container detection code which allows it to figure out whether the JVM is actually running inside a container (`podman`, `docker`, `crio`), or with some other means that enforces memory/cpu limits by means of the cgroup filesystem. If neither of those conditions hold, the JVM runs in not containerized mode, addressing the issue described in the JBS tracker. For example, on my Linux system `is_containerized() == false" is being indicated with the following trace log line: > > > [0.001s][debug][os,container] OSContainer::init: is_containerized() = false because no cpu or memory limit is present > > > This state is being exposed by the Java `Metrics` API class using the new (still JDK internal) `isContainerized()` method. Example: > > > java -XshowSettings:system --version > Operating System Metrics: > Provider: cgroupv1 > System not containerized. > openjdk 23-internal 2024-09-17 > OpenJDK Runtime Environment (fastdebug build 23-internal-adhoc.sgehwolf.jdk-jdk) > OpenJDK 64-Bit Server VM (fastdebug build 23-internal-adhoc.sgehwolf.jdk-jdk, mixed mode, sharing) > > > The basic property this is being built on is the observation that the cgroup controllers typically get mounted read only into containers. Note that the current container tests assert that `OSContainer::is_containerized() == true` in various tests. Therefore, using the heuristic of "is any memory or cpu limit present" isn't sufficient. I had considered that in an earlier iteration, but many container tests failed. > > Overall, I think, with this patch we improve the current situation of claiming a containerized system being present when it's actually just a regular Linux system. > > Testing: > > - [x] GHA (risc-v failure seems infra related) > - [x] Container tests on Linux x86_64 of cgroups v1 and cgroups v2 (including gtests) > - [x] Some manual testing using cri-o > > Thoughts? Severin Gehwolf has updated the pull request incrementally with one additional commit since the last revision: Remove problem listing of PlainRead which is reworked here ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18201/files - new: https://git.openjdk.org/jdk/pull/18201/files/7c163cb2..3d98cbc2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18201&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18201&range=04-05 Stats: 1 line in 1 file changed: 0 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/18201.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18201/head:pull/18201 PR: https://git.openjdk.org/jdk/pull/18201 From sgehwolf at openjdk.org Thu Jun 20 12:06:48 2024 From: sgehwolf at openjdk.org (Severin Gehwolf) Date: Thu, 20 Jun 2024 12:06:48 GMT Subject: RFR: 8261242: [Linux] OSContainer::is_containerized() returns true when run outside a container [v2] In-Reply-To: <8MpoLKDw6usz92EBH9R1XWfnX0E7NU5fd2dv8tob2ho=.455c310f-cadb-484d-a40f-6fd7e2c0811c@github.com> References: <8MpoLKDw6usz92EBH9R1XWfnX0E7NU5fd2dv8tob2ho=.455c310f-cadb-484d-a40f-6fd7e2c0811c@github.com> Message-ID: On Tue, 16 Apr 2024 18:25:52 GMT, Thomas Stuefe wrote: >> Severin Gehwolf has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains ten additional commits since the last revision: >> >> - Merge branch 'master' into jdk-8261242-is-containerized-fix >> - jcheck fixes >> - Fix tests >> - Implement Metrics.isContainerized() >> - Some clean-up >> - Drop cgroups testing on plain Linux >> - Implement fall-back logic for non-ro controller mounts >> - Make find_ro static and local to compilation unit >> - 8261242: [Linux] OSContainer::is_containerized() returns true > > I am not enough of a container expert to judge if the basic approach is right - I trust you on this. This is just a technical code review. @tstuefe Do you mind to take another look? Thank you! ------------- PR Comment: https://git.openjdk.org/jdk/pull/18201#issuecomment-2180504024 From coleenp at openjdk.org Thu Jun 20 12:41:12 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 20 Jun 2024 12:41:12 GMT Subject: RFR: 8334594: Generational ZGC: Deadlock after OopMap rewrites in 8331572 In-Reply-To: References: Message-ID: On Thu, 20 Jun 2024 09:11:04 GMT, Aleksey Shipilev wrote: > As shown in the bug, there are cases when acquiring the `ServiceLock` for opportunistic notification leads to deadlock. We can untie the deadlock by checking if `ServiceLock` can be acquired on triggering path, and never blocking otherwise. > > Additional testing: > - [ ] Linux x86_64 service fastdebug, `all` > - [ ] Linux AArch64 service fastdebug, `all` Seems like a good workaround. ------------- Marked as reviewed by coleenp (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19800#pullrequestreview-2130311043 From jsjolen at openjdk.org Thu Jun 20 12:54:03 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Thu, 20 Jun 2024 12:54:03 GMT Subject: RFR: 8333658: NMT: Use an allocator with 4-byte pointers to save memory in NativeCallStackStorage [v22] In-Reply-To: References: Message-ID: > Hi, > > This PR introduces a new allocator, `IndexedFreelistAllocator`. It uses a `GrowableArray` in order to have 4-byte "pointers" to its elements and also works as a freelist as unused slots form a linked list. This allocator cannot shrink its used memory. I'm always open for better names. > > We then use this allocator in order to store the `NativeCallStackStorage` hash table. This saves `4 * 4099 = ~16KiB` of memory as each table element is now only 4 bytes. For each stored stack it also saves 4 bytes. The fact that the allocator cannot shrink is fine, we do not ever remove stacks from the storage. > > The main point of this is to introduce a pointer-generic experimential API, so I also implemented CHeap and Arena allocators. It's currently placed in NMT, but we might want to move it into utilities. It uses a bit of template-magic, but my IDE (Emacs+clangd) was actually able to figure things out when the types didn't line up correctly etc, so it's not an enemy to IDE help. > > It sounded expensive to have the GrowableArray continously realloc its underlying data array, so I did a basic test where we allocate 1 000 000 stacks and push them into NativeCallStackStorage backed by different allocators. This is available in the PR. > > The results are as follows on linux-x64-slowdebug: > > > Generate stacks... Done > Time taken with GrowableArray: 8341.240945 > Time taken with CHeap: 12189.031318 > Time taken with Arena: 8800.703092 > Time taken with GrowableArray again: 8295.508829 > > > And on linux-x64: > > > Time taken with GrowableArray: 8378.018135 > Time taken with CHeap: 12437.347868 > Time taken with Arena: 8758.064717 > Time taken with GrowableArray again: 8391.076291 > > > Obviously, this is a very basic benchmark, but it seems like we're faster than CHeap and Arena for this case. Johan Sj?len has updated the pull request incrementally with two additional commits since the last revision: - Thomas comments - Another test ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18979/files - new: https://git.openjdk.org/jdk/pull/18979/files/f392c3b5..a2357e1a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18979&range=21 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18979&range=20-21 Stats: 513 lines in 5 files changed: 255 ins; 253 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/18979.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18979/head:pull/18979 PR: https://git.openjdk.org/jdk/pull/18979 From jsjolen at openjdk.org Thu Jun 20 12:56:51 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Thu, 20 Jun 2024 12:56:51 GMT Subject: RFR: 8333658: NMT: Use an allocator with 4-byte pointers to save memory in NativeCallStackStorage [v23] In-Reply-To: References: Message-ID: > Hi, > > This PR introduces a new allocator, `IndexedFreelistAllocator`. It uses a `GrowableArray` in order to have 4-byte "pointers" to its elements and also works as a freelist as unused slots form a linked list. This allocator cannot shrink its used memory. I'm always open for better names. > > We then use this allocator in order to store the `NativeCallStackStorage` hash table. This saves `4 * 4099 = ~16KiB` of memory as each table element is now only 4 bytes. For each stored stack it also saves 4 bytes. The fact that the allocator cannot shrink is fine, we do not ever remove stacks from the storage. > > The main point of this is to introduce a pointer-generic experimential API, so I also implemented CHeap and Arena allocators. It's currently placed in NMT, but we might want to move it into utilities. It uses a bit of template-magic, but my IDE (Emacs+clangd) was actually able to figure things out when the types didn't line up correctly etc, so it's not an enemy to IDE help. > > It sounded expensive to have the GrowableArray continously realloc its underlying data array, so I did a basic test where we allocate 1 000 000 stacks and push them into NativeCallStackStorage backed by different allocators. This is available in the PR. > > The results are as follows on linux-x64-slowdebug: > > > Generate stacks... Done > Time taken with GrowableArray: 8341.240945 > Time taken with CHeap: 12189.031318 > Time taken with Arena: 8800.703092 > Time taken with GrowableArray again: 8295.508829 > > > And on linux-x64: > > > Time taken with GrowableArray: 8378.018135 > Time taken with CHeap: 12437.347868 > Time taken with Arena: 8758.064717 > Time taken with GrowableArray again: 8391.076291 > > > Obviously, this is a very basic benchmark, but it seems like we're faster than CHeap and Arena for this case. Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: Fix include ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18979/files - new: https://git.openjdk.org/jdk/pull/18979/files/a2357e1a..3859223c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18979&range=22 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18979&range=21-22 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/18979.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18979/head:pull/18979 PR: https://git.openjdk.org/jdk/pull/18979 From jsjolen at openjdk.org Thu Jun 20 13:06:38 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Thu, 20 Jun 2024 13:06:38 GMT Subject: RFR: 8333658: NMT: Use an allocator with 4-byte pointers to save memory in NativeCallStackStorage [v24] In-Reply-To: References: Message-ID: > Hi, > > This PR introduces a new allocator, `IndexedFreelistAllocator`. It uses a `GrowableArray` in order to have 4-byte "pointers" to its elements and also works as a freelist as unused slots form a linked list. This allocator cannot shrink its used memory. I'm always open for better names. > > We then use this allocator in order to store the `NativeCallStackStorage` hash table. This saves `4 * 4099 = ~16KiB` of memory as each table element is now only 4 bytes. For each stored stack it also saves 4 bytes. The fact that the allocator cannot shrink is fine, we do not ever remove stacks from the storage. > > The main point of this is to introduce a pointer-generic experimential API, so I also implemented CHeap and Arena allocators. It's currently placed in NMT, but we might want to move it into utilities. It uses a bit of template-magic, but my IDE (Emacs+clangd) was actually able to figure things out when the types didn't line up correctly etc, so it's not an enemy to IDE help. > > It sounded expensive to have the GrowableArray continously realloc its underlying data array, so I did a basic test where we allocate 1 000 000 stacks and push them into NativeCallStackStorage backed by different allocators. This is available in the PR. > > The results are as follows on linux-x64-slowdebug: > > > Generate stacks... Done > Time taken with GrowableArray: 8341.240945 > Time taken with CHeap: 12189.031318 > Time taken with Arena: 8800.703092 > Time taken with GrowableArray again: 8295.508829 > > > And on linux-x64: > > > Time taken with GrowableArray: 8378.018135 > Time taken with CHeap: 12437.347868 > Time taken with Arena: 8758.064717 > Time taken with GrowableArray again: 8391.076291 > > > Obviously, this is a very basic benchmark, but it seems like we're faster than CHeap and Arena for this case. Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: Rename free to deallocate ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18979/files - new: https://git.openjdk.org/jdk/pull/18979/files/3859223c..479e9573 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18979&range=23 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18979&range=22-23 Stats: 4 lines in 2 files changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/18979.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18979/head:pull/18979 PR: https://git.openjdk.org/jdk/pull/18979 From alanb at openjdk.org Thu Jun 20 13:09:12 2024 From: alanb at openjdk.org (Alan Bateman) Date: Thu, 20 Jun 2024 13:09:12 GMT Subject: RFR: 8333268: Fixes for static build [v4] In-Reply-To: References: Message-ID: On Wed, 19 Jun 2024 15:15:43 GMT, Magnus Ihse Bursie wrote: >> This patch contains a set of changes to improve static builds. They will pave the way for implementing a full static-only java launcher. The changes here will: >> >> 1) Make sure non-exported symbols are made local in the static libraries. This means that the risk of symbol conflict is the same for static libraries as for dynamic libraries (i.e. in practice zero, as long as a consistent naming scheme is used for exported functions). >> >> 2) Remove the work-arounds to exclude duplicated symbols. >> >> 3) Fix some code in hotspot and the JDK libraries that did not work properly with a static java launcher. >> >> The latter fixes are copied from or inspired by the work done by @jianglizhou and her team as part of the Project Leyden [Hermetic Java](https://github.com/openjdk/leyden/tree/hermetic-java-runtime). > > Magnus Ihse Bursie has updated the pull request incrementally with one additional commit since the last revision: > > Add dummy implementation of os::lookup_function for Windows The changes to the launcher look okay. The move from `ifdef STATIC_BUILD` to `JLI_IsStaticallyLinked` is quite nice. Having libjdwp link to libjvm was a surprise but I think okay. I think it would be useful to provide a brief summary on the when/where the static builds are tested to ensure that the changes don't bit rot. I realise we already have static builds but it isn't obvious where this is tested. src/hotspot/share/runtime/linkType.cpp line 36: > 34: return JNI_TRUE; > 35: #else > 36: return JNI_FALSE; bool != jboolean, I assume you don't want that. The naming is a bit unusual, a function that returns a boolean is usually name is_XXX, but maybe there is reason for this? ------------- PR Comment: https://git.openjdk.org/jdk/pull/19478#issuecomment-2180635747 PR Review Comment: https://git.openjdk.org/jdk/pull/19478#discussion_r1647480992 From zgu at openjdk.org Thu Jun 20 13:17:11 2024 From: zgu at openjdk.org (Zhengyu Gu) Date: Thu, 20 Jun 2024 13:17:11 GMT Subject: RFR: 8334594: Generational ZGC: Deadlock after OopMap rewrites in 8331572 In-Reply-To: References: Message-ID: On Thu, 20 Jun 2024 09:11:04 GMT, Aleksey Shipilev wrote: > As shown in the bug, there are cases when acquiring the `ServiceLock` for opportunistic notification leads to deadlock. We can untie the deadlock by checking if `ServiceLock` can be acquired on triggering path, and never blocking otherwise. > > Additional testing: > - [ ] Linux x86_64 service fastdebug, `all` > - [ ] Linux AArch64 service fastdebug, `all` LGTM > This looks good as a direct fix to the bug. I agree though with the assessment that we should use a different lock for the queue going forward. > > It's also interesting to read the placement of these hooks. For the GC VM operation there is a comment saying that we probably just used the oop map cache so now is a good time to trigger cleanup. > > The same comment is present for ZGC and Shenandoah where we perform safepoints. But there the situation is typically the exact opposite to what the comment suggests then. Since we perform concurrent root scanning, we are _just about to_ use the oop map cache, and the placement is probably the most unfortunate instead. It seems like we clean the caches _just before_ using them instead just after. But again, that seems like a follow-up thing. The cleanup is on already stalled/evicted oop map entries, which are no longer accessible, so I don't the placement issue. ------------- Marked as reviewed by zgu (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19800#pullrequestreview-2130401926 PR Comment: https://git.openjdk.org/jdk/pull/19800#issuecomment-2180657257 From fyang at openjdk.org Thu Jun 20 13:26:16 2024 From: fyang at openjdk.org (Fei Yang) Date: Thu, 20 Jun 2024 13:26:16 GMT Subject: RFR: 8332587: RISC-V: secondary_super_cache does not scale well [v10] In-Reply-To: References: Message-ID: On Thu, 20 Jun 2024 03:18:41 GMT, Gui Cao wrote: >> Implementation of subtype checking [JDK-8180450](https://bugs.openjdk.org/browse/JDK-8180450) for linux-riscv64. >> This optimization depends on availability of the Zbb extension which has the cpop instruction. >> >> ### Correctness testing: >> >> - [x] Run tier1-3, hotspot:tier4 tests on SOPHON SG2042 (release) >> - [x] Run tier1-3 tests with -XX:+UseRVV on qemu 8.1.0 (fastdebug) >> - [x] Run all of tier1 with `-XX:+VerifySecondarySupers` >> >> ### JMH tested on Banana Pi BPI-F3 board (has Zbb) and Enable UseZbb, Not Enable UseZba, UseZbs >> Original: >> >> Benchmark Mode Cnt Score Error Units >> SecondarySuperCacheHits.test avgt 15 11.375 ? 0.071 ns/op >> SecondarySuperCacheInterContention.test avgt 15 646.087 ? 32.587 ns/op >> SecondarySuperCacheInterContention.test:t1 avgt 15 600.090 ? 83.779 ns/op >> SecondarySuperCacheInterContention.test:t2 avgt 15 692.084 ? 73.218 ns/op >> SecondarySupersLookup.testNegative00 avgt 15 16.420 ? 0.239 ns/op >> SecondarySupersLookup.testNegative01 avgt 15 18.307 ? 0.260 ns/op >> SecondarySupersLookup.testNegative02 avgt 15 21.695 ? 0.458 ns/op >> SecondarySupersLookup.testNegative03 avgt 15 24.855 ? 0.664 ns/op >> SecondarySupersLookup.testNegative04 avgt 15 27.305 ? 0.522 ns/op >> SecondarySupersLookup.testNegative05 avgt 15 29.719 ? 0.385 ns/op >> SecondarySupersLookup.testNegative06 avgt 15 32.231 ? 0.498 ns/op >> SecondarySupersLookup.testNegative07 avgt 15 33.747 ? 0.603 ns/op >> SecondarySupersLookup.testNegative08 avgt 15 35.856 ? 0.629 ns/op >> SecondarySupersLookup.testNegative09 avgt 15 37.077 ? 0.546 ns/op >> SecondarySupersLookup.testNegative10 avgt 15 39.408 ? 0.465 ns/op >> SecondarySupersLookup.testNegative16 avgt 15 51.041 ? 0.547 ns/op >> SecondarySupersLookup.testNegative20 avgt 15 58.722 ? 0.922 ns/op >> SecondarySupersLookup.testNegative30 avgt 15 77.310 ? 0.654 ns/op >> SecondarySupersLookup.testNegative32 avgt 15 81.116 ? 0.854 ns/op >> SecondarySupersLookup.testNegative40 avgt 15 96.311 ? 0.840 ns/op >> SecondarySupersLookup.testNegative50 avgt 15 115.427 ? 0.838 ns/op >> SecondarySupersLookup.testNegative55 avgt 15 124.371 ? 1.076 ns/op >> SecondarySupersLookup.testNegative56 avgt 15 126.796 ? 0.916 ns/op >> SecondarySupersLookup.testNegative57 avgt 15 127.952 ? 1.202 ns/op >> SecondarySupersLookup.testNegative58 avgt 15 131.956 ? 4.515 ns/op >> Seco... > > Gui Cao has updated the pull request incrementally with one additional commit since the last revision: > > Add comment and fix population_count Marked as reviewed by fyang (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/19320#pullrequestreview-2130427039 From stuefe at openjdk.org Thu Jun 20 13:32:13 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 20 Jun 2024 13:32:13 GMT Subject: RFR: 8333775: Small improvement to outputStream auto-indentation mode [v2] In-Reply-To: References: Message-ID: On Thu, 20 Jun 2024 11:36:52 GMT, Inigo Mediavilla Saiz wrote: > @tstuefe I was wondering if you think that it would make sense to include an `auto_indent` field in the `streamIndentor` object, rather than having a `StreamAutoIndentor` ? > > The reason why I'm suggesting that is that I cannot think of a valid case where I'd want to use the `streamIndentor` with `auto_indent` disabled. IIRC there are existing users of streamIndentor that don't expect automatic indentation. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19592#issuecomment-2180696649 From rcastanedalo at openjdk.org Thu Jun 20 13:45:15 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 20 Jun 2024 13:45:15 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v2] In-Reply-To: References: Message-ID: On Thu, 20 Jun 2024 04:17:30 GMT, Roberto Casta?eda Lozano wrote: >> This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail. >> >> We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold: >> >> - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and >> - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work. >> >> ## Summary of the Changes >> >> ### Platform-Independent Changes (`src/hotspot/share`) >> >> These consist mainly of: >> >> - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early; >> - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and >> - temporary support for porting the JEP to the remaining platforms. >> >> The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed. >> >> ### Platform-Dependent Changes (`src/hotspot/cpu`) >> >> These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms. >> >> #### ADL Changes >> >> The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy. >> >> #### `G1BarrierSetAssembler` Changes >> >> Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live ... > > Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: > > Build barrier data in G1BarrierSetC2::get_store_barrier() by adding, rather than removing, barrier tags I will be away from keyboard until Aug 5, @fisk and @kimbarrett may be able to answer questions about porting in the meantime. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19746#issuecomment-2180739531 From gcao at openjdk.org Thu Jun 20 13:45:16 2024 From: gcao at openjdk.org (Gui Cao) Date: Thu, 20 Jun 2024 13:45:16 GMT Subject: RFR: 8332587: RISC-V: secondary_super_cache does not scale well [v10] In-Reply-To: References: Message-ID: On Thu, 20 Jun 2024 03:18:41 GMT, Gui Cao wrote: >> Implementation of subtype checking [JDK-8180450](https://bugs.openjdk.org/browse/JDK-8180450) for linux-riscv64. >> This optimization depends on availability of the Zbb extension which has the cpop instruction. >> >> ### Correctness testing: >> >> - [x] Run tier1-3, hotspot:tier4 tests on SOPHON SG2042 (release) >> - [x] Run tier1-3 tests with -XX:+UseRVV on qemu 8.1.0 (fastdebug) >> - [x] Run all of tier1 with `-XX:+VerifySecondarySupers` >> >> ### JMH tested on Banana Pi BPI-F3 board (has Zbb) and Enable UseZbb, Not Enable UseZba, UseZbs >> Original: >> >> Benchmark Mode Cnt Score Error Units >> SecondarySuperCacheHits.test avgt 15 11.375 ? 0.071 ns/op >> SecondarySuperCacheInterContention.test avgt 15 646.087 ? 32.587 ns/op >> SecondarySuperCacheInterContention.test:t1 avgt 15 600.090 ? 83.779 ns/op >> SecondarySuperCacheInterContention.test:t2 avgt 15 692.084 ? 73.218 ns/op >> SecondarySupersLookup.testNegative00 avgt 15 16.420 ? 0.239 ns/op >> SecondarySupersLookup.testNegative01 avgt 15 18.307 ? 0.260 ns/op >> SecondarySupersLookup.testNegative02 avgt 15 21.695 ? 0.458 ns/op >> SecondarySupersLookup.testNegative03 avgt 15 24.855 ? 0.664 ns/op >> SecondarySupersLookup.testNegative04 avgt 15 27.305 ? 0.522 ns/op >> SecondarySupersLookup.testNegative05 avgt 15 29.719 ? 0.385 ns/op >> SecondarySupersLookup.testNegative06 avgt 15 32.231 ? 0.498 ns/op >> SecondarySupersLookup.testNegative07 avgt 15 33.747 ? 0.603 ns/op >> SecondarySupersLookup.testNegative08 avgt 15 35.856 ? 0.629 ns/op >> SecondarySupersLookup.testNegative09 avgt 15 37.077 ? 0.546 ns/op >> SecondarySupersLookup.testNegative10 avgt 15 39.408 ? 0.465 ns/op >> SecondarySupersLookup.testNegative16 avgt 15 51.041 ? 0.547 ns/op >> SecondarySupersLookup.testNegative20 avgt 15 58.722 ? 0.922 ns/op >> SecondarySupersLookup.testNegative30 avgt 15 77.310 ? 0.654 ns/op >> SecondarySupersLookup.testNegative32 avgt 15 81.116 ? 0.854 ns/op >> SecondarySupersLookup.testNegative40 avgt 15 96.311 ? 0.840 ns/op >> SecondarySupersLookup.testNegative50 avgt 15 115.427 ? 0.838 ns/op >> SecondarySupersLookup.testNegative55 avgt 15 124.371 ? 1.076 ns/op >> SecondarySupersLookup.testNegative56 avgt 15 126.796 ? 0.916 ns/op >> SecondarySupersLookup.testNegative57 avgt 15 127.952 ? 1.202 ns/op >> SecondarySupersLookup.testNegative58 avgt 15 131.956 ? 4.515 ns/op >> Seco... > > Gui Cao has updated the pull request incrementally with one additional commit since the last revision: > > Add comment and fix population_count Thanks all for the review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19320#issuecomment-2180739594 From gcao at openjdk.org Thu Jun 20 13:48:17 2024 From: gcao at openjdk.org (Gui Cao) Date: Thu, 20 Jun 2024 13:48:17 GMT Subject: Integrated: 8332587: RISC-V: secondary_super_cache does not scale well In-Reply-To: References: Message-ID: On Tue, 21 May 2024 08:31:53 GMT, Gui Cao wrote: > Implementation of subtype checking [JDK-8180450](https://bugs.openjdk.org/browse/JDK-8180450) for linux-riscv64. > This optimization depends on availability of the Zbb extension which has the cpop instruction. > > ### Correctness testing: > > - [x] Run tier1-3, hotspot:tier4 tests on SOPHON SG2042 (release) > - [x] Run tier1-3 tests with -XX:+UseRVV on qemu 8.1.0 (fastdebug) > - [x] Run all of tier1 with `-XX:+VerifySecondarySupers` > > ### JMH tested on Banana Pi BPI-F3 board (has Zbb) and Enable UseZbb, Not Enable UseZba, UseZbs > Original: > > Benchmark Mode Cnt Score Error Units > SecondarySuperCacheHits.test avgt 15 11.375 ? 0.071 ns/op > SecondarySuperCacheInterContention.test avgt 15 646.087 ? 32.587 ns/op > SecondarySuperCacheInterContention.test:t1 avgt 15 600.090 ? 83.779 ns/op > SecondarySuperCacheInterContention.test:t2 avgt 15 692.084 ? 73.218 ns/op > SecondarySupersLookup.testNegative00 avgt 15 16.420 ? 0.239 ns/op > SecondarySupersLookup.testNegative01 avgt 15 18.307 ? 0.260 ns/op > SecondarySupersLookup.testNegative02 avgt 15 21.695 ? 0.458 ns/op > SecondarySupersLookup.testNegative03 avgt 15 24.855 ? 0.664 ns/op > SecondarySupersLookup.testNegative04 avgt 15 27.305 ? 0.522 ns/op > SecondarySupersLookup.testNegative05 avgt 15 29.719 ? 0.385 ns/op > SecondarySupersLookup.testNegative06 avgt 15 32.231 ? 0.498 ns/op > SecondarySupersLookup.testNegative07 avgt 15 33.747 ? 0.603 ns/op > SecondarySupersLookup.testNegative08 avgt 15 35.856 ? 0.629 ns/op > SecondarySupersLookup.testNegative09 avgt 15 37.077 ? 0.546 ns/op > SecondarySupersLookup.testNegative10 avgt 15 39.408 ? 0.465 ns/op > SecondarySupersLookup.testNegative16 avgt 15 51.041 ? 0.547 ns/op > SecondarySupersLookup.testNegative20 avgt 15 58.722 ? 0.922 ns/op > SecondarySupersLookup.testNegative30 avgt 15 77.310 ? 0.654 ns/op > SecondarySupersLookup.testNegative32 avgt 15 81.116 ? 0.854 ns/op > SecondarySupersLookup.testNegative40 avgt 15 96.311 ? 0.840 ns/op > SecondarySupersLookup.testNegative50 avgt 15 115.427 ? 0.838 ns/op > SecondarySupersLookup.testNegative55 avgt 15 124.371 ? 1.076 ns/op > SecondarySupersLookup.testNegative56 avgt 15 126.796 ? 0.916 ns/op > SecondarySupersLookup.testNegative57 avgt 15 127.952 ? 1.202 ns/op > SecondarySupersLookup.testNegative58 avgt 15 131.956 ? 4.515 ns/op > SecondarySupersLookup.testNegative59 avgt 15 131.858 ? 1.066 ns/op > SecondaryS... This pull request has now been integrated. Changeset: 001d6860 Author: Gui Cao Committer: Hamlin Li URL: https://git.openjdk.org/jdk/commit/001d6860199436c5fb14bd681d640d462b472015 Stats: 400 lines in 5 files changed: 398 ins; 0 del; 2 mod 8332587: RISC-V: secondary_super_cache does not scale well Reviewed-by: mli, fyang ------------- PR: https://git.openjdk.org/jdk/pull/19320 From shade at openjdk.org Thu Jun 20 14:10:10 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 20 Jun 2024 14:10:10 GMT Subject: RFR: 8334594: Generational ZGC: Deadlock after OopMap rewrites in 8331572 In-Reply-To: References: Message-ID: On Thu, 20 Jun 2024 09:11:04 GMT, Aleksey Shipilev wrote: > As shown in the bug, there are cases when acquiring the `ServiceLock` for opportunistic notification leads to deadlock. We can untie the deadlock by checking if `ServiceLock` can be acquired on triggering path, and never blocking otherwise. > > Additional testing: > - [x] Linux x86_64 service fastdebug, `all` > - [x] Linux AArch64 service fastdebug, `all` All tests pass. I'll wait until 24 hours expire and then integrate. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19800#issuecomment-2180802390 From sviswanathan at openjdk.org Thu Jun 20 15:06:14 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Thu, 20 Jun 2024 15:06:14 GMT Subject: RFR: 8329032: C2 compiler register allocation support for APX EGPRs [v8] In-Reply-To: References: <835IiFAjWPI9AoU9j1WSAhVY2EEHzjpvWucUXTGxgUw=.cdb547b2-e1e7-4c5b-915f-7fde46e26c81@github.com> Message-ID: On Thu, 20 Jun 2024 06:01:39 GMT, Jatin Bhateja wrote: >> Intel? Advanced Performance Extensions (Intel? APX) adds 16 new 64 bit general purpose register also known as Extended General Purpose Registers in IA-32e 64 bit mode. >> >> Summary of changes introduced along with this patch:- >> >> 1. C2 compiler register allocation support. >> 2. Architecture state save restoration while transitioning from C1/C2 JIT compiled code to runtime services. >> 3. Support new PUSH2/POP2 instructions along with push-pop acceleration hints (PPX) to optimize register save/restore operation. >> 4. Applicable extensions to native interface used by runtime for patching instruction. >> >> We plan to address C1 register support in subsequent patch as there are hard upper bound allocation limits >> (currently set to r11) imposed by existing implementation of linear scan algorithm after which it reserves >> remaining register for special purpose. >> >> Patch has been regressed over stand alone test points after prioritizing EGPR allocations over existing GPR register by manually modifying the register sequences in relevant allocation class. >> >> We plan to do thorough validation using [Intel's SDE](https://www.intel.com/content/www/us/en/download/684897/intel-software-development-emulator.html) during course of time and release incremental patches for bug fixes >> found during testing. >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Review comments resolution. Looks good to me. ------------- Marked as reviewed by sviswanathan (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19042#pullrequestreview-2130698196 From lmesnik at openjdk.org Thu Jun 20 15:46:23 2024 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Thu, 20 Jun 2024 15:46:23 GMT Subject: Integrated: 8332252: Clean up vmTestbase/vm/share In-Reply-To: References: Message-ID: On Fri, 14 Jun 2024 16:37:15 GMT, Leonid Mesnik wrote: > The vmTestbase/vm/share is a shared test library for vmTestbase tests. This library contains a lot of code that is used by only by small number of tests or not used at all. There are no plans to actively develop new tests in vmTestsbase and improve this shared library. > The final goal of this and the following PRs is to reduce the maintenance cost of vmTestbase by eliminating this library. > > Also, this PR moves test-specific code into corresponding test directories to increase code locality. This allows later easier move tests from vmTestbase. > > The few remaining classes include > InMemoryJavaCompiler.java > that is very similar to same class from the standard testlibrary and could be merge with it and > ProcessUtils.java > which is used by > test/hotspot/jtreg/runtime/Thread/TestBreakSignalThreadDump.java > and thus should be moved into the standard testlibrary. > The stack and options might be merged in nsk/share test library. This pull request has now been integrated. Changeset: a81e1bf1 Author: Leonid Mesnik URL: https://git.openjdk.org/jdk/commit/a81e1bf1e1a6f00280b9be987c03fe20915fd52c Stats: 1669 lines in 45 files changed: 27 ins; 1595 del; 47 mod 8332252: Clean up vmTestbase/vm/share Reviewed-by: cjplummer ------------- PR: https://git.openjdk.org/jdk/pull/19727 From never at openjdk.org Thu Jun 20 15:50:38 2024 From: never at openjdk.org (Tom Rodriguez) Date: Thu, 20 Jun 2024 15:50:38 GMT Subject: RFR: 8333300: [JVMCI] add support for generational ZGC [v5] In-Reply-To: References: Message-ID: > This exposes the required values for JVMCI to support generational ZGC. It includes a few things worth mentioning. JVMCI still exports XBarrierSetRuntime as fields in CompilerToVM::Data under the original name of ZBarrierSetRuntime. I have exported the XBarrierSetRuntime and ZBarrierSetRuntime functions as addresses under their actual name. This permits backward compatibility until all the required parts are in place. We can eventually delete the CompilerToVM::Data names. > > I added ZBarrierSetRuntime::load_barrier_on_oop_array paralleling XBarrierSetRuntime::load_barrier_on_oop_array as we use that for a vector barrier. I could create the function as part of JVMCIRuntime if there are any concerns about including that in the ZGC core. Tom Rodriguez has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 13 commits: - Merge branch 'master' into tkr-genz - fix spelling of JVMCI - Merge branch 'master' into tkr-genz - Merge remote-tracking branch 'origin/master' into tkr-genz - Merge branch 'tkr-genz' of github.com:tkrodriguez/jdk into tkr-genz - Fix riscv compilation - 8333300: [JVMCI] add support for generational ZGC - Merge branch 'master' into tkr-genz - Merge branch 'master' into tkr-genz - Use NativeAccess to read from handles - ... and 3 more: https://git.openjdk.org/jdk/compare/fad6644e...b4a82828 ------------- Changes: https://git.openjdk.org/jdk/pull/19490/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19490&range=04 Stats: 245 lines in 16 files changed: 194 ins; 10 del; 41 mod Patch: https://git.openjdk.org/jdk/pull/19490.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19490/head:pull/19490 PR: https://git.openjdk.org/jdk/pull/19490 From never at openjdk.org Thu Jun 20 15:50:38 2024 From: never at openjdk.org (Tom Rodriguez) Date: Thu, 20 Jun 2024 15:50:38 GMT Subject: RFR: 8333300: [JVMCI] add support for generational ZGC [v5] In-Reply-To: References: Message-ID: On Thu, 30 May 2024 20:47:05 GMT, Doug Simon wrote: >> Tom Rodriguez has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 13 commits: >> >> - Merge branch 'master' into tkr-genz >> - fix spelling of JVMCI >> - Merge branch 'master' into tkr-genz >> - Merge remote-tracking branch 'origin/master' into tkr-genz >> - Merge branch 'tkr-genz' of github.com:tkrodriguez/jdk into tkr-genz >> - Fix riscv compilation >> - 8333300: [JVMCI] add support for generational ZGC >> - Merge branch 'master' into tkr-genz >> - Merge branch 'master' into tkr-genz >> - Use NativeAccess to read from handles >> - ... and 3 more: https://git.openjdk.org/jdk/compare/fad6644e...b4a82828 > > src/hotspot/share/jvmci/jvmci_globals.cpp line 233: > >> 231: // Check if selected GC is supported by JVMCI and Java compiler >> 232: if (!gc_supports_jvmci()) { >> 233: fatal("JVMIC does not support the selected GC"); > > JVMIC -> JVMCI I fixed this. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19490#discussion_r1647795397 From kvn at openjdk.org Thu Jun 20 15:53:12 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 20 Jun 2024 15:53:12 GMT Subject: RFR: 8329032: C2 compiler register allocation support for APX EGPRs [v8] In-Reply-To: References: <835IiFAjWPI9AoU9j1WSAhVY2EEHzjpvWucUXTGxgUw=.cdb547b2-e1e7-4c5b-915f-7fde46e26c81@github.com> Message-ID: On Thu, 20 Jun 2024 06:01:39 GMT, Jatin Bhateja wrote: >> Intel? Advanced Performance Extensions (Intel? APX) adds 16 new 64 bit general purpose register also known as Extended General Purpose Registers in IA-32e 64 bit mode. >> >> Summary of changes introduced along with this patch:- >> >> 1. C2 compiler register allocation support. >> 2. Architecture state save restoration while transitioning from C1/C2 JIT compiled code to runtime services. >> 3. Support new PUSH2/POP2 instructions along with push-pop acceleration hints (PPX) to optimize register save/restore operation. >> 4. Applicable extensions to native interface used by runtime for patching instruction. >> >> We plan to address C1 register support in subsequent patch as there are hard upper bound allocation limits >> (currently set to r11) imposed by existing implementation of linear scan algorithm after which it reserves >> remaining register for special purpose. >> >> Patch has been regressed over stand alone test points after prioritizing EGPR allocations over existing GPR register by manually modifying the register sequences in relevant allocation class. >> >> We plan to do thorough validation using [Intel's SDE](https://www.intel.com/content/www/us/en/download/684897/intel-software-development-emulator.html) during course of time and release incremental patches for bug fixes >> found during testing. >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Review comments resolution. Update is fine. Let me re-test it before you integrate (because of new asserts). ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19042#pullrequestreview-2130809688 From coleenp at openjdk.org Thu Jun 20 16:33:10 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 20 Jun 2024 16:33:10 GMT Subject: RFR: 8326820: Metadata artificially kept alive [v3] In-Reply-To: References: <1xvsDMD50nAALDma6JqSvqNVfYl1KnN0ihFdkztDHYE=.97983fa3-6feb-4ee6-b578-c76e326bcc33@github.com> Message-ID: On Wed, 19 Jun 2024 15:06:25 GMT, Axel Boldt-Christmas wrote: >> ClassLoaderDataGraph provides APIs for walking different metadata. All the iterators which are not designed to be used by the GC also keep the holder of the CLDs alive and by extensions keeps all metadata alive. This is problematic for concurrent GC as it keeps otherwise unreachable classes from being unloaded and the respective metadata freed. >> >> This patch changes the default iteration behaviour to not keep the holder alive, with the exception of `loaded_classes_do` (renamed `loaded_classes_do_keepalive`) and `modules_do` (renamed `modules_do_keepalive`) which is used by jvmti APIs that requires that the holder is kept alive. >> >> All other uses consumes all the metadata it queries during its safepoint or before releasing the `ClassLoaderDataGraph_lock`. >> >> Before this change some jcmd, new jfr chunks and some jfr events, all of which consumed these APIs, could cause class unloading to not occur. >> >> Been running our internal stress test in an even more stressful mode which without this patch reproduces the metaspace OOME [JDK-8326005](https://bugs.openjdk.org/browse/JDK-8326005) consistently within a few hours. And after this patch it does not. >> >> Currently running tier1-tier8 testing. > > Axel Boldt-Christmas has updated the pull request incrementally with one additional commit since the last revision: > > Rename and comment SystemDictionary::methods_do If the default is to not keep the CLD alive, I don't like that we need the details of the side effect in the name. Just call it classes_do, etc. I don't care about no-keepalive in all these callers, if that's the right answer for most of these callers. Put the side effect in the name of the exceptional cases. We had the iterator keep things alive for safety in the cases where we could have the CLD unload while we were looking at it. In all these cases, this wasn't needed? Still have to work through that. ------------- Changes requested by coleenp (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19769#pullrequestreview-2130887817 From stefank at openjdk.org Thu Jun 20 16:56:10 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Thu, 20 Jun 2024 16:56:10 GMT Subject: RFR: 8326820: Metadata artificially kept alive [v3] In-Reply-To: References: <1xvsDMD50nAALDma6JqSvqNVfYl1KnN0ihFdkztDHYE=.97983fa3-6feb-4ee6-b578-c76e326bcc33@github.com> Message-ID: On Thu, 20 Jun 2024 16:30:30 GMT, Coleen Phillimore wrote: > If the default is to not keep the CLD alive, I don't like that we need the details of the side effect in the name. Just call it classes_do, etc. I don't care about no-keepalive in all these callers, if that's the right answer for most of these callers. Put the side effect in the name of the exceptional cases. I disagree and probably urged/influenced Axel to name these functions this way. Very few people understand that they need to be careful around liveness of CLDs and klasses when they use these iterators/visitors. We keep seeing bugs in this area. Either because the devs fail to keep the klasses alive, or as in this bug, all klasses become kept alive. We want a name that strongly suggests that these functions are not as innocent as one might think, and that if you are going to use these functions you need to go an look at the comments (or ask someone) what we mean with no-keepalive. > > We had the iterator keep things alive for safety in the cases where we could have the CLD unload while we were looking at it. It turns out that that safety causes significant problems for seldomly run, concurrent marking GCs. If you repeatedly use any of these APIs you in effect turn off class unloading when running wit ZGC. > In all these cases, this wasn't needed? We don't think they are. It would be great if you could take an extra close look at the class redefinition code. > Still have to work through that. Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/19769#issuecomment-2181139233 From stuefe at openjdk.org Thu Jun 20 17:41:16 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 20 Jun 2024 17:41:16 GMT Subject: RFR: 8261242: [Linux] OSContainer::is_containerized() returns true when run outside a container [v6] In-Reply-To: References: Message-ID: <3garHvE8lhPovujClt422-1pcIcs7z7zpqpngEHDd6w=.8776bce8-2b79-44bd-8355-d753562a75cf@github.com> On Thu, 20 Jun 2024 12:06:43 GMT, Severin Gehwolf wrote: >> Please review this enhancement to the container detection code which allows it to figure out whether the JVM is actually running inside a container (`podman`, `docker`, `crio`), or with some other means that enforces memory/cpu limits by means of the cgroup filesystem. If neither of those conditions hold, the JVM runs in not containerized mode, addressing the issue described in the JBS tracker. For example, on my Linux system `is_containerized() == false" is being indicated with the following trace log line: >> >> >> [0.001s][debug][os,container] OSContainer::init: is_containerized() = false because no cpu or memory limit is present >> >> >> This state is being exposed by the Java `Metrics` API class using the new (still JDK internal) `isContainerized()` method. Example: >> >> >> java -XshowSettings:system --version >> Operating System Metrics: >> Provider: cgroupv1 >> System not containerized. >> openjdk 23-internal 2024-09-17 >> OpenJDK Runtime Environment (fastdebug build 23-internal-adhoc.sgehwolf.jdk-jdk) >> OpenJDK 64-Bit Server VM (fastdebug build 23-internal-adhoc.sgehwolf.jdk-jdk, mixed mode, sharing) >> >> >> The basic property this is being built on is the observation that the cgroup controllers typically get mounted read only into containers. Note that the current container tests assert that `OSContainer::is_containerized() == true` in various tests. Therefore, using the heuristic of "is any memory or cpu limit present" isn't sufficient. I had considered that in an earlier iteration, but many container tests failed. >> >> Overall, I think, with this patch we improve the current situation of claiming a containerized system being present when it's actually just a regular Linux system. >> >> Testing: >> >> - [x] GHA (risc-v failure seems infra related) >> - [x] Container tests on Linux x86_64 of cgroups v1 and cgroups v2 (including gtests) >> - [x] Some manual testing using cri-o >> >> Thoughts? > > Severin Gehwolf has updated the pull request incrementally with one additional commit since the last revision: > > Remove problem listing of PlainRead which is reworked here Seems okay. I don't understand the depths of V1 vs V2 controller files as well as you do, @jerboaa , but I trust you there. The mechanics seem fine. src/hotspot/os/linux/cgroupSubsystem_linux.cpp line 373: > 371: * (11) mount source: matched with '%*s' and discarded > 372: * (12) super options: matched with '%*s' and discarded > 373: */ Thanks for the good comment. That scanf line is a brain teaser. src/hotspot/os/linux/cgroupSubsystem_linux.cpp line 422: > 420: * (12) super options: matched with '%s' and captured in 'tmpcgroups' > 421: */ > 422: if (sscanf(p, "%*d %*d %*d:%*d %s %s %s%*[^-]- %s %*s %s", tmproot, tmpmount, mount_opts, tmp_fs_type, tmpcgroups) == 5) { The only difference to v1 is that we parse the super options (12), right? Could we factor out the parsing into a helper function? Or, alternatively, at least `#define` the scanf format somewhere up top, including the nice comment, and reuse that format string? ------------- Marked as reviewed by stuefe (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18201#pullrequestreview-2130943896 PR Review Comment: https://git.openjdk.org/jdk/pull/18201#discussion_r1647881202 PR Review Comment: https://git.openjdk.org/jdk/pull/18201#discussion_r1647925120 From stuefe at openjdk.org Thu Jun 20 18:13:15 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 20 Jun 2024 18:13:15 GMT Subject: RFR: 8333658: NMT: Use an allocator with 4-byte pointers to save memory in NativeCallStackStorage [v24] In-Reply-To: References: Message-ID: <8INosu20fiBl-PmW_AUM2KLt2emJGHH-0ENqgF6aoy8=.55fe9c14-3b7d-454c-8161-d0f6cdc2af1c@github.com> On Thu, 20 Jun 2024 13:06:38 GMT, Johan Sj?len wrote: >> Hi, >> >> This PR introduces a new allocator, `IndexedFreelistAllocator`. It uses a `GrowableArray` in order to have 4-byte "pointers" to its elements and also works as a freelist as unused slots form a linked list. This allocator cannot shrink its used memory. I'm always open for better names. >> >> We then use this allocator in order to store the `NativeCallStackStorage` hash table. This saves `4 * 4099 = ~16KiB` of memory as each table element is now only 4 bytes. For each stored stack it also saves 4 bytes. The fact that the allocator cannot shrink is fine, we do not ever remove stacks from the storage. >> >> The main point of this is to introduce a pointer-generic experimential API, so I also implemented CHeap and Arena allocators. It's currently placed in NMT, but we might want to move it into utilities. It uses a bit of template-magic, but my IDE (Emacs+clangd) was actually able to figure things out when the types didn't line up correctly etc, so it's not an enemy to IDE help. >> >> It sounded expensive to have the GrowableArray continously realloc its underlying data array, so I did a basic test where we allocate 1 000 000 stacks and push them into NativeCallStackStorage backed by different allocators. This is available in the PR. >> >> The results are as follows on linux-x64-slowdebug: >> >> >> Generate stacks... Done >> Time taken with GrowableArray: 8341.240945 >> Time taken with CHeap: 12189.031318 >> Time taken with Arena: 8800.703092 >> Time taken with GrowableArray again: 8295.508829 >> >> >> And on linux-x64: >> >> >> Time taken with GrowableArray: 8378.018135 >> Time taken with CHeap: 12437.347868 >> Time taken with Arena: 8758.064717 >> Time taken with GrowableArray again: 8391.076291 >> >> >> Obviously, this is a very basic benchmark, but it seems like we're faster than CHeap and Arena for this case. > > Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: > > Rename free to deallocate src/hotspot/share/nmt/nmtNativeCallStackStorage.hpp line 74: > 72: using Allocator = HomogenousObjectArray; > 73: using LinkPtr = typename Allocator::I; > 74: LinkPtr nil() { return Allocator::nil; } could we have some clearer names? - Link -> TableEntry or Entry (since its entries of the hashtable) - LinkPtr -> TableEntryIndex Also, could you put all data members of NCCS here, please? It makes it easier to see the implementation. The big boys here are _table, hash table entry storage, and stack storage. I don't think you need a nil(), at least not if you keep referring to Allocator::nil elsewhere. Btw, why do you prefer using over typedef? src/hotspot/share/nmt/nmtNativeCallStackStorage.hpp line 76: > 74: LinkPtr nil() { return Allocator::nil; } > 75: > 76: Allocator _allocator; "Allocator" is misleading. An allocator is typically a thing that provides alloc() and free() for you, but does not maintain the storage for you, thats up to you. Proposal: Allocator -> TableEntryStorage or EntryStorage. _allocator = _entry_storage. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18979#discussion_r1647953473 PR Review Comment: https://git.openjdk.org/jdk/pull/18979#discussion_r1647951546 From duke at openjdk.org Thu Jun 20 18:16:15 2024 From: duke at openjdk.org (duke) Date: Thu, 20 Jun 2024 18:16:15 GMT Subject: Withdrawn: 8327885: runtime/Unsafe/InternalErrorTest.java enters endless loop on Alpine aarch64 In-Reply-To: References: Message-ID: On Wed, 13 Mar 2024 07:34:11 GMT, Dmitry Cherepanov wrote: > [JDK-8322163](https://bugs.openjdk.org/browse/JDK-8322163) replaced memset with a for loop on Alpine. This fixed the test on Alpine x86_64 but it enters endless loop on Alpine aarch64. > > The loop causes SIGBUS to be generated and the signal handler continues to the next instruction. As gcc generates strb with auto-increment on aarch64, the increment will be skipped. > > The patch makes the counter volatile to prevent compilers from generating strb with auto-increment. With the patch, the test passes on Alpine aarch64. This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/18262 From duke at openjdk.org Thu Jun 20 18:35:13 2024 From: duke at openjdk.org (Volodymyr Paprotski) Date: Thu, 20 Jun 2024 18:35:13 GMT Subject: RFR: 8333583: Crypto-XDH.generateSecret regression after JDK-8329538 [v3] In-Reply-To: References: Message-ID: <_0jtDLz3WT2dPvhlE3oi8s3pRETfC38Uvng1wwu1y3w=.406d44cf-7821-4e2c-be26-3194016ab89d@github.com> On Mon, 17 Jun 2024 16:38:55 GMT, Volodymyr Paprotski wrote: >> This fix recovers XDH performance but removes some of the P256 gains (~-8-14%). Still faster, but not as much. >> >> The fix is to undo 'int' return type on mult()/square(), which allowed to return partially reduced result (e.g. this avoids extra reductions when mult() result is fed into addition). This is the behaviour before the Montgomery ECC PR. >> >> --- >> XDH.generateSecret performance >> before Montgomery PR: >> >> Benchmark (algorithm) (keyLength) (kpgAlgorithm) (provider) Mode Cnt Score Error Units >> KeyAgreementBench.XDH.generateSecret XDH 255 XDH thrpt 3 8435.277 ? 27.230 ops/s >> >> after Montgomery PR: >> >> Benchmark (algorithm) (keyLength) (kpgAlgorithm) (provider) Mode Cnt Score Error Units >> KeyAgreementBench.XDH.generateSecret XDH 255 XDH thrpt 3 8309.028 ? 22.071 ops/s >> >> with this PR: >> >> Benchmark (algorithm) (keyLength) (kpgAlgorithm) (provider) Mode Cnt Score Error Units >> KeyAgreementBench.XDH.generateSecret XDH 255 XDH thrpt 3 8491.268 ? 32.858 ops/s >> >> --- >> >> P256 performance with/without mult intrinsic: >> >> Performance before Montgomery PR: >> >> Benchmark (algorithm) (dataSize) (keyLength) (provider) Mode Cnt Score Error Units >> SignatureBench.ECDSA.sign SHA256withECDSA 1024 256 thrpt 3 6398.727 ? 7.400 ops/s >> SignatureBench.ECDSA.sign SHA256withECDSA 16384 256 thrpt 3 6129.739 ? 5.995 ops/s >> SignatureBench.ECDSA.verify SHA256withECDSA 1024 256 thrpt 3 1889.928 ? 54.660 ops/s >> SignatureBench.ECDSA.verify SHA256withECDSA 16384 256 thrpt 3 1866.339 ? 42.438 ops/s >> Benchmark (algorithm) (keyLength) (kpgAlgorithm) (provider) Mode Cnt Score Error Units >> o.o.b.j.c.full.KeyAgreementBench.EC.generateSecret ECDH 256 EC thrpt 3 1350.745 ? 28.514 ops/s >> o.o.b.j.c.small.KeyAgreementBench.EC.generateSecret ECDH 256 EC thrpt 3 1349.393 ? 32.050 ops/s >> >> Performance in master without mult() intrinsic >> >> Benchmark ... > > Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision: > > comment from Sandhya @ferakocz just tagging you as reminder of (the many) items in your queue :) Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/19728#issuecomment-2181297371 From never at openjdk.org Thu Jun 20 18:49:18 2024 From: never at openjdk.org (Tom Rodriguez) Date: Thu, 20 Jun 2024 18:49:18 GMT Subject: RFR: 8333300: [JVMCI] add support for generational ZGC [v5] In-Reply-To: References: Message-ID: On Thu, 20 Jun 2024 15:50:38 GMT, Tom Rodriguez wrote: >> This exposes the required values for JVMCI to support generational ZGC. It includes a few things worth mentioning. JVMCI still exports XBarrierSetRuntime as fields in CompilerToVM::Data under the original name of ZBarrierSetRuntime. I have exported the XBarrierSetRuntime and ZBarrierSetRuntime functions as addresses under their actual name. This permits backward compatibility until all the required parts are in place. We can eventually delete the CompilerToVM::Data names. >> >> I added ZBarrierSetRuntime::load_barrier_on_oop_array paralleling XBarrierSetRuntime::load_barrier_on_oop_array as we use that for a vector barrier. I could create the function as part of JVMCIRuntime if there are any concerns about including that in the ZGC core. > > Tom Rodriguez has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 13 commits: > > - Merge branch 'master' into tkr-genz > - fix spelling of JVMCI > - Merge branch 'master' into tkr-genz > - Merge remote-tracking branch 'origin/master' into tkr-genz > - Merge branch 'tkr-genz' of github.com:tkrodriguez/jdk into tkr-genz > - Fix riscv compilation > - 8333300: [JVMCI] add support for generational ZGC > - Merge branch 'master' into tkr-genz > - Merge branch 'master' into tkr-genz > - Use NativeAccess to read from handles > - ... and 3 more: https://git.openjdk.org/jdk/compare/fad6644e...b4a82828 I've tested against latest bits and everything was clean. Thanks for the reviews. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19490#issuecomment-2181317032 From never at openjdk.org Thu Jun 20 18:49:19 2024 From: never at openjdk.org (Tom Rodriguez) Date: Thu, 20 Jun 2024 18:49:19 GMT Subject: Integrated: 8333300: [JVMCI] add support for generational ZGC In-Reply-To: References: Message-ID: On Thu, 30 May 2024 20:37:09 GMT, Tom Rodriguez wrote: > This exposes the required values for JVMCI to support generational ZGC. It includes a few things worth mentioning. JVMCI still exports XBarrierSetRuntime as fields in CompilerToVM::Data under the original name of ZBarrierSetRuntime. I have exported the XBarrierSetRuntime and ZBarrierSetRuntime functions as addresses under their actual name. This permits backward compatibility until all the required parts are in place. We can eventually delete the CompilerToVM::Data names. > > I added ZBarrierSetRuntime::load_barrier_on_oop_array paralleling XBarrierSetRuntime::load_barrier_on_oop_array as we use that for a vector barrier. I could create the function as part of JVMCIRuntime if there are any concerns about including that in the ZGC core. This pull request has now been integrated. Changeset: 187710e1 Author: Tom Rodriguez URL: https://git.openjdk.org/jdk/commit/187710e1c1714ba28c7802efd4f7bb32a366d79d Stats: 245 lines in 16 files changed: 194 ins; 10 del; 41 mod 8333300: [JVMCI] add support for generational ZGC Reviewed-by: dnsimon, kvn, eosterlund ------------- PR: https://git.openjdk.org/jdk/pull/19490 From jsjolen at openjdk.org Thu Jun 20 20:06:17 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Thu, 20 Jun 2024 20:06:17 GMT Subject: RFR: 8333658: NMT: Use an allocator with 4-byte pointers to save memory in NativeCallStackStorage [v24] In-Reply-To: <8INosu20fiBl-PmW_AUM2KLt2emJGHH-0ENqgF6aoy8=.55fe9c14-3b7d-454c-8161-d0f6cdc2af1c@github.com> References: <8INosu20fiBl-PmW_AUM2KLt2emJGHH-0ENqgF6aoy8=.55fe9c14-3b7d-454c-8161-d0f6cdc2af1c@github.com> Message-ID: <4MgYFP3R22aMvW6RjmTPAwz5XSrufKvfRxm_-SrwdKI=.17463f91-227c-4575-bba3-0d3fc3f40897@github.com> On Thu, 20 Jun 2024 18:02:08 GMT, Thomas Stuefe wrote: >> Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: >> >> Rename free to deallocate > > src/hotspot/share/nmt/nmtNativeCallStackStorage.hpp line 74: > >> 72: using Allocator = HomogenousObjectArray; >> 73: using LinkPtr = typename Allocator::I; >> 74: LinkPtr nil() { return Allocator::nil; } > > could we have some clearer names? > > - Link -> TableEntry or Entry (since its entries of the hashtable) > - LinkPtr -> TableEntryIndex > > Also, could you put all data members of NCCS here, please? It makes it easier to see the implementation. The big boys here are _table, hash table entry storage, and stack storage. > > I don't think you need a nil(), at least not if you keep referring to Allocator::nil elsewhere. > > Btw, why do you prefer using over typedef? All suggestions sounds fine > Btw, why do you prefer using over typedef? It's helpful that the name sticks out a lot more than for typedef when doing stuff like function pointer or array aliases, the name is always left of `=`. `using` also supports templates (you can do `template using x = array`). That's why you might prefer `using` to `typedef`. I prefer it because that's what I'm used to using, boring but true. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18979#discussion_r1648068972 From jbhateja at openjdk.org Thu Jun 20 23:38:17 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 20 Jun 2024 23:38:17 GMT Subject: RFR: 8329032: C2 compiler register allocation support for APX EGPRs [v6] In-Reply-To: References: <835IiFAjWPI9AoU9j1WSAhVY2EEHzjpvWucUXTGxgUw=.cdb547b2-e1e7-4c5b-915f-7fde46e26c81@github.com> Message-ID: On Tue, 18 Jun 2024 16:49:43 GMT, Vladimir Kozlov wrote: >> Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 14 commits: >> >> - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8329032 >> - 32-bit build fix. >> - Review comments resolutions. >> - jvmci test failures fixes >> - 32-bit build fixes. >> - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8329032 >> - Changes to skip over stack alignment gaps while popping registers using POP2 after comment from sviswa7 >> - 32 bit build fix and enforced stack alignment constraints. >> - Support new PUSH2/POP2 instructions along with Push-Pop Acceleration (PPX) to optimize register save/restore operation. >> - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8329032 >> - ... and 4 more: https://git.openjdk.org/jdk/compare/6f860f8f...8db22672 > > I start testing Thanks @vnkozlov and @sviswa7 for approvals. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19042#issuecomment-2181718623 From jbhateja at openjdk.org Thu Jun 20 23:38:17 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 20 Jun 2024 23:38:17 GMT Subject: Integrated: 8329032: C2 compiler register allocation support for APX EGPRs In-Reply-To: <835IiFAjWPI9AoU9j1WSAhVY2EEHzjpvWucUXTGxgUw=.cdb547b2-e1e7-4c5b-915f-7fde46e26c81@github.com> References: <835IiFAjWPI9AoU9j1WSAhVY2EEHzjpvWucUXTGxgUw=.cdb547b2-e1e7-4c5b-915f-7fde46e26c81@github.com> Message-ID: On Wed, 1 May 2024 18:42:13 GMT, Jatin Bhateja wrote: > Intel? Advanced Performance Extensions (Intel? APX) adds 16 new 64 bit general purpose register also known as Extended General Purpose Registers in IA-32e 64 bit mode. > > Summary of changes introduced along with this patch:- > > 1. C2 compiler register allocation support. > 2. Architecture state save restoration while transitioning from C1/C2 JIT compiled code to runtime services. > 3. Support new PUSH2/POP2 instructions along with push-pop acceleration hints (PPX) to optimize register save/restore operation. > 4. Applicable extensions to native interface used by runtime for patching instruction. > > We plan to address C1 register support in subsequent patch as there are hard upper bound allocation limits > (currently set to r11) imposed by existing implementation of linear scan algorithm after which it reserves > remaining register for special purpose. > > Patch has been regressed over stand alone test points after prioritizing EGPR allocations over existing GPR register by manually modifying the register sequences in relevant allocation class. > > We plan to do thorough validation using [Intel's SDE](https://www.intel.com/content/www/us/en/download/684897/intel-software-development-emulator.html) during course of time and release incremental patches for bug fixes > found during testing. > > Kindly review and share your feedback. > > Best Regards, > Jatin This pull request has now been integrated. Changeset: e5de26dd Author: Jatin Bhateja URL: https://git.openjdk.org/jdk/commit/e5de26ddf0550da9e6d074d5b9ab4a943170adca Stats: 790 lines in 26 files changed: 605 ins; 53 del; 132 mod 8329032: C2 compiler register allocation support for APX EGPRs Reviewed-by: kvn, sviswanathan ------------- PR: https://git.openjdk.org/jdk/pull/19042 From jiangli at openjdk.org Fri Jun 21 00:30:18 2024 From: jiangli at openjdk.org (Jiangli Zhou) Date: Fri, 21 Jun 2024 00:30:18 GMT Subject: RFR: 8333268: Fixes for static build [v2] In-Reply-To: <0dEUfxGGkUTfm3TPCNbBxREmGZScyLCXwKv9-7AFf3M=.b69446a9-0828-4a99-a677-8f948ea612b6@github.com> References: <0dEUfxGGkUTfm3TPCNbBxREmGZScyLCXwKv9-7AFf3M=.b69446a9-0828-4a99-a677-8f948ea612b6@github.com> Message-ID: On Tue, 18 Jun 2024 17:57:29 GMT, Magnus Ihse Bursie wrote: >> Magnus Ihse Bursie has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: >> >> - Merge branch 'master' into static-linking-progress >> - Merge branch 'master' into static-linking-progress >> - Move the exported JVM_IsStaticallyLinked to a better location >> - Use runtime lookup of static vs dynamic instead of #ifdef STATIC_BUILD >> - Copy fix for init_system_properties_values on linux >> - Make sure we do not try to build static libraries on Windows >> - 8333268: Fixes for static build > > src/hotspot/os/linux/os_linux.cpp line 605: > >> 603: >> 604: // Get rid of /{client|server|hotspot}, if binary is libjvm.so. >> 605: // Or, cut off /. > > @jianglizhou This code is based on changes in the Hermetic Java repo, but I do not fully understand neither the comment nor what the purpose is. If you could explain this a bit I'd be grateful. The specific related commit in the hermetic Java branch is https://github.com/openjdk/leyden/commit/53aa8f0cf418ab5f435a4b9996c7754fb8505d4b. The change in os_linux.cpp here is to make sure that the libjvm.so related path manipulation is conditionally done only. The check at line 599 looks for "/libjvm.so" substring, so we only chop off (`*pslash = `\0` at line 601) that part when necessary. In the static JDK case, there is no `libjvm.so` and the path string is `/bin/javastatic`, which should not be affected. Otherwise, it could fail. I found the code was not very easy to follow when running into problems and fixing for static support. So I added a bit more comments in the code here. The comment above about `/{client|server|hotspot}` was there originally. I think we no longer have those directories. We can cleanup that later, since it needs some more testing. @magicus, thanks a lot for extracting/reworking/cleaning-up the static Java changes from the hermetic Java branch. That's a substantial amount of work! I have one quick comment about the removal of `STATIC_LIB_EXCLUDE_OBJS` changes. Will post it as separate comment for the related code. I'll also look closely of the vm & jdk changes and compare those with the changes in the hermetic Java branch this week. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19478#discussion_r1648283151 From jiangli at openjdk.org Fri Jun 21 00:48:12 2024 From: jiangli at openjdk.org (Jiangli Zhou) Date: Fri, 21 Jun 2024 00:48:12 GMT Subject: RFR: 8333268: Fixes for static build [v4] In-Reply-To: References: Message-ID: On Wed, 19 Jun 2024 15:15:43 GMT, Magnus Ihse Bursie wrote: >> This patch contains a set of changes to improve static builds. They will pave the way for implementing a full static-only java launcher. The changes here will: >> >> 1) Make sure non-exported symbols are made local in the static libraries. This means that the risk of symbol conflict is the same for static libraries as for dynamic libraries (i.e. in practice zero, as long as a consistent naming scheme is used for exported functions). >> >> 2) Remove the work-arounds to exclude duplicated symbols. >> >> 3) Fix some code in hotspot and the JDK libraries that did not work properly with a static java launcher. >> >> The latter fixes are copied from or inspired by the work done by @jianglizhou and her team as part of the Project Leyden [Hermetic Java](https://github.com/openjdk/leyden/tree/hermetic-java-runtime). > > Magnus Ihse Bursie has updated the pull request incrementally with one additional commit since the last revision: > > Add dummy implementation of os::lookup_function for Windows make/modules/java.base/lib/CoreLibraries.gmk line 148: > 146: $(LIBJLI_EXTRA_FILE_LIST)) > 147: > 148: # Do not include these libz objects in the static libjli library. Why this is no longer needed? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19478#discussion_r1648290693 From jiangli at openjdk.org Fri Jun 21 00:51:17 2024 From: jiangli at openjdk.org (Jiangli Zhou) Date: Fri, 21 Jun 2024 00:51:17 GMT Subject: RFR: 8333268: Fixes for static build [v4] In-Reply-To: References: Message-ID: On Wed, 19 Jun 2024 15:15:43 GMT, Magnus Ihse Bursie wrote: >> This patch contains a set of changes to improve static builds. They will pave the way for implementing a full static-only java launcher. The changes here will: >> >> 1) Make sure non-exported symbols are made local in the static libraries. This means that the risk of symbol conflict is the same for static libraries as for dynamic libraries (i.e. in practice zero, as long as a consistent naming scheme is used for exported functions). >> >> 2) Remove the work-arounds to exclude duplicated symbols. >> >> 3) Fix some code in hotspot and the JDK libraries that did not work properly with a static java launcher. >> >> The latter fixes are copied from or inspired by the work done by @jianglizhou and her team as part of the Project Leyden [Hermetic Java](https://github.com/openjdk/leyden/tree/hermetic-java-runtime). > > Magnus Ihse Bursie has updated the pull request incrementally with one additional commit since the last revision: > > Add dummy implementation of os::lookup_function for Windows make/modules/java.desktop/lib/AwtLibraries.gmk line 257: > 255: JDK_LIBS := libawt java.base:libjava, \ > 256: LIBS_unix := $(LIBDL) $(LIBM) $(X_LIBS) -lX11 -lXext -lXi -lXrender \ > 257: -lXtst, \ Same question as for the STATIC_LIB_EXCLUDE_OBJS change with `libjli`. Are the duplicate symbol issues resolved by symbol hiding? I think it's still better to not include those .o files to avoid unnecessary footprint overhead in the binary. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19478#discussion_r1648292220 From jiangli at openjdk.org Fri Jun 21 00:54:12 2024 From: jiangli at openjdk.org (Jiangli Zhou) Date: Fri, 21 Jun 2024 00:54:12 GMT Subject: RFR: 8333268: Fixes for static build [v4] In-Reply-To: References: Message-ID: <1fgMACYXrz2wEFaJ22HNdkjUu5MGpVjUu7oG14oFvzc=.7f6bb2d8-4ad8-4e2c-a781-182e91908d07@github.com> On Wed, 19 Jun 2024 15:15:43 GMT, Magnus Ihse Bursie wrote: >> This patch contains a set of changes to improve static builds. They will pave the way for implementing a full static-only java launcher. The changes here will: >> >> 1) Make sure non-exported symbols are made local in the static libraries. This means that the risk of symbol conflict is the same for static libraries as for dynamic libraries (i.e. in practice zero, as long as a consistent naming scheme is used for exported functions). >> >> 2) Remove the work-arounds to exclude duplicated symbols. >> >> 3) Fix some code in hotspot and the JDK libraries that did not work properly with a static java launcher. >> >> The latter fixes are copied from or inspired by the work done by @jianglizhou and her team as part of the Project Leyden [Hermetic Java](https://github.com/openjdk/leyden/tree/hermetic-java-runtime). > > Magnus Ihse Bursie has updated the pull request incrementally with one additional commit since the last revision: > > Add dummy implementation of os::lookup_function for Windows make/common/native/Link.gmk line 72: > 70: define CreateStaticLibrary > 71: # Include partial linking when building the static library with clang on linux > 72: ifeq ($(call isTargetOs, linux macosx), true) Is this mainly for `clang` support for now? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19478#discussion_r1648293391 From fyang at openjdk.org Fri Jun 21 02:19:14 2024 From: fyang at openjdk.org (Fei Yang) Date: Fri, 21 Jun 2024 02:19:14 GMT Subject: RFR: 8334505: RISC-V: Several tests fail when MaxVectorSize does not match VM_Version::_initial_vector_length In-Reply-To: References: Message-ID: <4q1skSGwBys--eq_ph1CkoEz3WlBLAyCHauNP3ZNUEs=.39943e95-cb84-43f9-bd6e-b269519d227c@github.com> On Wed, 19 Jun 2024 04:21:24 GMT, Gui Cao wrote: > HI, It's possible to specify a MaxVectorSize which is not equal to VM_Version::_initial_vector_length on RISC-V. For example, it could happen on Banana-Pi that MaxVectorSize equals 16, while VM_Version::_initial_vector_length is 32. This may lead to several jtreg test failures, see jbs issue for exception information. > > The reason for this problem is that when spill vector registers into memory, the whole width of the register is used incorrectly, and MaxVectorSize should be used to handle the number of elements spill. > > https://github.com/openjdk/jdk/blob/326dbb1b139dd1ec1b8605339b91697cdf49da9a/src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.hpp#L133-L136 > > PR propose to simply set MaxVectorSize to VM_Version::_initial_vector_length for the following reasons: > 1. The CSR_VLENB register of RISC-V is read-only, we can't change it to MaxVectorSize like like aarch64. > 2. It does not make sense to me to set MaxVectorSize to a value smaller than VM_Version::_initial_vector_length in the real world, which might bring negative impact on performance. > 3. If MaxVectorSize equals to VM_Version::_initial_vector_length, then we can make use of vs1r_v/vl1r_v when saving and restoring vector registers, which avoids the need to control the number of elements with vsetvli. > > After this patch, MaxVectorSize always equal to VM_Version::_initial_vector_length: > > zifeihan at plct-c8:~/jdk/build/linux-riscv64-server-fastdebug/jdk/bin$ ./java -XX:MaxVectorSize=16 -XX:+PrintFlagsFinal -version |grep MaxVectorSize > OpenJDK 64-Bit Server VM warning: MaxVectorSize is set to 32 on this platform > intx MaxVectorSize = 32 {C2 product} {command line} > openjdk version "24-internal" 2025-03-18 > OpenJDK Runtime Environment (fastdebug build 24-internal-adhoc.zifeihan.jdk) > OpenJDK 64-Bit Server VM (fastdebug build 24-internal-adhoc.zifeihan.jdk, mixed mode) > zifeihan at plct-c8:~/jdk/build/linux-riscv64-server-fastdebug/jdk/bin$ ./java -XX:MaxVectorSize=32 -XX:+PrintFlagsFinal -version |grep MaxVectorSize > intx MaxVectorSize = 32 {C2 product} {command line} > openjdk version "24-internal" 2025-03-18 > OpenJDK Runtime Environment (fastdebug build 24-internal-adhoc.zifeihan.jdk) > OpenJDK 64-Bit Server VM (fastdebug build 24-internal-adhoc.zifeihan.jdk, mixed mode) > zifeihan at plct-c8:~/jdk/build/linux-riscv64-server-fastdebug/jdk/bin$ ./java -XX:MaxVectorSize=64 -XX:+PrintFlagsFinal -vers... Changes requested by fyang (Reviewer). src/hotspot/cpu/riscv/vm_version_riscv.cpp line 331: > 329: } else { > 330: if (!FLAG_IS_DEFAULT(MaxVectorSize) && MaxVectorSize != _initial_vector_length) { > 331: warning("MaxVectorSize is set to %i on this platform", _initial_vector_length); Suggestion: `warning("Current system does not support RVV vector length for MaxVectorSize %d. Set MaxVectorSize to %d", (int)MaxVectorSize, _initial_vector_length)` ------------- PR Review: https://git.openjdk.org/jdk/pull/19785#pullrequestreview-2131666362 PR Review Comment: https://git.openjdk.org/jdk/pull/19785#discussion_r1648332116 From gcao at openjdk.org Fri Jun 21 02:46:39 2024 From: gcao at openjdk.org (Gui Cao) Date: Fri, 21 Jun 2024 02:46:39 GMT Subject: RFR: 8334505: RISC-V: Several tests fail when MaxVectorSize does not match VM_Version::_initial_vector_length [v2] In-Reply-To: References: Message-ID: > HI, It's possible to specify a MaxVectorSize which is not equal to VM_Version::_initial_vector_length on RISC-V. For example, it could happen on Banana-Pi that MaxVectorSize equals 16, while VM_Version::_initial_vector_length is 32. This may lead to several jtreg test failures, see jbs issue for exception information. > > The reason for this problem is that when spill vector registers into memory, the whole width of the register is used incorrectly, and MaxVectorSize should be used to handle the number of elements spill. > > https://github.com/openjdk/jdk/blob/326dbb1b139dd1ec1b8605339b91697cdf49da9a/src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.hpp#L133-L136 > > PR propose to simply set MaxVectorSize to VM_Version::_initial_vector_length for the following reasons: > 1. The CSR_VLENB register of RISC-V is read-only, we can't change it to MaxVectorSize like like aarch64. > 2. It does not make sense to me to set MaxVectorSize to a value smaller than VM_Version::_initial_vector_length in the real world, which might bring negative impact on performance. > 3. If MaxVectorSize equals to VM_Version::_initial_vector_length, then we can make use of vs1r_v/vl1r_v when saving and restoring vector registers, which avoids the need to control the number of elements with vsetvli. > > After this patch, MaxVectorSize always equal to VM_Version::_initial_vector_length: > > zifeihan at plct-c8:~/jdk/build/linux-riscv64-server-fastdebug/jdk/bin$ ./java -XX:MaxVectorSize=16 -XX:+PrintFlagsFinal -version |grep MaxVectorSize > OpenJDK 64-Bit Server VM warning: MaxVectorSize is set to 32 on this platform > intx MaxVectorSize = 32 {C2 product} {command line} > openjdk version "24-internal" 2025-03-18 > OpenJDK Runtime Environment (fastdebug build 24-internal-adhoc.zifeihan.jdk) > OpenJDK 64-Bit Server VM (fastdebug build 24-internal-adhoc.zifeihan.jdk, mixed mode) > zifeihan at plct-c8:~/jdk/build/linux-riscv64-server-fastdebug/jdk/bin$ ./java -XX:MaxVectorSize=32 -XX:+PrintFlagsFinal -version |grep MaxVectorSize > intx MaxVectorSize = 32 {C2 product} {command line} > openjdk version "24-internal" 2025-03-18 > OpenJDK Runtime Environment (fastdebug build 24-internal-adhoc.zifeihan.jdk) > OpenJDK 64-Bit Server VM (fastdebug build 24-internal-adhoc.zifeihan.jdk, mixed mode) > zifeihan at plct-c8:~/jdk/build/linux-riscv64-server-fastdebug/jdk/bin$ ./java -XX:MaxVectorSize=64 -XX:+PrintFlagsFinal -vers... Gui Cao has updated the pull request incrementally with one additional commit since the last revision: Update warning log ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19785/files - new: https://git.openjdk.org/jdk/pull/19785/files/096354b6..16428e74 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19785&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19785&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/19785.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19785/head:pull/19785 PR: https://git.openjdk.org/jdk/pull/19785 From gcao at openjdk.org Fri Jun 21 02:49:09 2024 From: gcao at openjdk.org (Gui Cao) Date: Fri, 21 Jun 2024 02:49:09 GMT Subject: RFR: 8334505: RISC-V: Several tests fail when MaxVectorSize does not match VM_Version::_initial_vector_length [v2] In-Reply-To: <4q1skSGwBys--eq_ph1CkoEz3WlBLAyCHauNP3ZNUEs=.39943e95-cb84-43f9-bd6e-b269519d227c@github.com> References: <4q1skSGwBys--eq_ph1CkoEz3WlBLAyCHauNP3ZNUEs=.39943e95-cb84-43f9-bd6e-b269519d227c@github.com> Message-ID: On Fri, 21 Jun 2024 02:16:01 GMT, Fei Yang wrote: > Suggestion: `warning("Current system does not support RVV vector length for MaxVectorSize %d. Set MaxVectorSize to %d", (int)MaxVectorSize, _initial_vector_length)` Fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19785#discussion_r1648347922 From gcao at openjdk.org Fri Jun 21 03:08:22 2024 From: gcao at openjdk.org (Gui Cao) Date: Fri, 21 Jun 2024 03:08:22 GMT Subject: RFR: 8334505: RISC-V: Several tests fail when MaxVectorSize does not match VM_Version::_initial_vector_length [v3] In-Reply-To: References: Message-ID: > HI, It's possible to specify a MaxVectorSize which is not equal to VM_Version::_initial_vector_length on RISC-V. For example, it could happen on Banana-Pi that MaxVectorSize equals 16, while VM_Version::_initial_vector_length is 32. This may lead to several jtreg test failures, see jbs issue for exception information. > > The reason for this problem is that when spill vector registers into memory, the whole width of the register is used incorrectly, and MaxVectorSize should be used to handle the number of elements spill. > > https://github.com/openjdk/jdk/blob/326dbb1b139dd1ec1b8605339b91697cdf49da9a/src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.hpp#L133-L136 > > PR propose to simply set MaxVectorSize to VM_Version::_initial_vector_length for the following reasons: > 1. The CSR_VLENB register of RISC-V is read-only, we can't change it to MaxVectorSize like like aarch64. > 2. It does not make sense to me to set MaxVectorSize to a value smaller than VM_Version::_initial_vector_length in the real world, which might bring negative impact on performance. > 3. If MaxVectorSize equals to VM_Version::_initial_vector_length, then we can make use of vs1r_v/vl1r_v when saving and restoring vector registers, which avoids the need to control the number of elements with vsetvli. > > After this patch, MaxVectorSize always equal to VM_Version::_initial_vector_length: > > zifeihan at plct-c8:~/jdk/build/linux-riscv64-server-fastdebug/jdk/bin$ ./java -XX:MaxVectorSize=16 -XX:+PrintFlagsFinal -version |grep MaxVectorSize > OpenJDK 64-Bit Server VM warning: MaxVectorSize is set to 32 on this platform > intx MaxVectorSize = 32 {C2 product} {command line} > openjdk version "24-internal" 2025-03-18 > OpenJDK Runtime Environment (fastdebug build 24-internal-adhoc.zifeihan.jdk) > OpenJDK 64-Bit Server VM (fastdebug build 24-internal-adhoc.zifeihan.jdk, mixed mode) > zifeihan at plct-c8:~/jdk/build/linux-riscv64-server-fastdebug/jdk/bin$ ./java -XX:MaxVectorSize=32 -XX:+PrintFlagsFinal -version |grep MaxVectorSize > intx MaxVectorSize = 32 {C2 product} {command line} > openjdk version "24-internal" 2025-03-18 > OpenJDK Runtime Environment (fastdebug build 24-internal-adhoc.zifeihan.jdk) > OpenJDK 64-Bit Server VM (fastdebug build 24-internal-adhoc.zifeihan.jdk, mixed mode) > zifeihan at plct-c8:~/jdk/build/linux-riscv64-server-fastdebug/jdk/bin$ ./java -XX:MaxVectorSize=64 -XX:+PrintFlagsFinal -vers... Gui Cao has updated the pull request incrementally with two additional commits since the last revision: - Polishing - Update warning log ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19785/files - new: https://git.openjdk.org/jdk/pull/19785/files/16428e74..75fefaf2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19785&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19785&range=01-02 Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/19785.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19785/head:pull/19785 PR: https://git.openjdk.org/jdk/pull/19785 From fyang at openjdk.org Fri Jun 21 03:08:22 2024 From: fyang at openjdk.org (Fei Yang) Date: Fri, 21 Jun 2024 03:08:22 GMT Subject: RFR: 8334505: RISC-V: Several tests fail when MaxVectorSize does not match VM_Version::_initial_vector_length [v3] In-Reply-To: References: Message-ID: On Fri, 21 Jun 2024 03:06:06 GMT, Gui Cao wrote: >> HI, It's possible to specify a MaxVectorSize which is not equal to VM_Version::_initial_vector_length on RISC-V. For example, it could happen on Banana-Pi that MaxVectorSize equals 16, while VM_Version::_initial_vector_length is 32. This may lead to several jtreg test failures, see jbs issue for exception information. >> >> The reason for this problem is that when spill vector registers into memory, the whole width of the register is used incorrectly, and MaxVectorSize should be used to handle the number of elements spill. >> >> https://github.com/openjdk/jdk/blob/326dbb1b139dd1ec1b8605339b91697cdf49da9a/src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.hpp#L133-L136 >> >> PR propose to simply set MaxVectorSize to VM_Version::_initial_vector_length for the following reasons: >> 1. The CSR_VLENB register of RISC-V is read-only, we can't change it to MaxVectorSize like like aarch64. >> 2. It does not make sense to me to set MaxVectorSize to a value smaller than VM_Version::_initial_vector_length in the real world, which might bring negative impact on performance. >> 3. If MaxVectorSize equals to VM_Version::_initial_vector_length, then we can make use of vs1r_v/vl1r_v when saving and restoring vector registers, which avoids the need to control the number of elements with vsetvli. >> >> After this patch, MaxVectorSize always equal to VM_Version::_initial_vector_length: >> >> zifeihan at plct-c8:~/jdk/build/linux-riscv64-server-fastdebug/jdk/bin$ ./java -XX:MaxVectorSize=16 -XX:+PrintFlagsFinal -version |grep MaxVectorSize >> OpenJDK 64-Bit Server VM warning: MaxVectorSize is set to 32 on this platform >> intx MaxVectorSize = 32 {C2 product} {command line} >> openjdk version "24-internal" 2025-03-18 >> OpenJDK Runtime Environment (fastdebug build 24-internal-adhoc.zifeihan.jdk) >> OpenJDK 64-Bit Server VM (fastdebug build 24-internal-adhoc.zifeihan.jdk, mixed mode) >> zifeihan at plct-c8:~/jdk/build/linux-riscv64-server-fastdebug/jdk/bin$ ./java -XX:MaxVectorSize=32 -XX:+PrintFlagsFinal -version |grep MaxVectorSize >> intx MaxVectorSize = 32 {C2 product} {command line} >> openjdk version "24-internal" 2025-03-18 >> OpenJDK Runtime Environment (fastdebug build 24-internal-adhoc.zifeihan.jdk) >> OpenJDK 64-Bit Server VM (fastdebug build 24-internal-adhoc.zifeihan.jdk, mixed mode) >> zifeihan at plct-c8:~/jdk/build/linux-riscv64-server-fastdebug/jdk/bin$ ./ja... > > Gui Cao has updated the pull request incrementally with two additional commits since the last revision: > > - Polishing > - Update warning log Marked as reviewed by fyang (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/19785#pullrequestreview-2131703076 From fyang at openjdk.org Fri Jun 21 03:30:11 2024 From: fyang at openjdk.org (Fei Yang) Date: Fri, 21 Jun 2024 03:30:11 GMT Subject: RFR: 8332689: RISC-V: Use load instead of trampolines [v12] In-Reply-To: References: Message-ID: On Thu, 20 Jun 2024 11:17:19 GMT, Robbin Ehn wrote: >> src/hotspot/cpu/riscv/codeBuffer_riscv.cpp line 74: >> >>> 72: >>> 73: assert(requests->number_of_entries() >= 1, "at least one"); >>> 74: const int total_requested_size = MacroAssembler::max_patchable_far_call_stub_size() * requests->number_of_entries(); >> >> I see mixed uses of bot `MacroAssembler::max_patchable_far_call_stub_size()` and `MacroAssembler::NativeShortCall::trampoline_size` in this function. As this is only used under `UseTrampolines`, seems more reasonable to use `MacroAssembler::NativeShortCall::trampoline_size`. > > As you see in diff the mixing is pre-exsisting, I only changed names. > > Fixed. Ah, I see. I think you are right in using `MacroAssembler::max_patchable_far_call_stub_size()` at places where we call `MacroAssembler::max_trampoline_stub_size()` previously. Could you please revert this part? I think I miss-read the code before. Sorry. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19453#discussion_r1648369390 From jiefu at openjdk.org Fri Jun 21 06:21:16 2024 From: jiefu at openjdk.org (Jie Fu) Date: Fri, 21 Jun 2024 06:21:16 GMT Subject: RFR: 8333300: [JVMCI] add support for generational ZGC [v5] In-Reply-To: References: Message-ID: On Thu, 20 Jun 2024 15:50:38 GMT, Tom Rodriguez wrote: >> This exposes the required values for JVMCI to support generational ZGC. It includes a few things worth mentioning. JVMCI still exports XBarrierSetRuntime as fields in CompilerToVM::Data under the original name of ZBarrierSetRuntime. I have exported the XBarrierSetRuntime and ZBarrierSetRuntime functions as addresses under their actual name. This permits backward compatibility until all the required parts are in place. We can eventually delete the CompilerToVM::Data names. >> >> I added ZBarrierSetRuntime::load_barrier_on_oop_array paralleling XBarrierSetRuntime::load_barrier_on_oop_array as we use that for a vector barrier. I could create the function as part of JVMCIRuntime if there are any concerns about including that in the ZGC core. > > Tom Rodriguez has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 13 commits: > > - Merge branch 'master' into tkr-genz > - fix spelling of JVMCI > - Merge branch 'master' into tkr-genz > - Merge remote-tracking branch 'origin/master' into tkr-genz > - Merge branch 'tkr-genz' of github.com:tkrodriguez/jdk into tkr-genz > - Fix riscv compilation > - 8333300: [JVMCI] add support for generational ZGC > - Merge branch 'master' into tkr-genz > - Merge branch 'master' into tkr-genz > - Use NativeAccess to read from handles > - ... and 3 more: https://git.openjdk.org/jdk/compare/fad6644e...b4a82828 Hi, please see: https://github.com/openjdk/jdk/pull/19818 Thanks. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19490#issuecomment-2182079733 From fyang at openjdk.org Fri Jun 21 07:39:14 2024 From: fyang at openjdk.org (Fei Yang) Date: Fri, 21 Jun 2024 07:39:14 GMT Subject: RFR: 8332689: RISC-V: Use load instead of trampolines [v14] In-Reply-To: References: Message-ID: <6EvPetzLpHyHVD5tFoYg19hx9wbAkw1Pi3LoZFSp9yY=.a7dd6cef-532c-4a42-a09a-4a81c04e09a7@github.com> On Thu, 20 Jun 2024 11:24:24 GMT, Robbin Ehn wrote: >> Hi all, please consider! >> >> Today we do JAL to **dest** if **dest** is in reach (+/- 1 MB). >> Using a very small application or running very short time we have fast patchable calls. >> But any normal application running longer will increase the code size and code chrun/fragmentation. >> So whatever or not you get hot fast calls rely on luck. >> >> To be patchable and get code cache reach we also emit a stub trampoline which we can point the JAL to. >> This would be the common case for a patchable call. >> >> Code stream: >> JAL >> Stubs: >> AUIPC >> LD >> JALR >> >> >> >> On some CPUs L1D and L1I can't contain the same cache line, which means the tramopline stub can bounce from L1I->L1D->L1I, which is expensive. >> Even if you don't have that problem having a call to a jump is not the fastest way. >> Loading the address avoids the pitsfalls of cmodx. >> >> This patch suggest to solve the problems with trampolines, we take small penalty in the naive case of JAL to **dest**, >> and instead do by default: >> >> Code stream: >> AUIPC >> LD >> JALR >> Stubs: >> >> >> An experimental option for turning trampolines back on exists. >> >> It should be possible to enhanced this with the WIP [Zjid](https://github.com/riscv/riscv-j-extension) by changing the JALR to JAL and nop out the auipc+ld (as the current proposal of Zjid forces the I-fetcher to fetch instruction in order (meaning we will avoid a lot issues which arm has)) when in reach and vice-versa. >> >> Numbers from VF2 (I have done them a few times, they are always overall in favor of this patch): >> >> fop (msec) 2239 | 2128 = 0.950424 >> h2 (msec) 18660 | 16594 = 0.889282 >> jython (msec) 22022 | 21925 = 0.995595 >> luindex (msec) 2866 | 2842 = 0.991626 >> lusearch (msec) 4108 | 4311 = 1.04942 >> lusearch-fix (msec) 4406 | 4116 = 0.934181 >> pmd (msec) 5976 | 5897 = 0.98678 >> jython (msec) 22022 | 21925 = 0.995595 >> Avg: 0.974112 >> fop(xcomp) (msec) 2721 | 2714 = 0.997427 >> h2(xcomp) ... > > Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 22 commits: > > - Minor review comments > - Merge branch 'master' into 8332689 > - To be pushed > - Merge branch 'master' into 8332689 > - Review comments, removed dead code. > - Merge branch 'master' into 8332689 > - Merge branch 'master' into 8332689 > - Merge branch 'master' into 8332689 > - Merge branch 'master' into 8332689 > - Merge branch 'master' into 8332689 > - ... and 12 more: https://git.openjdk.org/jdk/compare/d7dad50a...e47f2454 src/hotspot/cpu/riscv/compiledIC_riscv.cpp line 74: > 72: // Somewhat pessimistically, we count 4 instructions here (although > 73: // there are only 3) because we sometimes emit an alignment nop. > 74: // Trampoline stubs are always word aligned. Seems the code comment needs update to reflect this change. src/hotspot/cpu/riscv/riscv.ad line 1244: > 1242: return 1 * NativeInstruction::instruction_size; // jal > 1243: } > 1244: return 3 * NativeInstruction::instruction_size; // auipc + ld + jalr Question: As we will only patch the address in the stub, do we still need the handling in compute_padding (`CallStaticJavaDirectNode::compute_padding` & `CallDynamicJavaDirectNode::compute_padding`) when `UseTrampolines` is false? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19453#discussion_r1648491039 PR Review Comment: https://git.openjdk.org/jdk/pull/19453#discussion_r1648542886 From stuefe at openjdk.org Fri Jun 21 10:02:15 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Fri, 21 Jun 2024 10:02:15 GMT Subject: RFR: 8334223: Make Arena MEMFLAGs immutable [v3] In-Reply-To: References: Message-ID: On Wed, 19 Jun 2024 06:37:32 GMT, Thomas Stuefe wrote: >> Arenas carry NMT flags. >> >> An arena should never change that flag. But it does: Arenas (as ResourceAreas), used by CompilerThread, are accounted toward mtCompiler. But since the RA is already created in the parent class constructor (as mtThread), we then have to awkwardly change the flag of an already existing RA in the CompilerThread constructor. >> >> As a prerequisite for future NMT work I would like Arena MEMFLAGS to be immutable. >> >> The patch does that: >> - we hand in MEMFLAGS to the Thread constructor now (defaults to mtThread) >> - CompilerThread hands in mtCompiler, all other threads rely on the default >> - on creation, both ResourceArea and HandleArea are now accounted toward the flag handed in >> - that allows us to make Arena::flags const, and to remove ResourceArea::bias_to which changed the flag in-flight for the arena >> - it also allows us to make Arena::flags private >> >> Other, unrelated cleanups: >> - Made Arena::_size_in_bytes and Arena::_tag private >> - Merged both Arena constructors into one by specifying a default value of `Chunk::init_size` for `init_size` argument. That makes it equivalent to the old `Arena(flag, tag)` constructor >> - removed `JavaThread::JavaThread(bool)`. That constructor was used when creating threads that are getting attached. There was only a single use for that constructor, and I replaced it with functionally equivalent code. >> >> Tests: >> >> I manually verified that the NMT numbers printed don't change. > > Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - feedback david > - Merge branch 'master' into arena-constify-memflags > - feedback johan > - Merge branch 'master' into arena-constify-memflags > - start @gerard-ziemski or @afshin-zafari could you give this a glance, please? Its a cleanup related to NMT. Thank you! ------------- PR Comment: https://git.openjdk.org/jdk/pull/19693#issuecomment-2182427967 From coleenp at openjdk.org Fri Jun 21 17:11:20 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Fri, 21 Jun 2024 17:11:20 GMT Subject: RFR: 8333542: Breakpoint in parallel code does not work Message-ID: Revert the change for JDK-8288064 "Class initialization locking". JVMTI class prepare event relies on a lock being held through setting the state of the class to 'linked' and the JVMTI event posting. The only usable lock is the Java object init_lock, which was removed. This change restores the lock and fixes all the conflicts in code that's changed since. Tested with tier1-7. ------------- Commit messages: - NULL -> nullptr - NULL -> nullptr - Include synchronizer.hpp in instanceKlass.cpp - Add a test contributed by Chris Plummer. - Include synchronizer.hpp in cpCache.cpp - Fix JVMCI again. - Revert "8288064: Class initialization locking" Changes: https://git.openjdk.org/jdk/pull/19755/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19755&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8333542 Stats: 516 lines in 16 files changed: 339 ins; 129 del; 48 mod Patch: https://git.openjdk.org/jdk/pull/19755.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19755/head:pull/19755 PR: https://git.openjdk.org/jdk/pull/19755 From coleenp at openjdk.org Fri Jun 21 17:11:20 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Fri, 21 Jun 2024 17:11:20 GMT Subject: RFR: 8333542: Breakpoint in parallel code does not work In-Reply-To: References: Message-ID: On Mon, 17 Jun 2024 17:58:14 GMT, Coleen Phillimore wrote: > Revert the change for JDK-8288064 "Class initialization locking". JVMTI class prepare event relies on a lock being held through setting the state of the class to 'linked' and the JVMTI event posting. The only usable lock is the Java object init_lock, which was removed. This change restores the lock and fixes all the conflicts in code that's changed since. > > Tested with tier1-7. Chris wrote the test. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19755#issuecomment-2183125662 From mdoerr at openjdk.org Fri Jun 21 17:53:13 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Fri, 21 Jun 2024 17:53:13 GMT Subject: RFR: 8332603: [PPC64] Improve itable_stub [v3] In-Reply-To: References: Message-ID: On Mon, 17 Jun 2024 16:11:26 GMT, Amit Kumar wrote: >> PPC Port similar to [JDK-8305959 (x86)](https://bugs.openjdk.org/browse/JDK-8305959) and [JDK-8307352(aarch64)](https://bugs.openjdk.org/browse/JDK-8307352) > > Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: > > fixes the test case This looks like a correct and direct port from aarch64. I'm fine with it. Testing has passed. I'll look at the performance, too. src/hotspot/cpu/ppc/vtableStubs_ppc_64.cpp line 3: > 1: /* > 2: * Copyright (c) 1997, 2023, Oracle and/or its affiliates. All rights reserved. > 3: * Copyright (c) 2012, 2023 SAP SE. All rights reserved. Copyright year should be 2024 for both, SAP and Oracle. ------------- PR Review: https://git.openjdk.org/jdk/pull/19733#pullrequestreview-2133159909 PR Review Comment: https://git.openjdk.org/jdk/pull/19733#discussion_r1649270420 From amitkumar at openjdk.org Fri Jun 21 18:00:30 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Fri, 21 Jun 2024 18:00:30 GMT Subject: RFR: 8332603: [PPC64] Improve itable_stub [v4] In-Reply-To: References: Message-ID: > PPC Port similar to [JDK-8305959 (x86)](https://bugs.openjdk.org/browse/JDK-8305959) and [JDK-8307352(aarch64)](https://bugs.openjdk.org/browse/JDK-8307352) Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: updates copyright header ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19733/files - new: https://git.openjdk.org/jdk/pull/19733/files/e7c60c71..19b67224 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19733&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19733&range=02-03 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/19733.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19733/head:pull/19733 PR: https://git.openjdk.org/jdk/pull/19733 From amitkumar at openjdk.org Fri Jun 21 18:04:12 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Fri, 21 Jun 2024 18:04:12 GMT Subject: RFR: 8332603: [PPC64] Improve itable_stub [v3] In-Reply-To: References: Message-ID: On Fri, 21 Jun 2024 17:49:12 GMT, Martin Doerr wrote: >> Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: >> >> fixes the test case > > src/hotspot/cpu/ppc/vtableStubs_ppc_64.cpp line 3: > >> 1: /* >> 2: * Copyright (c) 1997, 2023, Oracle and/or its affiliates. All rights reserved. >> 3: * Copyright (c) 2012, 2023 SAP SE. All rights reserved. > > Copyright year should be 2024 for both, SAP and Oracle. I don't know why I changed it to 2023 ?; but anyway fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19733#discussion_r1649281465 From jiangli at openjdk.org Fri Jun 21 18:40:11 2024 From: jiangli at openjdk.org (Jiangli Zhou) Date: Fri, 21 Jun 2024 18:40:11 GMT Subject: RFR: 8333268: Fixes for static build [v4] In-Reply-To: References: Message-ID: On Wed, 19 Jun 2024 15:15:43 GMT, Magnus Ihse Bursie wrote: >> This patch contains a set of changes to improve static builds. They will pave the way for implementing a full static-only java launcher. The changes here will: >> >> 1) Make sure non-exported symbols are made local in the static libraries. This means that the risk of symbol conflict is the same for static libraries as for dynamic libraries (i.e. in practice zero, as long as a consistent naming scheme is used for exported functions). >> >> 2) Remove the work-arounds to exclude duplicated symbols. >> >> 3) Fix some code in hotspot and the JDK libraries that did not work properly with a static java launcher. >> >> The latter fixes are copied from or inspired by the work done by @jianglizhou and her team as part of the Project Leyden [Hermetic Java](https://github.com/openjdk/leyden/tree/hermetic-java-runtime). > > Magnus Ihse Bursie has updated the pull request incrementally with one additional commit since the last revision: > > Add dummy implementation of os::lookup_function for Windows src/hotspot/os/bsd/os_bsd.cpp line 1: > 1: /* The changes in os_bsd.cpp are new and are not from https://github.com/openjdk/leyden/tree/hermetic-java-runtime/. Have you tested the bsd port? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19478#discussion_r1649312808 From mdoerr at openjdk.org Fri Jun 21 19:05:12 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Fri, 21 Jun 2024 19:05:12 GMT Subject: RFR: 8332603: [PPC64] Improve itable_stub [v4] In-Reply-To: References: Message-ID: On Fri, 21 Jun 2024 18:00:30 GMT, Amit Kumar wrote: >> PPC Port similar to [JDK-8305959 (x86)](https://bugs.openjdk.org/browse/JDK-8305959) and [JDK-8307352(aarch64)](https://bugs.openjdk.org/browse/JDK-8307352) > > Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: > > updates copyright header Thanks! I'm convinced that it is correct, but the performance seems to be slightly better without this patch: Without patch (measured on linux ppc64le Power10): Benchmark Mode Cnt Score Error Units InterfaceCalls.test1stInt2Types avgt 12 15.298 ? 0.162 ns/op InterfaceCalls.test1stInt3Types avgt 12 19.570 ? 0.324 ns/op InterfaceCalls.test1stInt5Types avgt 12 19.464 ? 0.353 ns/op InterfaceCalls.test2ndInt2Types avgt 12 15.066 ? 0.078 ns/op InterfaceCalls.test2ndInt3Types avgt 12 19.347 ? 0.292 ns/op InterfaceCalls.test2ndInt5Types avgt 12 20.407 ? 2.015 ns/op InterfaceCalls.testIfaceCall avgt 12 19.268 ? 0.134 ns/op InterfaceCalls.testIfaceExtCall avgt 12 19.890 ? 0.103 ns/op InterfaceCalls.testMonomorphic avgt 12 12.382 ? 0.081 ns/op With patch: Benchmark Mode Cnt Score Error Units InterfaceCalls.test1stInt2Types avgt 12 15.752 ? 0.472 ns/op InterfaceCalls.test1stInt3Types avgt 12 20.207 ? 0.307 ns/op InterfaceCalls.test1stInt5Types avgt 12 20.070 ? 0.387 ns/op InterfaceCalls.test2ndInt2Types avgt 12 15.692 ? 0.189 ns/op InterfaceCalls.test2ndInt3Types avgt 12 20.770 ? 0.598 ns/op InterfaceCalls.test2ndInt5Types avgt 12 20.482 ? 0.242 ns/op InterfaceCalls.testIfaceCall avgt 12 19.963 ? 0.344 ns/op InterfaceCalls.testIfaceExtCall avgt 12 20.165 ? 0.559 ns/op InterfaceCalls.testMonomorphic avgt 12 12.400 ? 0.083 ns/op I'm sorry to say this, but this change should not get integrated without improving performance which is the whole reason why it was done. I guess the old code uses slightly better instruction sequences which more than compensate the overhead of iterating twice. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19733#issuecomment-2183301144 From jiangli at openjdk.org Fri Jun 21 19:05:13 2024 From: jiangli at openjdk.org (Jiangli Zhou) Date: Fri, 21 Jun 2024 19:05:13 GMT Subject: RFR: 8333268: Fixes for static build [v4] In-Reply-To: References: Message-ID: On Wed, 19 Jun 2024 15:15:43 GMT, Magnus Ihse Bursie wrote: >> This patch contains a set of changes to improve static builds. They will pave the way for implementing a full static-only java launcher. The changes here will: >> >> 1) Make sure non-exported symbols are made local in the static libraries. This means that the risk of symbol conflict is the same for static libraries as for dynamic libraries (i.e. in practice zero, as long as a consistent naming scheme is used for exported functions). >> >> 2) Remove the work-arounds to exclude duplicated symbols. >> >> 3) Fix some code in hotspot and the JDK libraries that did not work properly with a static java launcher. >> >> The latter fixes are copied from or inspired by the work done by @jianglizhou and her team as part of the Project Leyden [Hermetic Java](https://github.com/openjdk/leyden/tree/hermetic-java-runtime). > > Magnus Ihse Bursie has updated the pull request incrementally with one additional commit since the last revision: > > Add dummy implementation of os::lookup_function for Windows src/hotspot/share/runtime/os.cpp line 521: > 519: char ebuf[1024]; > 520: > 521: if (vm_is_statically_linked()) { This block can be moved before the two variable declarations above, since they are not needed in the static case. https://github.com/openjdk/leyden/blob/c1c5fc686c1452550e1b3663a320fba652248505/src/hotspot/share/runtime/os.cpp#L507 handles it that way. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19478#discussion_r1649333382 From jiangli at openjdk.org Fri Jun 21 19:16:11 2024 From: jiangli at openjdk.org (Jiangli Zhou) Date: Fri, 21 Jun 2024 19:16:11 GMT Subject: RFR: 8333268: Fixes for static build [v4] In-Reply-To: References: Message-ID: On Wed, 19 Jun 2024 15:15:43 GMT, Magnus Ihse Bursie wrote: >> This patch contains a set of changes to improve static builds. They will pave the way for implementing a full static-only java launcher. The changes here will: >> >> 1) Make sure non-exported symbols are made local in the static libraries. This means that the risk of symbol conflict is the same for static libraries as for dynamic libraries (i.e. in practice zero, as long as a consistent naming scheme is used for exported functions). >> >> 2) Remove the work-arounds to exclude duplicated symbols. >> >> 3) Fix some code in hotspot and the JDK libraries that did not work properly with a static java launcher. >> >> The latter fixes are copied from or inspired by the work done by @jianglizhou and her team as part of the Project Leyden [Hermetic Java](https://github.com/openjdk/leyden/tree/hermetic-java-runtime). > > Magnus Ihse Bursie has updated the pull request incrementally with one additional commit since the last revision: > > Add dummy implementation of os::lookup_function for Windows src/hotspot/share/utilities/zipLibrary.cpp line 63: > 61: > 62: static void* dll_lookup(const char* name, const char* path, bool vm_exit_on_failure) { > 63: if (vm_is_statically_linked()) { I like this change. It is cleaner than the hermetic Java branch change that does the `if` static check in `store_function_pointers` (https://github.com/openjdk/leyden/blob/c1c5fc686c1452550e1b3663a320fba652248505/src/hotspot/share/utilities/zipLibrary.cpp#L75). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19478#discussion_r1649341844 From jiangli at openjdk.org Fri Jun 21 19:23:11 2024 From: jiangli at openjdk.org (Jiangli Zhou) Date: Fri, 21 Jun 2024 19:23:11 GMT Subject: RFR: 8333268: Fixes for static build [v4] In-Reply-To: References: Message-ID: On Wed, 19 Jun 2024 15:15:43 GMT, Magnus Ihse Bursie wrote: >> This patch contains a set of changes to improve static builds. They will pave the way for implementing a full static-only java launcher. The changes here will: >> >> 1) Make sure non-exported symbols are made local in the static libraries. This means that the risk of symbol conflict is the same for static libraries as for dynamic libraries (i.e. in practice zero, as long as a consistent naming scheme is used for exported functions). >> >> 2) Remove the work-arounds to exclude duplicated symbols. >> >> 3) Fix some code in hotspot and the JDK libraries that did not work properly with a static java launcher. >> >> The latter fixes are copied from or inspired by the work done by @jianglizhou and her team as part of the Project Leyden [Hermetic Java](https://github.com/openjdk/leyden/tree/hermetic-java-runtime). > > Magnus Ihse Bursie has updated the pull request incrementally with one additional commit since the last revision: > > Add dummy implementation of os::lookup_function for Windows src/java.base/macosx/native/libjli/java_md_macosx.m line 1: > 1: /* In the mailing list email discussion thread on hermetic Java, you mentioned running on macosx with a build from hermtic Java branch crashed for you during startup. Is that fully resolved with the changes in this PR? The hermetic Java branch does not have any changes for macosx port. What tests are done for the macosx port for static support? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19478#discussion_r1649348777 From jiangli at openjdk.org Fri Jun 21 19:34:11 2024 From: jiangli at openjdk.org (Jiangli Zhou) Date: Fri, 21 Jun 2024 19:34:11 GMT Subject: RFR: 8333268: Fixes for static build [v4] In-Reply-To: References: Message-ID: On Wed, 19 Jun 2024 15:15:43 GMT, Magnus Ihse Bursie wrote: >> This patch contains a set of changes to improve static builds. They will pave the way for implementing a full static-only java launcher. The changes here will: >> >> 1) Make sure non-exported symbols are made local in the static libraries. This means that the risk of symbol conflict is the same for static libraries as for dynamic libraries (i.e. in practice zero, as long as a consistent naming scheme is used for exported functions). >> >> 2) Remove the work-arounds to exclude duplicated symbols. >> >> 3) Fix some code in hotspot and the JDK libraries that did not work properly with a static java launcher. >> >> The latter fixes are copied from or inspired by the work done by @jianglizhou and her team as part of the Project Leyden [Hermetic Java](https://github.com/openjdk/leyden/tree/hermetic-java-runtime). > > Magnus Ihse Bursie has updated the pull request incrementally with one additional commit since the last revision: > > Add dummy implementation of os::lookup_function for Windows src/java.base/unix/native/libjli/java_md.c line 316: > 314: SetExecname(*pargv); > 315: > 316: if (!JLI_IsStaticallyLinked()) { Any reason this is diverted from the change in hermetic Java branch, https://github.com/openjdk/leyden/blob/c1c5fc686c1452550e1b3663a320fba652248505/src/java.base/unix/native/libjli/java_md.c#L300? I think the setenv part below is not needed for static/hermetic support either. There is no $JRE/lib with a single executable image. All natives are statically linked with the executable image. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19478#discussion_r1649356484 From jiangli at openjdk.org Fri Jun 21 19:54:11 2024 From: jiangli at openjdk.org (Jiangli Zhou) Date: Fri, 21 Jun 2024 19:54:11 GMT Subject: RFR: 8333268: Fixes for static build [v4] In-Reply-To: References: Message-ID: On Wed, 19 Jun 2024 15:15:43 GMT, Magnus Ihse Bursie wrote: >> This patch contains a set of changes to improve static builds. They will pave the way for implementing a full static-only java launcher. The changes here will: >> >> 1) Make sure non-exported symbols are made local in the static libraries. This means that the risk of symbol conflict is the same for static libraries as for dynamic libraries (i.e. in practice zero, as long as a consistent naming scheme is used for exported functions). >> >> 2) Remove the work-arounds to exclude duplicated symbols. >> >> 3) Fix some code in hotspot and the JDK libraries that did not work properly with a static java launcher. >> >> The latter fixes are copied from or inspired by the work done by @jianglizhou and her team as part of the Project Leyden [Hermetic Java](https://github.com/openjdk/leyden/tree/hermetic-java-runtime). > > Magnus Ihse Bursie has updated the pull request incrementally with one additional commit since the last revision: > > Add dummy implementation of os::lookup_function for Windows I've looked through all JDK and VM changes and left comments in various places. All the rest changes in PR look good. Thanks again for extracting these changes from the leyden/hermeticJava branch and integrating with mainline! My other main question is why the `javastatic` linking work is not included in the PR together with these runtime changes. IIUC from our meetings and mailing list discussions, the initial integration PR needs to include the part for statically linking the `javastatic`. That's a minimum requirement for testing/verifying the runtime changes when integrating into the mainline, which is also the reason why we haven't starting integrating any of the runtime changes so far. Has that been changed? ------------- PR Review: https://git.openjdk.org/jdk/pull/19478#pullrequestreview-2133328296 From gziemski at openjdk.org Fri Jun 21 20:54:18 2024 From: gziemski at openjdk.org (Gerard Ziemski) Date: Fri, 21 Jun 2024 20:54:18 GMT Subject: RFR: 8333994: NMT: call stacks should show source information [v3] In-Reply-To: References: Message-ID: On Thu, 20 Jun 2024 09:41:48 GMT, Thomas Stuefe wrote: >> Patch adds printing of source file and line number to NMT call stacks that are printed in detail mode. Useful for hunting down leaks. > > Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: > > - fix windows build > - Merge branch 'master' into JDK-8333994-NMT-call-stacks-should-show-source-information > - caching > - Merge branch 'master' into JDK-8333994-NMT-call-stacks-should-show-source-information > - exclude macos from testing source info > - copyrights > - test > - JDK-8333994-NMT-call-stacks-should-show-source-information Looks good overall, I do have a few small questions/suggestions. src/hotspot/share/nmt/memReporter.hpp line 157: > 155: MemDetailReporter(MemBaseline& baseline, outputStream* output, size_t scale = default_scale) : > 156: MemSummaryReporter(baseline, output, scale), > 157: _baseline(baseline), _stackprinter(output) { } `NativeCallStackPrinter` is 293* 1024 == 293KB Isn't there a way to create/destroy it on the fly? NMT is already memory hungry, we just keep increasing the price of admission ... src/hotspot/share/utilities/nativeCallStack.cpp line 130: > 128: // cached? > 129: bool created = false; > 130: Entry* const cached_value = _cache.put_if_absent(pc, &created); `cached_value` sounds generic, could we have `cached_stream_text` or something more descriptive? src/hotspot/share/utilities/nativeCallStack.cpp line 137: > 135: stack->print_frame(&ss, pc); > 136: _out->print_raw_cr(cached_value->text); > 137: } We could simplify to: if (created) { stringStream ss(cached_value->text, sizeof(cached_value->text)); stack->print_frame(&ss, pc); } _out->print_raw_cr(cached_value->text); src/hotspot/share/utilities/nativeCallStack.hpp line 136: > 134: // a NativeCallStackPrinter improves performance by caching printed frames by address. > 135: class NativeCallStackPrinter { > 136: struct Entry { char text[1024]; }; 1024 is big enough? Would 512 suffice? Not that important if we were creating/destroying it as needed, but right now it is always there... ------------- Changes requested by gziemski (Committer). PR Review: https://git.openjdk.org/jdk/pull/19655#pullrequestreview-2133388645 PR Review Comment: https://git.openjdk.org/jdk/pull/19655#discussion_r1649414362 PR Review Comment: https://git.openjdk.org/jdk/pull/19655#discussion_r1649418281 PR Review Comment: https://git.openjdk.org/jdk/pull/19655#discussion_r1649409549 PR Review Comment: https://git.openjdk.org/jdk/pull/19655#discussion_r1649416879 From dlong at openjdk.org Fri Jun 21 22:31:21 2024 From: dlong at openjdk.org (Dean Long) Date: Fri, 21 Jun 2024 22:31:21 GMT Subject: RFR: 8333542: Breakpoint in parallel code does not work In-Reply-To: References: Message-ID: On Mon, 17 Jun 2024 17:58:14 GMT, Coleen Phillimore wrote: > Revert the change for JDK-8288064 "Class initialization locking". JVMTI class prepare event relies on a lock being held through setting the state of the class to 'linked' and the JVMTI event posting. The only usable lock is the Java object init_lock, which was removed. This change restores the lock and fixes all the conflicts in code that's changed since. > > Tested with tier1-7. There may be a simpler way to fix this. What if we pretend that any class with a deferred breakpoint automatically has a static initializer? Then as part of the "injected" static initializer, the class does the prepare event and sets up the breakpoint. In the mean time, other threads trying to execute methods in that class are blocked because of the static initializer protocol. I don't think this introduces new deadlocks unless the prepare event does something strange. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19755#issuecomment-2183542289 From coleenp at openjdk.org Fri Jun 21 23:39:09 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Fri, 21 Jun 2024 23:39:09 GMT Subject: RFR: 8333542: Breakpoint in parallel code does not work In-Reply-To: References: Message-ID: On Mon, 17 Jun 2024 17:58:14 GMT, Coleen Phillimore wrote: > Revert the change for JDK-8288064 "Class initialization locking". JVMTI class prepare event relies on a lock being held through setting the state of the class to 'linked' and the JVMTI event posting. The only usable lock is the Java object init_lock, which was removed. This change restores the lock and fixes all the conflicts in code that's changed since. > > Tested with tier1-7. I had a version of the fix that you described and it works well with the test case. Unfortunately, there are other class prepare events that are expected at link time before the class is initialized. Luckily we have a robust set of tests. One such test is vmTestbase/nsk/jvmti/scenarios/events/EM01/em01t001. If we want to remove the ObjectLocker in a future release, I think the RecursiveLock mechanism would be the way to do it. I chose a backout for this because it's also broken in JDK 21 and Chris thinks it's important to fix there also. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19755#issuecomment-2183582057 From coleenp at openjdk.org Sat Jun 22 00:21:10 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Sat, 22 Jun 2024 00:21:10 GMT Subject: RFR: 8326820: Metadata artificially kept alive [v3] In-Reply-To: References: <1xvsDMD50nAALDma6JqSvqNVfYl1KnN0ihFdkztDHYE=.97983fa3-6feb-4ee6-b578-c76e326bcc33@github.com> Message-ID: On Wed, 19 Jun 2024 15:06:25 GMT, Axel Boldt-Christmas wrote: >> ClassLoaderDataGraph provides APIs for walking different metadata. All the iterators which are not designed to be used by the GC also keep the holder of the CLDs alive and by extensions keeps all metadata alive. This is problematic for concurrent GC as it keeps otherwise unreachable classes from being unloaded and the respective metadata freed. >> >> This patch changes the default iteration behaviour to not keep the holder alive, with the exception of `loaded_classes_do` (renamed `loaded_classes_do_keepalive`) and `modules_do` (renamed `modules_do_keepalive`) which is used by jvmti APIs that requires that the holder is kept alive. >> >> All other uses consumes all the metadata it queries during its safepoint or before releasing the `ClassLoaderDataGraph_lock`. >> >> Before this change some jcmd, new jfr chunks and some jfr events, all of which consumed these APIs, could cause class unloading to not occur. >> >> Been running our internal stress test in an even more stressful mode which without this patch reproduces the metaspace OOME [JDK-8326005](https://bugs.openjdk.org/browse/JDK-8326005) consistently within a few hours. And after this patch it does not. >> >> Currently running tier1-tier8 testing. > > Axel Boldt-Christmas has updated the pull request incrementally with one additional commit since the last revision: > > Rename and comment SystemDictionary::methods_do So the problem with the no-keepalive names is that for these functions it's very unclear what exactly you're keeping alive, unless you're thinking about concurrent collection GC, and even then it's a puzzler. I don't see how it's going to help prevent more bugs. I don't yet understand the current bug. The 'resolve' for the CLDG iterator was to temporarily keep that CLD from being unloaded, in the short time that we're iterating on that particular CLD. The CLDG_lock was added later and it may be that that's what prevents the CLD from being *deleted* while we are holding the lock while iterating. So that keeps the CLD metadata alive. The oops in the CLD are not kept alive unless we have resolved the holder which the GC sees as the root. So if we are accessing oops from the CLD, we need to use the keepalive version. It's not really keeping the CLD alive. It's making the oops alive, or accessible until they are handled somewhere by the caller. So the no-keepalive is not a great name for this, even though it's used when accessing oops in other places. I think the versions of these functions that access oops that need to be kept alive should have a different name, like modules_do_keeping_oops_alive(). But that's a sentence name. Maybe modules_do_keepalive() can be read as keeping oops alive. But classes_do_no_keepalive() is a terrible name because we want the class metadata to be kept alive and the name is disconcerting. Usually I prefer good names to good comments, but here I think good comments are better. ------------- PR Review: https://git.openjdk.org/jdk/pull/19769#pullrequestreview-2133534047 From amitkumar at openjdk.org Sat Jun 22 03:38:19 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Sat, 22 Jun 2024 03:38:19 GMT Subject: RFR: 8332603: [PPC64] Improve itable_stub [v4] In-Reply-To: References: Message-ID: On Fri, 21 Jun 2024 19:02:56 GMT, Martin Doerr wrote: > I guess the old code uses slightly better instruction sequences which more than compensate the overhead of iterating twice. Similar issue I'm facing for s390x as well. We are not getting that much performance improvement as aarch64 & x86 are showing. I'll look further into it. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19733#issuecomment-2183758743 From stuefe at openjdk.org Sat Jun 22 05:49:15 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Sat, 22 Jun 2024 05:49:15 GMT Subject: RFR: 8333994: NMT: call stacks should show source information [v3] In-Reply-To: References: Message-ID: On Fri, 21 Jun 2024 20:45:03 GMT, Gerard Ziemski wrote: >> Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: >> >> - fix windows build >> - Merge branch 'master' into JDK-8333994-NMT-call-stacks-should-show-source-information >> - caching >> - Merge branch 'master' into JDK-8333994-NMT-call-stacks-should-show-source-information >> - exclude macos from testing source info >> - copyrights >> - test >> - JDK-8333994-NMT-call-stacks-should-show-source-information > > src/hotspot/share/nmt/memReporter.hpp line 157: > >> 155: MemDetailReporter(MemBaseline& baseline, outputStream* output, size_t scale = default_scale) : >> 156: MemSummaryReporter(baseline, output, scale), >> 157: _baseline(baseline), _stackprinter(output) { } > > `NativeCallStackPrinter` is 293* 1024 == 293KB > > Isn't there a way to create/destroy it on the fly? NMT is already memory hungry, we just keep increasing the price of admission ... MemDetailReporter is only created for a detail report and does not live beyond reporting. The printer cache is needed for the duration of the report. Simplest way to do this is an inline member in the reporter class. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19655#discussion_r1649570291 From stuefe at openjdk.org Sat Jun 22 06:02:20 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Sat, 22 Jun 2024 06:02:20 GMT Subject: RFR: 8333994: NMT: call stacks should show source information [v3] In-Reply-To: References: Message-ID: On Fri, 21 Jun 2024 20:50:17 GMT, Gerard Ziemski wrote: >> Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: >> >> - fix windows build >> - Merge branch 'master' into JDK-8333994-NMT-call-stacks-should-show-source-information >> - caching >> - Merge branch 'master' into JDK-8333994-NMT-call-stacks-should-show-source-information >> - exclude macos from testing source info >> - copyrights >> - test >> - JDK-8333994-NMT-call-stacks-should-show-source-information > > src/hotspot/share/utilities/nativeCallStack.cpp line 130: > >> 128: // cached? >> 129: bool created = false; >> 130: Entry* const cached_value = _cache.put_if_absent(pc, &created); > > `cached_value` sounds generic, could we have `cached_stream_text` or something more descriptive? Replaced it with cached_frame_text > src/hotspot/share/utilities/nativeCallStack.hpp line 136: > >> 134: // a NativeCallStackPrinter improves performance by caching printed frames by address. >> 135: class NativeCallStackPrinter { >> 136: struct Entry { char text[1024]; }; > > 1024 is big enough? Would 512 suffice? > > Not that important if we were creating/destroying it as needed, but right now it is always there... Frame strings can get very lengthy. Think templatized function names, or long file names. In theory, 1024 can be not enough. I first wanted to do this via strdup, but that would mean that we use twice as many mallocs, copy the strings around unnecessarily, and need to manually deallocate after reporting. So a fixed-size entry is a compromise for simplicity. Note that this is only used during detail reports. Those are memory-hungry anyway, since we snapshot the whole state. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19655#discussion_r1649572271 PR Review Comment: https://git.openjdk.org/jdk/pull/19655#discussion_r1649572192 From stuefe at openjdk.org Sat Jun 22 10:32:43 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Sat, 22 Jun 2024 10:32:43 GMT Subject: RFR: 8333994: NMT: call stacks should show source information [v4] In-Reply-To: References: Message-ID: > Patch adds printing of source file and line number to NMT call stacks that are printed in detail mode. Useful for hunting down leaks. Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 11 additional commits since the last revision: - increase init buffer - small rework - Merge branch 'master' into JDK-8333994-NMT-call-stacks-should-show-source-information - fix windows build - Merge branch 'master' into JDK-8333994-NMT-call-stacks-should-show-source-information - caching - Merge branch 'master' into JDK-8333994-NMT-call-stacks-should-show-source-information - exclude macos from testing source info - copyrights - test - ... and 1 more: https://git.openjdk.org/jdk/compare/3df805fc...2bcc5bd1 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19655/files - new: https://git.openjdk.org/jdk/pull/19655/files/0c8e98ea..2bcc5bd1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19655&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19655&range=02-03 Stats: 6252 lines in 163 files changed: 2271 ins; 3399 del; 582 mod Patch: https://git.openjdk.org/jdk/pull/19655.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19655/head:pull/19655 PR: https://git.openjdk.org/jdk/pull/19655 From stuefe at openjdk.org Sat Jun 22 10:37:27 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Sat, 22 Jun 2024 10:37:27 GMT Subject: RFR: 8333994: NMT: call stacks should show source information [v5] In-Reply-To: References: Message-ID: > Patch adds printing of source file and line number to NMT call stacks that are printed in detail mode. Useful for hunting down leaks. Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: cleanups ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19655/files - new: https://git.openjdk.org/jdk/pull/19655/files/2bcc5bd1..c6df7e1d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19655&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19655&range=03-04 Stats: 9 lines in 2 files changed: 0 ins; 9 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/19655.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19655/head:pull/19655 PR: https://git.openjdk.org/jdk/pull/19655 From stuefe at openjdk.org Sat Jun 22 10:37:27 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Sat, 22 Jun 2024 10:37:27 GMT Subject: RFR: 8333994: NMT: call stacks should show source information [v5] In-Reply-To: References: Message-ID: On Tue, 18 Jun 2024 20:41:01 GMT, Gerard Ziemski wrote: >>> >>> I have the same question. Did dwarf decoder performance improve? If so, could you point me the PR? Thanks! >> >> I completely forgot that this had been an issue. The comment was even written by me :( >> >> No, Elf decoder is still slow. But I have found myself too many times staring at NMT output now trying to make sense of the offsets. Missing source info in combination with the small stack size of 4 makes investigations a pain. >> >> I added a simple caching mechanism to aid printing. Its pretty straight-forward, but still I am not sure it is worth the complexity. Here the numbers: >> >> Running all NMT jtreg tests: >> - Stock JVM (no source info): 40 seconds >> - Source info: 2 min 30 seconds >> - Source info + caching: 1 min 15 seconds >> >> I think that is acceptable. Any more intricate caching would be over the complexity-benefit line. >> >> @gerard-ziemski >> >> The cost is with Dwarf parsing, not dladdr. dladdr is cheap. But feel free to make Dwarf parsing cheaper, that would be surely welcome. > >> > I have the same question. Did dwarf decoder performance improve? If so, could you point me the PR? Thanks! >> >> I completely forgot that this had been an issue. The comment was even written by me :( >> >> No, Elf decoder is still slow. But I have found myself too many times staring at NMT output now trying to make sense of the offsets. Missing source info in combination with the small stack size of 4 makes investigations a pain. >> >> I added a simple caching mechanism to aid printing. Its pretty straight-forward, but still I am not sure it is worth the complexity. Here the numbers: >> >> Running all NMT jtreg tests: >> >> * Stock JVM (no source info): 40 seconds >> * Source info: 2 min 30 seconds >> * Source info + caching: 1 min 15 seconds >> >> I think that is acceptable. Any more intricate caching would be over the complexity-benefit line. > > I simply pointed out your own old concern. If you are happy with the final performance now, then I'm good. > > I will look at the cache shortly. @gerard-ziemski @jdksjolen I did a small revamp. The strings are now stored in an Arena. That allows for compact storage and prevents truncation should a frame be longer than 1024 chars (probably never happen). Storage cost is ~96K for a detail report, compared to ~700K with fixed-sized 1024 byte entries. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19655#issuecomment-2183974599 From azafari at openjdk.org Sat Jun 22 16:45:10 2024 From: azafari at openjdk.org (Afshin Zafari) Date: Sat, 22 Jun 2024 16:45:10 GMT Subject: RFR: 8334223: Make Arena MEMFLAGs immutable [v3] In-Reply-To: References: Message-ID: On Wed, 19 Jun 2024 06:37:32 GMT, Thomas Stuefe wrote: >> Arenas carry NMT flags. >> >> An arena should never change that flag. But it does: Arenas (as ResourceAreas), used by CompilerThread, are accounted toward mtCompiler. But since the RA is already created in the parent class constructor (as mtThread), we then have to awkwardly change the flag of an already existing RA in the CompilerThread constructor. >> >> As a prerequisite for future NMT work I would like Arena MEMFLAGS to be immutable. >> >> The patch does that: >> - we hand in MEMFLAGS to the Thread constructor now (defaults to mtThread) >> - CompilerThread hands in mtCompiler, all other threads rely on the default >> - on creation, both ResourceArea and HandleArea are now accounted toward the flag handed in >> - that allows us to make Arena::flags const, and to remove ResourceArea::bias_to which changed the flag in-flight for the arena >> - it also allows us to make Arena::flags private >> >> Other, unrelated cleanups: >> - Made Arena::_size_in_bytes and Arena::_tag private >> - Merged both Arena constructors into one by specifying a default value of `Chunk::init_size` for `init_size` argument. That makes it equivalent to the old `Arena(flag, tag)` constructor >> - removed `JavaThread::JavaThread(bool)`. That constructor was used when creating threads that are getting attached. There was only a single use for that constructor, and I replaced it with functionally equivalent code. >> >> Tests: >> >> I manually verified that the NMT numbers printed don't change. > > Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - feedback david > - Merge branch 'master' into arena-constify-memflags > - feedback johan > - Merge branch 'master' into arena-constify-memflags > - start Thanks, I could only found some missing Copyright updates: - `compilerThread.cpp` - `resourceArea.cpp` and its `hpp` - `jni.cpp`, maybe for RedHat also? - `handles.hpp` - `thread.hpp` ------------- PR Comment: https://git.openjdk.org/jdk/pull/19693#issuecomment-2184094224 From mdoerr at openjdk.org Sat Jun 22 20:07:13 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Sat, 22 Jun 2024 20:07:13 GMT Subject: RFR: 8332603: [PPC64] Improve itable_stub [v4] In-Reply-To: References: Message-ID: On Sat, 22 Jun 2024 03:35:35 GMT, Amit Kumar wrote: > > I guess the old code uses slightly better instruction sequences which more than compensate the overhead of iterating twice. > > Similar issue I'm facing for s390x as well. We are not getting that much performance improvement as aarch64 & x86 are showing. I'll look further into it. Thanks for working on it and for investigating. If there's no clear performance gain, I think such changes should better not get integrated. The old code is better readable IMHO. The PRs don't get lost if we close them. We can still reopen them in case we need them. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19733#issuecomment-2184174124 From stuefe at openjdk.org Sun Jun 23 06:09:37 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Sun, 23 Jun 2024 06:09:37 GMT Subject: RFR: 8334223: Make Arena MEMFLAGs immutable [v4] In-Reply-To: References: Message-ID: > Arenas carry NMT flags. > > An arena should never change that flag. But it does: Arenas (as ResourceAreas), used by CompilerThread, are accounted toward mtCompiler. But since the RA is already created in the parent class constructor (as mtThread), we then have to awkwardly change the flag of an already existing RA in the CompilerThread constructor. > > As a prerequisite for future NMT work I would like Arena MEMFLAGS to be immutable. > > The patch does that: > - we hand in MEMFLAGS to the Thread constructor now (defaults to mtThread) > - CompilerThread hands in mtCompiler, all other threads rely on the default > - on creation, both ResourceArea and HandleArea are now accounted toward the flag handed in > - that allows us to make Arena::flags const, and to remove ResourceArea::bias_to which changed the flag in-flight for the arena > - it also allows us to make Arena::flags private > > Other, unrelated cleanups: > - Made Arena::_size_in_bytes and Arena::_tag private > - Merged both Arena constructors into one by specifying a default value of `Chunk::init_size` for `init_size` argument. That makes it equivalent to the old `Arena(flag, tag)` constructor > - removed `JavaThread::JavaThread(bool)`. That constructor was used when creating threads that are getting attached. There was only a single use for that constructor, and I replaced it with functionally equivalent code. > > Tests: > > I manually verified that the NMT numbers printed don't change. Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: - fix copyrights - merge - feedback david - Merge branch 'master' into arena-constify-memflags - feedback johan - Merge branch 'master' into arena-constify-memflags - start ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19693/files - new: https://git.openjdk.org/jdk/pull/19693/files/ac61bf42..b2e1b113 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19693&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19693&range=02-03 Stats: 18664 lines in 253 files changed: 12759 ins; 4320 del; 1585 mod Patch: https://git.openjdk.org/jdk/pull/19693.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19693/head:pull/19693 PR: https://git.openjdk.org/jdk/pull/19693 From stuefe at openjdk.org Sun Jun 23 06:09:37 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Sun, 23 Jun 2024 06:09:37 GMT Subject: RFR: 8334223: Make Arena MEMFLAGs immutable [v3] In-Reply-To: References: Message-ID: <5IN9gAx_uVyT-HaSPE9fkq6sN39tbGWJ4CHARwW7slA=.49c40990-db8e-4727-a591-532e24072c9d@github.com> On Sat, 22 Jun 2024 16:42:21 GMT, Afshin Zafari wrote: > Thanks, I could only found some missing Copyright updates: > > * `compilerThread.cpp` > * `resourceArea.cpp` and its `hpp` > * `jni.cpp`, maybe for RedHat also? > * `handles.hpp` > * `thread.hpp` Many thanks, @afshin-zafari. Fixed copyrights. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19693#issuecomment-2184619420 From jsjolen at openjdk.org Sun Jun 23 08:32:57 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Sun, 23 Jun 2024 08:32:57 GMT Subject: RFR: 8333658: NMT: Use an allocator with 4-byte pointers to save memory in NativeCallStackStorage [v25] In-Reply-To: References: Message-ID: > Hi, > > This PR introduces a new allocator, `IndexedFreelistAllocator`. It uses a `GrowableArray` in order to have 4-byte "pointers" to its elements and also works as a freelist as unused slots form a linked list. This allocator cannot shrink its used memory. I'm always open for better names. > > We then use this allocator in order to store the `NativeCallStackStorage` hash table. This saves `4 * 4099 = ~16KiB` of memory as each table element is now only 4 bytes. For each stored stack it also saves 4 bytes. The fact that the allocator cannot shrink is fine, we do not ever remove stacks from the storage. > > The main point of this is to introduce a pointer-generic experimential API, so I also implemented CHeap and Arena allocators. It's currently placed in NMT, but we might want to move it into utilities. It uses a bit of template-magic, but my IDE (Emacs+clangd) was actually able to figure things out when the types didn't line up correctly etc, so it's not an enemy to IDE help. > > It sounded expensive to have the GrowableArray continously realloc its underlying data array, so I did a basic test where we allocate 1 000 000 stacks and push them into NativeCallStackStorage backed by different allocators. This is available in the PR. > > The results are as follows on linux-x64-slowdebug: > > > Generate stacks... Done > Time taken with GrowableArray: 8341.240945 > Time taken with CHeap: 12189.031318 > Time taken with Arena: 8800.703092 > Time taken with GrowableArray again: 8295.508829 > > > And on linux-x64: > > > Time taken with GrowableArray: 8378.018135 > Time taken with CHeap: 12437.347868 > Time taken with Arena: 8758.064717 > Time taken with GrowableArray again: 8391.076291 > > > Obviously, this is a very basic benchmark, but it seems like we're faster than CHeap and Arena for this case. Johan Sj?len has updated the pull request incrementally with two additional commits since the last revision: - Move put into cpp file - Rename things in NCSS ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18979/files - new: https://git.openjdk.org/jdk/pull/18979/files/479e9573..f6247f78 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18979&range=24 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18979&range=23-24 Stats: 72 lines in 2 files changed: 43 ins; 16 del; 13 mod Patch: https://git.openjdk.org/jdk/pull/18979.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18979/head:pull/18979 PR: https://git.openjdk.org/jdk/pull/18979 From jsjolen at openjdk.org Sun Jun 23 08:32:57 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Sun, 23 Jun 2024 08:32:57 GMT Subject: RFR: 8333658: NMT: Use an allocator with 4-byte pointers to save memory in NativeCallStackStorage [v24] In-Reply-To: References: Message-ID: On Thu, 20 Jun 2024 13:06:38 GMT, Johan Sj?len wrote: >> Hi, >> >> This PR introduces a new allocator, `IndexedFreelistAllocator`. It uses a `GrowableArray` in order to have 4-byte "pointers" to its elements and also works as a freelist as unused slots form a linked list. This allocator cannot shrink its used memory. I'm always open for better names. >> >> We then use this allocator in order to store the `NativeCallStackStorage` hash table. This saves `4 * 4099 = ~16KiB` of memory as each table element is now only 4 bytes. For each stored stack it also saves 4 bytes. The fact that the allocator cannot shrink is fine, we do not ever remove stacks from the storage. >> >> The main point of this is to introduce a pointer-generic experimential API, so I also implemented CHeap and Arena allocators. It's currently placed in NMT, but we might want to move it into utilities. It uses a bit of template-magic, but my IDE (Emacs+clangd) was actually able to figure things out when the types didn't line up correctly etc, so it's not an enemy to IDE help. >> >> It sounded expensive to have the GrowableArray continously realloc its underlying data array, so I did a basic test where we allocate 1 000 000 stacks and push them into NativeCallStackStorage backed by different allocators. This is available in the PR. >> >> The results are as follows on linux-x64-slowdebug: >> >> >> Generate stacks... Done >> Time taken with GrowableArray: 8341.240945 >> Time taken with CHeap: 12189.031318 >> Time taken with Arena: 8800.703092 >> Time taken with GrowableArray again: 8295.508829 >> >> >> And on linux-x64: >> >> >> Time taken with GrowableArray: 8378.018135 >> Time taken with CHeap: 12437.347868 >> Time taken with Arena: 8758.064717 >> Time taken with GrowableArray again: 8391.076291 >> >> >> Obviously, this is a very basic benchmark, but it seems like we're faster than CHeap and Arena for this case. > > Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: > > Rename free to deallocate Surely a problem with `HomogenousObjectArray` is that all arrays store objects of the same type. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18979#issuecomment-2184891815 From jsjolen at openjdk.org Sun Jun 23 09:03:11 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Sun, 23 Jun 2024 09:03:11 GMT Subject: RFR: 8333994: NMT: call stacks should show source information [v5] In-Reply-To: References: Message-ID: On Sat, 22 Jun 2024 10:37:27 GMT, Thomas Stuefe wrote: >> Patch adds printing of source file and line number to NMT call stacks that are printed in detail mode. Useful for hunting down leaks. > > Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: > > cleanups Good job, I found some minor improvements to be had. src/hotspot/share/nmt/nativeCallStackPrinter.cpp line 40: > 38: for (int i = 0; i < NMT_TrackingStackDepth; i++) { > 39: const address pc = stack->get_frame(i); > 40: if (pc != nullptr) { Style suggestion: Invert condition and `continue` instead, to reduce indentation of remaining code. src/hotspot/share/nmt/nativeCallStackPrinter.cpp line 46: > 44: stringStream ss(4 * K); > 45: stack->print_frame(&ss, pc); > 46: const size_t len = strlen(ss.base()); Just use `ss.size()`. src/hotspot/share/nmt/nativeCallStackPrinter.cpp line 48: > 46: const size_t len = strlen(ss.base()); > 47: char* store = NEW_ARENA_ARRAY(&_text_storage, char, len + 1); > 48: strcpy(store, ss.base()); We have the sizes, use `memcpy`. src/hotspot/share/nmt/nativeCallStackPrinter.cpp line 49: > 47: char* store = NEW_ARENA_ARRAY(&_text_storage, char, len + 1); > 48: strcpy(store, ss.base()); > 49: (*cached_frame_text) = store; Redundant parens ------------- PR Review: https://git.openjdk.org/jdk/pull/19655#pullrequestreview-2134220333 PR Review Comment: https://git.openjdk.org/jdk/pull/19655#discussion_r1650022361 PR Review Comment: https://git.openjdk.org/jdk/pull/19655#discussion_r1650022628 PR Review Comment: https://git.openjdk.org/jdk/pull/19655#discussion_r1650022710 PR Review Comment: https://git.openjdk.org/jdk/pull/19655#discussion_r1650022744 From jsjolen at openjdk.org Sun Jun 23 10:33:44 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Sun, 23 Jun 2024 10:33:44 GMT Subject: RFR: 8333658: NMT: Use an allocator with 4-byte pointers to save memory in NativeCallStackStorage [v26] In-Reply-To: References: Message-ID: <-2EpS5USC5fxnLGF5Lj5RP2tCL5gfxRow5T7ER_hdVY=.c7ca663d-8fb1-4536-beb1-15ae39a2a00a@github.com> > Hi, > > This PR introduces a new allocator, `IndexedFreelistAllocator`. It uses a `GrowableArray` in order to have 4-byte "pointers" to its elements and also works as a freelist as unused slots form a linked list. This allocator cannot shrink its used memory. I'm always open for better names. > > We then use this allocator in order to store the `NativeCallStackStorage` hash table. This saves `4 * 4099 = ~16KiB` of memory as each table element is now only 4 bytes. For each stored stack it also saves 4 bytes. The fact that the allocator cannot shrink is fine, we do not ever remove stacks from the storage. > > The main point of this is to introduce a pointer-generic experimential API, so I also implemented CHeap and Arena allocators. It's currently placed in NMT, but we might want to move it into utilities. It uses a bit of template-magic, but my IDE (Emacs+clangd) was actually able to figure things out when the types didn't line up correctly etc, so it's not an enemy to IDE help. > > It sounded expensive to have the GrowableArray continously realloc its underlying data array, so I did a basic test where we allocate 1 000 000 stacks and push them into NativeCallStackStorage backed by different allocators. This is available in the PR. > > The results are as follows on linux-x64-slowdebug: > > > Generate stacks... Done > Time taken with GrowableArray: 8341.240945 > Time taken with CHeap: 12189.031318 > Time taken with Arena: 8800.703092 > Time taken with GrowableArray again: 8295.508829 > > > And on linux-x64: > > > Time taken with GrowableArray: 8378.018135 > Time taken with CHeap: 12437.347868 > Time taken with Arena: 8758.064717 > Time taken with GrowableArray again: 8391.076291 > > > Obviously, this is a very basic benchmark, but it seems like we're faster than CHeap and Arena for this case. Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: Any entry must be trivial ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18979/files - new: https://git.openjdk.org/jdk/pull/18979/files/f6247f78..1fd03834 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18979&range=25 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18979&range=24-25 Stats: 26 lines in 3 files changed: 4 ins; 18 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/18979.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18979/head:pull/18979 PR: https://git.openjdk.org/jdk/pull/18979 From stuefe at openjdk.org Sun Jun 23 11:33:34 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Sun, 23 Jun 2024 11:33:34 GMT Subject: RFR: 8334513: New test gc/TestAlwaysPreTouchBehavior.java is failing Message-ID: See JBS issue. It is not completely obvious what the problem is in Oracle's CI, but the current assumption is that RSS of the testee VM gets reduced after it started and before we measured due to memory pressure. The patch: - exposes os::available_memory via Whitebox - For the test to count as failed, we require a certain minimum size of available memory both before and during the start of the testee JVM. Otherwise, we throw a `SkippedException` I have some misgivings about this solution, though: 1) obviously, it is not bullet-proof either, since it is vulnerable to fast changes in machine memory load. 2) On MacOS, we have the problem that 'os::available_memory()' totally underreports how much memory is available. Therefore, as an estimate of whether the test is valid, it is too conservative. I opened https://bugs.openjdk.org/browse/JDK-8334767 to track that issue. As long as it is not fixed, the tests will likely fall below the threshold on MacOS and, therefore, be skipped. Still, this is somewhat better than outright excluding the test for MacOS (or is it? Open to opinions) 3) `SkippedException` leads to the test counting as "passed", not "skipped". I think that is a usability issue with jtreg. I cannot easily see which tests had been skipped due to SkippedException. Despite my doubts, I think this is the best we can come up with if we want to have such a test. Note: One way to go about (3) would be to make "minimum available memory" a `@requires` tag, similar to os.maxMemory. However, I fear that this may be easily misused and cause many tests to be excluded without notice. ------------- Commit messages: - tweaks - fixes - Merge branch 'master' into JDK-8334513-New-test-gc-TestAlwaysPreTouchBehavior-java-is-failing - fix Changes: https://git.openjdk.org/jdk/pull/19803/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19803&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8334513 Stats: 108 lines in 5 files changed: 69 ins; 18 del; 21 mod Patch: https://git.openjdk.org/jdk/pull/19803.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19803/head:pull/19803 PR: https://git.openjdk.org/jdk/pull/19803 From jsjolen at openjdk.org Sun Jun 23 12:44:13 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Sun, 23 Jun 2024 12:44:13 GMT Subject: RFR: 8333658: NMT: Use an allocator with 4-byte pointers to save memory in NativeCallStackStorage [v26] In-Reply-To: <-2EpS5USC5fxnLGF5Lj5RP2tCL5gfxRow5T7ER_hdVY=.c7ca663d-8fb1-4536-beb1-15ae39a2a00a@github.com> References: <-2EpS5USC5fxnLGF5Lj5RP2tCL5gfxRow5T7ER_hdVY=.c7ca663d-8fb1-4536-beb1-15ae39a2a00a@github.com> Message-ID: On Sun, 23 Jun 2024 10:33:44 GMT, Johan Sj?len wrote: >> Hi, >> >> This PR introduces a new allocator, `IndexedFreelistAllocator`. It uses a `GrowableArray` in order to have 4-byte "pointers" to its elements and also works as a freelist as unused slots form a linked list. This allocator cannot shrink its used memory. I'm always open for better names. >> >> We then use this allocator in order to store the `NativeCallStackStorage` hash table. This saves `4 * 4099 = ~16KiB` of memory as each table element is now only 4 bytes. For each stored stack it also saves 4 bytes. The fact that the allocator cannot shrink is fine, we do not ever remove stacks from the storage. >> >> The main point of this is to introduce a pointer-generic experimential API, so I also implemented CHeap and Arena allocators. It's currently placed in NMT, but we might want to move it into utilities. It uses a bit of template-magic, but my IDE (Emacs+clangd) was actually able to figure things out when the types didn't line up correctly etc, so it's not an enemy to IDE help. >> >> It sounded expensive to have the GrowableArray continously realloc its underlying data array, so I did a basic test where we allocate 1 000 000 stacks and push them into NativeCallStackStorage backed by different allocators. This is available in the PR. >> >> The results are as follows on linux-x64-slowdebug: >> >> >> Generate stacks... Done >> Time taken with GrowableArray: 8341.240945 >> Time taken with CHeap: 12189.031318 >> Time taken with Arena: 8800.703092 >> Time taken with GrowableArray again: 8295.508829 >> >> >> And on linux-x64: >> >> >> Time taken with GrowableArray: 8378.018135 >> Time taken with CHeap: 12437.347868 >> Time taken with Arena: 8758.064717 >> Time taken with GrowableArray again: 8391.076291 >> >> >> Obviously, this is a very basic benchmark, but it seems like we're faster than CHeap and Arena for this case. > > Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: > > Any entry must be trivial Hi, I've changed the data which is stored in `HOA` to be trivial and have added `static_assert`s to the methods. We **cannot** add the `static_assert` to the top of the `HOA` class because there's a mutual dependency on the data type being stored and the `HOA`. It's very important that the data we put into `HOA` is trivial as we do not call any destructors or copy constructors/operators on the code. **Any such "improvements" may be undesirable**, it's good to have limited containers and data which is simpler to reason about. Here's a minimal example show casing the problem of why `static_assert` has to live in the methods and not in the class definition: ```c++ #include #include template struct A { // SEE ME: Uncomment this line to see compilation errors re: // "incomplete types" // static_assert(std::is_trivial::value); union alignas(E) Foo { char e[sizeof(E)]; }; using I = int; Foo* foo; void put(E& e) { static_assert(std::is_trivial::value); } }; struct C { struct B; using AB = A; using I = typename AB::I; struct B { I i; // SEE ME! Uncomment this line to make B non-trivial // std::unordered_map s; }; AB a; void do_it() { B b; a.put(b); } }; int main() { C c; c.do_it(); return 0; } ``` I suggest that you paste this in godbolt.org (remember to switch to C++) or compile locally, following the `SEE ME` comments to understand what's going on. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18979#issuecomment-2184974034 From jsjolen at openjdk.org Sun Jun 23 12:59:40 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Sun, 23 Jun 2024 12:59:40 GMT Subject: RFR: 8333658: NMT: Use an allocator with 4-byte pointers to save memory in NativeCallStackStorage [v27] In-Reply-To: References: Message-ID: <_F__nEtttBi229OINyw0WscqmX7voh_wswFXyFbo8Yc=.afc13c58-dca5-4ee2-9440-2db9b10b2ac4@github.com> > Hi, > > This PR introduces a new allocator, `IndexedFreelistAllocator`. It uses a `GrowableArray` in order to have 4-byte "pointers" to its elements and also works as a freelist as unused slots form a linked list. This allocator cannot shrink its used memory. I'm always open for better names. > > We then use this allocator in order to store the `NativeCallStackStorage` hash table. This saves `4 * 4099 = ~16KiB` of memory as each table element is now only 4 bytes. For each stored stack it also saves 4 bytes. The fact that the allocator cannot shrink is fine, we do not ever remove stacks from the storage. > > The main point of this is to introduce a pointer-generic experimential API, so I also implemented CHeap and Arena allocators. It's currently placed in NMT, but we might want to move it into utilities. It uses a bit of template-magic, but my IDE (Emacs+clangd) was actually able to figure things out when the types didn't line up correctly etc, so it's not an enemy to IDE help. > > It sounded expensive to have the GrowableArray continously realloc its underlying data array, so I did a basic test where we allocate 1 000 000 stacks and push them into NativeCallStackStorage backed by different allocators. This is available in the PR. > > The results are as follows on linux-x64-slowdebug: > > > Generate stacks... Done > Time taken with GrowableArray: 8341.240945 > Time taken with CHeap: 12189.031318 > Time taken with Arena: 8800.703092 > Time taken with GrowableArray again: 8295.508829 > > > And on linux-x64: > > > Time taken with GrowableArray: 8378.018135 > Time taken with CHeap: 12437.347868 > Time taken with Arena: 8758.064717 > Time taken with GrowableArray again: 8391.076291 > > > Obviously, this is a very basic benchmark, but it seems like we're faster than CHeap and Arena for this case. Johan Sj?len has updated the pull request incrementally with two additional commits since the last revision: - Tests are in cpp files, not hpp files - Call deallocate, not free ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18979/files - new: https://git.openjdk.org/jdk/pull/18979/files/1fd03834..469e79a5 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18979&range=26 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18979&range=25-26 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/18979.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18979/head:pull/18979 PR: https://git.openjdk.org/jdk/pull/18979 From jsjolen at openjdk.org Sun Jun 23 13:38:53 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Sun, 23 Jun 2024 13:38:53 GMT Subject: RFR: 8333658: NMT: Use an allocator with 4-byte pointers to save memory in NativeCallStackStorage [v28] In-Reply-To: References: Message-ID: > Hi, > > This PR introduces a new allocator, `IndexedFreelistAllocator`. It uses a `GrowableArray` in order to have 4-byte "pointers" to its elements and also works as a freelist as unused slots form a linked list. This allocator cannot shrink its used memory. I'm always open for better names. > > We then use this allocator in order to store the `NativeCallStackStorage` hash table. This saves `4 * 4099 = ~16KiB` of memory as each table element is now only 4 bytes. For each stored stack it also saves 4 bytes. The fact that the allocator cannot shrink is fine, we do not ever remove stacks from the storage. > > The main point of this is to introduce a pointer-generic experimential API, so I also implemented CHeap and Arena allocators. It's currently placed in NMT, but we might want to move it into utilities. It uses a bit of template-magic, but my IDE (Emacs+clangd) was actually able to figure things out when the types didn't line up correctly etc, so it's not an enemy to IDE help. > > It sounded expensive to have the GrowableArray continously realloc its underlying data array, so I did a basic test where we allocate 1 000 000 stacks and push them into NativeCallStackStorage backed by different allocators. This is available in the PR. > > The results are as follows on linux-x64-slowdebug: > > > Generate stacks... Done > Time taken with GrowableArray: 8341.240945 > Time taken with CHeap: 12189.031318 > Time taken with Arena: 8800.703092 > Time taken with GrowableArray again: 8295.508829 > > > And on linux-x64: > > > Time taken with GrowableArray: 8378.018135 > Time taken with CHeap: 12437.347868 > Time taken with Arena: 8758.064717 > Time taken with GrowableArray again: 8391.076291 > > > Obviously, this is a very basic benchmark, but it seems like we're faster than CHeap and Arena for this case. Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: Make test node trivial type ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18979/files - new: https://git.openjdk.org/jdk/pull/18979/files/469e79a5..3233f39b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18979&range=27 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18979&range=26-27 Stats: 9 lines in 1 file changed: 0 ins; 8 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/18979.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18979/head:pull/18979 PR: https://git.openjdk.org/jdk/pull/18979 From amitkumar at openjdk.org Sun Jun 23 16:26:15 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Sun, 23 Jun 2024 16:26:15 GMT Subject: RFR: 8332603: [PPC64] Improve itable_stub [v4] In-Reply-To: References: Message-ID: On Sat, 22 Jun 2024 20:05:01 GMT, Martin Doerr wrote: > The PRs don't get lost if we close them. We can still reopen them in case we need them. Sure, thanks for review :-) For now I'm closing it. I'll see, if in future we can squeeze out more performance and integrate it ; ------------- PR Comment: https://git.openjdk.org/jdk/pull/19733#issuecomment-2185112348 From amitkumar at openjdk.org Sun Jun 23 16:26:16 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Sun, 23 Jun 2024 16:26:16 GMT Subject: Withdrawn: 8332603: [PPC64] Improve itable_stub In-Reply-To: References: Message-ID: On Sat, 15 Jun 2024 10:54:30 GMT, Amit Kumar wrote: > PPC Port similar to [JDK-8305959 (x86)](https://bugs.openjdk.org/browse/JDK-8305959) and [JDK-8307352(aarch64)](https://bugs.openjdk.org/browse/JDK-8307352) This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/19733 From amitkumar at openjdk.org Mon Jun 24 05:04:14 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Mon, 24 Jun 2024 05:04:14 GMT Subject: RFR: 8331126: [s390x] secondary_super_cache does not scale well [v4] In-Reply-To: References: Message-ID: On Mon, 17 Jun 2024 10:19:46 GMT, Amit Kumar wrote: >> s390x Port for [JDK-8180450](https://bugs.openjdk.org/browse/JDK-8180450) >> >> I ran `tier1` test with `-XX:+UseSecondarySupersTable -XX:+VerifySecondarySupers -XX:+StressSecondarySupers` in fastdebug vm and didn't see any new failure appearing; Except one I have mentioned [here](https://github.com/openjdk/jdk/pull/19368#issuecomment-2154983693); But this is reproducible on every other architecture with these flags. >> >> >> Without Patch: >> >> SecondarySuperCacheHits.test avgt 15 0.929 ? 0.010 ns/op >> >> SecondarySuperCacheInterContention.test avgt 15 1.413 ? 0.007 ns/op >> SecondarySuperCacheInterContention.test:t1 avgt 15 1.415 ? 0.016 ns/op >> SecondarySuperCacheInterContention.test:t2 avgt 15 1.410 ? 0.017 ns/op >> >> Benchmark Mode Cnt Score Error Units >> SecondarySupersLookup.testNegative00 avgt 15 1.806 ? 0.325 ns/op >> SecondarySupersLookup.testNegative01 avgt 15 2.364 ? 0.236 ns/op >> SecondarySupersLookup.testNegative02 avgt 15 2.903 ? 0.215 ns/op >> SecondarySupersLookup.testNegative03 avgt 15 3.417 ? 0.199 ns/op >> SecondarySupersLookup.testNegative04 avgt 15 3.758 ? 0.102 ns/op >> SecondarySupersLookup.testNegative05 avgt 15 4.352 ? 0.123 ns/op >> SecondarySupersLookup.testNegative06 avgt 15 4.800 ? 0.099 ns/op >> SecondarySupersLookup.testNegative07 avgt 15 5.365 ? 0.060 ns/op >> SecondarySupersLookup.testNegative08 avgt 15 6.316 ? 0.092 ns/op >> SecondarySupersLookup.testNegative09 avgt 15 6.669 ? 0.164 ns/op >> SecondarySupersLookup.testNegative10 avgt 15 7.041 ? 0.164 ns/op >> SecondarySupersLookup.testNegative16 avgt 15 9.336 ? 0.185 ns/op >> SecondarySupersLookup.testNegative20 avgt 15 11.373 ? 0.029 ns/op >> SecondarySupersLookup.testNegative30 avgt 15 15.236 ? 0.051 ns/op >> SecondarySupersLookup.testNegative32 avgt 15 16.031 ? 0.091 ns/op >> SecondarySupersLookup.testNegative40 avgt 15 19.197 ? 0.279 ns/op >> SecondarySupersLookup.testNegative50 avgt 15 23.804 ? 2.387 ns/op >> SecondarySupersLookup.testNegative55 avgt 15 25.610 ? 1.155 ns/op >> SecondarySupersLookup.testNegative56 avgt 15 26.128 ? 2.203 ns/op >> SecondarySupersLookup.testNegative57 avgt 15 26.126 ? 0.881 ns/op >> SecondarySupersLookup.testNegative58 avgt 15 26.314 ? 0.521 ns/op >> SecondarySupersLookup.testNegative59 avgt 15 26.750 ? 0.837 ns/op >> SecondarySupersLookup.testNegative60 avgt 15 27.118 ? 0.557 ... > > Amit Kumar has updated the pull request incrementally with two additional commits since the last revision: > > - Update src/hotspot/cpu/s390/macroAssembler_s390.cpp > - rename: r_scratch to r_result in repne_scan method @theRealAph @TheRealMDoerr @RealLucy Can I get a review for this, please ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/19544#issuecomment-2185609102 From dholmes at openjdk.org Mon Jun 24 05:15:18 2024 From: dholmes at openjdk.org (David Holmes) Date: Mon, 24 Jun 2024 05:15:18 GMT Subject: RFR: 8333542: Breakpoint in parallel code does not work In-Reply-To: References: Message-ID: On Mon, 17 Jun 2024 17:58:14 GMT, Coleen Phillimore wrote: > Revert the change for JDK-8288064 "Class initialization locking". JVMTI class prepare event relies on a lock being held through setting the state of the class to 'linked' and the JVMTI event posting. The only usable lock is the Java object init_lock, which was removed. This change restores the lock and fixes all the conflicts in code that's changed since. > > Tested with tier1-7. This reversion to the OL code looks good to me. Thanks ------------- Marked as reviewed by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19755#pullrequestreview-2134649243 From dholmes at openjdk.org Mon Jun 24 05:24:28 2024 From: dholmes at openjdk.org (David Holmes) Date: Mon, 24 Jun 2024 05:24:28 GMT Subject: RFR: 8334239: Introduce macro for ubsan method/function exclusions [v6] In-Reply-To: References: Message-ID: On Wed, 19 Jun 2024 13:50:40 GMT, Matthias Baesken wrote: >> A number of functions/methods have to be excluded from ubsan detection (e.g. because they do things that ubsan warns about, however it is still valid what is done there). >> We can simplify this by introducing a macro (similar to asan-related ATTRIBUTE_NO_ASAN, see sanitizers/address.hpp). >> Currently something like this is used : >> >> #if defined(__clang__) || defined(__GNUC__) >> __attribute__((no_sanitize("undefined"))) >> #endif > > Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: > > add blank line Apologies - my review of your updates in response to my comments never got submitted. ------------- PR Review: https://git.openjdk.org/jdk/pull/19722#pullrequestreview-2134658908 From azafari at openjdk.org Mon Jun 24 08:16:15 2024 From: azafari at openjdk.org (Afshin Zafari) Date: Mon, 24 Jun 2024 08:16:15 GMT Subject: RFR: 8333658: NMT: Use an allocator with 4-byte pointers to save memory in NativeCallStackStorage [v28] In-Reply-To: References: Message-ID: On Sun, 23 Jun 2024 13:38:53 GMT, Johan Sj?len wrote: >> Hi, >> >> This PR introduces a new allocator, `IndexedFreelistAllocator`. It uses a `GrowableArray` in order to have 4-byte "pointers" to its elements and also works as a freelist as unused slots form a linked list. This allocator cannot shrink its used memory. I'm always open for better names. >> >> We then use this allocator in order to store the `NativeCallStackStorage` hash table. This saves `4 * 4099 = ~16KiB` of memory as each table element is now only 4 bytes. For each stored stack it also saves 4 bytes. The fact that the allocator cannot shrink is fine, we do not ever remove stacks from the storage. >> >> The main point of this is to introduce a pointer-generic experimential API, so I also implemented CHeap and Arena allocators. It's currently placed in NMT, but we might want to move it into utilities. It uses a bit of template-magic, but my IDE (Emacs+clangd) was actually able to figure things out when the types didn't line up correctly etc, so it's not an enemy to IDE help. >> >> It sounded expensive to have the GrowableArray continously realloc its underlying data array, so I did a basic test where we allocate 1 000 000 stacks and push them into NativeCallStackStorage backed by different allocators. This is available in the PR. >> >> The results are as follows on linux-x64-slowdebug: >> >> >> Generate stacks... Done >> Time taken with GrowableArray: 8341.240945 >> Time taken with CHeap: 12189.031318 >> Time taken with Arena: 8800.703092 >> Time taken with GrowableArray again: 8295.508829 >> >> >> And on linux-x64: >> >> >> Time taken with GrowableArray: 8378.018135 >> Time taken with CHeap: 12437.347868 >> Time taken with Arena: 8758.064717 >> Time taken with GrowableArray again: 8391.076291 >> >> >> Obviously, this is a very basic benchmark, but it seems like we're faster than CHeap and Arena for this case. > > Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: > > Make test node trivial type src/hotspot/share/nmt/nmtNativeCallStackStorage.hpp line 30: > 28: #include "memory/allocation.hpp" > 29: #include "utilities/growableArray.hpp" > 30: #include "nmt/homogenousObjectArray.hpp" order of headers. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18979#discussion_r1650546524 From stefank at openjdk.org Mon Jun 24 08:17:21 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Mon, 24 Jun 2024 08:17:21 GMT Subject: RFR: 8326820: Metadata artificially kept alive [v3] In-Reply-To: References: <1xvsDMD50nAALDma6JqSvqNVfYl1KnN0ihFdkztDHYE=.97983fa3-6feb-4ee6-b578-c76e326bcc33@github.com> Message-ID: On Sat, 22 Jun 2024 00:18:43 GMT, Coleen Phillimore wrote: > The 'resolve' for the CLDG iterator was to temporarily keep that CLD from being unloaded, in the short time that we're iterating on that particular CLD. This is the crux of the problem. For concurrent GCs, if the 'resolve' is called during a concurrent marking the CLD will be "marked" alive, and will be considered live all the way until we start a new concurrent mark cycle. At that point, we'll try again to figure out if the CLD is dead. If you then again use the iterator during that concurrent marking, the cycle repeats. So, if you tend to use these iterators a lot, we'll never get the chance to unload the classes. This patch tries to combat that, by changing the iterators. With the patch the iterators hands out objects that are not dead, but they are not considered part of the live object graph. You can use the oops (and its transitive closure) in the CLD as long as you block out safepoints. However, if you try to use them after blocking for a safepoint things will break because the objects are not guaranteed to be a part of the live object graph so nothing kept is keeping them alive. That's what the no-keepalive is intended to refer to. It would be great if we could figure out a name that hints that the iterators are unsafe to use unless you have understood the above. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19769#issuecomment-2185885243 From aph at openjdk.org Mon Jun 24 08:17:21 2024 From: aph at openjdk.org (Andrew Haley) Date: Mon, 24 Jun 2024 08:17:21 GMT Subject: RFR: 8332587: RISC-V: secondary_super_cache does not scale well [v10] In-Reply-To: References: Message-ID: On Thu, 20 Jun 2024 03:18:41 GMT, Gui Cao wrote: >> Implementation of subtype checking [JDK-8180450](https://bugs.openjdk.org/browse/JDK-8180450) for linux-riscv64. >> This optimization depends on availability of the Zbb extension which has the cpop instruction. >> >> ### Correctness testing: >> >> - [x] Run tier1-3, hotspot:tier4 tests on SOPHON SG2042 (release) >> - [x] Run tier1-3 tests with -XX:+UseRVV on qemu 8.1.0 (fastdebug) >> - [x] Run all of tier1 with `-XX:+VerifySecondarySupers` >> >> ### JMH tested on Banana Pi BPI-F3 board (has Zbb) and Enable UseZbb, Not Enable UseZba, UseZbs >> Original: >> >> Benchmark Mode Cnt Score Error Units >> SecondarySuperCacheHits.test avgt 15 11.375 ? 0.071 ns/op >> SecondarySuperCacheInterContention.test avgt 15 646.087 ? 32.587 ns/op >> SecondarySuperCacheInterContention.test:t1 avgt 15 600.090 ? 83.779 ns/op >> SecondarySuperCacheInterContention.test:t2 avgt 15 692.084 ? 73.218 ns/op >> SecondarySupersLookup.testNegative00 avgt 15 16.420 ? 0.239 ns/op >> SecondarySupersLookup.testNegative01 avgt 15 18.307 ? 0.260 ns/op >> SecondarySupersLookup.testNegative02 avgt 15 21.695 ? 0.458 ns/op >> SecondarySupersLookup.testNegative03 avgt 15 24.855 ? 0.664 ns/op >> SecondarySupersLookup.testNegative04 avgt 15 27.305 ? 0.522 ns/op >> SecondarySupersLookup.testNegative05 avgt 15 29.719 ? 0.385 ns/op >> SecondarySupersLookup.testNegative06 avgt 15 32.231 ? 0.498 ns/op >> SecondarySupersLookup.testNegative07 avgt 15 33.747 ? 0.603 ns/op >> SecondarySupersLookup.testNegative08 avgt 15 35.856 ? 0.629 ns/op >> SecondarySupersLookup.testNegative09 avgt 15 37.077 ? 0.546 ns/op >> SecondarySupersLookup.testNegative10 avgt 15 39.408 ? 0.465 ns/op >> SecondarySupersLookup.testNegative16 avgt 15 51.041 ? 0.547 ns/op >> SecondarySupersLookup.testNegative20 avgt 15 58.722 ? 0.922 ns/op >> SecondarySupersLookup.testNegative30 avgt 15 77.310 ? 0.654 ns/op >> SecondarySupersLookup.testNegative32 avgt 15 81.116 ? 0.854 ns/op >> SecondarySupersLookup.testNegative40 avgt 15 96.311 ? 0.840 ns/op >> SecondarySupersLookup.testNegative50 avgt 15 115.427 ? 0.838 ns/op >> SecondarySupersLookup.testNegative55 avgt 15 124.371 ? 1.076 ns/op >> SecondarySupersLookup.testNegative56 avgt 15 126.796 ? 0.916 ns/op >> SecondarySupersLookup.testNegative57 avgt 15 127.952 ? 1.202 ns/op >> SecondarySupersLookup.testNegative58 avgt 15 131.956 ? 4.515 ns/op >> Seco... > > Gui Cao has updated the pull request incrementally with one additional commit since the last revision: > > Add comment and fix population_count src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 3802: > 3800: // Check for wraparound. > 3801: Label skip; > 3802: bge(r_array_length, r_array_index, skip); Are you sure this test is correct? I would have thought it would be `bgt`. If length == index, then you must set index to 0. I would have expected this to fail testing. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19320#discussion_r1650548279 From jsjolen at openjdk.org Mon Jun 24 08:18:13 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Mon, 24 Jun 2024 08:18:13 GMT Subject: RFR: 8322475: Extend printing for System.map [v6] In-Reply-To: <-Qkoj2CJIqS0pNR-3JxXULeaty66oPIAJZgFx7IskTA=.9e679c42-24e4-4fb2-a3fd-d27be65aeac0@github.com> References: <-Qkoj2CJIqS0pNR-3JxXULeaty66oPIAJZgFx7IskTA=.9e679c42-24e4-4fb2-a3fd-d27be65aeac0@github.com> Message-ID: On Thu, 20 Jun 2024 09:31:48 GMT, Thomas Stuefe wrote: >> This is an expansion on the new `System.map` command introduced with JDK-8318636. >> >> We now print valuable information per memory region, such as: >> >> - the actual resident set size >> - the actual number of huge pages >> - the actual used page size >> - the THP state of the region (was advised, is eligible, uses THP, ...) >> - whether the region is shared >> - whether the region had been committed (backed by swap) >> - whether the region has been swapped out. >> >> Example output: >> >> >> from to size rss hugetlb pgsz prot notes vm info/file >> 0x00000000c0000000 - 0x00000000ffe00000 1071644672 0 4194304 2M rw-p huge JAVAHEAP /anon_hugepage >> 0x00000000ffe00000 - 0x0000000100000000 2097152 0 0 2M rw-p huge JAVAHEAP /anon_hugepage >> 0x0000558016b67000 - 0x0000558016b68000 4096 4096 0 4K r--p /shared/projects/openjdk/jdk-jdk/output-fastdebug/images/jdk/bin/java >> 0x0000558016b68000 - 0x0000558016b69000 4096 4096 0 4K r-xp /shared/projects/openjdk/jdk-jdk/output-fastdebug/images/jdk/bin/java >> 0x00007f3a749f2000 - 0x00007f3a74c62000 2555904 2555904 0 4K rwxp CODE(CodeHeap 'profiled nmethods') >> 0x00007f3a74c62000 - 0x00007f3a7be51000 119468032 0 0 4K ---p nores CODE(CodeHeap 'profiled nmethods') >> 0x00007f3a7be51000 - 0x00007f3a7c1c1000 3604480 3604480 0 4K rwxp CODE(CodeHeap 'profiled nmethods') >> 0x00007f3a7c1c1000 - 0x00007f3a7c592000 4001792 0 0 4K ---p nores CODE(CodeHeap 'non-nmethods') >> 0x00007f3a7c592000 - 0x00007f3a7c802000 2555904 2555904 0 4K rwxp CODE(CodeHeap 'non-profiled nmethods') ... > > Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 23 commits: > > - feedback johan > - fix merge errors > - Merge branch 'master' into System.maps-more-info > - copyrights > - Merge branch 'master' into System.maps-more-info > - fix merge issue > - Merge branch 'master' into System.maps-more-info > - fix whitespace issue > - wip > - exhuming > - ... and 13 more: https://git.openjdk.org/jdk/compare/c6f3bf4b...940199de This looks good to me. ------------- Marked as reviewed by jsjolen (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17158#pullrequestreview-2135005516 From azafari at openjdk.org Mon Jun 24 08:22:16 2024 From: azafari at openjdk.org (Afshin Zafari) Date: Mon, 24 Jun 2024 08:22:16 GMT Subject: RFR: 8333658: NMT: Use an allocator with 4-byte pointers to save memory in NativeCallStackStorage [v28] In-Reply-To: References: Message-ID: On Sun, 23 Jun 2024 13:38:53 GMT, Johan Sj?len wrote: >> Hi, >> >> This PR introduces a new allocator, `IndexedFreelistAllocator`. It uses a `GrowableArray` in order to have 4-byte "pointers" to its elements and also works as a freelist as unused slots form a linked list. This allocator cannot shrink its used memory. I'm always open for better names. >> >> We then use this allocator in order to store the `NativeCallStackStorage` hash table. This saves `4 * 4099 = ~16KiB` of memory as each table element is now only 4 bytes. For each stored stack it also saves 4 bytes. The fact that the allocator cannot shrink is fine, we do not ever remove stacks from the storage. >> >> The main point of this is to introduce a pointer-generic experimential API, so I also implemented CHeap and Arena allocators. It's currently placed in NMT, but we might want to move it into utilities. It uses a bit of template-magic, but my IDE (Emacs+clangd) was actually able to figure things out when the types didn't line up correctly etc, so it's not an enemy to IDE help. >> >> It sounded expensive to have the GrowableArray continously realloc its underlying data array, so I did a basic test where we allocate 1 000 000 stacks and push them into NativeCallStackStorage backed by different allocators. This is available in the PR. >> >> The results are as follows on linux-x64-slowdebug: >> >> >> Generate stacks... Done >> Time taken with GrowableArray: 8341.240945 >> Time taken with CHeap: 12189.031318 >> Time taken with Arena: 8800.703092 >> Time taken with GrowableArray again: 8295.508829 >> >> >> And on linux-x64: >> >> >> Time taken with GrowableArray: 8378.018135 >> Time taken with CHeap: 12437.347868 >> Time taken with Arena: 8758.064717 >> Time taken with GrowableArray again: 8391.076291 >> >> >> Obviously, this is a very basic benchmark, but it seems like we're faster than CHeap and Arena for this case. > > Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: > > Make test node trivial type Thanks Johan. One small nit on order of header files. ------------- PR Review: https://git.openjdk.org/jdk/pull/18979#pullrequestreview-2135016470 From fyang at openjdk.org Mon Jun 24 08:48:17 2024 From: fyang at openjdk.org (Fei Yang) Date: Mon, 24 Jun 2024 08:48:17 GMT Subject: RFR: 8332587: RISC-V: secondary_super_cache does not scale well [v10] In-Reply-To: References: Message-ID: <2RCvoTzEWMk4XQYWQpd7YUbBS1Oi7Dq93A51fpfTEtk=.a1995f99-522a-4a92-b260-db7da741188f@github.com> On Mon, 24 Jun 2024 08:14:25 GMT, Andrew Haley wrote: >> Gui Cao has updated the pull request incrementally with one additional commit since the last revision: >> >> Add comment and fix population_count > > src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 3802: > >> 3800: // Check for wraparound. >> 3801: Label skip; >> 3802: bge(r_array_length, r_array_index, skip); > > Are you sure this test is correct? I would have thought it would be `bgt`. If length == index, then you must set index to 0. I would have expected this to fail testing. Good catch! Yes, I agree that we should use `bgt` here. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19320#discussion_r1650601267 From shade at openjdk.org Mon Jun 24 08:49:26 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 24 Jun 2024 08:49:26 GMT Subject: Integrated: 8334594: Generational ZGC: Deadlock after OopMap rewrites in 8331572 In-Reply-To: References: Message-ID: On Thu, 20 Jun 2024 09:11:04 GMT, Aleksey Shipilev wrote: > As shown in the bug, there are cases when acquiring the `ServiceLock` for opportunistic notification leads to deadlock. We can untie the deadlock by checking if `ServiceLock` can be acquired on triggering path, and never blocking otherwise. > > Additional testing: > - [x] Linux x86_64 service fastdebug, `all` > - [x] Linux AArch64 service fastdebug, `all` This pull request has now been integrated. Changeset: 05ff3185 Author: Aleksey Shipilev URL: https://git.openjdk.org/jdk/commit/05ff3185edd25b381a97f6879f496e97b62dddc2 Stats: 12 lines in 6 files changed: 3 ins; 0 del; 9 mod 8334594: Generational ZGC: Deadlock after OopMap rewrites in 8331572 Reviewed-by: stefank, eosterlund, coleenp, zgu ------------- PR: https://git.openjdk.org/jdk/pull/19800 From shade at openjdk.org Mon Jun 24 09:07:25 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 24 Jun 2024 09:07:25 GMT Subject: [jdk23] RFR: 8334594: Generational ZGC: Deadlock after OopMap rewrites in 8331572 Message-ID: Clean backport to fix a deadlock. ------------- Commit messages: - Backport 05ff3185edd25b381a97f6879f496e97b62dddc2 Changes: https://git.openjdk.org/jdk/pull/19851/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19851&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8334594 Stats: 12 lines in 6 files changed: 3 ins; 0 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/19851.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19851/head:pull/19851 PR: https://git.openjdk.org/jdk/pull/19851 From stefank at openjdk.org Mon Jun 24 09:20:15 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Mon, 24 Jun 2024 09:20:15 GMT Subject: RFR: 8333658: NMT: Use an allocator with 4-byte pointers to save memory in NativeCallStackStorage [v28] In-Reply-To: References: Message-ID: On Sun, 23 Jun 2024 13:38:53 GMT, Johan Sj?len wrote: >> Hi, >> >> This PR introduces a new allocator, `IndexedFreelistAllocator`. It uses a `GrowableArray` in order to have 4-byte "pointers" to its elements and also works as a freelist as unused slots form a linked list. This allocator cannot shrink its used memory. I'm always open for better names. >> >> We then use this allocator in order to store the `NativeCallStackStorage` hash table. This saves `4 * 4099 = ~16KiB` of memory as each table element is now only 4 bytes. For each stored stack it also saves 4 bytes. The fact that the allocator cannot shrink is fine, we do not ever remove stacks from the storage. >> >> The main point of this is to introduce a pointer-generic experimential API, so I also implemented CHeap and Arena allocators. It's currently placed in NMT, but we might want to move it into utilities. It uses a bit of template-magic, but my IDE (Emacs+clangd) was actually able to figure things out when the types didn't line up correctly etc, so it's not an enemy to IDE help. >> >> It sounded expensive to have the GrowableArray continously realloc its underlying data array, so I did a basic test where we allocate 1 000 000 stacks and push them into NativeCallStackStorage backed by different allocators. This is available in the PR. >> >> The results are as follows on linux-x64-slowdebug: >> >> >> Generate stacks... Done >> Time taken with GrowableArray: 8341.240945 >> Time taken with CHeap: 12189.031318 >> Time taken with Arena: 8800.703092 >> Time taken with GrowableArray again: 8295.508829 >> >> >> And on linux-x64: >> >> >> Time taken with GrowableArray: 8378.018135 >> Time taken with CHeap: 12437.347868 >> Time taken with Arena: 8758.064717 >> Time taken with GrowableArray again: 8391.076291 >> >> >> Obviously, this is a very basic benchmark, but it seems like we're faster than CHeap and Arena for this case. > > Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: > > Make test node trivial type Could we try another round of coming up with a better name for this utility? HomogenousObjectArray is eerily similar to G1's humongous object arrays. It's also not clear to me what makes this array an homogenous array. Is our other arrays non-homogenous? Would it make sense to put this in utilities/ instead of nmt/. The include guards should be placed before the includes. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18979#issuecomment-2186013453 From gcao at openjdk.org Mon Jun 24 09:26:17 2024 From: gcao at openjdk.org (Gui Cao) Date: Mon, 24 Jun 2024 09:26:17 GMT Subject: RFR: 8332587: RISC-V: secondary_super_cache does not scale well [v10] In-Reply-To: <2RCvoTzEWMk4XQYWQpd7YUbBS1Oi7Dq93A51fpfTEtk=.a1995f99-522a-4a92-b260-db7da741188f@github.com> References: <2RCvoTzEWMk4XQYWQpd7YUbBS1Oi7Dq93A51fpfTEtk=.a1995f99-522a-4a92-b260-db7da741188f@github.com> Message-ID: On Mon, 24 Jun 2024 08:45:19 GMT, Fei Yang wrote: >> src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 3802: >> >>> 3800: // Check for wraparound. >>> 3801: Label skip; >>> 3802: bge(r_array_length, r_array_index, skip); >> >> Are you sure this test is correct? I would have thought it would be `bgt`. If length == index, then you must set index to 0. I would have expected this to fail testing. > > Good catch! Yes, I agree that we should use `bgt` here. > Are you sure this test is correct? I would have thought it would be `bgt`. If length == index, then you must set index to 0. I would have expected this to fail testing. Thanks, I will fix it like `blt(r_array_index,r_array_length,skip);` Linked JBS: https://bugs.openjdk.org/browse/JDK-8334843. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19320#discussion_r1650667024 From maxim.kartashev at jetbrains.com Mon Jun 24 09:28:26 2024 From: maxim.kartashev at jetbrains.com (Maxim Kartashev) Date: Mon, 24 Jun 2024 13:28:26 +0400 Subject: RFO: a tool to analyze HotSpot fatal error logs In-Reply-To: References: <76ed1a5d-ec36-4e2c-a31a-af4c80b592a1@oracle.com> Message-ID: The tool to examine HotSpot crashes has been published as a plugin: https://plugins.jetbrains.com/plugin/24675-hotspot-crash-examiner Its source code is fully open: https://github.com/JetBrains/HotSpotCrashExaminerPlugin I hope the community around HotSpot will find it useful. On Wed, Apr 24, 2024 at 3:14?PM Brice Dutheil wrote: > > I would find this tool particularly useful. I came across various `hs_err_pid` files, while one can read them this can speed analyzing up significantly. > > Even for simple cases, like a bug in the FFM API usage (it's still possible to dereference a bad address). > > Looking forward to this plugin ! > > Thanks for proposing the idea. > -- Brice > > > On Fri, Apr 12, 2024 at 5:35?PM Laurence Cable wrote: >> >> Hi Maxim, a great idea, the JDK serviceability team here at Oracle would >> like to assist you in any way we can. >> >> I think also we should (in the future) consider the format of the error >> file and associated jcmd and perhaps render the content >> in a format that is better suited to programmatic parsing even a >> jq-esque formatter that took the human-readable format and >> re-formatted would be useful (IMO) >> >> Rgds >> >> - Larry Cable >> >> >> On 4/11/24 7:05 AM, Maxim Kartashev wrote: >> > Hello, >> > >> > I am writing to inquire about the potential interest of the people >> > involved in inspecting HotSpot crashes in a tool aimed at facilitating >> > that inspection. >> > >> > We at JetBrains have developed an internal plugin that helps both with >> > filtering through dozens of reports quickly in order to find a pattern >> > and for diving deep into a particular crash. In addition to the >> > "standard" features such as syntax highlighting, folding, and >> > structural navigation, it will >> > * highlight potential problems such as overloaded CPU, low physical >> > memory, the presence of OOME in the recent exceptions, LD_LIBRARY_PATH >> > being set, etc, >> > * generate an "executive summary" for a high-level overview, for >> > example, by front-line support, >> > * pop up a tooltip for any recognized address describing its origin >> > (for example, if it belongs to some thread's stack, the Java heap, a >> > register, or a memory-mapped region), >> > * provide the ability to highlight all addresses "near" the selected >> > address, including registers, threads, and memory-mapped regions. >> > >> > If there is sufficient interest in creating a public and/or >> > open-source variant of this internal plugin, I will pitch the idea to >> > my employer. It shouldn't be too much work to create a public version. >> > >> > Kind regards, >> > Maxim. >> > >> > References: >> > * https://docs.oracle.com/javase/10/troubleshoot/fatal-error-log.htm >> > >> From jsjolen at openjdk.org Mon Jun 24 09:51:49 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Mon, 24 Jun 2024 09:51:49 GMT Subject: RFR: 8333658: NMT: Use an allocator with 4-byte pointers to save memory in NativeCallStackStorage [v29] In-Reply-To: References: Message-ID: <_UzjquVa1CRphN881l-geAXesq0zLI9GGsie512-bhs=.57d37312-7aca-4fc5-b59e-a2f121e77584@github.com> > Hi, > > This PR introduces a new allocator, `IndexedFreelistAllocator`. It uses a `GrowableArray` in order to have 4-byte "pointers" to its elements and also works as a freelist as unused slots form a linked list. This allocator cannot shrink its used memory. I'm always open for better names. > > We then use this allocator in order to store the `NativeCallStackStorage` hash table. This saves `4 * 4099 = ~16KiB` of memory as each table element is now only 4 bytes. For each stored stack it also saves 4 bytes. The fact that the allocator cannot shrink is fine, we do not ever remove stacks from the storage. > > The main point of this is to introduce a pointer-generic experimential API, so I also implemented CHeap and Arena allocators. It's currently placed in NMT, but we might want to move it into utilities. It uses a bit of template-magic, but my IDE (Emacs+clangd) was actually able to figure things out when the types didn't line up correctly etc, so it's not an enemy to IDE help. > > It sounded expensive to have the GrowableArray continously realloc its underlying data array, so I did a basic test where we allocate 1 000 000 stacks and push them into NativeCallStackStorage backed by different allocators. This is available in the PR. > > The results are as follows on linux-x64-slowdebug: > > > Generate stacks... Done > Time taken with GrowableArray: 8341.240945 > Time taken with CHeap: 12189.031318 > Time taken with Arena: 8800.703092 > Time taken with GrowableArray again: 8295.508829 > > > And on linux-x64: > > > Time taken with GrowableArray: 8378.018135 > Time taken with CHeap: 12437.347868 > Time taken with Arena: 8758.064717 > Time taken with GrowableArray again: 8391.076291 > > > Obviously, this is a very basic benchmark, but it seems like we're faster than CHeap and Arena for this case. Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: Correct placement of include guards ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18979/files - new: https://git.openjdk.org/jdk/pull/18979/files/3233f39b..4d631c11 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18979&range=28 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18979&range=27-28 Stats: 6 lines in 2 files changed: 3 ins; 3 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/18979.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18979/head:pull/18979 PR: https://git.openjdk.org/jdk/pull/18979 From jsjolen at openjdk.org Mon Jun 24 09:56:28 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Mon, 24 Jun 2024 09:56:28 GMT Subject: RFR: 8333658: NMT: Use an allocator with 4-byte pointers to save memory in NativeCallStackStorage [v28] In-Reply-To: References: Message-ID: On Mon, 24 Jun 2024 09:17:28 GMT, Stefan Karlsson wrote: >The include guards should be placed before the includes. Thanks! >Would it make sense to put this in utilities/ instead of nmt/. Do we have anyone else that wants to use this? The threshold in terms of API quality is typically higher for something in utilities/ than a local utility. I'm working on some improvements to this allocator already, maybe we can take a move to utilities/ together with those improvements? > Could we try another round of coming up with a better name for this utility? HomogenousObjectArray is eerily similar to G1's humongous object arrays. It's also not clear to me what makes this array an homogenous array. Is our other arrays non-homogenous? I agree on the name part. We could call it a `HomogenousAllocator`, that does differentiate it meaningfully from something like `Arena` which can allocate anything. @tstuefe , arenas are also an example of the storage/lifetime of the objects from an allocator being bound to the lifetime of that allocator, so I think this makes sense. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18979#issuecomment-2186092159 From jsjolen at openjdk.org Mon Jun 24 10:05:16 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Mon, 24 Jun 2024 10:05:16 GMT Subject: RFR: 8333658: NMT: Use an allocator with 4-byte pointers to save memory in NativeCallStackStorage [v28] In-Reply-To: References: Message-ID: On Mon, 24 Jun 2024 08:13:03 GMT, Afshin Zafari wrote: >> Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: >> >> Make test node trivial type > > src/hotspot/share/nmt/nmtNativeCallStackStorage.hpp line 30: > >> 28: #include "memory/allocation.hpp" >> 29: #include "utilities/growableArray.hpp" >> 30: #include "nmt/homogenousObjectArray.hpp" > > order of headers. Thanks! Fixed ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18979#discussion_r1650741067 From stefank at openjdk.org Mon Jun 24 10:11:21 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Mon, 24 Jun 2024 10:11:21 GMT Subject: RFR: 8324781: runtime/Thread/TestAlwaysPreTouchStacks.java failed with Expected a higher ratio between stack committed and reserved [v9] In-Reply-To: References: Message-ID: On Fri, 14 Jun 2024 04:16:08 GMT, Liming Liu wrote: >> Meanwhile, I am warming to the current approach. I understand that this it avoids referring to individual downstream vendors, which I agree may be brittle. >> >> My main concern is to prevent future flag mismatches. Therefore, my proposal is to do what this patch does, but in a more generic way. Essentially, encoding that for certain flags, we cannot rely on older kernel correctly ignoring them. But we assume that downstream kernel vendors will at least fix conflicts when they merge in flags from mainline. We sacrifice the ability to benefit from vendor-specific backports, but that is the compromise. >> >> The flags I'd like to guard for now are: >> 1) UEK7: MADV_DONTNEED_LOCKED -> MADV_DOEXEC >> 2) UEK7: MADV_COLLAPSE -> MADV_DONTEXEC >> 3) UEK6: MADV_POPULATE_READ -> MADV_DOEXEC >> 4) UEK6: MADV_POPULATE_WRITE -> MADV_DONTEXEC >> >> If the vendor keeps up its routine of just shifting the proprietary flags to the end of the numerical MADV range for each new mainline flag, we will continue to have problems and this list may grow. >> >> The mechanism could be very close to what @limingliu-ampere does now, only a tad more generic. E.g.: >> >> >> bool os::Linux::can_use_madvise_flag(int someflag) { >> // have a hardcoded array of { flag, kernel version } tupels. >> // Search it for someflag, and if found, return false if host kernel version is older than the encoded version. >> // Otherwise return true. >> } >> >> >> and then maybe wrap the madvise call with something like this: >> >> >> bool os::Linux::checked_madvise(..., someflag) { >> assert(can_use_madvise_flag(someflag)) >> call real madvise >> } >> >> >> in addition to something like this in initialization: >> >> >> if (UseMadvPopulateWrite && ! can_use_madvise_flag(MADV_POPULATE_WRITE)) { >> FLAG_SET_ERGO(UseMadvPopulateWrite, false); >> } >> >> >> Do you like this, does this make sense? > > Hi, @tstuefe. Could you please take a look? The patch had been limited to testcases, as there were already fixes in UEK and you created a ticket to cover pretouch. I don't think @limingliu-ampere has enough rights to backport this fix. @tstuefe is this something that should be backported to JDK 23? ------------- PR Comment: https://git.openjdk.org/jdk/pull/18592#issuecomment-2186178376 From stuefe at openjdk.org Mon Jun 24 11:33:15 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Mon, 24 Jun 2024 11:33:15 GMT Subject: RFR: 8333658: NMT: Use an allocator with 4-byte pointers to save memory in NativeCallStackStorage [v28] In-Reply-To: References: Message-ID: On Mon, 24 Jun 2024 09:54:00 GMT, Johan Sj?len wrote: > > The include guards should be placed before the includes. > > Thanks! > > > Would it make sense to put this in utilities/ instead of nmt/. > > Do we have anyone else that wants to use this? The threshold in terms of API quality is typically higher for something in utilities/ than a local utility. I'm working on some improvements to this allocator already, maybe we can take a move to utilities/ together with those improvements? Yes, I will use it to replace some code in Metaspace at least, but for that I need some more features (placing this thing atop of an existing memory range, and templatized index type). Can all be done in a follow-up RFE. > > > Could we try another round of coming up with a better name for this utility? HomogenousObjectArray is eerily similar to G1's humongous object arrays. It's also not clear to me what makes this array an homogenous array. Is our other arrays non-homogenous? > > I agree on the name part. We could call it a `HomogenousAllocator`, that does differentiate it meaningfully from something like `Arena` which can allocate anything. @tstuefe , arenas are also an example of the storage/lifetime of the objects from an allocator being bound to the lifetime of that allocator, so I think this makes sense. Names are hard. We already have two types of arenas in hotspots, plus glibc has arenas, so "arena" is not ideal either. Arena does usually not imply a free list either, nor homogenous sizes. Thinking about it, Array already sort of implies homogeneous sizes, so maybe "homogeneous" is redundant. ArrayWithFreeList? Unsexy but precise. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18979#issuecomment-2186354479 From stuefe at openjdk.org Mon Jun 24 11:41:13 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Mon, 24 Jun 2024 11:41:13 GMT Subject: RFR: 8333658: NMT: Use an allocator with 4-byte pointers to save memory in NativeCallStackStorage [v29] In-Reply-To: <_UzjquVa1CRphN881l-geAXesq0zLI9GGsie512-bhs=.57d37312-7aca-4fc5-b59e-a2f121e77584@github.com> References: <_UzjquVa1CRphN881l-geAXesq0zLI9GGsie512-bhs=.57d37312-7aca-4fc5-b59e-a2f121e77584@github.com> Message-ID: <-3oOR2zJr-kzAIw3ULK53nW_coHcmdVW3TcH27Vux8I=.1f28b261-0ce9-4824-baf3-f6cc7d1ccc73@github.com> On Mon, 24 Jun 2024 09:51:49 GMT, Johan Sj?len wrote: >> Hi, >> >> This PR introduces a new allocator, `IndexedFreelistAllocator`. It uses a `GrowableArray` in order to have 4-byte "pointers" to its elements and also works as a freelist as unused slots form a linked list. This allocator cannot shrink its used memory. I'm always open for better names. >> >> We then use this allocator in order to store the `NativeCallStackStorage` hash table. This saves `4 * 4099 = ~16KiB` of memory as each table element is now only 4 bytes. For each stored stack it also saves 4 bytes. The fact that the allocator cannot shrink is fine, we do not ever remove stacks from the storage. >> >> The main point of this is to introduce a pointer-generic experimential API, so I also implemented CHeap and Arena allocators. It's currently placed in NMT, but we might want to move it into utilities. It uses a bit of template-magic, but my IDE (Emacs+clangd) was actually able to figure things out when the types didn't line up correctly etc, so it's not an enemy to IDE help. >> >> It sounded expensive to have the GrowableArray continously realloc its underlying data array, so I did a basic test where we allocate 1 000 000 stacks and push them into NativeCallStackStorage backed by different allocators. This is available in the PR. >> >> The results are as follows on linux-x64-slowdebug: >> >> >> Generate stacks... Done >> Time taken with GrowableArray: 8341.240945 >> Time taken with CHeap: 12189.031318 >> Time taken with Arena: 8800.703092 >> Time taken with GrowableArray again: 8295.508829 >> >> >> And on linux-x64: >> >> >> Time taken with GrowableArray: 8378.018135 >> Time taken with CHeap: 12437.347868 >> Time taken with Arena: 8758.064717 >> Time taken with GrowableArray again: 8391.076291 >> >> >> Obviously, this is a very basic benchmark, but it seems like we're faster than CHeap and Arena for this case. > > Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: > > Correct placement of include guards > Hi, > > I've changed the data which is stored in `HOA` to be trivial and have added `static_assert`s to the methods. ... It's very important that the data we put into `HOA` is trivial as we do not call any destructors or copy constructors/operators on the code. We don't, but why forbid the user from doing that? I may want to place a non-trivial object in there, e.g. one where I forbid copying altogether, or one with a single non-default ctor? What would be the problem with that? Forbidding copying would be a good way to make sure the only instance exists at the place the array provides. So its address stable. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18979#issuecomment-2186371303 From stuefe at openjdk.org Mon Jun 24 11:44:18 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Mon, 24 Jun 2024 11:44:18 GMT Subject: RFR: 8324781: runtime/Thread/TestAlwaysPreTouchStacks.java failed with Expected a higher ratio between stack committed and reserved [v9] In-Reply-To: References: Message-ID: On Fri, 14 Jun 2024 04:16:08 GMT, Liming Liu wrote: >> Meanwhile, I am warming to the current approach. I understand that this it avoids referring to individual downstream vendors, which I agree may be brittle. >> >> My main concern is to prevent future flag mismatches. Therefore, my proposal is to do what this patch does, but in a more generic way. Essentially, encoding that for certain flags, we cannot rely on older kernel correctly ignoring them. But we assume that downstream kernel vendors will at least fix conflicts when they merge in flags from mainline. We sacrifice the ability to benefit from vendor-specific backports, but that is the compromise. >> >> The flags I'd like to guard for now are: >> 1) UEK7: MADV_DONTNEED_LOCKED -> MADV_DOEXEC >> 2) UEK7: MADV_COLLAPSE -> MADV_DONTEXEC >> 3) UEK6: MADV_POPULATE_READ -> MADV_DOEXEC >> 4) UEK6: MADV_POPULATE_WRITE -> MADV_DONTEXEC >> >> If the vendor keeps up its routine of just shifting the proprietary flags to the end of the numerical MADV range for each new mainline flag, we will continue to have problems and this list may grow. >> >> The mechanism could be very close to what @limingliu-ampere does now, only a tad more generic. E.g.: >> >> >> bool os::Linux::can_use_madvise_flag(int someflag) { >> // have a hardcoded array of { flag, kernel version } tupels. >> // Search it for someflag, and if found, return false if host kernel version is older than the encoded version. >> // Otherwise return true. >> } >> >> >> and then maybe wrap the madvise call with something like this: >> >> >> bool os::Linux::checked_madvise(..., someflag) { >> assert(can_use_madvise_flag(someflag)) >> call real madvise >> } >> >> >> in addition to something like this in initialization: >> >> >> if (UseMadvPopulateWrite && ! can_use_madvise_flag(MADV_POPULATE_WRITE)) { >> FLAG_SET_ERGO(UseMadvPopulateWrite, false); >> } >> >> >> Do you like this, does this make sense? > > Hi, @tstuefe. Could you please take a look? The patch had been limited to testcases, as there were already fixes in UEK and you created a ticket to cover pretouch. > I don't think @limingliu-ampere has enough rights to backport this fix. @tstuefe is this something that should be backported to JDK 23? Backporting this is fine and low risk ------------- PR Comment: https://git.openjdk.org/jdk/pull/18592#issuecomment-2186375816 From stuefe at openjdk.org Mon Jun 24 11:52:13 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Mon, 24 Jun 2024 11:52:13 GMT Subject: RFR: 8333658: NMT: Use an allocator with 4-byte pointers to save memory in NativeCallStackStorage [v29] In-Reply-To: <_UzjquVa1CRphN881l-geAXesq0zLI9GGsie512-bhs=.57d37312-7aca-4fc5-b59e-a2f121e77584@github.com> References: <_UzjquVa1CRphN881l-geAXesq0zLI9GGsie512-bhs=.57d37312-7aca-4fc5-b59e-a2f121e77584@github.com> Message-ID: On Mon, 24 Jun 2024 09:51:49 GMT, Johan Sj?len wrote: >> Hi, >> >> This PR introduces a new allocator, `IndexedFreelistAllocator`. It uses a `GrowableArray` in order to have 4-byte "pointers" to its elements and also works as a freelist as unused slots form a linked list. This allocator cannot shrink its used memory. I'm always open for better names. >> >> We then use this allocator in order to store the `NativeCallStackStorage` hash table. This saves `4 * 4099 = ~16KiB` of memory as each table element is now only 4 bytes. For each stored stack it also saves 4 bytes. The fact that the allocator cannot shrink is fine, we do not ever remove stacks from the storage. >> >> The main point of this is to introduce a pointer-generic experimential API, so I also implemented CHeap and Arena allocators. It's currently placed in NMT, but we might want to move it into utilities. It uses a bit of template-magic, but my IDE (Emacs+clangd) was actually able to figure things out when the types didn't line up correctly etc, so it's not an enemy to IDE help. >> >> It sounded expensive to have the GrowableArray continously realloc its underlying data array, so I did a basic test where we allocate 1 000 000 stacks and push them into NativeCallStackStorage backed by different allocators. This is available in the PR. >> >> The results are as follows on linux-x64-slowdebug: >> >> >> Generate stacks... Done >> Time taken with GrowableArray: 8341.240945 >> Time taken with CHeap: 12189.031318 >> Time taken with Arena: 8800.703092 >> Time taken with GrowableArray again: 8295.508829 >> >> >> And on linux-x64: >> >> >> Time taken with GrowableArray: 8378.018135 >> Time taken with CHeap: 12437.347868 >> Time taken with Arena: 8758.064717 >> Time taken with GrowableArray again: 8391.076291 >> >> >> Obviously, this is a very basic benchmark, but it seems like we're faster than CHeap and Arena for this case. > > Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: > > Correct placement of include guards Almost good. test/hotspot/gtest/nmt/test_homogenousObjectArray.cpp line 152: > 150: A::I i3 = alloc.allocate(0); > 151: EXPECT_EQ(p1, &alloc.at(i3)); > 152: } I think for what little the tests do, testing with different list types is overengineered. You could just scrap both LL and LL2, and just use a HOA directly. ------------- PR Review: https://git.openjdk.org/jdk/pull/18979#pullrequestreview-2135572565 PR Review Comment: https://git.openjdk.org/jdk/pull/18979#discussion_r1650897160 From stuefe at openjdk.org Mon Jun 24 11:56:17 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Mon, 24 Jun 2024 11:56:17 GMT Subject: RFR: 8334738: os::print_hex_dump should optionally print ASCII Message-ID: Motivated by analyzing CDS dump differences in the context of reproducible builds, I found an optional ASCII printout to be very valuable. As usual with hex dumps, ascii follows hex printout Example: 118 0x00000000000001c0: 204b444a6e65704f 53207469422d3436 4d56207265767265 6564747361662820 OpenJDK 64-Bit Server VM (fastde 119 0x00000000000001e0: 692d343220677562 2d6c616e7265746e 68742e636f686461 756f732e73616d6f bug 24-internal-adhoc.thomas.sou 120 0x0000000000000200: 726f662029656372 612d78756e696c20 45524a203436646d 746e692d34322820 rce) for linux-amd64 JRE (24-int 121 0x0000000000000220: 64612d6c616e7265 6d6f68742e636f68 6372756f732e7361 6c697562202c2965 ernal-adhoc.thomas.source), buil 122 0x0000000000000240: 323032206e6f2074 5430322d36302d34 32313a35343a3031 672068746977205a t on 2024-06-20T10:45:12Z with g 123 0x0000000000000260: 2e352e3031206363 0000000000000030 0000000000000000 0000000000000000 cc 10.5.0_______________________ 124 0x0000000000000280: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 ________________________________ 125 0x00000000000002a0: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 ________________________________ The patch does that. Small unrelated changes: - I rewrote and extended the gtests, testing now a real-life printout containing a mixture or readable and non-readable pages, and printable and non-printable characters. I re-enabled tests on Windows, since https://bugs.openjdk.org/browse/JDK-8185734 is long solved. - The new test uncovered an issue on 32-bit when printing giant words. We shift a signed value by 32 bits upwards, which can result in -1 resp. ffffffff in the upper half of the giant word. One of the pitfalls of intptr_t vs uintptr_t (I think most uses of intptr_t should probably use uintptr_t). - I got tired of casting constness away from to-be-printed memory range just to be able to feed an address to os::print_hex_dump. The content printed is usually const. os::print_hex_dump does not need non-constness, but since we use address, and address is typedef char*, and one cannot declare a typedef'ed pointer target-const, the issue is there. I therefore changed the input to const uint8_t*. Maybe we need a const_address or something similar. ---- Ran tests on Linux x64 and x86, Windows x86 and Mac aarch64. Fixed all issues I found. Only little-endian, I don't have big-endian machines and therefore made those changes blindly. Any maintainers of big-endian platforms should either test this or trust me. ------------- Commit messages: - fixes - wtf - remove unused variable - fixes - fix windows build - fix ascii printout on macos - fix error on 32-bit when printing with giant unit size - wip - fix copyrights - start Changes: https://git.openjdk.org/jdk/pull/19835/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19835&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8334738 Stats: 160 lines in 10 files changed: 61 ins; 15 del; 84 mod Patch: https://git.openjdk.org/jdk/pull/19835.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19835/head:pull/19835 PR: https://git.openjdk.org/jdk/pull/19835 From jsjolen at openjdk.org Mon Jun 24 11:56:13 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Mon, 24 Jun 2024 11:56:13 GMT Subject: RFR: 8333658: NMT: Use an allocator with 4-byte pointers to save memory in NativeCallStackStorage [v28] In-Reply-To: References: Message-ID: On Mon, 24 Jun 2024 11:30:32 GMT, Thomas Stuefe wrote: > > > The include guards should be placed before the includes. > > > > > > Thanks! > > > Would it make sense to put this in utilities/ instead of nmt/. > > > > > > Do we have anyone else that wants to use this? The threshold in terms of API quality is typically higher for something in utilities/ than a local utility. I'm working on some improvements to this allocator already, maybe we can take a move to utilities/ together with those improvements? > > Yes, I will use it to replace some code in Metaspace at least, but for that I need some more features (placing this thing atop of an existing memory range, and templatized index type). > > Can all be done in a follow-up RFE. > > > > Could we try another round of coming up with a better name for this utility? HomogenousObjectArray is eerily similar to G1's humongous object arrays. It's also not clear to me what makes this array an homogenous array. Is our other arrays non-homogenous? > > > > > > I agree on the name part. We could call it a `HomogenousAllocator`, that does differentiate it meaningfully from something like `Arena` which can allocate anything. @tstuefe , arenas are also an example of the storage/lifetime of the objects from an allocator being bound to the lifetime of that allocator, so I think this makes sense. > > Names are hard. > > We already have two types of arenas in hotspots, plus glibc has arenas, so "arena" is not ideal either. Arena does usually not imply a free list either, nor homogenous sizes. > > Thinking about it, Array already sort of implies homogeneous sizes, so maybe "homogeneous" is redundant. > > ArrayWithFreeList? Unsexy but precise. Agreed, let's go with `ArrayWithFreeList`. > > Hi, > > I've changed the data which is stored in `HOA` to be trivial and have added `static_assert`s to the methods. ... It's very important that the data we put into `HOA` is trivial as we do not call any destructors or copy constructors/operators on the code. > > We don't, but why forbid the user from doing that? I may want to place a non-trivial object in there, e.g. one where I forbid copying altogether, or one with a single non-default ctor? What would be the problem with that? > > Forbidding copying would be a good way to make sure the only instance exists at the place the array provides. So its address stable. We can't have what you suggest at the moment, as objects aren't address stable without a fixed and non-growing memory area. Any form of constructor is actually fine, as those are called as they are. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18979#issuecomment-2186396971 From jsjolen at openjdk.org Mon Jun 24 12:01:14 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Mon, 24 Jun 2024 12:01:14 GMT Subject: RFR: 8333658: NMT: Use an allocator with 4-byte pointers to save memory in NativeCallStackStorage [v29] In-Reply-To: References: <_UzjquVa1CRphN881l-geAXesq0zLI9GGsie512-bhs=.57d37312-7aca-4fc5-b59e-a2f121e77584@github.com> Message-ID: On Mon, 24 Jun 2024 11:48:41 GMT, Thomas Stuefe wrote: >> Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: >> >> Correct placement of include guards > > test/hotspot/gtest/nmt/test_homogenousObjectArray.cpp line 152: > >> 150: A::I i3 = alloc.allocate(0); >> 151: EXPECT_EQ(p1, &alloc.at(i3)); >> 152: } > > I think for what little the tests do, testing with different list types is overengineered. You could just scrap both LL and LL2, and just use a HOA directly. I'd like to push back on that. I believe that while the list tests aren't testing that much they do serve as a form of example of how to use the HOA. It's also useful to have this in the form of a test, as opposed to a comment, as breaking changes to the HOA will break the examples, forcing them to be updated. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18979#discussion_r1650909660 From jsjolen at openjdk.org Mon Jun 24 12:09:01 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Mon, 24 Jun 2024 12:09:01 GMT Subject: RFR: 8333658: NMT: Use an allocator with 4-byte pointers to save memory in NativeCallStackStorage [v30] In-Reply-To: References: Message-ID: > Hi, > > This PR introduces a new allocator, `IndexedFreelistAllocator`. It uses a `GrowableArray` in order to have 4-byte "pointers" to its elements and also works as a freelist as unused slots form a linked list. This allocator cannot shrink its used memory. I'm always open for better names. > > We then use this allocator in order to store the `NativeCallStackStorage` hash table. This saves `4 * 4099 = ~16KiB` of memory as each table element is now only 4 bytes. For each stored stack it also saves 4 bytes. The fact that the allocator cannot shrink is fine, we do not ever remove stacks from the storage. > > The main point of this is to introduce a pointer-generic experimential API, so I also implemented CHeap and Arena allocators. It's currently placed in NMT, but we might want to move it into utilities. It uses a bit of template-magic, but my IDE (Emacs+clangd) was actually able to figure things out when the types didn't line up correctly etc, so it's not an enemy to IDE help. > > It sounded expensive to have the GrowableArray continously realloc its underlying data array, so I did a basic test where we allocate 1 000 000 stacks and push them into NativeCallStackStorage backed by different allocators. This is available in the PR. > > The results are as follows on linux-x64-slowdebug: > > > Generate stacks... Done > Time taken with GrowableArray: 8341.240945 > Time taken with CHeap: 12189.031318 > Time taken with Arena: 8800.703092 > Time taken with GrowableArray again: 8295.508829 > > > And on linux-x64: > > > Time taken with GrowableArray: 8378.018135 > Time taken with CHeap: 12437.347868 > Time taken with Arena: 8758.064717 > Time taken with GrowableArray again: 8391.076291 > > > Obviously, this is a very basic benchmark, but it seems like we're faster than CHeap and Arena for this case. Johan Sj?len has updated the pull request incrementally with two additional commits since the last revision: - Fix - Rename and use a more advanced constexpr function ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18979/files - new: https://git.openjdk.org/jdk/pull/18979/files/4d631c11..478d3127 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18979&range=29 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18979&range=28-29 Stats: 210 lines in 4 files changed: 106 ins; 98 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/18979.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18979/head:pull/18979 PR: https://git.openjdk.org/jdk/pull/18979 From jsjolen at openjdk.org Mon Jun 24 12:23:39 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Mon, 24 Jun 2024 12:23:39 GMT Subject: RFR: 8333658: NMT: Use an allocator with 4-byte pointers to save memory in NativeCallStackStorage [v31] In-Reply-To: References: Message-ID: > Hi, > > This PR introduces a new allocator, `IndexedFreelistAllocator`. It uses a `GrowableArray` in order to have 4-byte "pointers" to its elements and also works as a freelist as unused slots form a linked list. This allocator cannot shrink its used memory. I'm always open for better names. > > We then use this allocator in order to store the `NativeCallStackStorage` hash table. This saves `4 * 4099 = ~16KiB` of memory as each table element is now only 4 bytes. For each stored stack it also saves 4 bytes. The fact that the allocator cannot shrink is fine, we do not ever remove stacks from the storage. > > The main point of this is to introduce a pointer-generic experimential API, so I also implemented CHeap and Arena allocators. It's currently placed in NMT, but we might want to move it into utilities. It uses a bit of template-magic, but my IDE (Emacs+clangd) was actually able to figure things out when the types didn't line up correctly etc, so it's not an enemy to IDE help. > > It sounded expensive to have the GrowableArray continously realloc its underlying data array, so I did a basic test where we allocate 1 000 000 stacks and push them into NativeCallStackStorage backed by different allocators. This is available in the PR. > > The results are as follows on linux-x64-slowdebug: > > > Generate stacks... Done > Time taken with GrowableArray: 8341.240945 > Time taken with CHeap: 12189.031318 > Time taken with Arena: 8800.703092 > Time taken with GrowableArray again: 8295.508829 > > > And on linux-x64: > > > Time taken with GrowableArray: 8378.018135 > Time taken with CHeap: 12437.347868 > Time taken with Arena: 8758.064717 > Time taken with GrowableArray again: 8391.076291 > > > Obviously, this is a very basic benchmark, but it seems like we're faster than CHeap and Arena for this case. Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: A bit of a mouthful, but will pay off in future I hope ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18979/files - new: https://git.openjdk.org/jdk/pull/18979/files/478d3127..474a6098 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18979&range=30 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18979&range=29-30 Stats: 5 lines in 1 file changed: 0 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/18979.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18979/head:pull/18979 PR: https://git.openjdk.org/jdk/pull/18979 From rehn at openjdk.org Mon Jun 24 13:01:42 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Mon, 24 Jun 2024 13:01:42 GMT Subject: RFR: 8332689: RISC-V: Use load instead of trampolines [v15] In-Reply-To: References: Message-ID: > Hi all, please consider! > > Today we do JAL to **dest** if **dest** is in reach (+/- 1 MB). > Using a very small application or running very short time we have fast patchable calls. > But any normal application running longer will increase the code size and code chrun/fragmentation. > So whatever or not you get hot fast calls rely on luck. > > To be patchable and get code cache reach we also emit a stub trampoline which we can point the JAL to. > This would be the common case for a patchable call. > > Code stream: > JAL > Stubs: > AUIPC > LD > JALR > > > > On some CPUs L1D and L1I can't contain the same cache line, which means the tramopline stub can bounce from L1I->L1D->L1I, which is expensive. > Even if you don't have that problem having a call to a jump is not the fastest way. > Loading the address avoids the pitsfalls of cmodx. > > This patch suggest to solve the problems with trampolines, we take small penalty in the naive case of JAL to **dest**, > and instead do by default: > > Code stream: > AUIPC > LD > JALR > Stubs: > > > An experimental option for turning trampolines back on exists. > > It should be possible to enhanced this with the WIP [Zjid](https://github.com/riscv/riscv-j-extension) by changing the JALR to JAL and nop out the auipc+ld (as the current proposal of Zjid forces the I-fetcher to fetch instruction in order (meaning we will avoid a lot issues which arm has)) when in reach and vice-versa. > > Numbers from VF2 (I have done them a few times, they are always overall in favor of this patch): > > fop (msec) 2239 | 2128 = 0.950424 > h2 (msec) 18660 | 16594 = 0.889282 > jython (msec) 22022 | 21925 = 0.995595 > luindex (msec) 2866 | 2842 = 0.991626 > lusearch (msec) 4108 | 4311 = 1.04942 > lusearch-fix (msec) 4406 | 4116 = 0.934181 > pmd (msec) 5976 | 5897 = 0.98678 > jython (msec) 22022 | 21925 = 0.995595 > Avg: 0.974112 > fop(xcomp) (msec) 2721 | 2714 = 0.997427 > h2(xcomp) (msec) 37719 | 38004 = 1.00756 > jython(xcomp) ... Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 23 commits: - Merge branch 'master' into 8332689 - Minor review comments - Merge branch 'master' into 8332689 - To be pushed - Merge branch 'master' into 8332689 - Review comments, removed dead code. - Merge branch 'master' into 8332689 - Merge branch 'master' into 8332689 - Merge branch 'master' into 8332689 - Merge branch 'master' into 8332689 - ... and 13 more: https://git.openjdk.org/jdk/compare/9d4a4bd2...ea013d08 ------------- Changes: https://git.openjdk.org/jdk/pull/19453/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19453&range=14 Stats: 874 lines in 16 files changed: 611 ins; 168 del; 95 mod Patch: https://git.openjdk.org/jdk/pull/19453.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19453/head:pull/19453 PR: https://git.openjdk.org/jdk/pull/19453 From stuefe at openjdk.org Mon Jun 24 13:07:15 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Mon, 24 Jun 2024 13:07:15 GMT Subject: RFR: 8334223: Make Arena MEMFLAGs immutable [v3] In-Reply-To: References: Message-ID: On Sat, 22 Jun 2024 16:42:21 GMT, Afshin Zafari wrote: >> Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: >> >> - feedback david >> - Merge branch 'master' into arena-constify-memflags >> - feedback johan >> - Merge branch 'master' into arena-constify-memflags >> - start > > Thanks, I could only found some missing Copyright updates: > - `compilerThread.cpp` > - `resourceArea.cpp` and its `hpp` > - `jni.cpp`, maybe for RedHat also? > - `handles.hpp` > - `thread.hpp` @afshin-zafari if you are happy, mind approving? Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/19693#issuecomment-2186536525 From mli at openjdk.org Mon Jun 24 13:29:42 2024 From: mli at openjdk.org (Hamlin Li) Date: Mon, 24 Jun 2024 13:29:42 GMT Subject: RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF [v7] In-Reply-To: <0cUurmXlMJ_B66Wy1umd2n4r9ve7_Q4WOU0ffMd8s5Y=.bbc93b65-382c-4139-aaec-cb835d94a06e@github.com> References: <0cUurmXlMJ_B66Wy1umd2n4r9ve7_Q4WOU0ffMd8s5Y=.bbc93b65-382c-4139-aaec-cb835d94a06e@github.com> Message-ID: <95MVYMyj601Y-E4QA4paQBFF9otmFrOPVxSMYYqwiBE=.c797bdc6-f90a-4785-b5a1-d26dd3024630@github.com> > Hi, > Can you help to review the patch? > This pr is based on previous work and discussion in [pr 16234](https://github.com/openjdk/jdk/pull/16234), [pr 18294](https://github.com/openjdk/jdk/pull/18294). > > Compared with previous prs, the major change in this pr is to integrate the source of sleef (for the steps, please check `src/jdk.incubator.vector/linux/native/libvectormath/README`), rather than depends on external sleef things (header or lib) at build or run time. > Besides of this change, also modify the previous changes accordingly, e.g. remove some uncessary files or changes especially in make dir of jdk. > > Besides of the code changes, one important task is to handle the legal process. > > Thanks! > > ## Performance > NOTE: > * `Src` means implementation in this pr, i.e. without depenency on external sleef. > * `Disabled` means disable intrinsics by `-XX:-UseVectorStubs` > * `system_sleef` means implementation in [previous pr 18294](https://github.com/openjdk/jdk/pull/18294), i.e. build and run jdk with depenency on external sleef. > > Basically, the perf data below shows that > * this implementation has better performance than previous version in [pr 18294](https://github.com/openjdk/jdk/pull/18294), > * and both sleef versions has much better performance compared with non-sleef version. > > |Benchmark |(size)|Src |Units|system_sleef|(system_sleef-Src)/Src|Diabled |(Disable-Src)/Src| > |------------------------------|------|---------|-----|------------|----------------------|---------|-----------------| > |3472:Double128Vector.ACOS |1024 |8546.842 |ns/op|8516.007 |-0.004 |16799.273|0.966 | > |3473:Double128Vector.ASIN |1024 |6864.656 |ns/op|6987.328 |0.018 |16602.442|1.419 | > |3474:Double128Vector.ATAN |1024 |11489.255|ns/op|12261.800 |0.067 |26329.320|1.292 | > |3475:Double128Vector.ATAN2 |1024 |16661.170|ns/op|17234.472 |0.034 |42084.100|1.526 | > |3476:Double128Vector.CBRT |1024 |18999.387|ns/op|20298.458 |0.068 |35998.688|0.895 | > |3477:Double128Vector.COS |1024 |14081.857|ns/op|14846.117 |0.054 |24420.692|0.734 | > |3478:Double128Vector.COSH |1024 |12202.306|ns/op|12237.772 |0.003 |21343.863|0.749 | > |3479:Double128Vector.EXP |1024 |4553.108 |ns/op|4777.638 |0.049 |20155.903|3.427 | > |3480:D... Hamlin Li has updated the pull request incrementally with two additional commits since the last revision: - sleef 3.6.1 for riscv - sleef 3.6.1 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18605/files - new: https://git.openjdk.org/jdk/pull/18605/files/36415c34..c279a3c2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18605&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18605&range=05-06 Stats: 3 lines in 3 files changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/18605.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18605/head:pull/18605 PR: https://git.openjdk.org/jdk/pull/18605 From stuefe at openjdk.org Mon Jun 24 14:04:13 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Mon, 24 Jun 2024 14:04:13 GMT Subject: RFR: 8333658: NMT: Use an allocator with 4-byte pointers to save memory in NativeCallStackStorage [v29] In-Reply-To: References: <_UzjquVa1CRphN881l-geAXesq0zLI9GGsie512-bhs=.57d37312-7aca-4fc5-b59e-a2f121e77584@github.com> Message-ID: On Mon, 24 Jun 2024 11:58:41 GMT, Johan Sj?len wrote: >> test/hotspot/gtest/nmt/test_homogenousObjectArray.cpp line 152: >> >>> 150: A::I i3 = alloc.allocate(0); >>> 151: EXPECT_EQ(p1, &alloc.at(i3)); >>> 152: } >> >> I think for what little the tests do, testing with different list types is overengineered. You could just scrap both LL and LL2, and just use a HOA directly. > > I'd like to push back on that. I believe that while the list tests aren't testing that much they do serve as a form of example of how to use the HOA. It's also useful to have this in the form of a test, as opposed to a comment, as breaking changes to the HOA will break the examples, forcing them to be updated. Hmm. I am not sold on the "example" benefit. The array is quite easy in itself. You could scratch the lists, since their implementations don't do anything to really test the AWFL (new acronym, yay). For the saved LOCs could expand the tests to test with a collection of various types, e.g. - u1, u2, u8 - unaligned structures that need alignment (e.g. struct (void*; int; )) - trivial objects Interesting to see would be that it works, that alignment is correct, that nothing breaks across resizes (indexes stay stable, etc.). I wont insist of it, just thinking that the current test complexity is somewhat wasted. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18979#discussion_r1651089209 From jsjolen at openjdk.org Mon Jun 24 14:12:49 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Mon, 24 Jun 2024 14:12:49 GMT Subject: RFR: 8333658: NMT: Use an allocator with 4-byte pointers to save memory in NativeCallStackStorage [v32] In-Reply-To: References: Message-ID: <6BBqynMUvv-_60E_ydHakvgaJLLeHSsEfPJvFRMg9cA=.f19b8187-3a20-4ab9-adaf-1d1ad0a4cbea@github.com> > Hi, > > This PR introduces a new allocator, `IndexedFreelistAllocator`. It uses a `GrowableArray` in order to have 4-byte "pointers" to its elements and also works as a freelist as unused slots form a linked list. This allocator cannot shrink its used memory. I'm always open for better names. > > We then use this allocator in order to store the `NativeCallStackStorage` hash table. This saves `4 * 4099 = ~16KiB` of memory as each table element is now only 4 bytes. For each stored stack it also saves 4 bytes. The fact that the allocator cannot shrink is fine, we do not ever remove stacks from the storage. > > The main point of this is to introduce a pointer-generic experimential API, so I also implemented CHeap and Arena allocators. It's currently placed in NMT, but we might want to move it into utilities. It uses a bit of template-magic, but my IDE (Emacs+clangd) was actually able to figure things out when the types didn't line up correctly etc, so it's not an enemy to IDE help. > > It sounded expensive to have the GrowableArray continously realloc its underlying data array, so I did a basic test where we allocate 1 000 000 stacks and push them into NativeCallStackStorage backed by different allocators. This is available in the PR. > > The results are as follows on linux-x64-slowdebug: > > > Generate stacks... Done > Time taken with GrowableArray: 8341.240945 > Time taken with CHeap: 12189.031318 > Time taken with Arena: 8800.703092 > Time taken with GrowableArray again: 8295.508829 > > > And on linux-x64: > > > Time taken with GrowableArray: 8378.018135 > Time taken with CHeap: 12437.347868 > Time taken with Arena: 8758.064717 > Time taken with GrowableArray again: 8391.076291 > > > Obviously, this is a very basic benchmark, but it seems like we're faster than CHeap and Arena for this case. Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: Rename tests ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18979/files - new: https://git.openjdk.org/jdk/pull/18979/files/474a6098..67a8a218 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18979&range=31 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18979&range=30-31 Stats: 4 lines in 1 file changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/18979.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18979/head:pull/18979 PR: https://git.openjdk.org/jdk/pull/18979 From gcao at openjdk.org Mon Jun 24 14:20:35 2024 From: gcao at openjdk.org (Gui Cao) Date: Mon, 24 Jun 2024 14:20:35 GMT Subject: RFR: 8334843: RISC-V: Fix wraparound checking for r_array_index in lookup_secondary_supers_table_slow_path Message-ID: Branch condition for r_array_index wraparound checking in lookup_secondary_supers_table_slow_path is wrong. // Check for wraparound. Label skip; bge(r_array_length, r_array_index, skip); mv(r_array_index, zr); bind(skip); As discussed at https://github.com/openjdk/jdk/pull/19320/files#r1650548279 . If length == index, then we must set index to 0. That is `blt(r_array_index,r_array_length,skip);`. ### Correctness testing: - [ ] Run tier1-3 tests on SOPHON SG2042 (release) ### JMH tested on SOPHON SG2042 (has not Zbb) without this patch: Benchmark Mode Cnt Score Error Units SecondarySupersLookup.testNegative00 avgt 15 20.649 ? 0.147 ns/op SecondarySupersLookup.testNegative01 avgt 15 20.649 ? 0.117 ns/op SecondarySupersLookup.testNegative02 avgt 15 20.637 ? 0.116 ns/op SecondarySupersLookup.testNegative03 avgt 15 20.638 ? 0.113 ns/op SecondarySupersLookup.testNegative04 avgt 15 20.638 ? 0.127 ns/op SecondarySupersLookup.testNegative05 avgt 15 20.639 ? 0.115 ns/op SecondarySupersLookup.testNegative06 avgt 15 20.638 ? 0.119 ns/op SecondarySupersLookup.testNegative07 avgt 15 20.850 ? 0.457 ns/op SecondarySupersLookup.testNegative08 avgt 15 20.842 ? 0.459 ns/op SecondarySupersLookup.testNegative09 avgt 15 20.650 ? 0.124 ns/op SecondarySupersLookup.testNegative10 avgt 15 20.642 ? 0.127 ns/op SecondarySupersLookup.testNegative16 avgt 15 20.657 ? 0.157 ns/op SecondarySupersLookup.testNegative20 avgt 15 20.669 ? 0.152 ns/op SecondarySupersLookup.testNegative30 avgt 15 20.668 ? 0.166 ns/op SecondarySupersLookup.testNegative32 avgt 15 20.669 ? 0.168 ns/op SecondarySupersLookup.testNegative40 avgt 15 20.668 ? 0.174 ns/op SecondarySupersLookup.testNegative50 avgt 15 20.682 ? 0.194 ns/op SecondarySupersLookup.testNegative55 avgt 15 113.369 ? 3.792 ns/op SecondarySupersLookup.testNegative56 avgt 15 113.888 ? 3.769 ns/op SecondarySupersLookup.testNegative57 avgt 15 115.320 ? 4.271 ns/op SecondarySupersLookup.testNegative58 avgt 15 115.648 ? 2.985 ns/op SecondarySupersLookup.testNegative59 avgt 15 117.730 ? 3.370 ns/op SecondarySupersLookup.testNegative60 avgt 15 142.533 ? 3.636 ns/op SecondarySupersLookup.testNegative61 avgt 15 144.901 ? 5.267 ns/op SecondarySupersLookup.testNegative62 avgt 15 145.926 ? 3.799 ns/op SecondarySupersLookup.testNegative63 avgt 15 207.704 ? 5.370 ns/op SecondarySupersLookup.testNegative64 avgt 15 210.631 ? 3.832 ns/op SecondarySupersLookup.testPositive01 avgt 15 20.334 ? 0.455 ns/op SecondarySupersLookup.testPositive02 avgt 15 20.126 ? 0.101 ns/op SecondarySupersLookup.testPositive03 avgt 15 20.126 ? 0.097 ns/op SecondarySupersLookup.testPositive04 avgt 15 20.124 ? 0.102 ns/op SecondarySupersLookup.testPositive05 avgt 15 20.119 ? 0.100 ns/op SecondarySupersLookup.testPositive06 avgt 15 20.126 ? 0.098 ns/op SecondarySupersLookup.testPositive07 avgt 15 20.321 ? 0.462 ns/op SecondarySupersLookup.testPositive08 avgt 15 20.117 ? 0.098 ns/op SecondarySupersLookup.testPositive09 avgt 15 20.534 ? 0.555 ns/op SecondarySupersLookup.testPositive10 avgt 15 20.120 ? 0.100 ns/op SecondarySupersLookup.testPositive16 avgt 15 20.125 ? 0.104 ns/op SecondarySupersLookup.testPositive20 avgt 15 20.125 ? 0.116 ns/op SecondarySupersLookup.testPositive30 avgt 15 20.132 ? 0.110 ns/op SecondarySupersLookup.testPositive32 avgt 15 20.328 ? 0.449 ns/op SecondarySupersLookup.testPositive40 avgt 15 20.132 ? 0.096 ns/op SecondarySupersLookup.testPositive50 avgt 15 20.331 ? 0.460 ns/op SecondarySupersLookup.testPositive60 avgt 15 20.134 ? 0.104 ns/op SecondarySupersLookup.testPositive63 avgt 15 20.128 ? 0.104 ns/op SecondarySupersLookup.testPositive64 avgt 15 20.334 ? 0.456 ns/op Finished running test 'micro:vm.lang.SecondarySupersLookup' with this patch: Benchmark Mode Cnt Score Error Units SecondarySupersLookup.testNegative00 avgt 15 20.644 ? 0.118 ns/op SecondarySupersLookup.testNegative01 avgt 15 20.639 ? 0.124 ns/op SecondarySupersLookup.testNegative02 avgt 15 20.645 ? 0.111 ns/op SecondarySupersLookup.testNegative03 avgt 15 20.647 ? 0.114 ns/op SecondarySupersLookup.testNegative04 avgt 15 20.641 ? 0.112 ns/op SecondarySupersLookup.testNegative05 avgt 15 22.702 ? 0.166 ns/op SecondarySupersLookup.testNegative06 avgt 15 20.654 ? 0.116 ns/op SecondarySupersLookup.testNegative07 avgt 15 20.663 ? 0.147 ns/op SecondarySupersLookup.testNegative08 avgt 15 20.650 ? 0.114 ns/op SecondarySupersLookup.testNegative09 avgt 15 20.666 ? 0.130 ns/op SecondarySupersLookup.testNegative10 avgt 15 20.665 ? 0.155 ns/op SecondarySupersLookup.testNegative16 avgt 15 20.670 ? 0.154 ns/op SecondarySupersLookup.testNegative20 avgt 15 20.674 ? 0.163 ns/op SecondarySupersLookup.testNegative30 avgt 15 20.683 ? 0.168 ns/op SecondarySupersLookup.testNegative32 avgt 15 20.681 ? 0.172 ns/op SecondarySupersLookup.testNegative40 avgt 15 20.683 ? 0.167 ns/op SecondarySupersLookup.testNegative50 avgt 15 20.691 ? 0.188 ns/op SecondarySupersLookup.testNegative55 avgt 15 112.106 ? 3.051 ns/op SecondarySupersLookup.testNegative56 avgt 15 112.728 ? 3.976 ns/op SecondarySupersLookup.testNegative57 avgt 15 114.488 ? 3.391 ns/op SecondarySupersLookup.testNegative58 avgt 15 116.445 ? 4.055 ns/op SecondarySupersLookup.testNegative59 avgt 15 116.419 ? 3.347 ns/op SecondarySupersLookup.testNegative60 avgt 15 144.107 ? 4.251 ns/op SecondarySupersLookup.testNegative61 avgt 15 145.079 ? 4.456 ns/op SecondarySupersLookup.testNegative62 avgt 15 146.440 ? 4.284 ns/op SecondarySupersLookup.testNegative63 avgt 15 209.836 ? 8.016 ns/op SecondarySupersLookup.testNegative64 avgt 15 209.803 ? 7.432 ns/op SecondarySupersLookup.testPositive01 avgt 15 20.146 ? 0.111 ns/op SecondarySupersLookup.testPositive02 avgt 15 20.136 ? 0.101 ns/op SecondarySupersLookup.testPositive03 avgt 15 20.133 ? 0.098 ns/op SecondarySupersLookup.testPositive04 avgt 15 20.148 ? 0.105 ns/op SecondarySupersLookup.testPositive05 avgt 15 20.634 ? 0.097 ns/op SecondarySupersLookup.testPositive06 avgt 15 20.135 ? 0.106 ns/op SecondarySupersLookup.testPositive07 avgt 15 20.139 ? 0.103 ns/op SecondarySupersLookup.testPositive08 avgt 15 20.133 ? 0.098 ns/op SecondarySupersLookup.testPositive09 avgt 15 20.340 ? 0.456 ns/op SecondarySupersLookup.testPositive10 avgt 15 20.135 ? 0.104 ns/op SecondarySupersLookup.testPositive16 avgt 15 20.127 ? 0.094 ns/op SecondarySupersLookup.testPositive20 avgt 15 20.131 ? 0.103 ns/op SecondarySupersLookup.testPositive30 avgt 15 20.142 ? 0.102 ns/op SecondarySupersLookup.testPositive32 avgt 15 20.135 ? 0.095 ns/op SecondarySupersLookup.testPositive40 avgt 15 20.128 ? 0.094 ns/op SecondarySupersLookup.testPositive50 avgt 15 20.135 ? 0.097 ns/op SecondarySupersLookup.testPositive60 avgt 15 20.130 ? 0.094 ns/op SecondarySupersLookup.testPositive63 avgt 15 20.139 ? 0.101 ns/op SecondarySupersLookup.testPositive64 avgt 15 20.130 ? 0.100 ns/op Finished running test 'micro:vm.lang.SecondarySupersLookup' ------------- Commit messages: - 8334843: RISC-V: Fix wraparound checking for r_array_index in lookup_secondary_supers_table_slow_path Changes: https://git.openjdk.org/jdk/pull/19852/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19852&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8334843 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/19852.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19852/head:pull/19852 PR: https://git.openjdk.org/jdk/pull/19852 From fyang at openjdk.org Mon Jun 24 14:20:35 2024 From: fyang at openjdk.org (Fei Yang) Date: Mon, 24 Jun 2024 14:20:35 GMT Subject: RFR: 8334843: RISC-V: Fix wraparound checking for r_array_index in lookup_secondary_supers_table_slow_path In-Reply-To: References: Message-ID: On Mon, 24 Jun 2024 09:25:31 GMT, Gui Cao wrote: > Branch condition for r_array_index wraparound checking in lookup_secondary_supers_table_slow_path is wrong. > > // Check for wraparound. > Label skip; > bge(r_array_length, r_array_index, skip); > mv(r_array_index, zr); > bind(skip); > > As discussed at https://github.com/openjdk/jdk/pull/19320/files#r1650548279 . If length == index, then we must set index to 0. That is `blt(r_array_index,r_array_length,skip);`. > > ### Correctness testing: > - [ ] Run tier1-3 tests on SOPHON SG2042 (release) > > ### JMH tested on SOPHON SG2042 (has not Zbb) > without this patch: > > Benchmark Mode Cnt Score Error Units > SecondarySupersLookup.testNegative00 avgt 15 20.649 ? 0.147 ns/op > SecondarySupersLookup.testNegative01 avgt 15 20.649 ? 0.117 ns/op > SecondarySupersLookup.testNegative02 avgt 15 20.637 ? 0.116 ns/op > SecondarySupersLookup.testNegative03 avgt 15 20.638 ? 0.113 ns/op > SecondarySupersLookup.testNegative04 avgt 15 20.638 ? 0.127 ns/op > SecondarySupersLookup.testNegative05 avgt 15 20.639 ? 0.115 ns/op > SecondarySupersLookup.testNegative06 avgt 15 20.638 ? 0.119 ns/op > SecondarySupersLookup.testNegative07 avgt 15 20.850 ? 0.457 ns/op > SecondarySupersLookup.testNegative08 avgt 15 20.842 ? 0.459 ns/op > SecondarySupersLookup.testNegative09 avgt 15 20.650 ? 0.124 ns/op > SecondarySupersLookup.testNegative10 avgt 15 20.642 ? 0.127 ns/op > SecondarySupersLookup.testNegative16 avgt 15 20.657 ? 0.157 ns/op > SecondarySupersLookup.testNegative20 avgt 15 20.669 ? 0.152 ns/op > SecondarySupersLookup.testNegative30 avgt 15 20.668 ? 0.166 ns/op > SecondarySupersLookup.testNegative32 avgt 15 20.669 ? 0.168 ns/op > SecondarySupersLookup.testNegative40 avgt 15 20.668 ? 0.174 ns/op > SecondarySupersLookup.testNegative50 avgt 15 20.682 ? 0.194 ns/op > SecondarySupersLookup.testNegative55 avgt 15 113.369 ? 3.792 ns/op > SecondarySupersLookup.testNegative56 avgt 15 113.888 ? 3.769 ns/op > SecondarySupersLookup.testNegative57 avgt 15 115.320 ? 4.271 ns/op > SecondarySupersLookup.testNegative58 avgt 15 115.648 ? 2.985 ns/op > SecondarySupersLookup.testNegative59 avgt 15 117.730 ? 3.370 ns/op > SecondarySupersLookup.testNegative60 avgt 15 142.533 ? 3.636 ns/op > SecondarySupersLookup.testNegative61 avgt 15 144.901 ? 5.267 ns/op > SecondarySupersLookup.testNegative62 avgt 15 145.926 ? 3.799 ns/op > SecondarySupersLookup.testNegative63 avgt 15 207.704 ? 5.... Marked as reviewed by fyang (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/19852#pullrequestreview-2135945512 From gcao at openjdk.org Mon Jun 24 14:20:35 2024 From: gcao at openjdk.org (Gui Cao) Date: Mon, 24 Jun 2024 14:20:35 GMT Subject: RFR: 8334843: RISC-V: Fix wraparound checking for r_array_index in lookup_secondary_supers_table_slow_path In-Reply-To: References: Message-ID: On Mon, 24 Jun 2024 09:25:31 GMT, Gui Cao wrote: > Branch condition for r_array_index wraparound checking in lookup_secondary_supers_table_slow_path is wrong. > > // Check for wraparound. > Label skip; > bge(r_array_length, r_array_index, skip); > mv(r_array_index, zr); > bind(skip); > > As discussed at https://github.com/openjdk/jdk/pull/19320/files#r1650548279 . If length == index, then we must set index to 0. That is `blt(r_array_index,r_array_length,skip);`. > > ### Correctness testing: > - [ ] Run tier1-3 tests on SOPHON SG2042 (release) > > ### JMH tested on SOPHON SG2042 (has not Zbb) > without this patch: > > Benchmark Mode Cnt Score Error Units > SecondarySupersLookup.testNegative00 avgt 15 20.649 ? 0.147 ns/op > SecondarySupersLookup.testNegative01 avgt 15 20.649 ? 0.117 ns/op > SecondarySupersLookup.testNegative02 avgt 15 20.637 ? 0.116 ns/op > SecondarySupersLookup.testNegative03 avgt 15 20.638 ? 0.113 ns/op > SecondarySupersLookup.testNegative04 avgt 15 20.638 ? 0.127 ns/op > SecondarySupersLookup.testNegative05 avgt 15 20.639 ? 0.115 ns/op > SecondarySupersLookup.testNegative06 avgt 15 20.638 ? 0.119 ns/op > SecondarySupersLookup.testNegative07 avgt 15 20.850 ? 0.457 ns/op > SecondarySupersLookup.testNegative08 avgt 15 20.842 ? 0.459 ns/op > SecondarySupersLookup.testNegative09 avgt 15 20.650 ? 0.124 ns/op > SecondarySupersLookup.testNegative10 avgt 15 20.642 ? 0.127 ns/op > SecondarySupersLookup.testNegative16 avgt 15 20.657 ? 0.157 ns/op > SecondarySupersLookup.testNegative20 avgt 15 20.669 ? 0.152 ns/op > SecondarySupersLookup.testNegative30 avgt 15 20.668 ? 0.166 ns/op > SecondarySupersLookup.testNegative32 avgt 15 20.669 ? 0.168 ns/op > SecondarySupersLookup.testNegative40 avgt 15 20.668 ? 0.174 ns/op > SecondarySupersLookup.testNegative50 avgt 15 20.682 ? 0.194 ns/op > SecondarySupersLookup.testNegative55 avgt 15 113.369 ? 3.792 ns/op > SecondarySupersLookup.testNegative56 avgt 15 113.888 ? 3.769 ns/op > SecondarySupersLookup.testNegative57 avgt 15 115.320 ? 4.271 ns/op > SecondarySupersLookup.testNegative58 avgt 15 115.648 ? 2.985 ns/op > SecondarySupersLookup.testNegative59 avgt 15 117.730 ? 3.370 ns/op > SecondarySupersLookup.testNegative60 avgt 15 142.533 ? 3.636 ns/op > SecondarySupersLookup.testNegative61 avgt 15 144.901 ? 5.267 ns/op > SecondarySupersLookup.testNegative62 avgt 15 145.926 ? 3.799 ns/op > SecondarySupersLookup.testNegative63 avgt 15 207.704 ? 5.... @theRealAph Could you please take a look? ------------- PR Comment: https://git.openjdk.org/jdk/pull/19852#issuecomment-2186684595 From duke at openjdk.org Mon Jun 24 14:51:20 2024 From: duke at openjdk.org (Ferenc Rakoczi) Date: Mon, 24 Jun 2024 14:51:20 GMT Subject: RFR: 8333583: Crypto-XDH.generateSecret regression after JDK-8329538 [v3] In-Reply-To: <_0jtDLz3WT2dPvhlE3oi8s3pRETfC38Uvng1wwu1y3w=.406d44cf-7821-4e2c-be26-3194016ab89d@github.com> References: <_0jtDLz3WT2dPvhlE3oi8s3pRETfC38Uvng1wwu1y3w=.406d44cf-7821-4e2c-be26-3194016ab89d@github.com> Message-ID: On Thu, 20 Jun 2024 18:32:14 GMT, Volodymyr Paprotski wrote: > @ferakocz just tagging you as reminder of (the many) items in your queue :) Thanks! Sorry, I was out of office last week. I will take a deeper look at the changes tomorrow, but I have a question based on my first look at it: Do you attribute the performance loss of the XDH code path to the mult() function returning an int instead of being void? Do you think that this prevented some optimization in the hotspot compiler? ------------- PR Comment: https://git.openjdk.org/jdk/pull/19728#issuecomment-2186762776 From rehn at openjdk.org Mon Jun 24 14:54:36 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Mon, 24 Jun 2024 14:54:36 GMT Subject: RFR: 8332689: RISC-V: Use load instead of trampolines [v16] In-Reply-To: References: Message-ID: > Hi all, please consider! > > Today we do JAL to **dest** if **dest** is in reach (+/- 1 MB). > Using a very small application or running very short time we have fast patchable calls. > But any normal application running longer will increase the code size and code chrun/fragmentation. > So whatever or not you get hot fast calls rely on luck. > > To be patchable and get code cache reach we also emit a stub trampoline which we can point the JAL to. > This would be the common case for a patchable call. > > Code stream: > JAL > Stubs: > AUIPC > LD > JALR > > > > On some CPUs L1D and L1I can't contain the same cache line, which means the tramopline stub can bounce from L1I->L1D->L1I, which is expensive. > Even if you don't have that problem having a call to a jump is not the fastest way. > Loading the address avoids the pitsfalls of cmodx. > > This patch suggest to solve the problems with trampolines, we take small penalty in the naive case of JAL to **dest**, > and instead do by default: > > Code stream: > AUIPC > LD > JALR > Stubs: > > > An experimental option for turning trampolines back on exists. > > It should be possible to enhanced this with the WIP [Zjid](https://github.com/riscv/riscv-j-extension) by changing the JALR to JAL and nop out the auipc+ld (as the current proposal of Zjid forces the I-fetcher to fetch instruction in order (meaning we will avoid a lot issues which arm has)) when in reach and vice-versa. > > Numbers from VF2 (I have done them a few times, they are always overall in favor of this patch): > > fop (msec) 2239 | 2128 = 0.950424 > h2 (msec) 18660 | 16594 = 0.889282 > jython (msec) 22022 | 21925 = 0.995595 > luindex (msec) 2866 | 2842 = 0.991626 > lusearch (msec) 4108 | 4311 = 1.04942 > lusearch-fix (msec) 4406 | 4116 = 0.934181 > pmd (msec) 5976 | 5897 = 0.98678 > jython (msec) 22022 | 21925 = 0.995595 > Avg: 0.974112 > fop(xcomp) (msec) 2721 | 2714 = 0.997427 > h2(xcomp) (msec) 37719 | 38004 = 1.00756 > jython(xcomp) ... Robbin Ehn has updated the pull request incrementally with one additional commit since the last revision: Missed in merge-fixes, minor revert ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19453/files - new: https://git.openjdk.org/jdk/pull/19453/files/ea013d08..77e5d855 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19453&range=15 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19453&range=14-15 Stats: 12 lines in 3 files changed: 0 ins; 8 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/19453.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19453/head:pull/19453 PR: https://git.openjdk.org/jdk/pull/19453 From rehn at openjdk.org Mon Jun 24 14:54:36 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Mon, 24 Jun 2024 14:54:36 GMT Subject: RFR: 8332689: RISC-V: Use load instead of trampolines [v12] In-Reply-To: References: Message-ID: On Fri, 21 Jun 2024 03:26:34 GMT, Fei Yang wrote: >> As you see in diff the mixing is pre-exsisting, I only changed names. >> >> Fixed. > > Ah, I see. I think you are right in using `MacroAssembler::max_patchable_far_call_stub_size()` at places where we call `MacroAssembler::max_trampoline_stub_size()` previously. Could you please revert this part? I think I miss-read the code before. Sorry. Fixed ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19453#discussion_r1651177730 From rehn at openjdk.org Mon Jun 24 15:17:13 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Mon, 24 Jun 2024 15:17:13 GMT Subject: RFR: 8332689: RISC-V: Use load instead of trampolines [v14] In-Reply-To: <6EvPetzLpHyHVD5tFoYg19hx9wbAkw1Pi3LoZFSp9yY=.a7dd6cef-532c-4a42-a09a-4a81c04e09a7@github.com> References: <6EvPetzLpHyHVD5tFoYg19hx9wbAkw1Pi3LoZFSp9yY=.a7dd6cef-532c-4a42-a09a-4a81c04e09a7@github.com> Message-ID: On Fri, 21 Jun 2024 07:35:38 GMT, Fei Yang wrote: >> Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 22 commits: >> >> - Minor review comments >> - Merge branch 'master' into 8332689 >> - To be pushed >> - Merge branch 'master' into 8332689 >> - Review comments, removed dead code. >> - Merge branch 'master' into 8332689 >> - Merge branch 'master' into 8332689 >> - Merge branch 'master' into 8332689 >> - Merge branch 'master' into 8332689 >> - Merge branch 'master' into 8332689 >> - ... and 12 more: https://git.openjdk.org/jdk/compare/d7dad50a...e47f2454 > > src/hotspot/cpu/riscv/riscv.ad line 1244: > >> 1242: return 1 * NativeInstruction::instruction_size; // jal >> 1243: } >> 1244: return 3 * NativeInstruction::instruction_size; // auipc + ld + jalr > > Question: As we will only patch the address in the stub, do we still need the handling in compute_padding (`CallStaticJavaDirectNode::compute_padding` & `CallDynamicJavaDirectNode::compute_padding`) when `UseTrampolines` is false? No, not that I know of. But we need additional fixes, some asserts needs to tweaked and PostCallNop expects aligned calls (as we want to patch nop -> trap not crossing pages) so we instead need padding after. I think we can either have: `c.nop, auipc, ld, jalr, nop (cmodx to trap), nop` `auipc, ld, jalr, c.nop, nop (cmodx to trap), nop` It seems to me keeping what we have is just simplest. What you reckon? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19453#discussion_r1651211204 From duke at openjdk.org Mon Jun 24 15:31:12 2024 From: duke at openjdk.org (Volodymyr Paprotski) Date: Mon, 24 Jun 2024 15:31:12 GMT Subject: RFR: 8333583: Crypto-XDH.generateSecret regression after JDK-8329538 [v3] In-Reply-To: References: <_0jtDLz3WT2dPvhlE3oi8s3pRETfC38Uvng1wwu1y3w=.406d44cf-7821-4e2c-be26-3194016ab89d@github.com> Message-ID: On Mon, 24 Jun 2024 14:48:43 GMT, Ferenc Rakoczi wrote: >> @ferakocz just tagging you as reminder of (the many) items in your queue :) >> Thanks! > >> @ferakocz just tagging you as reminder of (the many) items in your queue :) Thanks! > > Sorry, I was out of office last week. I will take a deeper look at the changes tomorrow, but I have a question based on my first look at it: Do you attribute the performance loss of the XDH code path to the mult() function returning an int instead of being void? Do you think that this prevented some optimization in the hotspot compiler? @ferakocz, now I was out on long weekend... > Do you attribute the performance loss of the XDH code path to the mult() function returning an int instead of being void? Do you think that this prevented some optimization in the hotspot compiler? That's exactly it. I 'proved experimentally' that that's the case. Though I haven't identified which exact sequence of optimizations is missing deterministically from compilation logs. That's beyond me yet. Identifying which optimization(s) is missing might be great for long term, but figured since we are closing down commits for this release, I should put something in soonest. This PR essentially 'reverts' the part of my ECC PR to original code. Which in turn should be easiest to review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19728#issuecomment-2186847183 From mli at openjdk.org Mon Jun 24 15:37:43 2024 From: mli at openjdk.org (Hamlin Li) Date: Mon, 24 Jun 2024 15:37:43 GMT Subject: RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF [v8] In-Reply-To: <0cUurmXlMJ_B66Wy1umd2n4r9ve7_Q4WOU0ffMd8s5Y=.bbc93b65-382c-4139-aaec-cb835d94a06e@github.com> References: <0cUurmXlMJ_B66Wy1umd2n4r9ve7_Q4WOU0ffMd8s5Y=.bbc93b65-382c-4139-aaec-cb835d94a06e@github.com> Message-ID: > Hi, > Can you help to review the patch? > This pr is based on previous work and discussion in [pr 16234](https://github.com/openjdk/jdk/pull/16234), [pr 18294](https://github.com/openjdk/jdk/pull/18294). > > Compared with previous prs, the major change in this pr is to integrate the source of sleef (for the steps, please check `src/jdk.incubator.vector/linux/native/libvectormath/README`), rather than depends on external sleef things (header or lib) at build or run time. > Besides of this change, also modify the previous changes accordingly, e.g. remove some uncessary files or changes especially in make dir of jdk. > > Besides of the code changes, one important task is to handle the legal process. > > Thanks! > > ## Performance > NOTE: > * `Src` means implementation in this pr, i.e. without depenency on external sleef. > * `Disabled` means disable intrinsics by `-XX:-UseVectorStubs` > * `system_sleef` means implementation in [previous pr 18294](https://github.com/openjdk/jdk/pull/18294), i.e. build and run jdk with depenency on external sleef. > > Basically, the perf data below shows that > * this implementation has better performance than previous version in [pr 18294](https://github.com/openjdk/jdk/pull/18294), > * and both sleef versions has much better performance compared with non-sleef version. > > |Benchmark |(size)|Src |Units|system_sleef|(system_sleef-Src)/Src|Diabled |(Disable-Src)/Src| > |------------------------------|------|---------|-----|------------|----------------------|---------|-----------------| > |3472:Double128Vector.ACOS |1024 |8546.842 |ns/op|8516.007 |-0.004 |16799.273|0.966 | > |3473:Double128Vector.ASIN |1024 |6864.656 |ns/op|6987.328 |0.018 |16602.442|1.419 | > |3474:Double128Vector.ATAN |1024 |11489.255|ns/op|12261.800 |0.067 |26329.320|1.292 | > |3475:Double128Vector.ATAN2 |1024 |16661.170|ns/op|17234.472 |0.034 |42084.100|1.526 | > |3476:Double128Vector.CBRT |1024 |18999.387|ns/op|20298.458 |0.068 |35998.688|0.895 | > |3477:Double128Vector.COS |1024 |14081.857|ns/op|14846.117 |0.054 |24420.692|0.734 | > |3478:Double128Vector.COSH |1024 |12202.306|ns/op|12237.772 |0.003 |21343.863|0.749 | > |3479:Double128Vector.EXP |1024 |4553.108 |ns/op|4777.638 |0.049 |20155.903|3.427 | > |3480:D... Hamlin Li has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 32 commits: - merge master - sleef 3.6.1 for riscv - sleef 3.6.1 - update header files for arm - add inline header file for riscv64 - remove notes about sleef changes - fix performance issue - disable unused-function warnings; add log msg - minor - minor - ... and 22 more: https://git.openjdk.org/jdk/compare/ed149062...fe4be2c6 ------------- Changes: https://git.openjdk.org/jdk/pull/18605/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=18605&range=07 Stats: 21668 lines in 21 files changed: 21624 ins; 1 del; 43 mod Patch: https://git.openjdk.org/jdk/pull/18605.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18605/head:pull/18605 PR: https://git.openjdk.org/jdk/pull/18605 From azafari at openjdk.org Mon Jun 24 16:17:11 2024 From: azafari at openjdk.org (Afshin Zafari) Date: Mon, 24 Jun 2024 16:17:11 GMT Subject: RFR: 8334223: Make Arena MEMFLAGs immutable [v4] In-Reply-To: References: Message-ID: On Sun, 23 Jun 2024 06:09:37 GMT, Thomas Stuefe wrote: >> Arenas carry NMT flags. >> >> An arena should never change that flag. But it does: Arenas (as ResourceAreas), used by CompilerThread, are accounted toward mtCompiler. But since the RA is already created in the parent class constructor (as mtThread), we then have to awkwardly change the flag of an already existing RA in the CompilerThread constructor. >> >> As a prerequisite for future NMT work I would like Arena MEMFLAGS to be immutable. >> >> The patch does that: >> - we hand in MEMFLAGS to the Thread constructor now (defaults to mtThread) >> - CompilerThread hands in mtCompiler, all other threads rely on the default >> - on creation, both ResourceArea and HandleArea are now accounted toward the flag handed in >> - that allows us to make Arena::flags const, and to remove ResourceArea::bias_to which changed the flag in-flight for the arena >> - it also allows us to make Arena::flags private >> >> Other, unrelated cleanups: >> - Made Arena::_size_in_bytes and Arena::_tag private >> - Merged both Arena constructors into one by specifying a default value of `Chunk::init_size` for `init_size` argument. That makes it equivalent to the old `Arena(flag, tag)` constructor >> - removed `JavaThread::JavaThread(bool)`. That constructor was used when creating threads that are getting attached. There was only a single use for that constructor, and I replaced it with functionally equivalent code. >> >> Tests: >> >> I manually verified that the NMT numbers printed don't change. > > Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: > > - fix copyrights > - merge > - feedback david > - Merge branch 'master' into arena-constify-memflags > - feedback johan > - Merge branch 'master' into arena-constify-memflags > - start Usually when there are other pending reviews, I hold approving a PR. ------------- Marked as reviewed by azafari (Committer). PR Review: https://git.openjdk.org/jdk/pull/19693#pullrequestreview-2136239119 From mdoerr at openjdk.org Mon Jun 24 18:41:12 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 24 Jun 2024 18:41:12 GMT Subject: RFR: 8331126: [s390x] secondary_super_cache does not scale well [v4] In-Reply-To: References: Message-ID: On Mon, 17 Jun 2024 10:19:46 GMT, Amit Kumar wrote: >> s390x Port for [JDK-8180450](https://bugs.openjdk.org/browse/JDK-8180450) >> >> I ran `tier1` test with `-XX:+UseSecondarySupersTable -XX:+VerifySecondarySupers -XX:+StressSecondarySupers` in fastdebug vm and didn't see any new failure appearing; Except one I have mentioned [here](https://github.com/openjdk/jdk/pull/19368#issuecomment-2154983693); But this is reproducible on every other architecture with these flags. >> >> >> Without Patch: >> >> SecondarySuperCacheHits.test avgt 15 0.929 ? 0.010 ns/op >> >> SecondarySuperCacheInterContention.test avgt 15 1.413 ? 0.007 ns/op >> SecondarySuperCacheInterContention.test:t1 avgt 15 1.415 ? 0.016 ns/op >> SecondarySuperCacheInterContention.test:t2 avgt 15 1.410 ? 0.017 ns/op >> >> Benchmark Mode Cnt Score Error Units >> SecondarySupersLookup.testNegative00 avgt 15 1.806 ? 0.325 ns/op >> SecondarySupersLookup.testNegative01 avgt 15 2.364 ? 0.236 ns/op >> SecondarySupersLookup.testNegative02 avgt 15 2.903 ? 0.215 ns/op >> SecondarySupersLookup.testNegative03 avgt 15 3.417 ? 0.199 ns/op >> SecondarySupersLookup.testNegative04 avgt 15 3.758 ? 0.102 ns/op >> SecondarySupersLookup.testNegative05 avgt 15 4.352 ? 0.123 ns/op >> SecondarySupersLookup.testNegative06 avgt 15 4.800 ? 0.099 ns/op >> SecondarySupersLookup.testNegative07 avgt 15 5.365 ? 0.060 ns/op >> SecondarySupersLookup.testNegative08 avgt 15 6.316 ? 0.092 ns/op >> SecondarySupersLookup.testNegative09 avgt 15 6.669 ? 0.164 ns/op >> SecondarySupersLookup.testNegative10 avgt 15 7.041 ? 0.164 ns/op >> SecondarySupersLookup.testNegative16 avgt 15 9.336 ? 0.185 ns/op >> SecondarySupersLookup.testNegative20 avgt 15 11.373 ? 0.029 ns/op >> SecondarySupersLookup.testNegative30 avgt 15 15.236 ? 0.051 ns/op >> SecondarySupersLookup.testNegative32 avgt 15 16.031 ? 0.091 ns/op >> SecondarySupersLookup.testNegative40 avgt 15 19.197 ? 0.279 ns/op >> SecondarySupersLookup.testNegative50 avgt 15 23.804 ? 2.387 ns/op >> SecondarySupersLookup.testNegative55 avgt 15 25.610 ? 1.155 ns/op >> SecondarySupersLookup.testNegative56 avgt 15 26.128 ? 2.203 ns/op >> SecondarySupersLookup.testNegative57 avgt 15 26.126 ? 0.881 ns/op >> SecondarySupersLookup.testNegative58 avgt 15 26.314 ? 0.521 ns/op >> SecondarySupersLookup.testNegative59 avgt 15 26.750 ? 0.837 ns/op >> SecondarySupersLookup.testNegative60 avgt 15 27.118 ? 0.557 ... > > Amit Kumar has updated the pull request incrementally with two additional commits since the last revision: > > - Update src/hotspot/cpu/s390/macroAssembler_s390.cpp > - rename: r_scratch to r_result in repne_scan method src/hotspot/cpu/s390/macroAssembler_s390.cpp line 3173: > 3171: z_cg(r_value, Address(r_addr)); > 3172: z_bre(L_exit); // branch on success > 3173: add2reg(r_addr, wordSize); Can add2reg change CC? I'd probably rather use la / lay for the addition without changing CC. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19544#discussion_r1651467130 From mdoerr at openjdk.org Mon Jun 24 18:47:12 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 24 Jun 2024 18:47:12 GMT Subject: RFR: 8331126: [s390x] secondary_super_cache does not scale well [v4] In-Reply-To: References: Message-ID: On Mon, 17 Jun 2024 10:19:46 GMT, Amit Kumar wrote: >> s390x Port for [JDK-8180450](https://bugs.openjdk.org/browse/JDK-8180450) >> >> I ran `tier1` test with `-XX:+UseSecondarySupersTable -XX:+VerifySecondarySupers -XX:+StressSecondarySupers` in fastdebug vm and didn't see any new failure appearing; Except one I have mentioned [here](https://github.com/openjdk/jdk/pull/19368#issuecomment-2154983693); But this is reproducible on every other architecture with these flags. >> >> >> Without Patch: >> >> SecondarySuperCacheHits.test avgt 15 0.929 ? 0.010 ns/op >> >> SecondarySuperCacheInterContention.test avgt 15 1.413 ? 0.007 ns/op >> SecondarySuperCacheInterContention.test:t1 avgt 15 1.415 ? 0.016 ns/op >> SecondarySuperCacheInterContention.test:t2 avgt 15 1.410 ? 0.017 ns/op >> >> Benchmark Mode Cnt Score Error Units >> SecondarySupersLookup.testNegative00 avgt 15 1.806 ? 0.325 ns/op >> SecondarySupersLookup.testNegative01 avgt 15 2.364 ? 0.236 ns/op >> SecondarySupersLookup.testNegative02 avgt 15 2.903 ? 0.215 ns/op >> SecondarySupersLookup.testNegative03 avgt 15 3.417 ? 0.199 ns/op >> SecondarySupersLookup.testNegative04 avgt 15 3.758 ? 0.102 ns/op >> SecondarySupersLookup.testNegative05 avgt 15 4.352 ? 0.123 ns/op >> SecondarySupersLookup.testNegative06 avgt 15 4.800 ? 0.099 ns/op >> SecondarySupersLookup.testNegative07 avgt 15 5.365 ? 0.060 ns/op >> SecondarySupersLookup.testNegative08 avgt 15 6.316 ? 0.092 ns/op >> SecondarySupersLookup.testNegative09 avgt 15 6.669 ? 0.164 ns/op >> SecondarySupersLookup.testNegative10 avgt 15 7.041 ? 0.164 ns/op >> SecondarySupersLookup.testNegative16 avgt 15 9.336 ? 0.185 ns/op >> SecondarySupersLookup.testNegative20 avgt 15 11.373 ? 0.029 ns/op >> SecondarySupersLookup.testNegative30 avgt 15 15.236 ? 0.051 ns/op >> SecondarySupersLookup.testNegative32 avgt 15 16.031 ? 0.091 ns/op >> SecondarySupersLookup.testNegative40 avgt 15 19.197 ? 0.279 ns/op >> SecondarySupersLookup.testNegative50 avgt 15 23.804 ? 2.387 ns/op >> SecondarySupersLookup.testNegative55 avgt 15 25.610 ? 1.155 ns/op >> SecondarySupersLookup.testNegative56 avgt 15 26.128 ? 2.203 ns/op >> SecondarySupersLookup.testNegative57 avgt 15 26.126 ? 0.881 ns/op >> SecondarySupersLookup.testNegative58 avgt 15 26.314 ? 0.521 ns/op >> SecondarySupersLookup.testNegative59 avgt 15 26.750 ? 0.837 ns/op >> SecondarySupersLookup.testNegative60 avgt 15 27.118 ? 0.557 ... > > Amit Kumar has updated the pull request incrementally with two additional commits since the last revision: > > - Update src/hotspot/cpu/s390/macroAssembler_s390.cpp > - rename: r_scratch to r_result in repne_scan method src/hotspot/cpu/s390/macroAssembler_s390.cpp line 3171: > 3169: > 3170: bind(L_loop); > 3171: z_cg(r_value, Address(r_addr)); Idea: I think it is possible to use z_xg to do both at once: Set to 0 iff equal and CC to equal. (Like PPC64.) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19544#discussion_r1651472649 From gziemski at openjdk.org Mon Jun 24 21:06:10 2024 From: gziemski at openjdk.org (Gerard Ziemski) Date: Mon, 24 Jun 2024 21:06:10 GMT Subject: RFR: 8333994: NMT: call stacks should show source information [v5] In-Reply-To: References: Message-ID: On Sat, 22 Jun 2024 10:37:27 GMT, Thomas Stuefe wrote: >> Patch adds printing of source file and line number to NMT call stacks that are printed in detail mode. Useful for hunting down leaks. > > Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: > > cleanups LGTM thanks. ------------- Marked as reviewed by gziemski (Committer). PR Review: https://git.openjdk.org/jdk/pull/19655#pullrequestreview-2136741864 From gziemski at openjdk.org Mon Jun 24 21:15:13 2024 From: gziemski at openjdk.org (Gerard Ziemski) Date: Mon, 24 Jun 2024 21:15:13 GMT Subject: RFR: 8334223: Make Arena MEMFLAGs immutable [v4] In-Reply-To: References: Message-ID: On Sun, 23 Jun 2024 06:09:37 GMT, Thomas Stuefe wrote: >> Arenas carry NMT flags. >> >> An arena should never change that flag. But it does: Arenas (as ResourceAreas), used by CompilerThread, are accounted toward mtCompiler. But since the RA is already created in the parent class constructor (as mtThread), we then have to awkwardly change the flag of an already existing RA in the CompilerThread constructor. >> >> As a prerequisite for future NMT work I would like Arena MEMFLAGS to be immutable. >> >> The patch does that: >> - we hand in MEMFLAGS to the Thread constructor now (defaults to mtThread) >> - CompilerThread hands in mtCompiler, all other threads rely on the default >> - on creation, both ResourceArea and HandleArea are now accounted toward the flag handed in >> - that allows us to make Arena::flags const, and to remove ResourceArea::bias_to which changed the flag in-flight for the arena >> - it also allows us to make Arena::flags private >> >> Other, unrelated cleanups: >> - Made Arena::_size_in_bytes and Arena::_tag private >> - Merged both Arena constructors into one by specifying a default value of `Chunk::init_size` for `init_size` argument. That makes it equivalent to the old `Arena(flag, tag)` constructor >> - removed `JavaThread::JavaThread(bool)`. That constructor was used when creating threads that are getting attached. There was only a single use for that constructor, and I replaced it with functionally equivalent code. >> >> Tests: >> >> I manually verified that the NMT numbers printed don't change. > > Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: > > - fix copyrights > - merge > - feedback david > - Merge branch 'master' into arena-constify-memflags > - feedback johan > - Merge branch 'master' into arena-constify-memflags > - start LGTM, thanks. ------------- Marked as reviewed by gziemski (Committer). PR Review: https://git.openjdk.org/jdk/pull/19693#pullrequestreview-2136768146 From iklam at openjdk.org Mon Jun 24 21:24:34 2024 From: iklam at openjdk.org (Ioi Lam) Date: Mon, 24 Jun 2024 21:24:34 GMT Subject: RFR: 8309634: Resolve CONSTANT_MethodRef at CDS dump time Message-ID: Resolve `CONSTANT_MethodRef` entries during CDS dump time to improve start-up performance. - This PR uses the same framework introduced in #19355 and just added handling for methods. - More filtering is needed when building the default archive in the JDK: constant pool resolution when running the `build.tools.classlist.HelloClasslist` program is not deterministic (due to concurrency in core library classes). This could cause different values in the `@cp` lines in the classlist file. The benefit of pre-resolved constant pool entries is more visible for custom archives and not so much for the default archive in the JDK, so we disable this optimization for the default CDS archive, until we can find a way to make it deterministic. ------------- Commit messages: - Fixed whitespaces - 8309634: Resolve CONSTANT_MethodRef at CDS dump time Changes: https://git.openjdk.org/jdk/pull/19866/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19866&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8309634 Stats: 355 lines in 14 files changed: 311 ins; 8 del; 36 mod Patch: https://git.openjdk.org/jdk/pull/19866.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19866/head:pull/19866 PR: https://git.openjdk.org/jdk/pull/19866 From iklam at openjdk.org Mon Jun 24 21:41:08 2024 From: iklam at openjdk.org (Ioi Lam) Date: Mon, 24 Jun 2024 21:41:08 GMT Subject: RFR: 8309634: Resolve CONSTANT_MethodRef at CDS dump time In-Reply-To: References: Message-ID: On Mon, 24 Jun 2024 17:21:18 GMT, Ioi Lam wrote: > Resolve `CONSTANT_MethodRef` entries during CDS dump time to improve start-up performance. > > - This PR uses the same framework introduced in #19355 and just added handling for methods. > - Support for getstatic/putstatic/invokestatic will be done separately in [JDK-8334898](https://bugs.openjdk.org/browse/JDK-8334898) Note: the makefile changes is now in a separate PR (#19868) which will be integrated before this PR. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19866#issuecomment-2187453536 From kvn at openjdk.org Mon Jun 24 23:44:33 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 24 Jun 2024 23:44:33 GMT Subject: RFR: 8334779: Test compiler/c1/CanonicalizeArrayLength.java is timing out Message-ID: The timeout is cause by running the test with `-Xcomp -XX:+VerifyOops -XX:+PatchALot`. The test continuously deoptimize and recompile `java.lang.Throwable::` method. `-XX:+VerifyOops ` adds a lot of external addresses because it use ExternallAddress for error messages. These messages are unique for each call to `verify_oop()` because they are constructed locally. I reduced number of loop iteration by 10 to get reasonable execution time (2 mins instead of 20 mins) and got next data: Without VerifyOops: External addresses table: 38 entries With VerifyOops: External addresses table: 125922 entries Looks like most of the time VM is spending to grow/reallocate big growable array added by [JDK-8333819](https://bugs.openjdk.org/browse/JDK-8333819). Before that change these addresses were recorded locally in nmethod's relocation info. When nmethod was deoptimized, its relocation info was discarded together with nmethod. Only on x86 we declared message address as ExternalAddress. Aarch64 uses movptr() to store address as simple pointer. ARM uses own InlineAddress with RelocInfo::none type of relocation. I verified other platforms: none is using ExternalAddress. I suggest to use AddressLiteral with RelocInfo::none for x86. With that the global table is small even with -XX:+VerifyOops: External addresses table: 42 entries Tested tier1, run test with corresponding flags to verify that time is similar to before [JDK-8333819](https://bugs.openjdk.org/browse/JDK-8333819) ------------- Commit messages: - 8334779: Test compiler/c1/CanonicalizeArrayLength.java is timing out Changes: https://git.openjdk.org/jdk/pull/19871/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19871&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8334779 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/19871.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19871/head:pull/19871 PR: https://git.openjdk.org/jdk/pull/19871 From dlong at openjdk.org Tue Jun 25 03:23:10 2024 From: dlong at openjdk.org (Dean Long) Date: Tue, 25 Jun 2024 03:23:10 GMT Subject: RFR: 8334779: Test compiler/c1/CanonicalizeArrayLength.java is timing out In-Reply-To: References: Message-ID: On Mon, 24 Jun 2024 23:40:22 GMT, Vladimir Kozlov wrote: > The timeout is cause by running the test with `-Xcomp -XX:+VerifyOops -XX:+PatchALot`. > The test continuously deoptimize and recompile `java.lang.Throwable::` method. > `-XX:+VerifyOops ` adds a lot of external addresses because it use ExternallAddress for error messages. > These messages are unique for each call to `verify_oop()` because they are constructed locally. > I reduced number of loop iteration by 10 to get reasonable execution time (2 mins instead of 20 mins) and got next data: > > Without VerifyOops: > External addresses table: 38 entries > > With VerifyOops: > External addresses table: 125922 entries > > Looks like most of the time VM is spending to grow/reallocate big growable array added by [JDK-8333819](https://bugs.openjdk.org/browse/JDK-8333819). > Before that change these addresses were recorded locally in nmethod's relocation info. When nmethod was deoptimized, its relocation info was discarded together with nmethod. > > Only on x86 we declared message address as ExternalAddress. Aarch64 uses movptr() to store address as simple pointer. ARM uses own InlineAddress with RelocInfo::none type of relocation. I verified other platforms: none is using ExternalAddress. > > I suggest to use AddressLiteral with RelocInfo::none for x86. With that the global table is small even with -XX:+VerifyOops: > > External addresses table: 42 entries > > > Tested tier1, run test with corresponding flags to verify that time is similar to before [JDK-8333819](https://bugs.openjdk.org/browse/JDK-8333819) Where are these strings actually located? Do they need to be relocated if the CodeBuffer expands? Can we assert that uses of ExternalAddress do not point inside the CodeBuffer/CodeBlob? You might catch additional cases. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19871#issuecomment-2187893564 From amitkumar at openjdk.org Tue Jun 25 03:29:12 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Tue, 25 Jun 2024 03:29:12 GMT Subject: RFR: 8331126: [s390x] secondary_super_cache does not scale well [v4] In-Reply-To: References: Message-ID: On Mon, 24 Jun 2024 18:39:02 GMT, Martin Doerr wrote: >> Amit Kumar has updated the pull request incrementally with two additional commits since the last revision: >> >> - Update src/hotspot/cpu/s390/macroAssembler_s390.cpp >> - rename: r_scratch to r_result in repne_scan method > > src/hotspot/cpu/s390/macroAssembler_s390.cpp line 3173: > >> 3171: z_cg(r_value, Address(r_addr)); >> 3172: z_bre(L_exit); // branch on success >> 3173: add2reg(r_addr, wordSize); > > Can add2reg change CC? I'd probably rather use z_la for the addition without changing CC. I just noticed that `PreferLAoverADD` is by default set to false. I'm looking for a reason for that, maybe we can set it to true. Otherwise I'll update it to z_la. CC: @RealLucy ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19544#discussion_r1651937491 From cjplummer at openjdk.org Tue Jun 25 03:41:10 2024 From: cjplummer at openjdk.org (Chris Plummer) Date: Tue, 25 Jun 2024 03:41:10 GMT Subject: RFR: 8333542: Breakpoint in parallel code does not work In-Reply-To: References: Message-ID: On Mon, 17 Jun 2024 17:58:14 GMT, Coleen Phillimore wrote: > Revert the change for JDK-8288064 "Class initialization locking". JVMTI class prepare event relies on a lock being held through setting the state of the class to 'linked' and the JVMTI event posting. The only usable lock is the Java object init_lock, which was removed. This change restores the lock and fixes all the conflicts in code that's changed since. > > Tested with tier1-7. The changes pass all my jdi, jdwp, jdb, and jvmti testing, including running most of the test suites 10 times or more. It also passes the test case in the CR. ------------- PR Review: https://git.openjdk.org/jdk/pull/19755#pullrequestreview-2137260290 From stuefe at openjdk.org Tue Jun 25 05:09:16 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 25 Jun 2024 05:09:16 GMT Subject: RFR: 8334223: Make Arena MEMFLAGs immutable [v4] In-Reply-To: References: Message-ID: On Sun, 23 Jun 2024 06:09:37 GMT, Thomas Stuefe wrote: >> Arenas carry NMT flags. >> >> An arena should never change that flag. But it does: Arenas (as ResourceAreas), used by CompilerThread, are accounted toward mtCompiler. But since the RA is already created in the parent class constructor (as mtThread), we then have to awkwardly change the flag of an already existing RA in the CompilerThread constructor. >> >> As a prerequisite for future NMT work I would like Arena MEMFLAGS to be immutable. >> >> The patch does that: >> - we hand in MEMFLAGS to the Thread constructor now (defaults to mtThread) >> - CompilerThread hands in mtCompiler, all other threads rely on the default >> - on creation, both ResourceArea and HandleArea are now accounted toward the flag handed in >> - that allows us to make Arena::flags const, and to remove ResourceArea::bias_to which changed the flag in-flight for the arena >> - it also allows us to make Arena::flags private >> >> Other, unrelated cleanups: >> - Made Arena::_size_in_bytes and Arena::_tag private >> - Merged both Arena constructors into one by specifying a default value of `Chunk::init_size` for `init_size` argument. That makes it equivalent to the old `Arena(flag, tag)` constructor >> - removed `JavaThread::JavaThread(bool)`. That constructor was used when creating threads that are getting attached. There was only a single use for that constructor, and I replaced it with functionally equivalent code. >> >> Tests: >> >> I manually verified that the NMT numbers printed don't change. > > Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: > > - fix copyrights > - merge > - feedback david > - Merge branch 'master' into arena-constify-memflags > - feedback johan > - Merge branch 'master' into arena-constify-memflags > - start Thanks all! ------------- PR Comment: https://git.openjdk.org/jdk/pull/19693#issuecomment-2187986765 From stuefe at openjdk.org Tue Jun 25 05:09:16 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 25 Jun 2024 05:09:16 GMT Subject: Integrated: 8334223: Make Arena MEMFLAGs immutable In-Reply-To: References: Message-ID: <7ZogVzDyiUuqMVsE_g6zfBEFSAyjvCdsB9v86FIOjTo=.b39c4c6a-792d-4660-8cd0-1a628e0df9d1@github.com> On Thu, 13 Jun 2024 11:59:05 GMT, Thomas Stuefe wrote: > Arenas carry NMT flags. > > An arena should never change that flag. But it does: Arenas (as ResourceAreas), used by CompilerThread, are accounted toward mtCompiler. But since the RA is already created in the parent class constructor (as mtThread), we then have to awkwardly change the flag of an already existing RA in the CompilerThread constructor. > > As a prerequisite for future NMT work I would like Arena MEMFLAGS to be immutable. > > The patch does that: > - we hand in MEMFLAGS to the Thread constructor now (defaults to mtThread) > - CompilerThread hands in mtCompiler, all other threads rely on the default > - on creation, both ResourceArea and HandleArea are now accounted toward the flag handed in > - that allows us to make Arena::flags const, and to remove ResourceArea::bias_to which changed the flag in-flight for the arena > - it also allows us to make Arena::flags private > > Other, unrelated cleanups: > - Made Arena::_size_in_bytes and Arena::_tag private > - Merged both Arena constructors into one by specifying a default value of `Chunk::init_size` for `init_size` argument. That makes it equivalent to the old `Arena(flag, tag)` constructor > - removed `JavaThread::JavaThread(bool)`. That constructor was used when creating threads that are getting attached. There was only a single use for that constructor, and I replaced it with functionally equivalent code. > > Tests: > > I manually verified that the NMT numbers printed don't change. This pull request has now been integrated. Changeset: 974dca80 Author: Thomas Stuefe URL: https://git.openjdk.org/jdk/commit/974dca80df71c5cbe492d1e8ca5cee76bcc79358 Stats: 77 lines in 12 files changed: 15 ins; 36 del; 26 mod 8334223: Make Arena MEMFLAGs immutable Reviewed-by: jsjolen, azafari, gziemski ------------- PR: https://git.openjdk.org/jdk/pull/19693 From stuefe at openjdk.org Tue Jun 25 05:13:12 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 25 Jun 2024 05:13:12 GMT Subject: RFR: 8333994: NMT: call stacks should show source information [v5] In-Reply-To: References: Message-ID: On Sun, 23 Jun 2024 08:55:22 GMT, Johan Sj?len wrote: >> Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: >> >> cleanups > > src/hotspot/share/nmt/nativeCallStackPrinter.cpp line 40: > >> 38: for (int i = 0; i < NMT_TrackingStackDepth; i++) { >> 39: const address pc = stack->get_frame(i); >> 40: if (pc != nullptr) { > > Style suggestion: Invert condition and `continue` instead, to reduce indentation of remaining code. Just occurred to me that null designates the end of the stack. I'll just break out then. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19655#discussion_r1651995435 From dholmes at openjdk.org Tue Jun 25 07:08:13 2024 From: dholmes at openjdk.org (David Holmes) Date: Tue, 25 Jun 2024 07:08:13 GMT Subject: RFR: 8334738: os::print_hex_dump should optionally print ASCII In-Reply-To: References: Message-ID: On Fri, 21 Jun 2024 16:17:43 GMT, Thomas Stuefe wrote: > I got tired of casting constness away from to-be-printed memory range just to be able to feed an address to os::print_hex_dump. I don't follow - you didn't change/add any callsites so why would you need to cast away constness to "feed" the existing call ?? More comments below. I like the idea in general. src/hotspot/share/runtime/os.cpp line 944: > 942: } > 943: > 944: ATTRIBUTE_NO_ASAN static bool read_safely_from(const uint8_t* p, uintptr_t* result) { I don't understand the change to `uint8_t` - that is just an unsigned char, so casting to `intptr_t` we could have a misaligned value. ?? src/hotspot/share/runtime/os.cpp line 961: > 959: union { > 960: uint64_t v; > 961: uint8_t c[sizeof(v)]; Why `uint8_t` instead of `unsigned char`? src/hotspot/share/runtime/os.cpp line 966: > 964: const int idx = LITTLE_ENDIAN_ONLY(i) BIG_ENDIAN_ONLY(sizeof(u.v) - 1 - i); > 965: const uint8_t c = u.c[idx]; > 966: ascii_form.put(isprint(c) && isascii(c) ? c : '_'); Isn't it customary to print a `?` if the char is unprintable? src/hotspot/share/runtime/os.hpp line 860: > 858: static void print_hex_dump(outputStream* st, const uint8_t* start, const uint8_t* end, int unitsize, bool print_ascii, > 859: int bytes_per_line, const uint8_t* logical_start); > 860: static void print_hex_dump(outputStream* st, const uint8_t* start, const uint8_t* end, int unitsize, bool print_ascii = true) { If you have made this a default parameter why did you change so many call-sites to pass `true`? Can I suggest when you do pass true/false you annotate it as below e.g. `true /* print_ascii */` ------------- PR Review: https://git.openjdk.org/jdk/pull/19835#pullrequestreview-2137506667 PR Review Comment: https://git.openjdk.org/jdk/pull/19835#discussion_r1652090760 PR Review Comment: https://git.openjdk.org/jdk/pull/19835#discussion_r1652092990 PR Review Comment: https://git.openjdk.org/jdk/pull/19835#discussion_r1652093561 PR Review Comment: https://git.openjdk.org/jdk/pull/19835#discussion_r1652100163 From stefank at openjdk.org Tue Jun 25 07:57:37 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Tue, 25 Jun 2024 07:57:37 GMT Subject: [jdk23] RFR: 8324781: runtime/Thread/TestAlwaysPreTouchStacks.java failed with Expected a higher ratio between stack committed and reserved Message-ID: Hi all, This pull request contains a backport of commit [31e8deba](https://github.com/openjdk/jdk/commit/31e8debae63e008da79e403bcb870a7be631af2c) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. The commit being backported was authored by Liming Liu on 17 Jun 2024 and was reviewed by Stefan Karlsson, Johan Sj?len and Thomas Stuefe. Thanks! ------------- Commit messages: - Backport 31e8debae63e008da79e403bcb870a7be631af2c Changes: https://git.openjdk.org/jdk/pull/19877/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19877&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8324781 Stats: 13 lines in 3 files changed: 4 ins; 3 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/19877.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19877/head:pull/19877 PR: https://git.openjdk.org/jdk/pull/19877 From fyang at openjdk.org Tue Jun 25 07:58:18 2024 From: fyang at openjdk.org (Fei Yang) Date: Tue, 25 Jun 2024 07:58:18 GMT Subject: RFR: 8332689: RISC-V: Use load instead of trampolines [v14] In-Reply-To: References: <6EvPetzLpHyHVD5tFoYg19hx9wbAkw1Pi3LoZFSp9yY=.a7dd6cef-532c-4a42-a09a-4a81c04e09a7@github.com> Message-ID: <16PGG6zZKUSrSXS3s4I20nelQmJrYVWxaaB2IKKuJqA=.521e66ac-90c3-4e2e-92ff-500ff6429b57@github.com> On Mon, 24 Jun 2024 15:15:02 GMT, Robbin Ehn wrote: >> src/hotspot/cpu/riscv/riscv.ad line 1244: >> >>> 1242: return 1 * NativeInstruction::instruction_size; // jal >>> 1243: } >>> 1244: return 3 * NativeInstruction::instruction_size; // auipc + ld + jalr >> >> Question: As we will only patch the address in the stub, do we still need the handling in compute_padding (`CallStaticJavaDirectNode::compute_padding` & `CallDynamicJavaDirectNode::compute_padding`) when `UseTrampolines` is false? > > No, not that I know of. But we need additional fixes, some asserts needs to tweaked and PostCallNop expects aligned calls (as we want to patch nop -> trap not crossing pages) so we instead need padding after. > > I think we can either have: > `c.nop, auipc, ld, jalr, nop (cmodx to trap), nop` > `auipc, ld, jalr, c.nop, nop (cmodx to trap), nop` > > It seems to me keeping what we have is just simplest. > > What you reckon? Ah, I didn't notice the PostCallNop at the end. I agree it will be safer to keep the current shape. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19453#discussion_r1652184652 From fyang at openjdk.org Tue Jun 25 07:58:17 2024 From: fyang at openjdk.org (Fei Yang) Date: Tue, 25 Jun 2024 07:58:17 GMT Subject: RFR: 8332689: RISC-V: Use load instead of trampolines [v16] In-Reply-To: References: Message-ID: On Mon, 24 Jun 2024 14:54:36 GMT, Robbin Ehn wrote: >> Hi all, please consider! >> >> Today we do JAL to **dest** if **dest** is in reach (+/- 1 MB). >> Using a very small application or running very short time we have fast patchable calls. >> But any normal application running longer will increase the code size and code chrun/fragmentation. >> So whatever or not you get hot fast calls rely on luck. >> >> To be patchable and get code cache reach we also emit a stub trampoline which we can point the JAL to. >> This would be the common case for a patchable call. >> >> Code stream: >> JAL >> Stubs: >> AUIPC >> LD >> JALR >> >> >> >> On some CPUs L1D and L1I can't contain the same cache line, which means the tramopline stub can bounce from L1I->L1D->L1I, which is expensive. >> Even if you don't have that problem having a call to a jump is not the fastest way. >> Loading the address avoids the pitsfalls of cmodx. >> >> This patch suggest to solve the problems with trampolines, we take small penalty in the naive case of JAL to **dest**, >> and instead do by default: >> >> Code stream: >> AUIPC >> LD >> JALR >> Stubs: >> >> >> An experimental option for turning trampolines back on exists. >> >> It should be possible to enhanced this with the WIP [Zjid](https://github.com/riscv/riscv-j-extension) by changing the JALR to JAL and nop out the auipc+ld (as the current proposal of Zjid forces the I-fetcher to fetch instruction in order (meaning we will avoid a lot issues which arm has)) when in reach and vice-versa. >> >> Numbers from VF2 (I have done them a few times, they are always overall in favor of this patch): >> >> fop (msec) 2239 | 2128 = 0.950424 >> h2 (msec) 18660 | 16594 = 0.889282 >> jython (msec) 22022 | 21925 = 0.995595 >> luindex (msec) 2866 | 2842 = 0.991626 >> lusearch (msec) 4108 | 4311 = 1.04942 >> lusearch-fix (msec) 4406 | 4116 = 0.934181 >> pmd (msec) 5976 | 5897 = 0.98678 >> jython (msec) 22022 | 21925 = 0.995595 >> Avg: 0.974112 >> fop(xcomp) (msec) 2721 | 2714 = 0.997427 >> h2(xcomp) ... > > Robbin Ehn has updated the pull request incrementally with one additional commit since the last revision: > > Missed in merge-fixes, minor revert src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 3718: > 3716: // is secondary_supers[r_array_index]. Bits 0 and 1 in the bitmap > 3717: // have been checked. > 3718: rt_call(StubRoutines::lookup_secondary_supers_table_slow_path_stub()); Why not make use of the `stub_is_near` param and do a simpler `jump_link` when the slow path stub is near? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19453#discussion_r1651883277 From stuefe at openjdk.org Tue Jun 25 08:27:12 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 25 Jun 2024 08:27:12 GMT Subject: RFR: 8334738: os::print_hex_dump should optionally print ASCII In-Reply-To: References: Message-ID: On Tue, 25 Jun 2024 07:05:47 GMT, David Holmes wrote: > > I got tired of casting constness away from to-be-printed memory range just to be able to feed an address to os::print_hex_dump. > > I don't follow - you didn't change/add any callsites so why would you need to cast away constness to "feed" the existing call ?? > Mostly in tests. I rewrote them several times, one version was fed from const memory. If this aspect turns out to be contentious, I'll revert it. I still think this function should accept const pointers as input. I chose uint8_t since it seemed to me the closest const version of address. address, however, is signed char, but I did not want to pass const char* since that looks like a string. I can use unsigned char*, sure. Or, maybe just const void*. Both are fine to me. > More comments below. > > I like the idea in general. > src/hotspot/share/runtime/os.cpp line 966: > >> 964: const int idx = LITTLE_ENDIAN_ONLY(i) BIG_ENDIAN_ONLY(sizeof(u.v) - 1 - i); >> 965: const uint8_t c = u.c[idx]; >> 966: ascii_form.put(isprint(c) && isascii(c) ? c : '_'); > > Isn't it customary to print a `?` if the char is unprintable? It depends; I also know hex dumps with dot (.) and blanks. ? It looks a bit noisy, visually. I think _ or . or blank make the real ascii portions stick out better. I can live with anything, but would prefer dot or underscore. Dot seems to be the most common, thinking about it. > If you have made this a default parameter why did you change so many call-sites to pass `true`? > Good catch. I'll revert the call site changes and leave the default parameter. > Can I suggest when you do pass true/false you annotate it as below e.g. `true /* print_ascii */` Sure. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19835#issuecomment-2188273005 PR Review Comment: https://git.openjdk.org/jdk/pull/19835#discussion_r1652234333 PR Review Comment: https://git.openjdk.org/jdk/pull/19835#discussion_r1652238732 From rehn at openjdk.org Tue Jun 25 08:48:18 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Tue, 25 Jun 2024 08:48:18 GMT Subject: RFR: 8332689: RISC-V: Use load instead of trampolines [v16] In-Reply-To: References: Message-ID: On Tue, 25 Jun 2024 02:05:42 GMT, Fei Yang wrote: >> Robbin Ehn has updated the pull request incrementally with one additional commit since the last revision: >> >> Missed in merge-fixes, minor revert > > src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 3718: > >> 3716: // is secondary_supers[r_array_index]. Bits 0 and 1 in the bitmap >> 3717: // have been checked. >> 3718: rt_call(StubRoutines::lookup_secondary_supers_table_slow_path_stub()); > > Why not make use of the `stub_is_near` param and do a simpler `jump_link` when the slow path stub is near? As there was no users of jump_link, before this merge, except trampoline call, it always emit JAL as that is the only encoding we need. Hence we only have one place where we use JAL for calls, trampoline call. If we think we should use JAL for very short calls, manully adding it to one place is not the right approach IMHO. Instead we should have rt_call emit JAL for runtime addresses, i.e. fixed address in libjvm.so or fixed address in code cache if reachable. For exampel, I think this site, in gen_continuation_enter(): `__ rt_call(CAST_FROM_FN_PTR(address, StubRoutines::cont_thaw()));` Could be benficial to use JAL in. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19453#discussion_r1652279798 From lucy at openjdk.org Tue Jun 25 09:42:12 2024 From: lucy at openjdk.org (Lutz Schmidt) Date: Tue, 25 Jun 2024 09:42:12 GMT Subject: RFR: 8331126: [s390x] secondary_super_cache does not scale well [v4] In-Reply-To: References: Message-ID: On Mon, 17 Jun 2024 10:19:46 GMT, Amit Kumar wrote: >> s390x Port for [JDK-8180450](https://bugs.openjdk.org/browse/JDK-8180450) >> >> I ran `tier1` test with `-XX:+UseSecondarySupersTable -XX:+VerifySecondarySupers -XX:+StressSecondarySupers` in fastdebug vm and didn't see any new failure appearing; Except one I have mentioned [here](https://github.com/openjdk/jdk/pull/19368#issuecomment-2154983693); But this is reproducible on every other architecture with these flags. >> >> >> Without Patch: >> >> SecondarySuperCacheHits.test avgt 15 0.929 ? 0.010 ns/op >> >> SecondarySuperCacheInterContention.test avgt 15 1.413 ? 0.007 ns/op >> SecondarySuperCacheInterContention.test:t1 avgt 15 1.415 ? 0.016 ns/op >> SecondarySuperCacheInterContention.test:t2 avgt 15 1.410 ? 0.017 ns/op >> >> Benchmark Mode Cnt Score Error Units >> SecondarySupersLookup.testNegative00 avgt 15 1.806 ? 0.325 ns/op >> SecondarySupersLookup.testNegative01 avgt 15 2.364 ? 0.236 ns/op >> SecondarySupersLookup.testNegative02 avgt 15 2.903 ? 0.215 ns/op >> SecondarySupersLookup.testNegative03 avgt 15 3.417 ? 0.199 ns/op >> SecondarySupersLookup.testNegative04 avgt 15 3.758 ? 0.102 ns/op >> SecondarySupersLookup.testNegative05 avgt 15 4.352 ? 0.123 ns/op >> SecondarySupersLookup.testNegative06 avgt 15 4.800 ? 0.099 ns/op >> SecondarySupersLookup.testNegative07 avgt 15 5.365 ? 0.060 ns/op >> SecondarySupersLookup.testNegative08 avgt 15 6.316 ? 0.092 ns/op >> SecondarySupersLookup.testNegative09 avgt 15 6.669 ? 0.164 ns/op >> SecondarySupersLookup.testNegative10 avgt 15 7.041 ? 0.164 ns/op >> SecondarySupersLookup.testNegative16 avgt 15 9.336 ? 0.185 ns/op >> SecondarySupersLookup.testNegative20 avgt 15 11.373 ? 0.029 ns/op >> SecondarySupersLookup.testNegative30 avgt 15 15.236 ? 0.051 ns/op >> SecondarySupersLookup.testNegative32 avgt 15 16.031 ? 0.091 ns/op >> SecondarySupersLookup.testNegative40 avgt 15 19.197 ? 0.279 ns/op >> SecondarySupersLookup.testNegative50 avgt 15 23.804 ? 2.387 ns/op >> SecondarySupersLookup.testNegative55 avgt 15 25.610 ? 1.155 ns/op >> SecondarySupersLookup.testNegative56 avgt 15 26.128 ? 2.203 ns/op >> SecondarySupersLookup.testNegative57 avgt 15 26.126 ? 0.881 ns/op >> SecondarySupersLookup.testNegative58 avgt 15 26.314 ? 0.521 ns/op >> SecondarySupersLookup.testNegative59 avgt 15 26.750 ? 0.837 ns/op >> SecondarySupersLookup.testNegative60 avgt 15 27.118 ? 0.557 ... > > Amit Kumar has updated the pull request incrementally with two additional commits since the last revision: > > - Update src/hotspot/cpu/s390/macroAssembler_s390.cpp > - rename: r_scratch to r_result in repne_scan method Sorry, I was distracted yesterday and did not finish my review. src/hotspot/cpu/s390/macroAssembler_s390.cpp line 3231: > 3229: // We test the MSB of r_array_index, i.e., its sign bit > 3230: testbit(r_array_index, 63); > 3231: // TODO: load immediate on condition could be used here; How would that help? You have to branch on condition anyway. src/hotspot/cpu/s390/macroAssembler_s390.cpp line 3259: > 3257: // Is there another entry to check? Consult the bitmap. > 3258: testbit(r_bitmap, (bit + 1) & Klass::SECONDARY_SUPERS_TABLE_MASK); > 3259: // TODO: load immediate on condition could be use here; How would that help? You have to branch on condition anyway. src/hotspot/cpu/s390/macroAssembler_s390.cpp line 3264: > 3262: // Linear probe. Rotate the bitmap so that the next bit to test is > 3263: // in Bit 2 for the look-ahead check in the slow path. > 3264: if (bit) { I don't like that. It's C style. Please use `(bit != 0)`. src/hotspot/cpu/s390/macroAssembler_s390.cpp line 3275: > 3273: call_stub(StubRoutines::lookup_secondary_supers_table_slow_path_stub()); > 3274: > 3275: z_bru(L_done); // pass whatever result we got from a slow path This one branch could be saved by using "load immediate on condition". But it's after slow path processing. src/hotspot/cpu/s390/macroAssembler_s390.cpp line 3325: > 3323: assert(Klass::SECONDARY_SUPERS_BITMAP_FULL == ~uintx(0), ""); > 3324: > 3325: z_cghi(r_bitmap, (long)-1); Why not compare against `(unitx)Klass::SECONDARY_SUPERS_BITMAP_FULL`? ------------- Changes requested by lucy (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19544#pullrequestreview-2135877797 PR Review Comment: https://git.openjdk.org/jdk/pull/19544#discussion_r1651084089 PR Review Comment: https://git.openjdk.org/jdk/pull/19544#discussion_r1651084454 PR Review Comment: https://git.openjdk.org/jdk/pull/19544#discussion_r1651077231 PR Review Comment: https://git.openjdk.org/jdk/pull/19544#discussion_r1651094177 PR Review Comment: https://git.openjdk.org/jdk/pull/19544#discussion_r1651113235 From jsjolen at openjdk.org Tue Jun 25 09:50:14 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Tue, 25 Jun 2024 09:50:14 GMT Subject: RFR: 8333658: NMT: Use an allocator with 4-byte pointers to save memory in NativeCallStackStorage [v29] In-Reply-To: References: <_UzjquVa1CRphN881l-geAXesq0zLI9GGsie512-bhs=.57d37312-7aca-4fc5-b59e-a2f121e77584@github.com> Message-ID: On Mon, 24 Jun 2024 14:01:09 GMT, Thomas Stuefe wrote: >really test the AWFL (new acronym, yay). Can't wait for templatized indices so I can write ```c++ using AWFUL = ArrayWithFreeListAllocator; >Interesting to see would be that it works, that alignment is correct, that nothing breaks across resizes (indexes stay stable, etc.). >I wont insist of it, just thinking that the current test complexity is somewhat wasted. I think those tests make sense if we replace the `GrowableArray`as the backing memory area with something else, otherwise that's just testing the GA. OK, how about we scrap the tests for when moving this to utilities and having the index being a part of the template? I've got that in the pipeline already, so it's not going to take too long I hope. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18979#discussion_r1652386337 From lucy at openjdk.org Tue Jun 25 10:06:14 2024 From: lucy at openjdk.org (Lutz Schmidt) Date: Tue, 25 Jun 2024 10:06:14 GMT Subject: RFR: 8319947: Recursive lightweight locking: s390x implementation [v6] In-Reply-To: References: <_HXRejW4TcLfTYbXlRQUVyejaomiwx6HhObYfG1lX6E=.f89b29a0-4c2d-43e1-83be-d7bafd84816c@github.com> Message-ID: On Sun, 16 Jun 2024 09:49:42 GMT, Amit Kumar wrote: >> s390x port for recursive locking. >> >> testing: >> - [x] build fastdebug-vm >> - [x] build slowdebug-vm >> - [x] build release-vm >> - [x] build optimized-vm >> - [x] ./test/jdk/java/util/concurrent (fastdebug-vm) >> - [x] with C1 >> - [x] with C2 >> - [x] with interpreter >> - [x] ./test/jdk/java/util/concurrent (release-vm) >> - [x] with C1 >> - [x] with C2 >> - [x] with interpreter >> - [x] ./test/jdk/java/util/concurrent (slowdebug-vm) >> - [x] with C1 >> - [x] with C2 >> - [x] with interpreter >> - [x] tier1 with fastdebug-vm >> - [x] tier1 with slowdebug-vm >> - [x] tier1 with release-vm >> >> *BenchMarks*: >> >> Results from Performance LPARs : >> >> >> Locking Mode = 1 (without Patch) >> >> Benchmark (innerCount) Mode Cnt Score Error Units >> LockUnlock.testContendedLock 100 avgt 12 5.144 ? 0.035 ns/op >> LockUnlock.testRecursiveLockUnlock 100 avgt 12 3824.742 ? 89.475 ns/op >> LockUnlock.testRecursiveSynchronization 100 avgt 12 25.348 ? 0.559 ns/op >> LockUnlock.testSerialLockUnlock 100 avgt 12 466.629 ? 3.036 ns/op >> LockUnlock.testSimpleLockUnlock 100 avgt 12 468.532 ? 1.793 ns/op >> Finished running test 'micro:vm.lang.LockUnlock' >> >> Locking Mode = 1 (with patch) >> >> Benchmark (innerCount) Mode Cnt Score Error Units >> LockUnlock.testContendedLock 100 avgt 12 5.146 ? 0.027 ns/op >> LockUnlock.testRecursiveLockUnlock 100 avgt 12 3833.175 ? 75.863 ns/op >> LockUnlock.testRecursiveSynchronization 100 avgt 12 25.206 ? 0.519 ns/op >> LockUnlock.testSerialLockUnlock 100 avgt 12 473.973 ? 2.103 ns/op >> LockUnlock.testSimpleLockUnlock 100 avgt 12 470.749 ? 2.229 ns/op >> Finished running test 'micro:vm.lang.LockUnlock' >> >> >> >> >> Locking Mode = 2 (without Patch) >> >> Benchmark (innerCount) Mode Cnt Score Error Units >> LockUnlock.testContendedLock 100 avgt 12 4.688 ? 0.051 ns/op >> LockUnlock.testRecursiveLockUnlock 100 avgt 12 12800.544 ? 92.265 ns/op >> LockUnlock.testRecursiveSynchronization 100 avgt 12 26.486 ? 2.229 ns/op >> LockUnlock.testSerialLockUnlock 100 avgt 12 424.499 ? 0.416 ns/op >> LockUnlock.te... > > Amit Kumar has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains seven commits: > > - Merge branch 'master' into recursive_locking_v1 > - not using load_const_optimized in compiler_fast_lock_lightweight_object > - minor code formatting & variable renamings > - revert DiagnoseSyncOnValueBasedClasses changes from c1 > - suggestions from Axel > - Merge branch 'master' into recursive_locking_v1 > - s390x recursive locking port Looks good to me - except for that one comment incorrectness. src/hotspot/cpu/s390/s390.ad line 9614: > 9612: // If unlocking was successful, cc should indicate 'EQ'. > 9613: // The compiler generates a branch to the runtime call to > 9614: // _complete_monitor_unlocking_Java for the case where cc is 'NE'. Shouldn't the comment talk about locking here? ------------- Marked as reviewed by lucy (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18878#pullrequestreview-2138083012 PR Review Comment: https://git.openjdk.org/jdk/pull/18878#discussion_r1652399704 From amitkumar at openjdk.org Tue Jun 25 11:06:26 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Tue, 25 Jun 2024 11:06:26 GMT Subject: RFR: 8319947: Recursive lightweight locking: s390x implementation [v6] In-Reply-To: References: <_HXRejW4TcLfTYbXlRQUVyejaomiwx6HhObYfG1lX6E=.f89b29a0-4c2d-43e1-83be-d7bafd84816c@github.com> Message-ID: On Tue, 25 Jun 2024 09:55:21 GMT, Lutz Schmidt wrote: >> Amit Kumar has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains seven commits: >> >> - Merge branch 'master' into recursive_locking_v1 >> - not using load_const_optimized in compiler_fast_lock_lightweight_object >> - minor code formatting & variable renamings >> - revert DiagnoseSyncOnValueBasedClasses changes from c1 >> - suggestions from Axel >> - Merge branch 'master' into recursive_locking_v1 >> - s390x recursive locking port > > src/hotspot/cpu/s390/s390.ad line 9614: > >> 9612: // If unlocking was successful, cc should indicate 'EQ'. >> 9613: // The compiler generates a branch to the runtime call to >> 9614: // _complete_monitor_unlocking_Java for the case where cc is 'NE'. > > Shouldn't the comment talk about locking here? Yes it should!!! Please take a look at latest commit. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18878#discussion_r1652552902 From amitkumar at openjdk.org Tue Jun 25 11:06:23 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Tue, 25 Jun 2024 11:06:23 GMT Subject: RFR: 8319947: Recursive lightweight locking: s390x implementation [v7] In-Reply-To: <_HXRejW4TcLfTYbXlRQUVyejaomiwx6HhObYfG1lX6E=.f89b29a0-4c2d-43e1-83be-d7bafd84816c@github.com> References: <_HXRejW4TcLfTYbXlRQUVyejaomiwx6HhObYfG1lX6E=.f89b29a0-4c2d-43e1-83be-d7bafd84816c@github.com> Message-ID: > s390x port for recursive locking. > > testing: > - [x] build fastdebug-vm > - [x] build slowdebug-vm > - [x] build release-vm > - [x] build optimized-vm > - [x] ./test/jdk/java/util/concurrent (fastdebug-vm) > - [x] with C1 > - [x] with C2 > - [x] with interpreter > - [x] ./test/jdk/java/util/concurrent (release-vm) > - [x] with C1 > - [x] with C2 > - [x] with interpreter > - [x] ./test/jdk/java/util/concurrent (slowdebug-vm) > - [x] with C1 > - [x] with C2 > - [x] with interpreter > - [x] tier1 with fastdebug-vm > - [x] tier1 with slowdebug-vm > - [x] tier1 with release-vm > > *BenchMarks*: > > Results from Performance LPARs : > > > Locking Mode = 1 (without Patch) > > Benchmark (innerCount) Mode Cnt Score Error Units > LockUnlock.testContendedLock 100 avgt 12 5.144 ? 0.035 ns/op > LockUnlock.testRecursiveLockUnlock 100 avgt 12 3824.742 ? 89.475 ns/op > LockUnlock.testRecursiveSynchronization 100 avgt 12 25.348 ? 0.559 ns/op > LockUnlock.testSerialLockUnlock 100 avgt 12 466.629 ? 3.036 ns/op > LockUnlock.testSimpleLockUnlock 100 avgt 12 468.532 ? 1.793 ns/op > Finished running test 'micro:vm.lang.LockUnlock' > > Locking Mode = 1 (with patch) > > Benchmark (innerCount) Mode Cnt Score Error Units > LockUnlock.testContendedLock 100 avgt 12 5.146 ? 0.027 ns/op > LockUnlock.testRecursiveLockUnlock 100 avgt 12 3833.175 ? 75.863 ns/op > LockUnlock.testRecursiveSynchronization 100 avgt 12 25.206 ? 0.519 ns/op > LockUnlock.testSerialLockUnlock 100 avgt 12 473.973 ? 2.103 ns/op > LockUnlock.testSimpleLockUnlock 100 avgt 12 470.749 ? 2.229 ns/op > Finished running test 'micro:vm.lang.LockUnlock' > > > > > Locking Mode = 2 (without Patch) > > Benchmark (innerCount) Mode Cnt Score Error Units > LockUnlock.testContendedLock 100 avgt 12 4.688 ? 0.051 ns/op > LockUnlock.testRecursiveLockUnlock 100 avgt 12 12800.544 ? 92.265 ns/op > LockUnlock.testRecursiveSynchronization 100 avgt 12 26.486 ? 2.229 ns/op > LockUnlock.testSerialLockUnlock 100 avgt 12 424.499 ? 0.416 ns/op > LockUnlock.testSimpleLockUnlock 100 avgt 12 424.241 ? 0.840 ns/op > Finished running test 'micro:vm.lang.Lo... Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: comments from Lutz ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18878/files - new: https://git.openjdk.org/jdk/pull/18878/files/e6c23f53..88058685 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18878&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18878&range=05-06 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/18878.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18878/head:pull/18878 PR: https://git.openjdk.org/jdk/pull/18878 From rehn at openjdk.org Tue Jun 25 11:42:24 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Tue, 25 Jun 2024 11:42:24 GMT Subject: RFR: 8332689: RISC-V: Use load instead of trampolines [v17] In-Reply-To: References: Message-ID: > Hi all, please consider! > > Today we do JAL to **dest** if **dest** is in reach (+/- 1 MB). > Using a very small application or running very short time we have fast patchable calls. > But any normal application running longer will increase the code size and code chrun/fragmentation. > So whatever or not you get hot fast calls rely on luck. > > To be patchable and get code cache reach we also emit a stub trampoline which we can point the JAL to. > This would be the common case for a patchable call. > > Code stream: > JAL > Stubs: > AUIPC > LD > JALR > > > > On some CPUs L1D and L1I can't contain the same cache line, which means the tramopline stub can bounce from L1I->L1D->L1I, which is expensive. > Even if you don't have that problem having a call to a jump is not the fastest way. > Loading the address avoids the pitsfalls of cmodx. > > This patch suggest to solve the problems with trampolines, we take small penalty in the naive case of JAL to **dest**, > and instead do by default: > > Code stream: > AUIPC > LD > JALR > Stubs: > > > An experimental option for turning trampolines back on exists. > > It should be possible to enhanced this with the WIP [Zjid](https://github.com/riscv/riscv-j-extension) by changing the JALR to JAL and nop out the auipc+ld (as the current proposal of Zjid forces the I-fetcher to fetch instruction in order (meaning we will avoid a lot issues which arm has)) when in reach and vice-versa. > > Numbers from VF2 (I have done them a few times, they are always overall in favor of this patch): > > fop (msec) 2239 | 2128 = 0.950424 > h2 (msec) 18660 | 16594 = 0.889282 > jython (msec) 22022 | 21925 = 0.995595 > luindex (msec) 2866 | 2842 = 0.991626 > lusearch (msec) 4108 | 4311 = 1.04942 > lusearch-fix (msec) 4406 | 4116 = 0.934181 > pmd (msec) 5976 | 5897 = 0.98678 > jython (msec) 22022 | 21925 = 0.995595 > Avg: 0.974112 > fop(xcomp) (msec) 2721 | 2714 = 0.997427 > h2(xcomp) (msec) 37719 | 38004 = 1.00756 > jython(xcomp) ... Robbin Ehn has updated the pull request incrementally with one additional commit since the last revision: Comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19453/files - new: https://git.openjdk.org/jdk/pull/19453/files/77e5d855..ea57c42b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19453&range=16 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19453&range=15-16 Stats: 13 lines in 2 files changed: 11 ins; 1 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/19453.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19453/head:pull/19453 PR: https://git.openjdk.org/jdk/pull/19453 From stuefe at openjdk.org Tue Jun 25 12:23:14 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 25 Jun 2024 12:23:14 GMT Subject: RFR: 8333658: NMT: Use an allocator with 4-byte pointers to save memory in NativeCallStackStorage [v29] In-Reply-To: References: <_UzjquVa1CRphN881l-geAXesq0zLI9GGsie512-bhs=.57d37312-7aca-4fc5-b59e-a2f121e77584@github.com> Message-ID: On Tue, 25 Jun 2024 09:47:53 GMT, Johan Sj?len wrote: >> Hmm. I am not sold on the "example" benefit. The array is quite easy in itself. >> >> You could scratch the lists, since their implementations don't do anything to really test the AWFL (new acronym, yay). For the saved LOCs could expand the tests to test with a collection of various types, e.g. >> - u1, u2, u8 >> - unaligned structures that need alignment (e.g. struct (void*; int; )) >> - trivial objects >> >> Interesting to see would be that it works, that alignment is correct, that nothing breaks across resizes (indexes stay stable, etc.). >> >> I wont insist of it, just thinking that the current test complexity is somewhat wasted. > >>really test the AWFL (new acronym, yay). > > Can't wait for templatized indices so I can write > > ```c++ > using AWFUL = ArrayWithFreeListAllocator; > > >>Interesting to see would be that it works, that alignment is correct, that nothing breaks across resizes (indexes stay stable, etc.). > >>I wont insist of it, just thinking that the current test complexity is somewhat wasted. > > I think those tests make sense if we replace the `GrowableArray`as the backing memory area with something else, otherwise that's just testing the GA. > > OK, how about we scrap the tests for when moving this to utilities and having the index being a part of the template? I've got that in the pipeline already, so it's not going to take too long I hope. Fine for me ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18979#discussion_r1652705851 From stuefe at openjdk.org Tue Jun 25 12:45:22 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 25 Jun 2024 12:45:22 GMT Subject: RFR: 8334738: os::print_hex_dump should optionally print ASCII [v2] In-Reply-To: References: Message-ID: > Motivated by analyzing CDS dump differences for reproducible builds, I found an optional ASCII printout to be valuable. As usual with hex dumps, ascii follows hex printout > > Example: > > > > 118 0x00000000000001c0: 204b444a6e65704f 53207469422d3436 4d56207265767265 6564747361662820 OpenJDK 64-Bit Server VM (fastde > 119 0x00000000000001e0: 692d343220677562 2d6c616e7265746e 68742e636f686461 756f732e73616d6f bug 24-internal-adhoc.thomas.sou > 120 0x0000000000000200: 726f662029656372 612d78756e696c20 45524a203436646d 746e692d34322820 rce) for linux-amd64 JRE (24-int > 121 0x0000000000000220: 64612d6c616e7265 6d6f68742e636f68 6372756f732e7361 6c697562202c2965 ernal-adhoc.thomas.source), buil > 122 0x0000000000000240: 323032206e6f2074 5430322d36302d34 32313a35343a3031 672068746977205a t on 2024-06-20T10:45:12Z with g > 123 0x0000000000000260: 2e352e3031206363 0000000000000030 0000000000000000 0000000000000000 cc 10.5.0_______________________ > 124 0x0000000000000280: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 ________________________________ > 125 0x00000000000002a0: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 ________________________________ > > > The patch does that. > > Small unrelated changes: > > - I rewrote and extended the gtests, testing now a real-life printout containing a mixture or readable and non-readable pages, and printable and non-printable characters. I re-enabled tests on Windows, since https://bugs.openjdk.org/browse/JDK-8185734 is long solved. > > - The new test uncovered an issue on 32-bit when printing giant words. We shift a signed value by 32 bits upwards, which can result in -1 resp. ffffffff in the upper half of the giant word. One of the pitfalls of intptr_t vs uintptr_t (I think most uses of intptr_t should probably use uintptr_t). > > - I got tired of casting constness away from to-be-printed memory range just to be able to feed an address to os::print_hex_dump. The content printed is usually const. os::print_hex_dump does not need non-constness, but since we use address, and address is typedef char*, and one cannot declare a typedef'ed pointer target-const, the issue is there. I therefore changed the input to const uint8_t*. Maybe we need a const_address or something similar. > > ---- > > Ran tests on Linux x64 and x86, Windows x86 and Mac aarch64. Fixed all issues I found. Only little-endian, I don't have big-endian machines and therefore made those changes blindly. ... Thomas Stuefe has updated the pull request incrementally with five additional commits since the last revision: - copyrights - const_address instead of const uint8_t* - use dot instead of underscore for unprintable - rely on default true - Revert "fix copyrights" This reverts commit 2b8bc55e53a88d13cd268dc89ebaac7fe42f60d5. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19835/files - new: https://git.openjdk.org/jdk/pull/19835/files/17e4ce2a..648c4d4f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19835&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19835&range=00-01 Stats: 38 lines in 10 files changed: 2 ins; 0 del; 36 mod Patch: https://git.openjdk.org/jdk/pull/19835.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19835/head:pull/19835 PR: https://git.openjdk.org/jdk/pull/19835 From stuefe at openjdk.org Tue Jun 25 12:51:13 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 25 Jun 2024 12:51:13 GMT Subject: RFR: 8334738: os::print_hex_dump should optionally print ASCII [v2] In-Reply-To: References: Message-ID: On Tue, 25 Jun 2024 07:05:47 GMT, David Holmes wrote: >> Thomas Stuefe has updated the pull request incrementally with five additional commits since the last revision: >> >> - copyrights >> - const_address instead of const uint8_t* >> - use dot instead of underscore for unprintable >> - rely on default true >> - Revert "fix copyrights" >> >> This reverts commit 2b8bc55e53a88d13cd268dc89ebaac7fe42f60d5. > >> I got tired of casting constness away from to-be-printed memory range just to be able to feed an address to os::print_hex_dump. > > I don't follow - you didn't change/add any callsites so why would you need to cast away constness to "feed" the existing call ?? > > More comments below. > > I like the idea in general. Hi @dholmes-ora, thanks for your review! I hopefully addressed all your concerns. In particular: - I now rely on `true` for the default of print_ascii, and rolled back changes to all callsites that explicitly passed true. I also decorated the parameters with comments - I changed read_safely_from to take an uintptr_t* pointer. I would like to retain the signed->unsigned change, otherwise I would have to cast more in the 32-bit path. - I introduced a new `const_address` typedef to complement the ubiquitous `address`. If we have one, we should have the other. - I now use a dot instead of an underscore for unprintable data. This seems to be the default most hex editors use. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19835#issuecomment-2188846678 From stuefe at openjdk.org Tue Jun 25 12:51:13 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 25 Jun 2024 12:51:13 GMT Subject: RFR: 8334738: os::print_hex_dump should optionally print ASCII [v2] In-Reply-To: References: Message-ID: On Tue, 25 Jun 2024 08:20:05 GMT, Thomas Stuefe wrote: > I chose uint8_t since it seemed to me the closest const version of address. address, however, is signed char, but I did not want to pass const char* since that looks like a string. I was mistaken here, address is an unsigned char*. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19835#issuecomment-2188849029 From stuefe at openjdk.org Tue Jun 25 12:56:17 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 25 Jun 2024 12:56:17 GMT Subject: RFR: 8333658: NMT: Use an allocator with 4-byte pointers to save memory in NativeCallStackStorage [v32] In-Reply-To: <6BBqynMUvv-_60E_ydHakvgaJLLeHSsEfPJvFRMg9cA=.f19b8187-3a20-4ab9-adaf-1d1ad0a4cbea@github.com> References: <6BBqynMUvv-_60E_ydHakvgaJLLeHSsEfPJvFRMg9cA=.f19b8187-3a20-4ab9-adaf-1d1ad0a4cbea@github.com> Message-ID: On Mon, 24 Jun 2024 14:12:49 GMT, Johan Sj?len wrote: >> Hi, >> >> This PR introduces a new allocator, `IndexedFreelistAllocator`. It uses a `GrowableArray` in order to have 4-byte "pointers" to its elements and also works as a freelist as unused slots form a linked list. This allocator cannot shrink its used memory. I'm always open for better names. >> >> We then use this allocator in order to store the `NativeCallStackStorage` hash table. This saves `4 * 4099 = ~16KiB` of memory as each table element is now only 4 bytes. For each stored stack it also saves 4 bytes. The fact that the allocator cannot shrink is fine, we do not ever remove stacks from the storage. >> >> The main point of this is to introduce a pointer-generic experimential API, so I also implemented CHeap and Arena allocators. It's currently placed in NMT, but we might want to move it into utilities. It uses a bit of template-magic, but my IDE (Emacs+clangd) was actually able to figure things out when the types didn't line up correctly etc, so it's not an enemy to IDE help. >> >> It sounded expensive to have the GrowableArray continously realloc its underlying data array, so I did a basic test where we allocate 1 000 000 stacks and push them into NativeCallStackStorage backed by different allocators. This is available in the PR. >> >> The results are as follows on linux-x64-slowdebug: >> >> >> Generate stacks... Done >> Time taken with GrowableArray: 8341.240945 >> Time taken with CHeap: 12189.031318 >> Time taken with Arena: 8800.703092 >> Time taken with GrowableArray again: 8295.508829 >> >> >> And on linux-x64: >> >> >> Time taken with GrowableArray: 8378.018135 >> Time taken with CHeap: 12437.347868 >> Time taken with Arena: 8758.064717 >> Time taken with GrowableArray again: 8391.076291 >> >> >> Obviously, this is a very basic benchmark, but it seems like we're faster than CHeap and Arena for this case. > > Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: > > Rename tests src/hotspot/share/nmt/nmtNativeCallStackStorage.hpp line 76: > 74: // 4099 gives a 50% probability of collisions at 76 stacks (as per birthday problem). > 75: static const constexpr int default_table_size = 4099; > 76: int _table_size; Pre-existing. Both table size and, arguably, the table itself are never changed, since we don't resize. At least size could be const, _table too if you move allocation to init list. (latter is a matter of taste, up to you). src/hotspot/share/nmt/nmtNativeCallStackStorage.hpp line 108: > 106: } > 107: } > 108: } Could you move this, and the dtor, over to the cpp file, too? You probably can remove the allocation.hpp include then. Less include deps are always good. test/hotspot/gtest/nmt/test_arrayWithFreeList.cpp line 71: > 69: return e; > 70: } > 71: }; nit, add newline ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18979#discussion_r1652713763 PR Review Comment: https://git.openjdk.org/jdk/pull/18979#discussion_r1652764983 PR Review Comment: https://git.openjdk.org/jdk/pull/18979#discussion_r1652769018 From amitkumar at openjdk.org Tue Jun 25 13:07:39 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Tue, 25 Jun 2024 13:07:39 GMT Subject: RFR: 8331126: [s390x] secondary_super_cache does not scale well [v5] In-Reply-To: References: Message-ID: > s390x Port for [JDK-8180450](https://bugs.openjdk.org/browse/JDK-8180450) > > I ran `tier1` test with `-XX:+UseSecondarySupersTable -XX:+VerifySecondarySupers -XX:+StressSecondarySupers` in fastdebug vm and didn't see any new failure appearing; Except one I have mentioned [here](https://github.com/openjdk/jdk/pull/19368#issuecomment-2154983693); But this is reproducible on every other architecture with these flags. > > > Without Patch: > > SecondarySuperCacheHits.test avgt 15 0.929 ? 0.010 ns/op > > SecondarySuperCacheInterContention.test avgt 15 1.413 ? 0.007 ns/op > SecondarySuperCacheInterContention.test:t1 avgt 15 1.415 ? 0.016 ns/op > SecondarySuperCacheInterContention.test:t2 avgt 15 1.410 ? 0.017 ns/op > > Benchmark Mode Cnt Score Error Units > SecondarySupersLookup.testNegative00 avgt 15 1.806 ? 0.325 ns/op > SecondarySupersLookup.testNegative01 avgt 15 2.364 ? 0.236 ns/op > SecondarySupersLookup.testNegative02 avgt 15 2.903 ? 0.215 ns/op > SecondarySupersLookup.testNegative03 avgt 15 3.417 ? 0.199 ns/op > SecondarySupersLookup.testNegative04 avgt 15 3.758 ? 0.102 ns/op > SecondarySupersLookup.testNegative05 avgt 15 4.352 ? 0.123 ns/op > SecondarySupersLookup.testNegative06 avgt 15 4.800 ? 0.099 ns/op > SecondarySupersLookup.testNegative07 avgt 15 5.365 ? 0.060 ns/op > SecondarySupersLookup.testNegative08 avgt 15 6.316 ? 0.092 ns/op > SecondarySupersLookup.testNegative09 avgt 15 6.669 ? 0.164 ns/op > SecondarySupersLookup.testNegative10 avgt 15 7.041 ? 0.164 ns/op > SecondarySupersLookup.testNegative16 avgt 15 9.336 ? 0.185 ns/op > SecondarySupersLookup.testNegative20 avgt 15 11.373 ? 0.029 ns/op > SecondarySupersLookup.testNegative30 avgt 15 15.236 ? 0.051 ns/op > SecondarySupersLookup.testNegative32 avgt 15 16.031 ? 0.091 ns/op > SecondarySupersLookup.testNegative40 avgt 15 19.197 ? 0.279 ns/op > SecondarySupersLookup.testNegative50 avgt 15 23.804 ? 2.387 ns/op > SecondarySupersLookup.testNegative55 avgt 15 25.610 ? 1.155 ns/op > SecondarySupersLookup.testNegative56 avgt 15 26.128 ? 2.203 ns/op > SecondarySupersLookup.testNegative57 avgt 15 26.126 ? 0.881 ns/op > SecondarySupersLookup.testNegative58 avgt 15 26.314 ? 0.521 ns/op > SecondarySupersLookup.testNegative59 avgt 15 26.750 ? 0.837 ns/op > SecondarySupersLookup.testNegative60 avgt 15 27.118 ? 0.557 ns/op > SecondarySupersLookup.testNegative61 avgt 15 27.763 ? 1.628 ns... Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: comments from Lutz ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19544/files - new: https://git.openjdk.org/jdk/pull/19544/files/1042f43a..5a30c51d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19544&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19544&range=03-04 Stats: 5 lines in 1 file changed: 1 ins; 2 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/19544.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19544/head:pull/19544 PR: https://git.openjdk.org/jdk/pull/19544 From amitkumar at openjdk.org Tue Jun 25 13:07:40 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Tue, 25 Jun 2024 13:07:40 GMT Subject: RFR: 8331126: [s390x] secondary_super_cache does not scale well [v4] In-Reply-To: References: Message-ID: On Mon, 24 Jun 2024 14:12:05 GMT, Lutz Schmidt wrote: >> Amit Kumar has updated the pull request incrementally with two additional commits since the last revision: >> >> - Update src/hotspot/cpu/s390/macroAssembler_s390.cpp >> - rename: r_scratch to r_result in repne_scan method > > src/hotspot/cpu/s390/macroAssembler_s390.cpp line 3325: > >> 3323: assert(Klass::SECONDARY_SUPERS_BITMAP_FULL == ~uintx(0), ""); >> 3324: >> 3325: z_cghi(r_bitmap, (long)-1); > > Why not compare against `(unitx)Klass::SECONDARY_SUPERS_BITMAP_FULL`? Updated, Please see the latest commit. Tier1 test are still fine. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19544#discussion_r1652788045 From jsjolen at openjdk.org Tue Jun 25 13:12:02 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Tue, 25 Jun 2024 13:12:02 GMT Subject: RFR: 8333658: NMT: Use an allocator with 4-byte pointers to save memory in NativeCallStackStorage [v33] In-Reply-To: References: Message-ID: > Hi, > > This PR introduces a new allocator, `IndexedFreelistAllocator`. It uses a `GrowableArray` in order to have 4-byte "pointers" to its elements and also works as a freelist as unused slots form a linked list. This allocator cannot shrink its used memory. I'm always open for better names. > > We then use this allocator in order to store the `NativeCallStackStorage` hash table. This saves `4 * 4099 = ~16KiB` of memory as each table element is now only 4 bytes. For each stored stack it also saves 4 bytes. The fact that the allocator cannot shrink is fine, we do not ever remove stacks from the storage. > > The main point of this is to introduce a pointer-generic experimential API, so I also implemented CHeap and Arena allocators. It's currently placed in NMT, but we might want to move it into utilities. It uses a bit of template-magic, but my IDE (Emacs+clangd) was actually able to figure things out when the types didn't line up correctly etc, so it's not an enemy to IDE help. > > It sounded expensive to have the GrowableArray continously realloc its underlying data array, so I did a basic test where we allocate 1 000 000 stacks and push them into NativeCallStackStorage backed by different allocators. This is available in the PR. > > The results are as follows on linux-x64-slowdebug: > > > Generate stacks... Done > Time taken with GrowableArray: 8341.240945 > Time taken with CHeap: 12189.031318 > Time taken with Arena: 8800.703092 > Time taken with GrowableArray again: 8295.508829 > > > And on linux-x64: > > > Time taken with GrowableArray: 8378.018135 > Time taken with CHeap: 12437.347868 > Time taken with Arena: 8758.064717 > Time taken with GrowableArray again: 8391.076291 > > > Obviously, this is a very basic benchmark, but it seems like we're faster than CHeap and Arena for this case. Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: Thomas's review comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18979/files - new: https://git.openjdk.org/jdk/pull/18979/files/67a8a218..3d9e324c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18979&range=32 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18979&range=31-32 Stats: 33 lines in 3 files changed: 18 ins; 12 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/18979.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18979/head:pull/18979 PR: https://git.openjdk.org/jdk/pull/18979 From lucy at openjdk.org Tue Jun 25 13:18:17 2024 From: lucy at openjdk.org (Lutz Schmidt) Date: Tue, 25 Jun 2024 13:18:17 GMT Subject: RFR: 8319947: Recursive lightweight locking: s390x implementation [v7] In-Reply-To: References: <_HXRejW4TcLfTYbXlRQUVyejaomiwx6HhObYfG1lX6E=.f89b29a0-4c2d-43e1-83be-d7bafd84816c@github.com> Message-ID: On Tue, 25 Jun 2024 11:06:23 GMT, Amit Kumar wrote: >> s390x port for recursive locking. >> >> testing: >> - [x] build fastdebug-vm >> - [x] build slowdebug-vm >> - [x] build release-vm >> - [x] build optimized-vm >> - [x] ./test/jdk/java/util/concurrent (fastdebug-vm) >> - [x] with C1 >> - [x] with C2 >> - [x] with interpreter >> - [x] ./test/jdk/java/util/concurrent (release-vm) >> - [x] with C1 >> - [x] with C2 >> - [x] with interpreter >> - [x] ./test/jdk/java/util/concurrent (slowdebug-vm) >> - [x] with C1 >> - [x] with C2 >> - [x] with interpreter >> - [x] tier1 with fastdebug-vm >> - [x] tier1 with slowdebug-vm >> - [x] tier1 with release-vm >> >> *BenchMarks*: >> >> Results from Performance LPARs : >> >> >> Locking Mode = 1 (without Patch) >> >> Benchmark (innerCount) Mode Cnt Score Error Units >> LockUnlock.testContendedLock 100 avgt 12 5.144 ? 0.035 ns/op >> LockUnlock.testRecursiveLockUnlock 100 avgt 12 3824.742 ? 89.475 ns/op >> LockUnlock.testRecursiveSynchronization 100 avgt 12 25.348 ? 0.559 ns/op >> LockUnlock.testSerialLockUnlock 100 avgt 12 466.629 ? 3.036 ns/op >> LockUnlock.testSimpleLockUnlock 100 avgt 12 468.532 ? 1.793 ns/op >> Finished running test 'micro:vm.lang.LockUnlock' >> >> Locking Mode = 1 (with patch) >> >> Benchmark (innerCount) Mode Cnt Score Error Units >> LockUnlock.testContendedLock 100 avgt 12 5.146 ? 0.027 ns/op >> LockUnlock.testRecursiveLockUnlock 100 avgt 12 3833.175 ? 75.863 ns/op >> LockUnlock.testRecursiveSynchronization 100 avgt 12 25.206 ? 0.519 ns/op >> LockUnlock.testSerialLockUnlock 100 avgt 12 473.973 ? 2.103 ns/op >> LockUnlock.testSimpleLockUnlock 100 avgt 12 470.749 ? 2.229 ns/op >> Finished running test 'micro:vm.lang.LockUnlock' >> >> >> >> >> Locking Mode = 2 (without Patch) >> >> Benchmark (innerCount) Mode Cnt Score Error Units >> LockUnlock.testContendedLock 100 avgt 12 4.688 ? 0.051 ns/op >> LockUnlock.testRecursiveLockUnlock 100 avgt 12 12800.544 ? 92.265 ns/op >> LockUnlock.testRecursiveSynchronization 100 avgt 12 26.486 ? 2.229 ns/op >> LockUnlock.testSerialLockUnlock 100 avgt 12 424.499 ? 0.416 ns/op >> LockUnlock.te... > > Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: > > comments from Lutz No further complaints. ------------- Marked as reviewed by lucy (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18878#pullrequestreview-2138647314 From stuefe at openjdk.org Tue Jun 25 13:24:14 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 25 Jun 2024 13:24:14 GMT Subject: RFR: 8333658: NMT: Use an allocator with 4-byte pointers to save memory in NativeCallStackStorage [v33] In-Reply-To: References: Message-ID: On Tue, 25 Jun 2024 13:12:02 GMT, Johan Sj?len wrote: >> Hi, >> >> This PR introduces a new allocator, `IndexedFreelistAllocator`. It uses a `GrowableArray` in order to have 4-byte "pointers" to its elements and also works as a freelist as unused slots form a linked list. This allocator cannot shrink its used memory. I'm always open for better names. >> >> We then use this allocator in order to store the `NativeCallStackStorage` hash table. This saves `4 * 4099 = ~16KiB` of memory as each table element is now only 4 bytes. For each stored stack it also saves 4 bytes. The fact that the allocator cannot shrink is fine, we do not ever remove stacks from the storage. >> >> The main point of this is to introduce a pointer-generic experimential API, so I also implemented CHeap and Arena allocators. It's currently placed in NMT, but we might want to move it into utilities. It uses a bit of template-magic, but my IDE (Emacs+clangd) was actually able to figure things out when the types didn't line up correctly etc, so it's not an enemy to IDE help. >> >> It sounded expensive to have the GrowableArray continously realloc its underlying data array, so I did a basic test where we allocate 1 000 000 stacks and push them into NativeCallStackStorage backed by different allocators. This is available in the PR. >> >> The results are as follows on linux-x64-slowdebug: >> >> >> Generate stacks... Done >> Time taken with GrowableArray: 8341.240945 >> Time taken with CHeap: 12189.031318 >> Time taken with Arena: 8800.703092 >> Time taken with GrowableArray again: 8295.508829 >> >> >> And on linux-x64: >> >> >> Time taken with GrowableArray: 8378.018135 >> Time taken with CHeap: 12437.347868 >> Time taken with Arena: 8758.064717 >> Time taken with GrowableArray again: 8391.076291 >> >> >> Obviously, this is a very basic benchmark, but it seems like we're faster than CHeap and Arena for this case. > > Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: > > Thomas's review comments I think this looks good now. ------------- Marked as reviewed by stuefe (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18979#pullrequestreview-2138666251 From amitkumar at openjdk.org Tue Jun 25 13:27:15 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Tue, 25 Jun 2024 13:27:15 GMT Subject: RFR: 8319947: Recursive lightweight locking: s390x implementation [v7] In-Reply-To: References: <_HXRejW4TcLfTYbXlRQUVyejaomiwx6HhObYfG1lX6E=.f89b29a0-4c2d-43e1-83be-d7bafd84816c@github.com> Message-ID: <8WiczWnhx-JaaqiOc7uwmHQgAoeXRA9K4Nr426dypg0=.ae482240-6386-4e94-acc7-34e4230ec56e@github.com> On Tue, 25 Jun 2024 11:06:23 GMT, Amit Kumar wrote: >> s390x port for recursive locking. >> >> testing: >> - [x] build fastdebug-vm >> - [x] build slowdebug-vm >> - [x] build release-vm >> - [x] build optimized-vm >> - [x] ./test/jdk/java/util/concurrent (fastdebug-vm) >> - [x] with C1 >> - [x] with C2 >> - [x] with interpreter >> - [x] ./test/jdk/java/util/concurrent (release-vm) >> - [x] with C1 >> - [x] with C2 >> - [x] with interpreter >> - [x] ./test/jdk/java/util/concurrent (slowdebug-vm) >> - [x] with C1 >> - [x] with C2 >> - [x] with interpreter >> - [x] tier1 with fastdebug-vm >> - [x] tier1 with slowdebug-vm >> - [x] tier1 with release-vm >> >> *BenchMarks*: >> >> Results from Performance LPARs : >> >> >> Locking Mode = 1 (without Patch) >> >> Benchmark (innerCount) Mode Cnt Score Error Units >> LockUnlock.testContendedLock 100 avgt 12 5.144 ? 0.035 ns/op >> LockUnlock.testRecursiveLockUnlock 100 avgt 12 3824.742 ? 89.475 ns/op >> LockUnlock.testRecursiveSynchronization 100 avgt 12 25.348 ? 0.559 ns/op >> LockUnlock.testSerialLockUnlock 100 avgt 12 466.629 ? 3.036 ns/op >> LockUnlock.testSimpleLockUnlock 100 avgt 12 468.532 ? 1.793 ns/op >> Finished running test 'micro:vm.lang.LockUnlock' >> >> Locking Mode = 1 (with patch) >> >> Benchmark (innerCount) Mode Cnt Score Error Units >> LockUnlock.testContendedLock 100 avgt 12 5.146 ? 0.027 ns/op >> LockUnlock.testRecursiveLockUnlock 100 avgt 12 3833.175 ? 75.863 ns/op >> LockUnlock.testRecursiveSynchronization 100 avgt 12 25.206 ? 0.519 ns/op >> LockUnlock.testSerialLockUnlock 100 avgt 12 473.973 ? 2.103 ns/op >> LockUnlock.testSimpleLockUnlock 100 avgt 12 470.749 ? 2.229 ns/op >> Finished running test 'micro:vm.lang.LockUnlock' >> >> >> >> >> Locking Mode = 2 (without Patch) >> >> Benchmark (innerCount) Mode Cnt Score Error Units >> LockUnlock.testContendedLock 100 avgt 12 4.688 ? 0.051 ns/op >> LockUnlock.testRecursiveLockUnlock 100 avgt 12 12800.544 ? 92.265 ns/op >> LockUnlock.testRecursiveSynchronization 100 avgt 12 26.486 ? 2.229 ns/op >> LockUnlock.testSerialLockUnlock 100 avgt 12 424.499 ? 0.416 ns/op >> LockUnlock.te... > > Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: > > comments from Lutz @TheRealMDoerr would you take final look, before I integrate it :-) ------------- PR Comment: https://git.openjdk.org/jdk/pull/18878#issuecomment-2188962304 From lucy at openjdk.org Tue Jun 25 13:34:12 2024 From: lucy at openjdk.org (Lutz Schmidt) Date: Tue, 25 Jun 2024 13:34:12 GMT Subject: RFR: 8331126: [s390x] secondary_super_cache does not scale well [v4] In-Reply-To: References: Message-ID: <-RKRQWDeR-hsOUqHtQ9Ip05iHsanDKGmN50FOM4Lj6A=.d4f4ce42-b0ad-47c1-8416-91cec99cbfc1@github.com> On Mon, 24 Jun 2024 18:44:29 GMT, Martin Doerr wrote: >> Amit Kumar has updated the pull request incrementally with two additional commits since the last revision: >> >> - Update src/hotspot/cpu/s390/macroAssembler_s390.cpp >> - rename: r_scratch to r_result in repne_scan method > > src/hotspot/cpu/s390/macroAssembler_s390.cpp line 3171: > >> 3169: >> 3170: bind(L_loop); >> 3171: z_cg(r_value, Address(r_addr)); > > Idea: I think it is possible to use z_xg to do both at once: Set to 0 iff equal and CC to equal. (Like PPC64.) Good idea. But will have no benefit. When using z_xg, r_value will be altered. Subsequent iterations will then compare against a modified value. To use the triadic register instruction (XGRK), we first have to load the value int a scratch register. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19544#discussion_r1652830658 From lucy at openjdk.org Tue Jun 25 13:34:13 2024 From: lucy at openjdk.org (Lutz Schmidt) Date: Tue, 25 Jun 2024 13:34:13 GMT Subject: RFR: 8331126: [s390x] secondary_super_cache does not scale well [v4] In-Reply-To: References: Message-ID: On Tue, 25 Jun 2024 03:26:24 GMT, Amit Kumar wrote: >> src/hotspot/cpu/s390/macroAssembler_s390.cpp line 3173: >> >>> 3171: z_cg(r_value, Address(r_addr)); >>> 3172: z_bre(L_exit); // branch on success >>> 3173: add2reg(r_addr, wordSize); >> >> Can add2reg change CC? I'd probably rather use z_la for the addition without changing CC. > > I just noticed that `PreferLAoverADD` is by default set to false. I'm looking for a reason for that, maybe we can set it to true. Otherwise I'll update it to z_la. > > CC: @RealLucy Use LA explicitly here. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19544#discussion_r1652832511 From jsjolen at openjdk.org Tue Jun 25 13:39:58 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Tue, 25 Jun 2024 13:39:58 GMT Subject: RFR: 8333658: NMT: Use an allocator with 4-byte pointers to save memory in NativeCallStackStorage [v34] In-Reply-To: References: Message-ID: <4uxxepR1o8-X2knnKyxZOK-lkiAZCe0tp-GR1MtkCc8=.36c8d96a-2683-40df-8ca2-f870b31b1eab@github.com> > Hi, > > This PR introduces a new allocator, `IndexedFreelistAllocator`. It uses a `GrowableArray` in order to have 4-byte "pointers" to its elements and also works as a freelist as unused slots form a linked list. This allocator cannot shrink its used memory. I'm always open for better names. > > We then use this allocator in order to store the `NativeCallStackStorage` hash table. This saves `4 * 4099 = ~16KiB` of memory as each table element is now only 4 bytes. For each stored stack it also saves 4 bytes. The fact that the allocator cannot shrink is fine, we do not ever remove stacks from the storage. > > The main point of this is to introduce a pointer-generic experimential API, so I also implemented CHeap and Arena allocators. It's currently placed in NMT, but we might want to move it into utilities. It uses a bit of template-magic, but my IDE (Emacs+clangd) was actually able to figure things out when the types didn't line up correctly etc, so it's not an enemy to IDE help. > > It sounded expensive to have the GrowableArray continously realloc its underlying data array, so I did a basic test where we allocate 1 000 000 stacks and push them into NativeCallStackStorage backed by different allocators. This is available in the PR. > > The results are as follows on linux-x64-slowdebug: > > > Generate stacks... Done > Time taken with GrowableArray: 8341.240945 > Time taken with CHeap: 12189.031318 > Time taken with Arena: 8800.703092 > Time taken with GrowableArray again: 8295.508829 > > > And on linux-x64: > > > Time taken with GrowableArray: 8378.018135 > Time taken with CHeap: 12437.347868 > Time taken with Arena: 8758.064717 > Time taken with GrowableArray again: 8391.076291 > > > Obviously, this is a very basic benchmark, but it seems like we're faster than CHeap and Arena for this case. Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: Do not use char array ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18979/files - new: https://git.openjdk.org/jdk/pull/18979/files/3d9e324c..eefeb926 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18979&range=33 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18979&range=32-33 Stats: 4 lines in 1 file changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/18979.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18979/head:pull/18979 PR: https://git.openjdk.org/jdk/pull/18979 From jsjolen at openjdk.org Tue Jun 25 13:39:58 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Tue, 25 Jun 2024 13:39:58 GMT Subject: RFR: 8333658: NMT: Use an allocator with 4-byte pointers to save memory in NativeCallStackStorage [v33] In-Reply-To: References: Message-ID: On Tue, 25 Jun 2024 13:12:02 GMT, Johan Sj?len wrote: >> Hi, >> >> This PR introduces a new allocator, `IndexedFreelistAllocator`. It uses a `GrowableArray` in order to have 4-byte "pointers" to its elements and also works as a freelist as unused slots form a linked list. This allocator cannot shrink its used memory. I'm always open for better names. >> >> We then use this allocator in order to store the `NativeCallStackStorage` hash table. This saves `4 * 4099 = ~16KiB` of memory as each table element is now only 4 bytes. For each stored stack it also saves 4 bytes. The fact that the allocator cannot shrink is fine, we do not ever remove stacks from the storage. >> >> The main point of this is to introduce a pointer-generic experimential API, so I also implemented CHeap and Arena allocators. It's currently placed in NMT, but we might want to move it into utilities. It uses a bit of template-magic, but my IDE (Emacs+clangd) was actually able to figure things out when the types didn't line up correctly etc, so it's not an enemy to IDE help. >> >> It sounded expensive to have the GrowableArray continously realloc its underlying data array, so I did a basic test where we allocate 1 000 000 stacks and push them into NativeCallStackStorage backed by different allocators. This is available in the PR. >> >> The results are as follows on linux-x64-slowdebug: >> >> >> Generate stacks... Done >> Time taken with GrowableArray: 8341.240945 >> Time taken with CHeap: 12189.031318 >> Time taken with Arena: 8800.703092 >> Time taken with GrowableArray again: 8295.508829 >> >> >> And on linux-x64: >> >> >> Time taken with GrowableArray: 8378.018135 >> Time taken with CHeap: 12437.347868 >> Time taken with Arena: 8758.064717 >> Time taken with GrowableArray again: 8391.076291 >> >> >> Obviously, this is a very basic benchmark, but it seems like we're faster than CHeap and Arena for this case. > > Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: > > Thomas's review comments One last change: I've removed the `e char[sizeof(E)]` part of the union and made it `E e`. As the type is trivially copyable and trivially destructible this is fine. I realised that the alignment of `I` may be larger than the alignment of `E` if we're unlucky (storing a `char` for example) then that can lead to faulty behavior. We're now a bit safer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18979#issuecomment-2188990041 From jkratochvil at openjdk.org Tue Jun 25 13:42:14 2024 From: jkratochvil at openjdk.org (Jan Kratochvil) Date: Tue, 25 Jun 2024 13:42:14 GMT Subject: RFR: 8261242: [Linux] OSContainer::is_containerized() returns true when run outside a container [v6] In-Reply-To: References: Message-ID: On Thu, 20 Jun 2024 12:06:43 GMT, Severin Gehwolf wrote: >> Please review this enhancement to the container detection code which allows it to figure out whether the JVM is actually running inside a container (`podman`, `docker`, `crio`), or with some other means that enforces memory/cpu limits by means of the cgroup filesystem. If neither of those conditions hold, the JVM runs in not containerized mode, addressing the issue described in the JBS tracker. For example, on my Linux system `is_containerized() == false" is being indicated with the following trace log line: >> >> >> [0.001s][debug][os,container] OSContainer::init: is_containerized() = false because no cpu or memory limit is present >> >> >> This state is being exposed by the Java `Metrics` API class using the new (still JDK internal) `isContainerized()` method. Example: >> >> >> java -XshowSettings:system --version >> Operating System Metrics: >> Provider: cgroupv1 >> System not containerized. >> openjdk 23-internal 2024-09-17 >> OpenJDK Runtime Environment (fastdebug build 23-internal-adhoc.sgehwolf.jdk-jdk) >> OpenJDK 64-Bit Server VM (fastdebug build 23-internal-adhoc.sgehwolf.jdk-jdk, mixed mode, sharing) >> >> >> The basic property this is being built on is the observation that the cgroup controllers typically get mounted read only into containers. Note that the current container tests assert that `OSContainer::is_containerized() == true` in various tests. Therefore, using the heuristic of "is any memory or cpu limit present" isn't sufficient. I had considered that in an earlier iteration, but many container tests failed. >> >> Overall, I think, with this patch we improve the current situation of claiming a containerized system being present when it's actually just a regular Linux system. >> >> Testing: >> >> - [x] GHA (risc-v failure seems infra related) >> - [x] Container tests on Linux x86_64 of cgroups v1 and cgroups v2 (including gtests) >> - [x] Some manual testing using cri-o >> >> Thoughts? > > Severin Gehwolf has updated the pull request incrementally with one additional commit since the last revision: > > Remove problem listing of PlainRead which is reworked here Currently this patch conflicts a lot with #19085 (jerboaa:jdk-8331560-cgroup-controller-delegation). ------------- PR Comment: https://git.openjdk.org/jdk/pull/18201#issuecomment-2188994651 From sgehwolf at openjdk.org Tue Jun 25 13:45:18 2024 From: sgehwolf at openjdk.org (Severin Gehwolf) Date: Tue, 25 Jun 2024 13:45:18 GMT Subject: RFR: 8261242: [Linux] OSContainer::is_containerized() returns true when run outside a container [v6] In-Reply-To: References: Message-ID: On Tue, 25 Jun 2024 13:39:07 GMT, Jan Kratochvil wrote: > Currently this patch conflicts a lot with #19085 (jerboaa:jdk-8331560-cgroup-controller-delegation). Yes, I'll resolve it one way or another depending on which one goes in first. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18201#issuecomment-2189001364 From sgehwolf at openjdk.org Tue Jun 25 13:54:46 2024 From: sgehwolf at openjdk.org (Severin Gehwolf) Date: Tue, 25 Jun 2024 13:54:46 GMT Subject: RFR: 8261242: [Linux] OSContainer::is_containerized() returns true when run outside a container [v7] In-Reply-To: References: Message-ID: > Please review this enhancement to the container detection code which allows it to figure out whether the JVM is actually running inside a container (`podman`, `docker`, `crio`), or with some other means that enforces memory/cpu limits by means of the cgroup filesystem. If neither of those conditions hold, the JVM runs in not containerized mode, addressing the issue described in the JBS tracker. For example, on my Linux system `is_containerized() == false" is being indicated with the following trace log line: > > > [0.001s][debug][os,container] OSContainer::init: is_containerized() = false because no cpu or memory limit is present > > > This state is being exposed by the Java `Metrics` API class using the new (still JDK internal) `isContainerized()` method. Example: > > > java -XshowSettings:system --version > Operating System Metrics: > Provider: cgroupv1 > System not containerized. > openjdk 23-internal 2024-09-17 > OpenJDK Runtime Environment (fastdebug build 23-internal-adhoc.sgehwolf.jdk-jdk) > OpenJDK 64-Bit Server VM (fastdebug build 23-internal-adhoc.sgehwolf.jdk-jdk, mixed mode, sharing) > > > The basic property this is being built on is the observation that the cgroup controllers typically get mounted read only into containers. Note that the current container tests assert that `OSContainer::is_containerized() == true` in various tests. Therefore, using the heuristic of "is any memory or cpu limit present" isn't sufficient. I had considered that in an earlier iteration, but many container tests failed. > > Overall, I think, with this patch we improve the current situation of claiming a containerized system being present when it's actually just a regular Linux system. > > Testing: > > - [x] GHA (risc-v failure seems infra related) > - [x] Container tests on Linux x86_64 of cgroups v1 and cgroups v2 (including gtests) > - [x] Some manual testing using cri-o > > Thoughts? Severin Gehwolf has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 17 commits: - Refactor mount info matching to helper function - Merge branch 'master' into jdk-8261242-is-containerized-fix - Remove problem listing of PlainRead which is reworked here - Merge branch 'master' into jdk-8261242-is-containerized-fix - Merge branch 'master' into jdk-8261242-is-containerized-fix - Add doc for mountinfo scanning. - Unify naming of variables - Merge branch 'master' into jdk-8261242-is-containerized-fix - Merge branch 'master' into jdk-8261242-is-containerized-fix - jcheck fixes - ... and 7 more: https://git.openjdk.org/jdk/compare/baafa662...532ea33b ------------- Changes: https://git.openjdk.org/jdk/pull/18201/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=18201&range=06 Stats: 411 lines in 20 files changed: 305 ins; 79 del; 27 mod Patch: https://git.openjdk.org/jdk/pull/18201.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18201/head:pull/18201 PR: https://git.openjdk.org/jdk/pull/18201 From sgehwolf at openjdk.org Tue Jun 25 13:54:47 2024 From: sgehwolf at openjdk.org (Severin Gehwolf) Date: Tue, 25 Jun 2024 13:54:47 GMT Subject: RFR: 8261242: [Linux] OSContainer::is_containerized() returns true when run outside a container [v6] In-Reply-To: <3garHvE8lhPovujClt422-1pcIcs7z7zpqpngEHDd6w=.8776bce8-2b79-44bd-8355-d753562a75cf@github.com> References: <3garHvE8lhPovujClt422-1pcIcs7z7zpqpngEHDd6w=.8776bce8-2b79-44bd-8355-d753562a75cf@github.com> Message-ID: On Thu, 20 Jun 2024 17:37:05 GMT, Thomas Stuefe wrote: >> Severin Gehwolf has updated the pull request incrementally with one additional commit since the last revision: >> >> Remove problem listing of PlainRead which is reworked here > > src/hotspot/os/linux/cgroupSubsystem_linux.cpp line 422: > >> 420: * (12) super options: matched with '%s' and captured in 'tmpcgroups' >> 421: */ >> 422: if (sscanf(p, "%*d %*d %*d:%*d %s %s %s%*[^-]- %s %*s %s", tmproot, tmpmount, mount_opts, tmp_fs_type, tmpcgroups) == 5) { > > The only difference to v1 is that we parse the super options (12), right? Could we factor out the parsing into a helper function? Or, alternatively, at least `#define` the scanf format somewhere up top, including the nice comment, and reuse that format string? That's correct. I've moved it into a local helper function. Thanks! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18201#discussion_r1652863523 From azafari at openjdk.org Tue Jun 25 14:06:16 2024 From: azafari at openjdk.org (Afshin Zafari) Date: Tue, 25 Jun 2024 14:06:16 GMT Subject: RFR: 8333658: NMT: Use an allocator with 4-byte pointers to save memory in NativeCallStackStorage [v34] In-Reply-To: <4uxxepR1o8-X2knnKyxZOK-lkiAZCe0tp-GR1MtkCc8=.36c8d96a-2683-40df-8ca2-f870b31b1eab@github.com> References: <4uxxepR1o8-X2knnKyxZOK-lkiAZCe0tp-GR1MtkCc8=.36c8d96a-2683-40df-8ca2-f870b31b1eab@github.com> Message-ID: On Tue, 25 Jun 2024 13:39:58 GMT, Johan Sj?len wrote: >> Hi, >> >> This PR introduces a new allocator, `IndexedFreelistAllocator`. It uses a `GrowableArray` in order to have 4-byte "pointers" to its elements and also works as a freelist as unused slots form a linked list. This allocator cannot shrink its used memory. I'm always open for better names. >> >> We then use this allocator in order to store the `NativeCallStackStorage` hash table. This saves `4 * 4099 = ~16KiB` of memory as each table element is now only 4 bytes. For each stored stack it also saves 4 bytes. The fact that the allocator cannot shrink is fine, we do not ever remove stacks from the storage. >> >> The main point of this is to introduce a pointer-generic experimential API, so I also implemented CHeap and Arena allocators. It's currently placed in NMT, but we might want to move it into utilities. It uses a bit of template-magic, but my IDE (Emacs+clangd) was actually able to figure things out when the types didn't line up correctly etc, so it's not an enemy to IDE help. >> >> It sounded expensive to have the GrowableArray continously realloc its underlying data array, so I did a basic test where we allocate 1 000 000 stacks and push them into NativeCallStackStorage backed by different allocators. This is available in the PR. >> >> The results are as follows on linux-x64-slowdebug: >> >> >> Generate stacks... Done >> Time taken with GrowableArray: 8341.240945 >> Time taken with CHeap: 12189.031318 >> Time taken with Arena: 8800.703092 >> Time taken with GrowableArray again: 8295.508829 >> >> >> And on linux-x64: >> >> >> Time taken with GrowableArray: 8378.018135 >> Time taken with CHeap: 12437.347868 >> Time taken with Arena: 8758.064717 >> Time taken with GrowableArray again: 8391.076291 >> >> >> Obviously, this is a very basic benchmark, but it seems like we're faster than CHeap and Arena for this case. > > Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: > > Do not use char array Thanks, my comments are applied. ------------- Marked as reviewed by azafari (Committer). PR Review: https://git.openjdk.org/jdk/pull/18979#pullrequestreview-2138779504 From amitkumar at openjdk.org Tue Jun 25 14:19:44 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Tue, 25 Jun 2024 14:19:44 GMT Subject: RFR: 8331126: [s390x] secondary_super_cache does not scale well [v6] In-Reply-To: References: Message-ID: > s390x Port for [JDK-8180450](https://bugs.openjdk.org/browse/JDK-8180450) > > I ran `tier1` test with `-XX:+UseSecondarySupersTable -XX:+VerifySecondarySupers -XX:+StressSecondarySupers` in fastdebug vm and didn't see any new failure appearing; Except one I have mentioned [here](https://github.com/openjdk/jdk/pull/19368#issuecomment-2154983693); But this is reproducible on every other architecture with these flags. > > > Without Patch: > > SecondarySuperCacheHits.test avgt 15 0.929 ? 0.010 ns/op > > SecondarySuperCacheInterContention.test avgt 15 1.413 ? 0.007 ns/op > SecondarySuperCacheInterContention.test:t1 avgt 15 1.415 ? 0.016 ns/op > SecondarySuperCacheInterContention.test:t2 avgt 15 1.410 ? 0.017 ns/op > > Benchmark Mode Cnt Score Error Units > SecondarySupersLookup.testNegative00 avgt 15 1.806 ? 0.325 ns/op > SecondarySupersLookup.testNegative01 avgt 15 2.364 ? 0.236 ns/op > SecondarySupersLookup.testNegative02 avgt 15 2.903 ? 0.215 ns/op > SecondarySupersLookup.testNegative03 avgt 15 3.417 ? 0.199 ns/op > SecondarySupersLookup.testNegative04 avgt 15 3.758 ? 0.102 ns/op > SecondarySupersLookup.testNegative05 avgt 15 4.352 ? 0.123 ns/op > SecondarySupersLookup.testNegative06 avgt 15 4.800 ? 0.099 ns/op > SecondarySupersLookup.testNegative07 avgt 15 5.365 ? 0.060 ns/op > SecondarySupersLookup.testNegative08 avgt 15 6.316 ? 0.092 ns/op > SecondarySupersLookup.testNegative09 avgt 15 6.669 ? 0.164 ns/op > SecondarySupersLookup.testNegative10 avgt 15 7.041 ? 0.164 ns/op > SecondarySupersLookup.testNegative16 avgt 15 9.336 ? 0.185 ns/op > SecondarySupersLookup.testNegative20 avgt 15 11.373 ? 0.029 ns/op > SecondarySupersLookup.testNegative30 avgt 15 15.236 ? 0.051 ns/op > SecondarySupersLookup.testNegative32 avgt 15 16.031 ? 0.091 ns/op > SecondarySupersLookup.testNegative40 avgt 15 19.197 ? 0.279 ns/op > SecondarySupersLookup.testNegative50 avgt 15 23.804 ? 2.387 ns/op > SecondarySupersLookup.testNegative55 avgt 15 25.610 ? 1.155 ns/op > SecondarySupersLookup.testNegative56 avgt 15 26.128 ? 2.203 ns/op > SecondarySupersLookup.testNegative57 avgt 15 26.126 ? 0.881 ns/op > SecondarySupersLookup.testNegative58 avgt 15 26.314 ? 0.521 ns/op > SecondarySupersLookup.testNegative59 avgt 15 26.750 ? 0.837 ns/op > SecondarySupersLookup.testNegative60 avgt 15 27.118 ? 0.557 ns/op > SecondarySupersLookup.testNegative61 avgt 15 27.763 ? 1.628 ns... Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: add2reg -> z_la ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19544/files - new: https://git.openjdk.org/jdk/pull/19544/files/5a30c51d..ebbca614 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19544&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19544&range=04-05 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/19544.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19544/head:pull/19544 PR: https://git.openjdk.org/jdk/pull/19544 From amitkumar at openjdk.org Tue Jun 25 14:19:44 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Tue, 25 Jun 2024 14:19:44 GMT Subject: RFR: 8331126: [s390x] secondary_super_cache does not scale well [v4] In-Reply-To: References: Message-ID: On Tue, 25 Jun 2024 13:31:50 GMT, Lutz Schmidt wrote: >> I just noticed that `PreferLAoverADD` is by default set to false. I'm looking for a reason for that, maybe we can set it to true. Otherwise I'll update it to z_la. >> >> CC: @RealLucy > > Use LA explicitly here. done; ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19544#discussion_r1652911303 From mdoerr at openjdk.org Tue Jun 25 14:22:11 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 25 Jun 2024 14:22:11 GMT Subject: RFR: 8331126: [s390x] secondary_super_cache does not scale well [v4] In-Reply-To: <-RKRQWDeR-hsOUqHtQ9Ip05iHsanDKGmN50FOM4Lj6A=.d4f4ce42-b0ad-47c1-8416-91cec99cbfc1@github.com> References: <-RKRQWDeR-hsOUqHtQ9Ip05iHsanDKGmN50FOM4Lj6A=.d4f4ce42-b0ad-47c1-8416-91cec99cbfc1@github.com> Message-ID: On Tue, 25 Jun 2024 13:30:37 GMT, Lutz Schmidt wrote: >> src/hotspot/cpu/s390/macroAssembler_s390.cpp line 3171: >> >>> 3169: >>> 3170: bind(L_loop); >>> 3171: z_cg(r_value, Address(r_addr)); >> >> Idea: I think it is possible to use z_xg to do both at once: Set to 0 iff equal and CC to equal. (Like PPC64.) > > Good idea. But will have no benefit. When using z_xg, r_value will be altered. Subsequent iterations will then compare against a modified value. To use the triadic register instruction (XGRK), we first have to load the value int a scratch register. Ok. So, maybe better keep it as it is. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19544#discussion_r1652921950 From jsjolen at openjdk.org Tue Jun 25 14:40:20 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Tue, 25 Jun 2024 14:40:20 GMT Subject: RFR: 8333658: NMT: Use an allocator with 4-byte pointers to save memory in NativeCallStackStorage [v34] In-Reply-To: <4uxxepR1o8-X2knnKyxZOK-lkiAZCe0tp-GR1MtkCc8=.36c8d96a-2683-40df-8ca2-f870b31b1eab@github.com> References: <4uxxepR1o8-X2knnKyxZOK-lkiAZCe0tp-GR1MtkCc8=.36c8d96a-2683-40df-8ca2-f870b31b1eab@github.com> Message-ID: On Tue, 25 Jun 2024 13:39:58 GMT, Johan Sj?len wrote: >> Hi, >> >> This PR introduces a new allocator, `IndexedFreelistAllocator`. It uses a `GrowableArray` in order to have 4-byte "pointers" to its elements and also works as a freelist as unused slots form a linked list. This allocator cannot shrink its used memory. I'm always open for better names. >> >> We then use this allocator in order to store the `NativeCallStackStorage` hash table. This saves `4 * 4099 = ~16KiB` of memory as each table element is now only 4 bytes. For each stored stack it also saves 4 bytes. The fact that the allocator cannot shrink is fine, we do not ever remove stacks from the storage. >> >> The main point of this is to introduce a pointer-generic experimential API, so I also implemented CHeap and Arena allocators. It's currently placed in NMT, but we might want to move it into utilities. It uses a bit of template-magic, but my IDE (Emacs+clangd) was actually able to figure things out when the types didn't line up correctly etc, so it's not an enemy to IDE help. >> >> It sounded expensive to have the GrowableArray continously realloc its underlying data array, so I did a basic test where we allocate 1 000 000 stacks and push them into NativeCallStackStorage backed by different allocators. This is available in the PR. >> >> The results are as follows on linux-x64-slowdebug: >> >> >> Generate stacks... Done >> Time taken with GrowableArray: 8341.240945 >> Time taken with CHeap: 12189.031318 >> Time taken with Arena: 8800.703092 >> Time taken with GrowableArray again: 8295.508829 >> >> >> And on linux-x64: >> >> >> Time taken with GrowableArray: 8378.018135 >> Time taken with CHeap: 12437.347868 >> Time taken with Arena: 8758.064717 >> Time taken with GrowableArray again: 8391.076291 >> >> >> Obviously, this is a very basic benchmark, but it seems like we're faster than CHeap and Arena for this case. > > Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: > > Do not use char array Thank you for the in-depth reviewing of this change. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18979#issuecomment-2189139743 From jsjolen at openjdk.org Tue Jun 25 14:40:22 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Tue, 25 Jun 2024 14:40:22 GMT Subject: Integrated: 8333658: NMT: Use an allocator with 4-byte pointers to save memory in NativeCallStackStorage In-Reply-To: References: Message-ID: On Fri, 26 Apr 2024 14:29:53 GMT, Johan Sj?len wrote: > Hi, > > This PR introduces a new allocator, `IndexedFreelistAllocator`. It uses a `GrowableArray` in order to have 4-byte "pointers" to its elements and also works as a freelist as unused slots form a linked list. This allocator cannot shrink its used memory. I'm always open for better names. > > We then use this allocator in order to store the `NativeCallStackStorage` hash table. This saves `4 * 4099 = ~16KiB` of memory as each table element is now only 4 bytes. For each stored stack it also saves 4 bytes. The fact that the allocator cannot shrink is fine, we do not ever remove stacks from the storage. > > The main point of this is to introduce a pointer-generic experimential API, so I also implemented CHeap and Arena allocators. It's currently placed in NMT, but we might want to move it into utilities. It uses a bit of template-magic, but my IDE (Emacs+clangd) was actually able to figure things out when the types didn't line up correctly etc, so it's not an enemy to IDE help. > > It sounded expensive to have the GrowableArray continously realloc its underlying data array, so I did a basic test where we allocate 1 000 000 stacks and push them into NativeCallStackStorage backed by different allocators. This is available in the PR. > > The results are as follows on linux-x64-slowdebug: > > > Generate stacks... Done > Time taken with GrowableArray: 8341.240945 > Time taken with CHeap: 12189.031318 > Time taken with Arena: 8800.703092 > Time taken with GrowableArray again: 8295.508829 > > > And on linux-x64: > > > Time taken with GrowableArray: 8378.018135 > Time taken with CHeap: 12437.347868 > Time taken with Arena: 8758.064717 > Time taken with GrowableArray again: 8391.076291 > > > Obviously, this is a very basic benchmark, but it seems like we're faster than CHeap and Arena for this case. This pull request has now been integrated. Changeset: 57f8b91e Author: Johan Sj?len URL: https://git.openjdk.org/jdk/commit/57f8b91e558e5b9ff9c2000b8f74e3a1988ead2b Stats: 377 lines in 4 files changed: 326 ins; 38 del; 13 mod 8333658: NMT: Use an allocator with 4-byte pointers to save memory in NativeCallStackStorage Reviewed-by: stuefe, azafari ------------- PR: https://git.openjdk.org/jdk/pull/18979 From kvn at openjdk.org Tue Jun 25 14:59:16 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 25 Jun 2024 14:59:16 GMT Subject: RFR: 8334779: Test compiler/c1/CanonicalizeArrayLength.java is timing out In-Reply-To: References: Message-ID: <3e26mxRAFeSkB1xNMeqjfF4RirxvQyk4hulhciPPpkI=.fe46ac1d-3fa9-4ecd-9809-c572c320ca4f@github.com> On Tue, 25 Jun 2024 03:20:42 GMT, Dean Long wrote: > Where are these strings actually located? Do they need to be relocated if the CodeBuffer expands? Thank you, @dean-long, for looking. They are not located in CodeCache space, they are external. It is long path from `code_string(ss.as_string())` call but at the end a string is duplicated in C heap: [codeBuffer.cpp#L1088](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/asm/codeBuffer.cpp#L1088) But structure which keep track of these strings is part of CodeBlob `CodeBlob:: _dbg_strings` which is not copied by `CodeBuffer::expand()`. `expand()` has similar code to `CodeBuffer::copy_code_to()` except coping `_dbg_strings` and `_asm_remarks`. So we do have the issue with expand - we may not free space allocated by strings and comments when expanded nmethod is deoptimized. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19871#issuecomment-2189187710 From mbaesken at openjdk.org Tue Jun 25 15:04:33 2024 From: mbaesken at openjdk.org (Matthias Baesken) Date: Tue, 25 Jun 2024 15:04:33 GMT Subject: RFR: 8333363: ubsan: instanceKlass.cpp: runtime error: member call on null pointer of type 'struct AnnotationArray' Message-ID: With ubsan enabled binaries we run on Linux aarch64 and Linux x86_64 into this issue : runtime/CommandLine/PrintClasses_id0.jtr src/hotspot/share/oops/instanceKlass.cpp:3603:84: runtime error: member call on null pointer of type 'struct AnnotationArray' #0 0xfffface09b40 in InstanceKlass::print_on(outputStream*) const src/hotspot/share/oops/instanceKlass.cpp:3603 #1 0xffffacdcd088 in PrintClassClosure::do_klass(Klass*) src/hotspot/share/oops/instanceKlass.cpp:2228 #2 0xffffac464200 in ClassLoaderData::classes_do(KlassClosure*) src/hotspot/share/classfile/classLoaderData.cpp:387 #3 0xffffac475c4c in ClassLoaderDataGraph::classes_do(KlassClosure*) src/hotspot/share/classfile/classLoaderDataGraph.cpp:303 #4 0xffffac7bc4f4 in VM_PrintClasses::doit() src/hotspot/share/services/diagnosticCommand.cpp:989 #5 0xffffae599c88 in VM_Operation::evaluate() src/hotspot/share/runtime/vmOperations.cpp:75 #6 0xffffae5a5a14 in VMThread::evaluate_operation(VM_Operation*) src/hotspot/share/runtime/vmThread.cpp:283 #7 0xffffae5a779c in VMThread::inner_execute(VM_Operation*) src/hotspot/share/runtime/vmThread.cpp:427 #8 0xffffae5a7fd8 in VMThread::loop() src/hotspot/share/runtime/vmThread.cpp:493 #9 0xffffae5a80bc in VMThread::run() src/hotspot/share/runtime/vmThread.cpp:177 #10 0xffffae396958 in Thread::call_run() src/hotspot/share/runtime/thread.cpp:225 #11 0xffffadba1b0c in thread_native_entry src/hotspot/os/linux/os_linux.cpp:846 #12 0xffffb1a9d5c4 (/lib/aarch64-linux-gnu/libc.so.6+0x7d5c4) #13 0xffffb1b05ed8 (/lib/aarch64-linux-gnu/libc.so.6+0xe5ed8) ------------- Commit messages: - JDK-8333363 Changes: https://git.openjdk.org/jdk/pull/19885/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19885&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8333363 Stats: 20 lines in 1 file changed: 16 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/19885.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19885/head:pull/19885 PR: https://git.openjdk.org/jdk/pull/19885 From mbaesken at openjdk.org Tue Jun 25 15:07:10 2024 From: mbaesken at openjdk.org (Matthias Baesken) Date: Tue, 25 Jun 2024 15:07:10 GMT Subject: RFR: 8333363: ubsan: instanceKlass.cpp: runtime error: member call on null pointer of type 'struct AnnotationArray' In-Reply-To: References: Message-ID: <_niNgXywZs0fxfBg3tgjHL5QAlW-s5Vf3wVKhFw8Ljg=.d42d21ff-2afe-4099-b580-b6b9d0ef162c@github.com> On Tue, 25 Jun 2024 15:00:03 GMT, Matthias Baesken wrote: > With ubsan enabled binaries we run on Linux aarch64 and Linux x86_64 into this issue : > > runtime/CommandLine/PrintClasses_id0.jtr > > src/hotspot/share/oops/instanceKlass.cpp:3603:84: runtime error: member call on null pointer of type 'struct AnnotationArray' > #0 0xfffface09b40 in InstanceKlass::print_on(outputStream*) const src/hotspot/share/oops/instanceKlass.cpp:3603 > #1 0xffffacdcd088 in PrintClassClosure::do_klass(Klass*) src/hotspot/share/oops/instanceKlass.cpp:2228 > #2 0xffffac464200 in ClassLoaderData::classes_do(KlassClosure*) src/hotspot/share/classfile/classLoaderData.cpp:387 > #3 0xffffac475c4c in ClassLoaderDataGraph::classes_do(KlassClosure*) src/hotspot/share/classfile/classLoaderDataGraph.cpp:303 > #4 0xffffac7bc4f4 in VM_PrintClasses::doit() src/hotspot/share/services/diagnosticCommand.cpp:989 > #5 0xffffae599c88 in VM_Operation::evaluate() src/hotspot/share/runtime/vmOperations.cpp:75 > #6 0xffffae5a5a14 in VMThread::evaluate_operation(VM_Operation*) src/hotspot/share/runtime/vmThread.cpp:283 > #7 0xffffae5a779c in VMThread::inner_execute(VM_Operation*) src/hotspot/share/runtime/vmThread.cpp:427 > #8 0xffffae5a7fd8 in VMThread::loop() src/hotspot/share/runtime/vmThread.cpp:493 > #9 0xffffae5a80bc in VMThread::run() src/hotspot/share/runtime/vmThread.cpp:177 > #10 0xffffae396958 in Thread::call_run() src/hotspot/share/runtime/thread.cpp:225 > #11 0xffffadba1b0c in thread_native_entry src/hotspot/os/linux/os_linux.cpp:846 > #12 0xffffb1a9d5c4 (/lib/aarch64-linux-gnu/libc.so.6+0x7d5c4) > #13 0xffffb1b05ed8 (/lib/aarch64-linux-gnu/libc.so.6+0xe5ed8) The other returns pointers (e.g. class_type_annotations()) can be nullptr too, so we need the same checking there as well. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19885#issuecomment-2189206582 From coleenp at openjdk.org Tue Jun 25 15:33:11 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 25 Jun 2024 15:33:11 GMT Subject: RFR: 8333363: ubsan: instanceKlass.cpp: runtime error: member call on null pointer of type 'struct AnnotationArray' In-Reply-To: References: Message-ID: <4tL8Iv8Cd3FVDe-XExXHkvFO24rgF6IFCcNuyql74jk=.abe8466b-055b-4867-a045-90aae26292c4@github.com> On Tue, 25 Jun 2024 15:00:03 GMT, Matthias Baesken wrote: > With ubsan enabled binaries we run on Linux aarch64 and Linux x86_64 into this issue : > > runtime/CommandLine/PrintClasses_id0.jtr > > src/hotspot/share/oops/instanceKlass.cpp:3603:84: runtime error: member call on null pointer of type 'struct AnnotationArray' > #0 0xfffface09b40 in InstanceKlass::print_on(outputStream*) const src/hotspot/share/oops/instanceKlass.cpp:3603 > #1 0xffffacdcd088 in PrintClassClosure::do_klass(Klass*) src/hotspot/share/oops/instanceKlass.cpp:2228 > #2 0xffffac464200 in ClassLoaderData::classes_do(KlassClosure*) src/hotspot/share/classfile/classLoaderData.cpp:387 > #3 0xffffac475c4c in ClassLoaderDataGraph::classes_do(KlassClosure*) src/hotspot/share/classfile/classLoaderDataGraph.cpp:303 > #4 0xffffac7bc4f4 in VM_PrintClasses::doit() src/hotspot/share/services/diagnosticCommand.cpp:989 > #5 0xffffae599c88 in VM_Operation::evaluate() src/hotspot/share/runtime/vmOperations.cpp:75 > #6 0xffffae5a5a14 in VMThread::evaluate_operation(VM_Operation*) src/hotspot/share/runtime/vmThread.cpp:283 > #7 0xffffae5a779c in VMThread::inner_execute(VM_Operation*) src/hotspot/share/runtime/vmThread.cpp:427 > #8 0xffffae5a7fd8 in VMThread::loop() src/hotspot/share/runtime/vmThread.cpp:493 > #9 0xffffae5a80bc in VMThread::run() src/hotspot/share/runtime/vmThread.cpp:177 > #10 0xffffae396958 in Thread::call_run() src/hotspot/share/runtime/thread.cpp:225 > #11 0xffffadba1b0c in thread_native_entry src/hotspot/os/linux/os_linux.cpp:846 > #12 0xffffb1a9d5c4 (/lib/aarch64-linux-gnu/libc.so.6+0x7d5c4) > #13 0xffffb1b05ed8 (/lib/aarch64-linux-gnu/libc.so.6+0xe5ed8) Can you make this a bigger change to handle all the potentially null pointers the same way? src/hotspot/share/oops/instanceKlass.cpp line 3607: > 3605: st->cr(); > 3606: } > 3607: if (class_annotations() != nullptr) { I hate to say it but this whole function looks like it should be rewritten. There are other places that could be null, like local_interfaces, and transitive_interfaces. I wonder if you should have a macro above with a string BULLET string, and do them all like this. ------------- Changes requested by coleenp (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19885#pullrequestreview-2139041977 PR Review Comment: https://git.openjdk.org/jdk/pull/19885#discussion_r1653047793 From coleenp at openjdk.org Tue Jun 25 15:38:10 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 25 Jun 2024 15:38:10 GMT Subject: RFR: 8333363: ubsan: instanceKlass.cpp: runtime error: member call on null pointer of type 'struct AnnotationArray' In-Reply-To: <4tL8Iv8Cd3FVDe-XExXHkvFO24rgF6IFCcNuyql74jk=.abe8466b-055b-4867-a045-90aae26292c4@github.com> References: <4tL8Iv8Cd3FVDe-XExXHkvFO24rgF6IFCcNuyql74jk=.abe8466b-055b-4867-a045-90aae26292c4@github.com> Message-ID: On Tue, 25 Jun 2024 15:30:02 GMT, Coleen Phillimore wrote: >> With ubsan enabled binaries we run on Linux aarch64 and Linux x86_64 into this issue : >> >> runtime/CommandLine/PrintClasses_id0.jtr >> >> src/hotspot/share/oops/instanceKlass.cpp:3603:84: runtime error: member call on null pointer of type 'struct AnnotationArray' >> #0 0xfffface09b40 in InstanceKlass::print_on(outputStream*) const src/hotspot/share/oops/instanceKlass.cpp:3603 >> #1 0xffffacdcd088 in PrintClassClosure::do_klass(Klass*) src/hotspot/share/oops/instanceKlass.cpp:2228 >> #2 0xffffac464200 in ClassLoaderData::classes_do(KlassClosure*) src/hotspot/share/classfile/classLoaderData.cpp:387 >> #3 0xffffac475c4c in ClassLoaderDataGraph::classes_do(KlassClosure*) src/hotspot/share/classfile/classLoaderDataGraph.cpp:303 >> #4 0xffffac7bc4f4 in VM_PrintClasses::doit() src/hotspot/share/services/diagnosticCommand.cpp:989 >> #5 0xffffae599c88 in VM_Operation::evaluate() src/hotspot/share/runtime/vmOperations.cpp:75 >> #6 0xffffae5a5a14 in VMThread::evaluate_operation(VM_Operation*) src/hotspot/share/runtime/vmThread.cpp:283 >> #7 0xffffae5a779c in VMThread::inner_execute(VM_Operation*) src/hotspot/share/runtime/vmThread.cpp:427 >> #8 0xffffae5a7fd8 in VMThread::loop() src/hotspot/share/runtime/vmThread.cpp:493 >> #9 0xffffae5a80bc in VMThread::run() src/hotspot/share/runtime/vmThread.cpp:177 >> #10 0xffffae396958 in Thread::call_run() src/hotspot/share/runtime/thread.cpp:225 >> #11 0xffffadba1b0c in thread_native_entry src/hotspot/os/linux/os_linux.cpp:846 >> #12 0xffffb1a9d5c4 (/lib/aarch64-linux-gnu/libc.so.6+0x7d5c4) >> #13 0xffffb1b05ed8 (/lib/aarch64-linux-gnu/libc.so.6+0xe5ed8) > > src/hotspot/share/oops/instanceKlass.cpp line 3607: > >> 3605: st->cr(); >> 3606: } >> 3607: if (class_annotations() != nullptr) { > > I hate to say it but this whole function looks like it should be rewritten. There are other places that could be null, like local_interfaces, and transitive_interfaces. I wonder if you should have a macro above with a string BULLET string, and do them all like this. it might also be that all the metadata print_on functions should use the same thing. There's static void print_value_on_maybe_null(outputStream* st, const Metadata* m) { if (nullptr == m) st->print("null"); else m->print_value_on(st); } maybe that should take the string with BULLET and print in the whole thing in the else statement. Or add a similar one that prints the address. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19885#discussion_r1653054693 From mbaesken at openjdk.org Tue Jun 25 16:02:16 2024 From: mbaesken at openjdk.org (Matthias Baesken) Date: Tue, 25 Jun 2024 16:02:16 GMT Subject: RFR: 8333363: ubsan: instanceKlass.cpp: runtime error: member call on null pointer of type 'struct AnnotationArray' In-Reply-To: References: Message-ID: On Tue, 25 Jun 2024 15:00:03 GMT, Matthias Baesken wrote: > With ubsan enabled binaries we run on Linux aarch64 and Linux x86_64 into this issue : > > runtime/CommandLine/PrintClasses_id0.jtr > > src/hotspot/share/oops/instanceKlass.cpp:3603:84: runtime error: member call on null pointer of type 'struct AnnotationArray' > #0 0xfffface09b40 in InstanceKlass::print_on(outputStream*) const src/hotspot/share/oops/instanceKlass.cpp:3603 > #1 0xffffacdcd088 in PrintClassClosure::do_klass(Klass*) src/hotspot/share/oops/instanceKlass.cpp:2228 > #2 0xffffac464200 in ClassLoaderData::classes_do(KlassClosure*) src/hotspot/share/classfile/classLoaderData.cpp:387 > #3 0xffffac475c4c in ClassLoaderDataGraph::classes_do(KlassClosure*) src/hotspot/share/classfile/classLoaderDataGraph.cpp:303 > #4 0xffffac7bc4f4 in VM_PrintClasses::doit() src/hotspot/share/services/diagnosticCommand.cpp:989 > #5 0xffffae599c88 in VM_Operation::evaluate() src/hotspot/share/runtime/vmOperations.cpp:75 > #6 0xffffae5a5a14 in VMThread::evaluate_operation(VM_Operation*) src/hotspot/share/runtime/vmThread.cpp:283 > #7 0xffffae5a779c in VMThread::inner_execute(VM_Operation*) src/hotspot/share/runtime/vmThread.cpp:427 > #8 0xffffae5a7fd8 in VMThread::loop() src/hotspot/share/runtime/vmThread.cpp:493 > #9 0xffffae5a80bc in VMThread::run() src/hotspot/share/runtime/vmThread.cpp:177 > #10 0xffffae396958 in Thread::call_run() src/hotspot/share/runtime/thread.cpp:225 > #11 0xffffadba1b0c in thread_native_entry src/hotspot/os/linux/os_linux.cpp:846 > #12 0xffffb1a9d5c4 (/lib/aarch64-linux-gnu/libc.so.6+0x7d5c4) > #13 0xffffb1b05ed8 (/lib/aarch64-linux-gnu/libc.so.6+0xe5ed8) > There are other places that could be null, like local_interfaces, and transitive_interfaces. Hi Coleen, in this change I only adjusted the ones that were really reported when running HS :tier1 with ubsan enabled binaries. So yes, maybe some others could be null too, but for those I really saw it then running the tests. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19885#issuecomment-2189344064 From szaldana at openjdk.org Tue Jun 25 16:43:19 2024 From: szaldana at openjdk.org (Sonia Zaldana Calles) Date: Tue, 25 Jun 2024 16:43:19 GMT Subject: RFR: 8333658: NMT: Use an allocator with 4-byte pointers to save memory in NativeCallStackStorage [v34] In-Reply-To: References: <4uxxepR1o8-X2knnKyxZOK-lkiAZCe0tp-GR1MtkCc8=.36c8d96a-2683-40df-8ca2-f870b31b1eab@github.com> Message-ID: On Tue, 25 Jun 2024 14:36:28 GMT, Johan Sj?len wrote: >> Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: >> >> Do not use char array > > Thank you for the in-depth reviewing of this change. Hi @jdksjolen, I'm seeing some build failures after this PR got integrated. I filed [JDK-8335108](https://bugs.openjdk.org/browse/JDK-8335108) for reference. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18979#issuecomment-2189434222 From duke at openjdk.org Tue Jun 25 17:33:23 2024 From: duke at openjdk.org (Ferenc Rakoczi) Date: Tue, 25 Jun 2024 17:33:23 GMT Subject: RFR: 8333583: Crypto-XDH.generateSecret regression after JDK-8329538 [v3] In-Reply-To: References: Message-ID: On Mon, 17 Jun 2024 19:21:37 GMT, Anthony Scarpino wrote: >>> What causes regression in P256 "(~-8-14%)"? From what I see, you re-arranged code to not execute some code ("reducePositive()") when it is not needed. How this affects P256? >> >> Actually, the other way around; reducePositive is now an unconditionally executed for both pure java and the intrinsic paths. Perhaps that's what is misleading, it was only the mult() intrinsic that was taking advantage of this 'skip reduction' before. (pure java did not benefit from removing reduction, so I kept it. Now 'keeping it' for both paths) > > Hi @vpaprotsk, > @ferakocz is going to take a look at the change. When he says it's ok, I'll approve the PR. @ascarpino please approve this change. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19728#issuecomment-2189546944 From duke at openjdk.org Tue Jun 25 17:33:22 2024 From: duke at openjdk.org (Ferenc Rakoczi) Date: Tue, 25 Jun 2024 17:33:22 GMT Subject: RFR: 8333583: Crypto-XDH.generateSecret regression after JDK-8329538 [v3] In-Reply-To: References: Message-ID: On Mon, 17 Jun 2024 16:38:55 GMT, Volodymyr Paprotski wrote: >> This fix recovers XDH performance but removes some of the P256 gains (~-8-14%). Still faster, but not as much. >> >> The fix is to undo 'int' return type on mult()/square(), which allowed to return partially reduced result (e.g. this avoids extra reductions when mult() result is fed into addition). This is the behaviour before the Montgomery ECC PR. >> >> --- >> XDH.generateSecret performance >> before Montgomery PR: >> >> Benchmark (algorithm) (keyLength) (kpgAlgorithm) (provider) Mode Cnt Score Error Units >> KeyAgreementBench.XDH.generateSecret XDH 255 XDH thrpt 3 8435.277 ? 27.230 ops/s >> >> after Montgomery PR: >> >> Benchmark (algorithm) (keyLength) (kpgAlgorithm) (provider) Mode Cnt Score Error Units >> KeyAgreementBench.XDH.generateSecret XDH 255 XDH thrpt 3 8309.028 ? 22.071 ops/s >> >> with this PR: >> >> Benchmark (algorithm) (keyLength) (kpgAlgorithm) (provider) Mode Cnt Score Error Units >> KeyAgreementBench.XDH.generateSecret XDH 255 XDH thrpt 3 8491.268 ? 32.858 ops/s >> >> --- >> >> P256 performance with/without mult intrinsic: >> >> Performance before Montgomery PR: >> >> Benchmark (algorithm) (dataSize) (keyLength) (provider) Mode Cnt Score Error Units >> SignatureBench.ECDSA.sign SHA256withECDSA 1024 256 thrpt 3 6398.727 ? 7.400 ops/s >> SignatureBench.ECDSA.sign SHA256withECDSA 16384 256 thrpt 3 6129.739 ? 5.995 ops/s >> SignatureBench.ECDSA.verify SHA256withECDSA 1024 256 thrpt 3 1889.928 ? 54.660 ops/s >> SignatureBench.ECDSA.verify SHA256withECDSA 16384 256 thrpt 3 1866.339 ? 42.438 ops/s >> Benchmark (algorithm) (keyLength) (kpgAlgorithm) (provider) Mode Cnt Score Error Units >> o.o.b.j.c.full.KeyAgreementBench.EC.generateSecret ECDH 256 EC thrpt 3 1350.745 ? 28.514 ops/s >> o.o.b.j.c.small.KeyAgreementBench.EC.generateSecret ECDH 256 EC thrpt 3 1349.393 ? 32.050 ops/s >> >> Performance in master without mult() intrinsic >> >> Benchmark ... > > Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision: > > comment from Sandhya Looks good to me. It would be good, though, to figure out what else could be done to regain the P256 performance with keeping the speed of this code path. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19728#issuecomment-2189545307 From coleenp at openjdk.org Tue Jun 25 17:56:09 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 25 Jun 2024 17:56:09 GMT Subject: RFR: 8333363: ubsan: instanceKlass.cpp: runtime error: member call on null pointer of type 'struct AnnotationArray' In-Reply-To: References: Message-ID: On Tue, 25 Jun 2024 15:00:03 GMT, Matthias Baesken wrote: > With ubsan enabled binaries we run on Linux aarch64 and Linux x86_64 into this issue : > > runtime/CommandLine/PrintClasses_id0.jtr > > src/hotspot/share/oops/instanceKlass.cpp:3603:84: runtime error: member call on null pointer of type 'struct AnnotationArray' > #0 0xfffface09b40 in InstanceKlass::print_on(outputStream*) const src/hotspot/share/oops/instanceKlass.cpp:3603 > #1 0xffffacdcd088 in PrintClassClosure::do_klass(Klass*) src/hotspot/share/oops/instanceKlass.cpp:2228 > #2 0xffffac464200 in ClassLoaderData::classes_do(KlassClosure*) src/hotspot/share/classfile/classLoaderData.cpp:387 > #3 0xffffac475c4c in ClassLoaderDataGraph::classes_do(KlassClosure*) src/hotspot/share/classfile/classLoaderDataGraph.cpp:303 > #4 0xffffac7bc4f4 in VM_PrintClasses::doit() src/hotspot/share/services/diagnosticCommand.cpp:989 > #5 0xffffae599c88 in VM_Operation::evaluate() src/hotspot/share/runtime/vmOperations.cpp:75 > #6 0xffffae5a5a14 in VMThread::evaluate_operation(VM_Operation*) src/hotspot/share/runtime/vmThread.cpp:283 > #7 0xffffae5a779c in VMThread::inner_execute(VM_Operation*) src/hotspot/share/runtime/vmThread.cpp:427 > #8 0xffffae5a7fd8 in VMThread::loop() src/hotspot/share/runtime/vmThread.cpp:493 > #9 0xffffae5a80bc in VMThread::run() src/hotspot/share/runtime/vmThread.cpp:177 > #10 0xffffae396958 in Thread::call_run() src/hotspot/share/runtime/thread.cpp:225 > #11 0xffffadba1b0c in thread_native_entry src/hotspot/os/linux/os_linux.cpp:846 > #12 0xffffb1a9d5c4 (/lib/aarch64-linux-gnu/libc.so.6+0x7d5c4) > #13 0xffffb1b05ed8 (/lib/aarch64-linux-gnu/libc.so.6+0xe5ed8) Why not add this to metadata.hpp: + template + static void print_on_maybe_null(outputStream* st, const char* str, const M* m) { + if (nullptr != m) { + st->print_raw(str); + m->print_value_on(st); + st->cr(); + } + } and use it for the things that ubsan complains about now. Then you could use it for the next set of ubsan complaints. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19885#issuecomment-2189618656 From jsjolen at openjdk.org Tue Jun 25 18:00:24 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Tue, 25 Jun 2024 18:00:24 GMT Subject: RFR: 8333658: NMT: Use an allocator with 4-byte pointers to save memory in NativeCallStackStorage [v34] In-Reply-To: References: <4uxxepR1o8-X2knnKyxZOK-lkiAZCe0tp-GR1MtkCc8=.36c8d96a-2683-40df-8ca2-f870b31b1eab@github.com> Message-ID: <7Qgqdta_MVhWQTjP4nGU3eDIsaLsyr-YWrkGPnAyk3s=.7d80bde7-a1d0-495d-b7c9-0ab90e24d440@github.com> On Tue, 25 Jun 2024 14:36:28 GMT, Johan Sj?len wrote: >> Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: >> >> Do not use char array > > Thank you for the in-depth reviewing of this change. > Hi @jdksjolen, I'm seeing some build failures after this PR got integrated. I filed [JDK-8335108](https://bugs.openjdk.org/browse/JDK-8335108) for reference. Hi Sonia, thanks for contacting me. It seems like this is a C++20 issue, but Hotspot only supports up to C++14. Here's a Godbolt for reproducing the issue, change the `-std=c++14` to `-std=c++20` to see the issue: https://godbolt.org/z/eK5GaTceY That ought to mean that I don't have to back out this PR and re-introduce it with the bug fixed, but I still think that we should fix the bug because we do want to get to C++20 some day :-). There's something weird with the fact that you get builds with C++20 support enabled, not sure what your configure options are. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18979#issuecomment-2189625153 From matsaave at openjdk.org Tue Jun 25 18:02:18 2024 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Tue, 25 Jun 2024 18:02:18 GMT Subject: RFR: 8309634: Resolve CONSTANT_MethodRef at CDS dump time In-Reply-To: References: Message-ID: <4t-Lh3G9nOxVUDt8cWFfwDigEU_fAsXvEEdD2uqGfqg=.da1a5f6c-c4e5-401c-9962-26b248264144@github.com> On Mon, 24 Jun 2024 17:21:18 GMT, Ioi Lam wrote: > Resolve `CONSTANT_MethodRef` entries during CDS dump time to improve start-up performance. > > - This PR uses the same framework introduced in #19355 and just added handling for methods. > - Support for getstatic/putstatic/invokestatic will be done separately in [JDK-8334898](https://bugs.openjdk.org/browse/JDK-8334898) Looks good! I have one consideration but otherwise I approve. src/hotspot/share/interpreter/interpreterRuntime.cpp line 930: > 928: CallInfo call_info; > 929: switch (bytecode) { > 930: case Bytecodes::_invokevirtual: LinkResolver::cds_resolve_virtual_call (call_info, link_info, CHECK); break; I think the the `cds_resolve_xyz_call()` methods might be unnecessary. You can just call the existing methods from LinkResolver besides `resolve_virtual_call` src/hotspot/share/oops/cpCache.cpp line 454: > 452: > 453: // Just for safety -- this should not happen, but do not archive if we ever see this. > 454: resolved &= !(rme->is_resolved(Bytecodes::_invokehandle) || Don't forget to fix the whitespace problem here ------------- Marked as reviewed by matsaave (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19866#pullrequestreview-2136680123 PR Review Comment: https://git.openjdk.org/jdk/pull/19866#discussion_r1653181970 PR Review Comment: https://git.openjdk.org/jdk/pull/19866#discussion_r1651570587 From vlivanov at openjdk.org Tue Jun 25 18:11:16 2024 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Tue, 25 Jun 2024 18:11:16 GMT Subject: RFR: 8333542: Breakpoint in parallel code does not work In-Reply-To: References: Message-ID: On Mon, 17 Jun 2024 17:58:14 GMT, Coleen Phillimore wrote: > Revert the change for JDK-8288064 "Class initialization locking". JVMTI class prepare event relies on a lock being held through setting the state of the class to 'linked' and the JVMTI event posting. The only usable lock is the Java object init_lock, which was removed. This change restores the lock and fixes all the conflicts in code that's changed since. > > Tested with tier1-7. Looks good. ------------- Marked as reviewed by vlivanov (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19755#pullrequestreview-2139442483 From kcr at openjdk.org Tue Jun 25 18:27:13 2024 From: kcr at openjdk.org (Kevin Rushforth) Date: Tue, 25 Jun 2024 18:27:13 GMT Subject: [jdk23] RFR: 8334594: Generational ZGC: Deadlock after OopMap rewrites in 8331572 In-Reply-To: References: Message-ID: <3yzDRbN8qzDoS-J0YiDADvLLh_XCQIpHRPZCkrDVlfI=.7e3577e9-555e-458a-868b-fa06014c698b@github.com> On Mon, 24 Jun 2024 09:02:30 GMT, Aleksey Shipilev wrote: > Clean backport to fix a deadlock. @shipilev Is the priority (P4) of this bug correct? If so, then it doesn't seem to meet the criteria for JDK 23 during RDP1. If the priority is wrong, please update it in JBS. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19851#issuecomment-2189682009 From szaldana at openjdk.org Tue Jun 25 19:00:37 2024 From: szaldana at openjdk.org (Sonia Zaldana Calles) Date: Tue, 25 Jun 2024 19:00:37 GMT Subject: RFR: 8333658: NMT: Use an allocator with 4-byte pointers to save memory in NativeCallStackStorage [v34] In-Reply-To: <7Qgqdta_MVhWQTjP4nGU3eDIsaLsyr-YWrkGPnAyk3s=.7d80bde7-a1d0-495d-b7c9-0ab90e24d440@github.com> References: <4uxxepR1o8-X2knnKyxZOK-lkiAZCe0tp-GR1MtkCc8=.36c8d96a-2683-40df-8ca2-f870b31b1eab@github.com> <7Qgqdta_MVhWQTjP4nGU3eDIsaLsyr-YWrkGPnAyk3s=.7d80bde7-a1d0-495d-b7c9-0ab90e24d440@github.com> Message-ID: On Tue, 25 Jun 2024 17:57:36 GMT, Johan Sj?len wrote: >> Thank you for the in-depth reviewing of this change. > >> Hi @jdksjolen, I'm seeing some build failures after this PR got integrated. I filed [JDK-8335108](https://bugs.openjdk.org/browse/JDK-8335108) for reference. > > Hi Sonia, thanks for contacting me. It seems like this is a C++20 issue, but Hotspot only supports up to C++14. Here's a Godbolt for reproducing the issue, change the `-std=c++14` to `-std=c++20` to see the issue: https://godbolt.org/z/eK5GaTceY > > That ought to mean that I don't have to back out this PR and re-introduce it with the bug fixed, but I still think that we should fix the bug because we do want to get to C++20 some day :-). > > There's something weird with the fact that you get builds with C++20 support enabled, not sure what your configure options are. Hi @jdksjolen, thanks for looking into it! > Hi Sonia, thanks for contacting me. It seems like this is a C++20 issue, but Hotspot only supports up to C++14. Here's a Godbolt for reproducing the issue, change the `-std=c++14` to `-std=c++20` to see the issue: https://godbolt.org/z/eK5GaTceY > > That ought to mean that I don't have to back out this PR and re-introduce it with the bug fixed, but I still think that we should fix the bug because we do want to get to C++20 some day :-). Makes sense. I'll submit a follow-up PR to fix this. > There's something weird with the fact that you get builds with C++20 support enabled, not sure what your configure options are. I'm not sure. I don't have any special configure options. I came across this error after a plain ```bash configure``` but I guess my installed g++ version is 14.1.1. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18979#issuecomment-2189747196 From kvn at openjdk.org Tue Jun 25 19:53:10 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 25 Jun 2024 19:53:10 GMT Subject: RFR: 8334779: Test compiler/c1/CanonicalizeArrayLength.java is timing out In-Reply-To: References: Message-ID: On Tue, 25 Jun 2024 03:20:42 GMT, Dean Long wrote: > Can we assert that uses of ExternalAddress do not point inside the CodeBuffer/CodeBlob? You might catch additional cases. To clarify, these strings are externals and using ExternalAddress for them is correct. But we don't need to use relocation for them (no need to patch) because they don't move and `pushptr()` in `_verify_oops` loads whole 64-bit address into register - no relative addressing. That is why I use `relocInfo::none` for them. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19871#issuecomment-2189850937 From coleenp at openjdk.org Tue Jun 25 19:54:15 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 25 Jun 2024 19:54:15 GMT Subject: RFR: 8333542: Breakpoint in parallel code does not work In-Reply-To: References: Message-ID: On Mon, 17 Jun 2024 17:58:14 GMT, Coleen Phillimore wrote: > Revert the change for JDK-8288064 "Class initialization locking". JVMTI class prepare event relies on a lock being held through setting the state of the class to 'linked' and the JVMTI event posting. The only usable lock is the Java object init_lock, which was removed. This change restores the lock and fixes all the conflicts in code that's changed since. > > Tested with tier1-7. Thanks David and Vladimir for the reviews, and Chris for the report, test and testing. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19755#issuecomment-2189849784 From coleenp at openjdk.org Tue Jun 25 19:54:16 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 25 Jun 2024 19:54:16 GMT Subject: Integrated: 8333542: Breakpoint in parallel code does not work In-Reply-To: References: Message-ID: On Mon, 17 Jun 2024 17:58:14 GMT, Coleen Phillimore wrote: > Revert the change for JDK-8288064 "Class initialization locking". JVMTI class prepare event relies on a lock being held through setting the state of the class to 'linked' and the JVMTI event posting. The only usable lock is the Java object init_lock, which was removed. This change restores the lock and fixes all the conflicts in code that's changed since. > > Tested with tier1-7. This pull request has now been integrated. Changeset: b3bf31a0 Author: Coleen Phillimore URL: https://git.openjdk.org/jdk/commit/b3bf31a0a08da679ec2fd21613243fb17b1135a9 Stats: 516 lines in 16 files changed: 339 ins; 129 del; 48 mod 8333542: Breakpoint in parallel code does not work Co-authored-by: Chris Plummer Reviewed-by: dholmes, vlivanov ------------- PR: https://git.openjdk.org/jdk/pull/19755 From kvn at openjdk.org Tue Jun 25 19:59:11 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 25 Jun 2024 19:59:11 GMT Subject: RFR: 8334779: Test compiler/c1/CanonicalizeArrayLength.java is timing out In-Reply-To: References: Message-ID: On Mon, 24 Jun 2024 23:40:22 GMT, Vladimir Kozlov wrote: > The timeout is cause by running the test with `-Xcomp -XX:+VerifyOops -XX:+PatchALot`. > The test continuously deoptimize and recompile `java.lang.Throwable::` method. > `-XX:+VerifyOops ` adds a lot of external addresses because it use ExternallAddress for error messages. > These messages are unique for each call to `verify_oop()` because they are constructed locally. > I reduced number of loop iteration by 10 to get reasonable execution time (2 mins instead of 20 mins) and got next data: > > Without VerifyOops: > External addresses table: 38 entries > > With VerifyOops: > External addresses table: 125922 entries > > Looks like most of the time VM is spending to grow/reallocate big growable array added by [JDK-8333819](https://bugs.openjdk.org/browse/JDK-8333819). > Before that change these addresses were recorded locally in nmethod's relocation info. When nmethod was deoptimized, its relocation info was discarded together with nmethod. > > Only on x86 we declared message address as ExternalAddress. Aarch64 uses movptr() to store address as simple pointer. ARM uses own InlineAddress with RelocInfo::none type of relocation. I verified other platforms: none is using ExternalAddress. > > I suggest to use AddressLiteral with RelocInfo::none for x86. With that the global table is small even with -XX:+VerifyOops: > > External addresses table: 42 entries > > > Tested tier1, run test with corresponding flags to verify that time is similar to before [JDK-8333819](https://bugs.openjdk.org/browse/JDK-8333819) I did experiment to see if ExternalAddress reference address in CodeCache. There are few places where it can be changed but it is for different RFE. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19871#issuecomment-2189859048 From thomas.stuefe at gmail.com Tue Jun 25 20:07:13 2024 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Tue, 25 Jun 2024 22:07:13 +0200 Subject: RFR: 8333658: NMT: Use an allocator with 4-byte pointers to save memory in NativeCallStackStorage [v34] In-Reply-To: References: <4uxxepR1o8-X2knnKyxZOK-lkiAZCe0tp-GR1MtkCc8=.36c8d96a-2683-40df-8ca2-f870b31b1eab@github.com> <7Qgqdta_MVhWQTjP4nGU3eDIsaLsyr-YWrkGPnAyk3s=.7d80bde7-a1d0-495d-b7c9-0ab90e24d440@github.com> Message-ID: Johan, I would just remove the noncopyable. Its questionable anyway, there is no reason one shouldnt be able to copy the AWFL, in particular because client code just uses indices, not pointers. On Tue 25. Jun 2024 at 21:01, Sonia Zaldana Calles wrote: > On Tue, 25 Jun 2024 17:57:36 GMT, Johan Sj?len > wrote: > > >> Thank you for the in-depth reviewing of this change. > > > >> Hi @jdksjolen, I'm seeing some build failures after this PR got > integrated. I filed [JDK-8335108]( > https://bugs.openjdk.org/browse/JDK-8335108) for reference. > > > > Hi Sonia, thanks for contacting me. It seems like this is a C++20 issue, > but Hotspot only supports up to C++14. Here's a Godbolt for reproducing the > issue, change the `-std=c++14` to `-std=c++20` to see the issue: > https://godbolt.org/z/eK5GaTceY > > > > That ought to mean that I don't have to back out this PR and > re-introduce it with the bug fixed, but I still think that we should fix > the bug because we do want to get to C++20 some day :-). > > > > There's something weird with the fact that you get builds with C++20 > support enabled, not sure what your configure options are. > > Hi @jdksjolen, thanks for looking into it! > > > Hi Sonia, thanks for contacting me. It seems like this is a C++20 issue, > but Hotspot only supports up to C++14. Here's a Godbolt for reproducing the > issue, change the `-std=c++14` to `-std=c++20` to see the issue: > https://godbolt.org/z/eK5GaTceY > > > > That ought to mean that I don't have to back out this PR and > re-introduce it with the bug fixed, but I still think that we should fix > the bug because we do want to get to C++20 some day :-). > > Makes sense. I'll submit a follow-up PR to fix this. > > > There's something weird with the fact that you get builds with C++20 > support enabled, not sure what your configure options are. > > I'm not sure. I don't have any special configure options. I came across > this error after a plain ```bash configure``` but I guess my installed g++ > version is 14.1.1. > > ------------- > > PR Comment: https://git.openjdk.org/jdk/pull/18979#issuecomment-2189747196 > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dlong at openjdk.org Tue Jun 25 20:31:09 2024 From: dlong at openjdk.org (Dean Long) Date: Tue, 25 Jun 2024 20:31:09 GMT Subject: RFR: 8334779: Test compiler/c1/CanonicalizeArrayLength.java is timing out In-Reply-To: References: Message-ID: On Mon, 24 Jun 2024 23:40:22 GMT, Vladimir Kozlov wrote: > The timeout is cause by running the test with `-Xcomp -XX:+VerifyOops -XX:+PatchALot`. > The test continuously deoptimize and recompile `java.lang.Throwable::` method. > `-XX:+VerifyOops ` adds a lot of external addresses because it use ExternallAddress for error messages. > These messages are unique for each call to `verify_oop()` because they are constructed locally. > I reduced number of loop iteration by 10 to get reasonable execution time (2 mins instead of 20 mins) and got next data: > > Without VerifyOops: > External addresses table: 38 entries > > With VerifyOops: > External addresses table: 125922 entries > > Looks like most of the time VM is spending to grow/reallocate big growable array added by [JDK-8333819](https://bugs.openjdk.org/browse/JDK-8333819). > Before that change these addresses were recorded locally in nmethod's relocation info. When nmethod was deoptimized, its relocation info was discarded together with nmethod. > > Only on x86 we declared message address as ExternalAddress. Aarch64 uses movptr() to store address as simple pointer. ARM uses own InlineAddress with RelocInfo::none type of relocation. I verified other platforms: none is using ExternalAddress. > > I suggest to use AddressLiteral with RelocInfo::none for x86. With that the global table is small even with -XX:+VerifyOops: > > External addresses table: 42 entries > > > Tested tier1, run test with corresponding flags to verify that time is similar to before [JDK-8333819](https://bugs.openjdk.org/browse/JDK-8333819) How will Leyden relocate these references if they are changed to relocInfo::none? Maybe these strings should be moved to the nmethod's _immutable_data? ------------- PR Comment: https://git.openjdk.org/jdk/pull/19871#issuecomment-2189911090 From kvn at openjdk.org Tue Jun 25 21:30:10 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 25 Jun 2024 21:30:10 GMT Subject: RFR: 8334779: Test compiler/c1/CanonicalizeArrayLength.java is timing out In-Reply-To: References: Message-ID: <6BUBPndSoAceBfFSbPl9I8tKLlUU_PQbTNLBWnhniX4=.59f2b146-080d-41e8-9628-3e192e377500@github.com> On Tue, 25 Jun 2024 20:28:16 GMT, Dean Long wrote: > How will Leyden relocate these references if they are changed to relocInfo::none? Maybe these strings should be moved to the nmethod's _immutable_data? Very nice idea. I was already think about moving oop_maps and relocation info from CodeBlob into _immutable_data. I will include these strings too. In a separate RFE. Until then Leyden will not support `VerifyOops`. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19871#issuecomment-2189999508 From kvn at openjdk.org Tue Jun 25 21:37:08 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 25 Jun 2024 21:37:08 GMT Subject: RFR: 8334779: Test compiler/c1/CanonicalizeArrayLength.java is timing out In-Reply-To: References: Message-ID: On Mon, 24 Jun 2024 23:40:22 GMT, Vladimir Kozlov wrote: > The timeout is cause by running the test with `-Xcomp -XX:+VerifyOops -XX:+PatchALot`. > The test continuously deoptimize and recompile `java.lang.Throwable::` method. > `-XX:+VerifyOops ` adds a lot of external addresses because it use ExternallAddress for error messages. > These messages are unique for each call to `verify_oop()` because they are constructed locally. > I reduced number of loop iteration by 10 to get reasonable execution time (2 mins instead of 20 mins) and got next data: > > Without VerifyOops: > External addresses table: 38 entries > > With VerifyOops: > External addresses table: 125922 entries > > Looks like most of the time VM is spending to grow/reallocate big growable array added by [JDK-8333819](https://bugs.openjdk.org/browse/JDK-8333819). > Before that change these addresses were recorded locally in nmethod's relocation info. When nmethod was deoptimized, its relocation info was discarded together with nmethod. > > Only on x86 we declared message address as ExternalAddress. Aarch64 uses movptr() to store address as simple pointer. ARM uses own InlineAddress with RelocInfo::none type of relocation. I verified other platforms: none is using ExternalAddress. > > I suggest to use AddressLiteral with RelocInfo::none for x86. With that the global table is small even with -XX:+VerifyOops: > > External addresses table: 42 entries > > > Tested tier1, run test with corresponding flags to verify that time is similar to before [JDK-8333819](https://bugs.openjdk.org/browse/JDK-8333819) I may need to use section_word relocation for that and add additional section type for _immutable_data. I would also need to fix it for other platforms which do not have relocation for these strings. It is not simple change and it needs a separate RFE. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19871#issuecomment-2190009979 From kvn at openjdk.org Tue Jun 25 22:15:10 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 25 Jun 2024 22:15:10 GMT Subject: RFR: 8334779: Test compiler/c1/CanonicalizeArrayLength.java is timing out In-Reply-To: References: Message-ID: On Mon, 24 Jun 2024 23:40:22 GMT, Vladimir Kozlov wrote: > The timeout is cause by running the test with `-Xcomp -XX:+VerifyOops -XX:+PatchALot`. > The test continuously deoptimize and recompile `java.lang.Throwable::` method. > `-XX:+VerifyOops ` adds a lot of external addresses because it use ExternallAddress for error messages. > These messages are unique for each call to `verify_oop()` because they are constructed locally. > I reduced number of loop iteration by 10 to get reasonable execution time (2 mins instead of 20 mins) and got next data: > > Without VerifyOops: > External addresses table: 38 entries > > With VerifyOops: > External addresses table: 125922 entries > > Looks like most of the time VM is spending to grow/reallocate big growable array added by [JDK-8333819](https://bugs.openjdk.org/browse/JDK-8333819). > Before that change these addresses were recorded locally in nmethod's relocation info. When nmethod was deoptimized, its relocation info was discarded together with nmethod. > > Only on x86 we declared message address as ExternalAddress. Aarch64 uses movptr() to store address as simple pointer. ARM uses own InlineAddress with RelocInfo::none type of relocation. I verified other platforms: none is using ExternalAddress. > > I suggest to use AddressLiteral with RelocInfo::none for x86. With that the global table is small even with -XX:+VerifyOops: > > External addresses table: 42 entries > > > Tested tier1, run test with corresponding flags to verify that time is similar to before [JDK-8333819](https://bugs.openjdk.org/browse/JDK-8333819) I did more investigation and there is an other way to solve this. I think I can use `external_word_Relocation::spec_for_immediate()` instead of `relocInfo::none` to avoid growing global table (because external_word_Relocation::_target is nullptr in this case) and still mark them as external addresses, so Leyden can see these strings and support `VerifyOops`. I did local testing and it works. I will test it more and update PR. Thank you, @dean-long, for this discussion. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19871#issuecomment-2190059908 From ascarpino at openjdk.org Tue Jun 25 22:16:11 2024 From: ascarpino at openjdk.org (Anthony Scarpino) Date: Tue, 25 Jun 2024 22:16:11 GMT Subject: RFR: 8333583: Crypto-XDH.generateSecret regression after JDK-8329538 [v3] In-Reply-To: References: Message-ID: On Mon, 17 Jun 2024 16:38:55 GMT, Volodymyr Paprotski wrote: >> This fix recovers XDH performance but removes some of the P256 gains (~-8-14%). Still faster, but not as much. >> >> The fix is to undo 'int' return type on mult()/square(), which allowed to return partially reduced result (e.g. this avoids extra reductions when mult() result is fed into addition). This is the behaviour before the Montgomery ECC PR. >> >> --- >> XDH.generateSecret performance >> before Montgomery PR: >> >> Benchmark (algorithm) (keyLength) (kpgAlgorithm) (provider) Mode Cnt Score Error Units >> KeyAgreementBench.XDH.generateSecret XDH 255 XDH thrpt 3 8435.277 ? 27.230 ops/s >> >> after Montgomery PR: >> >> Benchmark (algorithm) (keyLength) (kpgAlgorithm) (provider) Mode Cnt Score Error Units >> KeyAgreementBench.XDH.generateSecret XDH 255 XDH thrpt 3 8309.028 ? 22.071 ops/s >> >> with this PR: >> >> Benchmark (algorithm) (keyLength) (kpgAlgorithm) (provider) Mode Cnt Score Error Units >> KeyAgreementBench.XDH.generateSecret XDH 255 XDH thrpt 3 8491.268 ? 32.858 ops/s >> >> --- >> >> P256 performance with/without mult intrinsic: >> >> Performance before Montgomery PR: >> >> Benchmark (algorithm) (dataSize) (keyLength) (provider) Mode Cnt Score Error Units >> SignatureBench.ECDSA.sign SHA256withECDSA 1024 256 thrpt 3 6398.727 ? 7.400 ops/s >> SignatureBench.ECDSA.sign SHA256withECDSA 16384 256 thrpt 3 6129.739 ? 5.995 ops/s >> SignatureBench.ECDSA.verify SHA256withECDSA 1024 256 thrpt 3 1889.928 ? 54.660 ops/s >> SignatureBench.ECDSA.verify SHA256withECDSA 16384 256 thrpt 3 1866.339 ? 42.438 ops/s >> Benchmark (algorithm) (keyLength) (kpgAlgorithm) (provider) Mode Cnt Score Error Units >> o.o.b.j.c.full.KeyAgreementBench.EC.generateSecret ECDH 256 EC thrpt 3 1350.745 ? 28.514 ops/s >> o.o.b.j.c.small.KeyAgreementBench.EC.generateSecret ECDH 256 EC thrpt 3 1349.393 ? 32.050 ops/s >> >> Performance in master without mult() intrinsic >> >> Benchmark ... > > Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision: > > comment from Sandhya Approved with review by @ferakocz ------------- Marked as reviewed by ascarpino (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19728#pullrequestreview-2139971558 From duke at openjdk.org Tue Jun 25 22:34:15 2024 From: duke at openjdk.org (Volodymyr Paprotski) Date: Tue, 25 Jun 2024 22:34:15 GMT Subject: RFR: 8333583: Crypto-XDH.generateSecret regression after JDK-8329538 [v3] In-Reply-To: References: Message-ID: <5zUIiLs4JCGj0L7ullCkipQ0Q1V-gIYjkNt8ylS00YM=.2d42bed6-a3e0-429c-b4e2-14e8e0d1a474@github.com> On Tue, 25 Jun 2024 17:31:09 GMT, Ferenc Rakoczi wrote: >> Hi @vpaprotsk, >> @ferakocz is going to take a look at the change. When he says it's ok, I'll approve the PR. > > @ascarpino please approve this change. Thanks @ferakocz @ascarpino ------------- PR Comment: https://git.openjdk.org/jdk/pull/19728#issuecomment-2190106639 From duke at openjdk.org Tue Jun 25 22:34:16 2024 From: duke at openjdk.org (Volodymyr Paprotski) Date: Tue, 25 Jun 2024 22:34:16 GMT Subject: Integrated: 8333583: Crypto-XDH.generateSecret regression after JDK-8329538 In-Reply-To: References: Message-ID: On Fri, 14 Jun 2024 20:23:04 GMT, Volodymyr Paprotski wrote: > This fix recovers XDH performance but removes some of the P256 gains (~-8-14%). Still faster, but not as much. > > The fix is to undo 'int' return type on mult()/square(), which allowed to return partially reduced result (e.g. this avoids extra reductions when mult() result is fed into addition). This is the behaviour before the Montgomery ECC PR. > > --- > XDH.generateSecret performance > before Montgomery PR: > > Benchmark (algorithm) (keyLength) (kpgAlgorithm) (provider) Mode Cnt Score Error Units > KeyAgreementBench.XDH.generateSecret XDH 255 XDH thrpt 3 8435.277 ? 27.230 ops/s > > after Montgomery PR: > > Benchmark (algorithm) (keyLength) (kpgAlgorithm) (provider) Mode Cnt Score Error Units > KeyAgreementBench.XDH.generateSecret XDH 255 XDH thrpt 3 8309.028 ? 22.071 ops/s > > with this PR: > > Benchmark (algorithm) (keyLength) (kpgAlgorithm) (provider) Mode Cnt Score Error Units > KeyAgreementBench.XDH.generateSecret XDH 255 XDH thrpt 3 8491.268 ? 32.858 ops/s > > --- > > P256 performance with/without mult intrinsic: > > Performance before Montgomery PR: > > Benchmark (algorithm) (dataSize) (keyLength) (provider) Mode Cnt Score Error Units > SignatureBench.ECDSA.sign SHA256withECDSA 1024 256 thrpt 3 6398.727 ? 7.400 ops/s > SignatureBench.ECDSA.sign SHA256withECDSA 16384 256 thrpt 3 6129.739 ? 5.995 ops/s > SignatureBench.ECDSA.verify SHA256withECDSA 1024 256 thrpt 3 1889.928 ? 54.660 ops/s > SignatureBench.ECDSA.verify SHA256withECDSA 16384 256 thrpt 3 1866.339 ? 42.438 ops/s > Benchmark (algorithm) (keyLength) (kpgAlgorithm) (provider) Mode Cnt Score Error Units > o.o.b.j.c.full.KeyAgreementBench.EC.generateSecret ECDH 256 EC thrpt 3 1350.745 ? 28.514 ops/s > o.o.b.j.c.small.KeyAgreementBench.EC.generateSecret ECDH 256 EC thrpt 3 1349.393 ? 32.050 ops/s > > Performance in master without mult() intrinsic > > Benchmark (algorithm) (dataSize) (keyLength) (provider) Mode Cnt Score Err... This pull request has now been integrated. Changeset: f101e153 Author: Volodymyr Paprotski Committer: Sandhya Viswanathan URL: https://git.openjdk.org/jdk/commit/f101e153cee68750fcf1f12da10e29806875b522 Stats: 125 lines in 9 files changed: 35 ins; 52 del; 38 mod 8333583: Crypto-XDH.generateSecret regression after JDK-8329538 Reviewed-by: sviswanathan, kvn, ascarpino ------------- PR: https://git.openjdk.org/jdk/pull/19728 From ccheung at openjdk.org Tue Jun 25 23:28:10 2024 From: ccheung at openjdk.org (Calvin Cheung) Date: Tue, 25 Jun 2024 23:28:10 GMT Subject: RFR: 8309634: Resolve CONSTANT_MethodRef at CDS dump time In-Reply-To: References: Message-ID: On Mon, 24 Jun 2024 17:21:18 GMT, Ioi Lam wrote: > Resolve `CONSTANT_MethodRef` entries during CDS dump time to improve start-up performance. > > - This PR uses the same framework introduced in #19355 and just added handling for methods. > - Support for getstatic/putstatic/invokestatic will be done separately in [JDK-8334898](https://bugs.openjdk.org/browse/JDK-8334898) src/hotspot/share/interpreter/interpreterRuntime.cpp line 671: > 669: > 670: // check if link resolution caused cpCache to be updated > 671: ConstantPoolCache* cache = pool->cache(); Is this needed? I don't see `cache` being used within the function. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19866#discussion_r1653734163 From dlong at openjdk.org Tue Jun 25 23:39:09 2024 From: dlong at openjdk.org (Dean Long) Date: Tue, 25 Jun 2024 23:39:09 GMT Subject: RFR: 8334779: Test compiler/c1/CanonicalizeArrayLength.java is timing out In-Reply-To: References: Message-ID: On Mon, 24 Jun 2024 23:40:22 GMT, Vladimir Kozlov wrote: > The timeout is cause by running the test with `-Xcomp -XX:+VerifyOops -XX:+PatchALot`. > The test continuously deoptimize and recompile `java.lang.Throwable::` method. > `-XX:+VerifyOops ` adds a lot of external addresses because it use ExternallAddress for error messages. > These messages are unique for each call to `verify_oop()` because they are constructed locally. > I reduced number of loop iteration by 10 to get reasonable execution time (2 mins instead of 20 mins) and got next data: > > Without VerifyOops: > External addresses table: 38 entries > > With VerifyOops: > External addresses table: 125922 entries > > Looks like most of the time VM is spending to grow/reallocate big growable array added by [JDK-8333819](https://bugs.openjdk.org/browse/JDK-8333819). > Before that change these addresses were recorded locally in nmethod's relocation info. When nmethod was deoptimized, its relocation info was discarded together with nmethod. > > Only on x86 we declared message address as ExternalAddress. Aarch64 uses movptr() to store address as simple pointer. ARM uses own InlineAddress with RelocInfo::none type of relocation. I verified other platforms: none is using ExternalAddress. > > I suggest to use AddressLiteral with RelocInfo::none for x86. With that the global table is small even with -XX:+VerifyOops: > > External addresses table: 42 entries > > > Tested tier1, run test with corresponding flags to verify that time is similar to before [JDK-8333819](https://bugs.openjdk.org/browse/JDK-8333819) I forgot about spec_for_immediate(). I think it will work. For relocating in Leyden, you may need to enhance Relocation::pd_get_address_from_code() to recognize pushptr(). ------------- PR Comment: https://git.openjdk.org/jdk/pull/19871#issuecomment-2190205639 From duke at openjdk.org Tue Jun 25 23:55:35 2024 From: duke at openjdk.org (Volodymyr Paprotski) Date: Tue, 25 Jun 2024 23:55:35 GMT Subject: [jdk23] RFR: 8333583: Crypto-XDH.generateSecret regression after JDK-8329538 Message-ID: Hi all, This pull request contains a backport of commit [f101e153](https://github.com/openjdk/jdk/commit/f101e153cee68750fcf1f12da10e29806875b522) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. Thanks! ------------- Commit messages: - Backport f101e153cee68750fcf1f12da10e29806875b522 Changes: https://git.openjdk.org/jdk/pull/19893/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19893&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8333583 Stats: 125 lines in 9 files changed: 35 ins; 52 del; 38 mod Patch: https://git.openjdk.org/jdk/pull/19893.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19893/head:pull/19893 PR: https://git.openjdk.org/jdk/pull/19893 From gcao at openjdk.org Wed Jun 26 00:02:09 2024 From: gcao at openjdk.org (Gui Cao) Date: Wed, 26 Jun 2024 00:02:09 GMT Subject: RFR: 8334505: RISC-V: Several tests fail when MaxVectorSize does not match VM_Version::_initial_vector_length In-Reply-To: <5Dp1uHtGX5nhb1LTDJ-eCinxb5k30GAVnx9iqSjnD84=.9d029b0e-919d-40b7-aff8-0d2276380072@github.com> References: <5Dp1uHtGX5nhb1LTDJ-eCinxb5k30GAVnx9iqSjnD84=.9d029b0e-919d-40b7-aff8-0d2276380072@github.com> Message-ID: On Wed, 19 Jun 2024 08:07:40 GMT, Fei Yang wrote: >> HI, It's possible to specify a MaxVectorSize which is not equal to VM_Version::_initial_vector_length on RISC-V. For example, it could happen on Banana-Pi that MaxVectorSize equals 16, while VM_Version::_initial_vector_length is 32. This may lead to several jtreg test failures, see jbs issue for exception information. >> >> The reason for this problem is that when spill vector registers into memory, the whole width of the register is used incorrectly, and MaxVectorSize should be used to handle the number of elements spill. >> >> https://github.com/openjdk/jdk/blob/326dbb1b139dd1ec1b8605339b91697cdf49da9a/src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.hpp#L133-L136 >> >> PR propose to simply set MaxVectorSize to VM_Version::_initial_vector_length for the following reasons: >> 1. The CSR_VLENB register of RISC-V is read-only, we can't change it to MaxVectorSize like like aarch64. >> 2. It does not make sense to me to set MaxVectorSize to a value smaller than VM_Version::_initial_vector_length in the real world, which might bring negative impact on performance. >> 3. If MaxVectorSize equals to VM_Version::_initial_vector_length, then we can make use of vs1r_v/vl1r_v when saving and restoring vector registers, which avoids the need to control the number of elements with vsetvli. >> >> After this patch, MaxVectorSize always equal to VM_Version::_initial_vector_length: >> >> zifeihan at plct-c8:~/jdk/build/linux-riscv64-server-fastdebug/jdk/bin$ ./java -XX:MaxVectorSize=16 -XX:+PrintFlagsFinal -version |grep MaxVectorSize >> OpenJDK 64-Bit Server VM warning: MaxVectorSize is set to 32 on this platform >> intx MaxVectorSize = 32 {C2 product} {command line} >> openjdk version "24-internal" 2025-03-18 >> OpenJDK Runtime Environment (fastdebug build 24-internal-adhoc.zifeihan.jdk) >> OpenJDK 64-Bit Server VM (fastdebug build 24-internal-adhoc.zifeihan.jdk, mixed mode) >> zifeihan at plct-c8:~/jdk/build/linux-riscv64-server-fastdebug/jdk/bin$ ./java -XX:MaxVectorSize=32 -XX:+PrintFlagsFinal -version |grep MaxVectorSize >> intx MaxVectorSize = 32 {C2 product} {command line} >> openjdk version "24-internal" 2025-03-18 >> OpenJDK Runtime Environment (fastdebug build 24-internal-adhoc.zifeihan.jdk) >> OpenJDK 64-Bit Server VM (fastdebug build 24-internal-adhoc.zifeihan.jdk, mixed mode) >> zifeihan at plct-c8:~/jdk/build/linux-riscv64-server-fastdebug/jdk/bin$ ./ja... > > Yeah, most of the related RISC-V code was written under the assumption that MaxVectorSize matches `vlenb` CSR (the vector register length in bytes). I agree it will be safer to have this change for now. Also I don't think a MaxVectorSize value smaller than `vlenb` would work if we want to experiment with vector register groups (LMUL > 1) some day for C2 especially when we come to vector reduction operations. @RealFYang : Thanks for the review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19785#issuecomment-2190227009 From kvn at openjdk.org Wed Jun 26 00:47:21 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 26 Jun 2024 00:47:21 GMT Subject: RFR: 8334779: Test compiler/c1/CanonicalizeArrayLength.java is timing out In-Reply-To: References: Message-ID: <67BEHC-7VLIMlqm6r574TCZ_eyKg5Y94WKcdCIsSW9w=.cb27370f-8bd8-4019-9e65-b730c952f599@github.com> On Mon, 24 Jun 2024 23:40:22 GMT, Vladimir Kozlov wrote: > The timeout is cause by running the test with `-Xcomp -XX:+VerifyOops -XX:+PatchALot`. > The test continuously deoptimize and recompile `java.lang.Throwable::` method. > `-XX:+VerifyOops ` adds a lot of external addresses because it use ExternallAddress for error messages. > These messages are unique for each call to `verify_oop()` because they are constructed locally. > I reduced number of loop iteration by 10 to get reasonable execution time (2 mins instead of 20 mins) and got next data: > > Without VerifyOops: > External addresses table: 38 entries > > With VerifyOops: > External addresses table: 125922 entries > > Looks like most of the time VM is spending to grow/reallocate big growable array added by [JDK-8333819](https://bugs.openjdk.org/browse/JDK-8333819). > Before that change these addresses were recorded locally in nmethod's relocation info. When nmethod was deoptimized, its relocation info was discarded together with nmethod. > > Only on x86 we declared message address as ExternalAddress. Aarch64 uses movptr() to store address as simple pointer. ARM uses own InlineAddress with RelocInfo::none type of relocation. I verified other platforms: none is using ExternalAddress. > > I suggest to use AddressLiteral with RelocInfo::none for x86. With that the global table is small even with -XX:+VerifyOops: > > External addresses table: 42 entries > > > Tested tier1, run test with corresponding flags to verify that time is similar to before [JDK-8333819](https://bugs.openjdk.org/browse/JDK-8333819) Yes, I need add code similar too `oop_Relocation` and `metadata_Relocation` which uses 64-bit immediate because `pushptr()` -> `lea() + push()` and `lea()` -> `mov_literal64()` for x64: [macroAssembler_x86.cpp#L641](https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/macroAssembler_x86.cpp#L641) ------------- PR Comment: https://git.openjdk.org/jdk/pull/19871#issuecomment-2190301903 From gcao at openjdk.org Wed Jun 26 01:02:19 2024 From: gcao at openjdk.org (Gui Cao) Date: Wed, 26 Jun 2024 01:02:19 GMT Subject: Integrated: 8334505: RISC-V: Several tests fail when MaxVectorSize does not match VM_Version::_initial_vector_length In-Reply-To: References: Message-ID: On Wed, 19 Jun 2024 04:21:24 GMT, Gui Cao wrote: > HI, It's possible to specify a MaxVectorSize which is not equal to VM_Version::_initial_vector_length on RISC-V. For example, it could happen on Banana-Pi that MaxVectorSize equals 16, while VM_Version::_initial_vector_length is 32. This may lead to several jtreg test failures, see jbs issue for exception information. > > The reason for this problem is that when spill vector registers into memory, the whole width of the register is used incorrectly, and MaxVectorSize should be used to handle the number of elements spill. > > https://github.com/openjdk/jdk/blob/326dbb1b139dd1ec1b8605339b91697cdf49da9a/src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.hpp#L133-L136 > > PR propose to simply set MaxVectorSize to VM_Version::_initial_vector_length for the following reasons: > 1. The CSR_VLENB register of RISC-V is read-only, we can't change it to MaxVectorSize like like aarch64. > 2. It does not make sense to me to set MaxVectorSize to a value smaller than VM_Version::_initial_vector_length in the real world, which might bring negative impact on performance. > 3. If MaxVectorSize equals to VM_Version::_initial_vector_length, then we can make use of vs1r_v/vl1r_v when saving and restoring vector registers, which avoids the need to control the number of elements with vsetvli. > > After this patch, MaxVectorSize always equal to VM_Version::_initial_vector_length: > > zifeihan at plct-c8:~/jdk/build/linux-riscv64-server-fastdebug/jdk/bin$ ./java -XX:MaxVectorSize=16 -XX:+PrintFlagsFinal -version |grep MaxVectorSize > OpenJDK 64-Bit Server VM warning: MaxVectorSize is set to 32 on this platform > intx MaxVectorSize = 32 {C2 product} {command line} > openjdk version "24-internal" 2025-03-18 > OpenJDK Runtime Environment (fastdebug build 24-internal-adhoc.zifeihan.jdk) > OpenJDK 64-Bit Server VM (fastdebug build 24-internal-adhoc.zifeihan.jdk, mixed mode) > zifeihan at plct-c8:~/jdk/build/linux-riscv64-server-fastdebug/jdk/bin$ ./java -XX:MaxVectorSize=32 -XX:+PrintFlagsFinal -version |grep MaxVectorSize > intx MaxVectorSize = 32 {C2 product} {command line} > openjdk version "24-internal" 2025-03-18 > OpenJDK Runtime Environment (fastdebug build 24-internal-adhoc.zifeihan.jdk) > OpenJDK 64-Bit Server VM (fastdebug build 24-internal-adhoc.zifeihan.jdk, mixed mode) > zifeihan at plct-c8:~/jdk/build/linux-riscv64-server-fastdebug/jdk/bin$ ./java -XX:MaxVectorSize=64 -XX:+PrintFlagsFinal -vers... This pull request has now been integrated. Changeset: c66f785f Author: Gui Cao Committer: Fei Yang URL: https://git.openjdk.org/jdk/commit/c66f785fb685d5c378fb4c4cdebdef29c01d321b Stats: 9 lines in 1 file changed: 1 ins; 5 del; 3 mod 8334505: RISC-V: Several tests fail when MaxVectorSize does not match VM_Version::_initial_vector_length Reviewed-by: fyang ------------- PR: https://git.openjdk.org/jdk/pull/19785 From kvn at openjdk.org Wed Jun 26 02:33:25 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 26 Jun 2024 02:33:25 GMT Subject: RFR: 8334779: Test compiler/c1/CanonicalizeArrayLength.java is timing out [v2] In-Reply-To: References: Message-ID: > The timeout is cause by running the test with `-Xcomp -XX:+VerifyOops -XX:+PatchALot`. > The test continuously deoptimize and recompile `java.lang.Throwable::` method. > `-XX:+VerifyOops ` adds a lot of external addresses because it use ExternallAddress for error messages. > These messages are unique for each call to `verify_oop()` because they are constructed locally. > I reduced number of loop iteration by 10 to get reasonable execution time (2 mins instead of 20 mins) and got next data: > > Without VerifyOops: > External addresses table: 38 entries > > With VerifyOops: > External addresses table: 125922 entries > > Looks like most of the time VM is spending to grow/reallocate big growable array added by [JDK-8333819](https://bugs.openjdk.org/browse/JDK-8333819). > Before that change these addresses were recorded locally in nmethod's relocation info. When nmethod was deoptimized, its relocation info was discarded together with nmethod. > > Only on x86 we declared message address as ExternalAddress. Aarch64 uses movptr() to store address as simple pointer. ARM uses own InlineAddress with RelocInfo::none type of relocation. I verified other platforms: none is using ExternalAddress. > > I suggest to use AddressLiteral with RelocInfo::none for x86. With that the global table is small even with -XX:+VerifyOops: > > External addresses table: 42 entries > > > Tested tier1, run test with corresponding flags to verify that time is similar to before [JDK-8333819](https://bugs.openjdk.org/browse/JDK-8333819) Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: Use external_word_Relocation::spec_for_immediate() instead of relocInfo::none ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19871/files - new: https://git.openjdk.org/jdk/pull/19871/files/dc586132..6da56e22 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19871&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19871&range=00-01 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/19871.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19871/head:pull/19871 PR: https://git.openjdk.org/jdk/pull/19871 From kvn at openjdk.org Wed Jun 26 02:33:25 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 26 Jun 2024 02:33:25 GMT Subject: RFR: 8334779: Test compiler/c1/CanonicalizeArrayLength.java is timing out In-Reply-To: References: Message-ID: <0dHgzL5wzf1D7AsnqgD_-9lEjsgQMw2KQBE_uHyx6g4=.e703f397-3c06-4beb-9a27-f2d548169a7e@github.com> On Mon, 24 Jun 2024 23:40:22 GMT, Vladimir Kozlov wrote: > The timeout is cause by running the test with `-Xcomp -XX:+VerifyOops -XX:+PatchALot`. > The test continuously deoptimize and recompile `java.lang.Throwable::` method. > `-XX:+VerifyOops ` adds a lot of external addresses because it use ExternallAddress for error messages. > These messages are unique for each call to `verify_oop()` because they are constructed locally. > I reduced number of loop iteration by 10 to get reasonable execution time (2 mins instead of 20 mins) and got next data: > > Without VerifyOops: > External addresses table: 38 entries > > With VerifyOops: > External addresses table: 125922 entries > > Looks like most of the time VM is spending to grow/reallocate big growable array added by [JDK-8333819](https://bugs.openjdk.org/browse/JDK-8333819). > Before that change these addresses were recorded locally in nmethod's relocation info. When nmethod was deoptimized, its relocation info was discarded together with nmethod. > > Only on x86 we declared message address as ExternalAddress. Aarch64 uses movptr() to store address as simple pointer. ARM uses own InlineAddress with RelocInfo::none type of relocation. I verified other platforms: none is using ExternalAddress. > > I suggest to use AddressLiteral with RelocInfo::none for x86. With that the global table is small even with -XX:+VerifyOops: > > External addresses table: 42 entries > > > Tested tier1, run test with corresponding flags to verify that time is similar to before [JDK-8333819](https://bugs.openjdk.org/browse/JDK-8333819) I ran tier7 which uses `-Xcomp -XX:+VerifyOops` flags combination. It passed clean with last update with `external_word_Relocation::spec_for_immediate()`. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19871#issuecomment-2190423003 From duke at openjdk.org Wed Jun 26 03:04:11 2024 From: duke at openjdk.org (Aksh Desai) Date: Wed, 26 Jun 2024 03:04:11 GMT Subject: [jdk23] RFR: 8333583: Crypto-XDH.generateSecret regression after JDK-8329538 In-Reply-To: References: Message-ID: On Tue, 25 Jun 2024 23:50:20 GMT, Volodymyr Paprotski wrote: > Hi all, > > This pull request contains a backport of commit [f101e153](https://github.com/openjdk/jdk/commit/f101e153cee68750fcf1f12da10e29806875b522) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. > > The commit being backported was authored by Volodymyr Paprotski on 25 Jun 2024 and was reviewed by Sandhya Viswanathan, Vladimir Kozlov, Ferenc Rakoczi and Anthony Scarpino. > > Thanks! Marked as reviewed by AkshDesai04 at github.com (no known OpenJDK username). ------------- PR Review: https://git.openjdk.org/jdk/pull/19893#pullrequestreview-2140336933 From iklam at openjdk.org Wed Jun 26 03:11:41 2024 From: iklam at openjdk.org (Ioi Lam) Date: Wed, 26 Jun 2024 03:11:41 GMT Subject: RFR: 8309634: Resolve CONSTANT_MethodRef at CDS dump time [v2] In-Reply-To: References: Message-ID: > Resolve `CONSTANT_MethodRef` entries during CDS dump time to improve start-up performance. > > - This PR uses the same framework introduced in #19355 and just added handling for methods. > - Support for getstatic/putstatic/invokestatic will be done separately in [JDK-8334898](https://bugs.openjdk.org/browse/JDK-8334898) Ioi Lam has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: - Merge branch 'master' into 8309634-resolve-methods-at-dumptime - @calvinccheung and @matias9927 comments - Fixed whitespaces - 8309634: Resolve CONSTANT_MethodRef at CDS dump time ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19866/files - new: https://git.openjdk.org/jdk/pull/19866/files/0e8f0ac1..fd039bef Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19866&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19866&range=00-01 Stats: 22432 lines in 400 files changed: 15030 ins; 5311 del; 2091 mod Patch: https://git.openjdk.org/jdk/pull/19866.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19866/head:pull/19866 PR: https://git.openjdk.org/jdk/pull/19866 From iklam at openjdk.org Wed Jun 26 03:11:41 2024 From: iklam at openjdk.org (Ioi Lam) Date: Wed, 26 Jun 2024 03:11:41 GMT Subject: RFR: 8309634: Resolve CONSTANT_MethodRef at CDS dump time [v2] In-Reply-To: References: Message-ID: On Tue, 25 Jun 2024 23:24:59 GMT, Calvin Cheung wrote: >> Ioi Lam has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: >> >> - Merge branch 'master' into 8309634-resolve-methods-at-dumptime >> - @calvinccheung and @matias9927 comments >> - Fixed whitespaces >> - 8309634: Resolve CONSTANT_MethodRef at CDS dump time > > src/hotspot/share/interpreter/interpreterRuntime.cpp line 671: > >> 669: >> 670: // check if link resolution caused cpCache to be updated >> 671: ConstantPoolCache* cache = pool->cache(); > > Is this needed? I don't see `cache` being used within the function. I removed this line. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19866#discussion_r1653862864 From iklam at openjdk.org Wed Jun 26 03:11:41 2024 From: iklam at openjdk.org (Ioi Lam) Date: Wed, 26 Jun 2024 03:11:41 GMT Subject: RFR: 8309634: Resolve CONSTANT_MethodRef at CDS dump time [v2] In-Reply-To: <4t-Lh3G9nOxVUDt8cWFfwDigEU_fAsXvEEdD2uqGfqg=.da1a5f6c-c4e5-401c-9962-26b248264144@github.com> References: <4t-Lh3G9nOxVUDt8cWFfwDigEU_fAsXvEEdD2uqGfqg=.da1a5f6c-c4e5-401c-9962-26b248264144@github.com> Message-ID: On Mon, 24 Jun 2024 20:23:30 GMT, Matias Saavedra Silva wrote: >> Ioi Lam has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: >> >> - Merge branch 'master' into 8309634-resolve-methods-at-dumptime >> - @calvinccheung and @matias9927 comments >> - Fixed whitespaces >> - 8309634: Resolve CONSTANT_MethodRef at CDS dump time > > src/hotspot/share/oops/cpCache.cpp line 454: > >> 452: >> 453: // Just for safety -- this should not happen, but do not archive if we ever see this. >> 454: resolved &= !(rme->is_resolved(Bytecodes::_invokehandle) || > > Don't forget to fix the whitespace problem here Fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19866#discussion_r1653862929 From iklam at openjdk.org Wed Jun 26 03:14:20 2024 From: iklam at openjdk.org (Ioi Lam) Date: Wed, 26 Jun 2024 03:14:20 GMT Subject: RFR: 8309634: Resolve CONSTANT_MethodRef at CDS dump time [v2] In-Reply-To: <4t-Lh3G9nOxVUDt8cWFfwDigEU_fAsXvEEdD2uqGfqg=.da1a5f6c-c4e5-401c-9962-26b248264144@github.com> References: <4t-Lh3G9nOxVUDt8cWFfwDigEU_fAsXvEEdD2uqGfqg=.da1a5f6c-c4e5-401c-9962-26b248264144@github.com> Message-ID: On Tue, 25 Jun 2024 16:45:49 GMT, Matias Saavedra Silva wrote: >> Ioi Lam has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: >> >> - Merge branch 'master' into 8309634-resolve-methods-at-dumptime >> - @calvinccheung and @matias9927 comments >> - Fixed whitespaces >> - 8309634: Resolve CONSTANT_MethodRef at CDS dump time > > src/hotspot/share/interpreter/interpreterRuntime.cpp line 930: > >> 928: CallInfo call_info; >> 929: switch (bytecode) { >> 930: case Bytecodes::_invokevirtual: LinkResolver::cds_resolve_virtual_call (call_info, link_info, CHECK); break; > > I think the the `cds_resolve_xyz_call()` methods might be unnecessary. You can just call the existing methods from LinkResolver besides `resolve_virtual_call` I updated the code to make it easier to read. `cds_resolve_virtual_call()` needs to pass `is_abstract_interpretation=true` to `linktime_resolve_virtual_method()`. The reason is explained in this new comment: // is_abstract_interpretation is true IFF CDS is resolving method references without // running any actual bytecode. Therefore, we don't have an actual recv/recv_klass, so // we cannot check the actual selected_method (which is not needed by CDS anyway). Without this guard, we will dereference `recv_klass->method_at_vtable(vtable_index)` and will get a SEGV because `recv_klass` is null. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19866#discussion_r1653865166 From kbarrett at openjdk.org Wed Jun 26 05:21:29 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Wed, 26 Jun 2024 05:21:29 GMT Subject: RFR: 8333133: Simplify QuickSort::sort [v2] In-Reply-To: References: Message-ID: On Wed, 19 Jun 2024 08:41:24 GMT, Kim Barrett wrote: >> The "idempotent" argument is removed from that function, with associated >> simplifications to the implementation. Callers are updated to remove that >> argument. Callers that were providing a false value are unaffected in their >> behavior. The 3 callers that were providing a true value to request the >> associated feature are also unaffected (other than by being made faster), >> because the arrays involved don't contain any equivalent pairs. >> >> There are also some miscellaneous cleanups, including using the swap utility >> and fixing some comments. >> >> Testing: mach5 tier1-3 > > Kim Barrett has updated the pull request incrementally with one additional commit since the last revision: > > improve find_pivot description Thanks for reviews. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19464#issuecomment-2190724041 From kbarrett at openjdk.org Wed Jun 26 05:21:29 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Wed, 26 Jun 2024 05:21:29 GMT Subject: RFR: 8333133: Simplify QuickSort::sort [v3] In-Reply-To: References: Message-ID: > The "idempotent" argument is removed from that function, with associated > simplifications to the implementation. Callers are updated to remove that > argument. Callers that were providing a false value are unaffected in their > behavior. The 3 callers that were providing a true value to request the > associated feature are also unaffected (other than by being made faster), > because the arrays involved don't contain any equivalent pairs. > > There are also some miscellaneous cleanups, including using the swap utility > and fixing some comments. > > Testing: mach5 tier1-3 Kim Barrett has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: - Merge branch 'master' into no-idempotent-quicksort - Merge branch 'master' into no-idempotent-quicksort - improve find_pivot description - remove idempotent ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19464/files - new: https://git.openjdk.org/jdk/pull/19464/files/5cee3b81..07dd7040 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19464&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19464&range=01-02 Stats: 73160 lines in 1648 files changed: 45921 ins; 20473 del; 6766 mod Patch: https://git.openjdk.org/jdk/pull/19464.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19464/head:pull/19464 PR: https://git.openjdk.org/jdk/pull/19464 From kbarrett at openjdk.org Wed Jun 26 05:21:29 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Wed, 26 Jun 2024 05:21:29 GMT Subject: RFR: 8333133: Simplify QuickSort::sort In-Reply-To: References: Message-ID: On Mon, 17 Jun 2024 12:53:33 GMT, Florian Weimer wrote: > > > It does not provide any such thing. All the flag does is prevent swapping of > > > equivalent elements, which doesn't give us any interesting additional ordering > > > property. > > > > > > I only meant the sort order of the equivalent elements would be maintained. > > I think the partitioning phase swaps inequal elements based on comparison with the pivot, and this can move elements equivalent to the pivot past the pivot, with or without that additional equality check. Yes. I think what @dholmes-ora describes is "stability", which this option does not provide. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19464#issuecomment-2190723303 From kbarrett at openjdk.org Wed Jun 26 05:21:29 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Wed, 26 Jun 2024 05:21:29 GMT Subject: RFR: 8333133: Simplify QuickSort::sort [v3] In-Reply-To: References: Message-ID: <0mYXOm6IaLlZALCjMW1QqB7qWAVALTu7p726WGgODpA=.e39360ec-fc40-4ba0-a35a-e7d461dfc344@github.com> On Tue, 11 Jun 2024 05:32:03 GMT, David Holmes wrote: >> Kim Barrett has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: >> >> - Merge branch 'master' into no-idempotent-quicksort >> - Merge branch 'master' into no-idempotent-quicksort >> - improve find_pivot description >> - remove idempotent > > src/hotspot/share/utilities/quickSort.hpp line 43: > >> 41: // We swap these three values into the right place in the array. This >> 42: // means that this method not only returns the index of the pivot >> 43: // element. It also alters the array so that: > > Pre-existing nit: this should be one sentence: "... element, it also ..." I ended up doing a rewrite of the description. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19464#discussion_r1654047792 From kbarrett at openjdk.org Wed Jun 26 05:21:30 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Wed, 26 Jun 2024 05:21:30 GMT Subject: Integrated: 8333133: Simplify QuickSort::sort In-Reply-To: References: Message-ID: On Wed, 29 May 2024 18:52:03 GMT, Kim Barrett wrote: > The "idempotent" argument is removed from that function, with associated > simplifications to the implementation. Callers are updated to remove that > argument. Callers that were providing a false value are unaffected in their > behavior. The 3 callers that were providing a true value to request the > associated feature are also unaffected (other than by being made faster), > because the arrays involved don't contain any equivalent pairs. > > There are also some miscellaneous cleanups, including using the swap utility > and fixing some comments. > > Testing: mach5 tier1-3 This pull request has now been integrated. Changeset: 25c3845b Author: Kim Barrett URL: https://git.openjdk.org/jdk/commit/25c3845be270462388ee5e7330cc7315e5c738df Stats: 130 lines in 11 files changed: 5 ins; 97 del; 28 mod 8333133: Simplify QuickSort::sort Reviewed-by: shade, dholmes ------------- PR: https://git.openjdk.org/jdk/pull/19464 From thartmann at openjdk.org Wed Jun 26 06:34:12 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 26 Jun 2024 06:34:12 GMT Subject: [jdk23] RFR: 8333583: Crypto-XDH.generateSecret regression after JDK-8329538 In-Reply-To: References: Message-ID: On Tue, 25 Jun 2024 23:50:20 GMT, Volodymyr Paprotski wrote: > Hi all, > > This pull request contains a backport of commit [f101e153](https://github.com/openjdk/jdk/commit/f101e153cee68750fcf1f12da10e29806875b522) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. > > The commit being backported was authored by Volodymyr Paprotski on 25 Jun 2024 and was reviewed by Sandhya Viswanathan, Vladimir Kozlov, Ferenc Rakoczi and Anthony Scarpino. > > Thanks! Looks good to me. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19893#pullrequestreview-2140832004 From stuefe at openjdk.org Wed Jun 26 06:36:11 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 26 Jun 2024 06:36:11 GMT Subject: RFR: 8333994: NMT: call stacks should show source information [v5] In-Reply-To: References: Message-ID: On Sun, 23 Jun 2024 08:57:26 GMT, Johan Sj?len wrote: >> Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: >> >> cleanups > > src/hotspot/share/nmt/nativeCallStackPrinter.cpp line 49: > >> 47: char* store = NEW_ARENA_ARRAY(&_text_storage, char, len + 1); >> 48: strcpy(store, ss.base()); >> 49: (*cached_frame_text) = store; > > Redundant parens Lets keep those. This seems to be in line with general hotspot syntax. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19655#discussion_r1654206412 From sroy at openjdk.org Wed Jun 26 07:01:37 2024 From: sroy at openjdk.org (Suchismith Roy) Date: Wed, 26 Jun 2024 07:01:37 GMT Subject: RFR: JDK-8331732 :[PPC64] Unify and optimize code which converts != 0 to 1 Message-ID: [JDK-8331732](https://bugs.openjdk.org/browse/JDK-8331732) The template interpreter contains branch-free conversion code for T_BOOLEAN (TemplateInterpreterGenerator::generate_result_handler_for). SharedRuntime::generate_native_wrapper uses unoptimized code to "Unpack the native result" for T_BOOLEAN. Power10 has the "setbc" / "setbcr" instruction. We could create a new function for the conversion and use "setbc" on Power10 (determined by VM_Version::has_brw()) and otherwise the branch-free implementation. We should have a function for 32 and one for 64 bit operations (or one with supports both). The new code for MacroAssembler::verify_secondary_supers_table should also use the new function. ------------- Commit messages: - include the function - Use non branch code to normalize bool Changes: https://git.openjdk.org/jdk/pull/19886/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19886&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8331732 Stats: 45 lines in 7 files changed: 34 ins; 8 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/19886.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19886/head:pull/19886 PR: https://git.openjdk.org/jdk/pull/19886 From stuefe at openjdk.org Wed Jun 26 07:08:45 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 26 Jun 2024 07:08:45 GMT Subject: RFR: 8333994: NMT: call stacks should show source information [v6] In-Reply-To: References: Message-ID: > Patch adds printing of source file and line number to NMT call stacks that are printed in detail mode. Useful for hunting down leaks. Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 14 additional commits since the last revision: - feedback - Merge branch 'master' into JDK-8333994-NMT-call-stacks-should-show-source-information - cleanups - increase init buffer - small rework - Merge branch 'master' into JDK-8333994-NMT-call-stacks-should-show-source-information - fix windows build - Merge branch 'master' into JDK-8333994-NMT-call-stacks-should-show-source-information - caching - Merge branch 'master' into JDK-8333994-NMT-call-stacks-should-show-source-information - ... and 4 more: https://git.openjdk.org/jdk/compare/621a534d...12d23df8 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19655/files - new: https://git.openjdk.org/jdk/pull/19655/files/c6df7e1d..12d23df8 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19655&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19655&range=04-05 Stats: 2587 lines in 135 files changed: 1626 ins; 603 del; 358 mod Patch: https://git.openjdk.org/jdk/pull/19655.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19655/head:pull/19655 PR: https://git.openjdk.org/jdk/pull/19655 From stefank at openjdk.org Wed Jun 26 07:41:11 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Wed, 26 Jun 2024 07:41:11 GMT Subject: [jdk23] RFR: 8334594: Generational ZGC: Deadlock after OopMap rewrites in 8331572 In-Reply-To: <3yzDRbN8qzDoS-J0YiDADvLLh_XCQIpHRPZCkrDVlfI=.7e3577e9-555e-458a-868b-fa06014c698b@github.com> References: <3yzDRbN8qzDoS-J0YiDADvLLh_XCQIpHRPZCkrDVlfI=.7e3577e9-555e-458a-868b-fa06014c698b@github.com> Message-ID: On Tue, 25 Jun 2024 18:25:02 GMT, Kevin Rushforth wrote: > @shipilev Is the priority (P4) of this bug correct? If so, then it doesn't seem to meet the criteria for JDK 23 during RDP1. If the priority is wrong, please update it in JBS. The status is not correct. For GC bugs we tend to *not* set the priority when bugs are created (so they are left at the default, P4 [which is an unfortunate default, IMHO]), and then the GC triage will assign the correct priority. This bug was fixed before GC triage got to triaging the bug, and hence it didn't get the appropriate priority set. I'll update the priority to a P2. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19851#issuecomment-2191026927 From stefank at openjdk.org Wed Jun 26 07:45:09 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Wed, 26 Jun 2024 07:45:09 GMT Subject: [jdk23] RFR: 8334594: Generational ZGC: Deadlock after OopMap rewrites in 8331572 In-Reply-To: References: Message-ID: On Mon, 24 Jun 2024 09:02:30 GMT, Aleksey Shipilev wrote: > Clean backport to fix a deadlock. Marked as reviewed by stefank (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/19851#pullrequestreview-2140999606 From shade at openjdk.org Wed Jun 26 07:49:16 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 26 Jun 2024 07:49:16 GMT Subject: [jdk23] RFR: 8334594: Generational ZGC: Deadlock after OopMap rewrites in 8331572 In-Reply-To: References: Message-ID: On Mon, 24 Jun 2024 09:02:30 GMT, Aleksey Shipilev wrote: > Clean backport to fix a deadlock. Yes, I agree this is a P2 bug. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19851#issuecomment-2191038488 From shade at openjdk.org Wed Jun 26 07:49:17 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 26 Jun 2024 07:49:17 GMT Subject: [jdk23] Integrated: 8334594: Generational ZGC: Deadlock after OopMap rewrites in 8331572 In-Reply-To: References: Message-ID: On Mon, 24 Jun 2024 09:02:30 GMT, Aleksey Shipilev wrote: > Clean backport to fix a deadlock. This pull request has now been integrated. Changeset: d1510505 Author: Aleksey Shipilev URL: https://git.openjdk.org/jdk/commit/d1510505c1fb0c31063e7a1ba3bb6ead4cdd7568 Stats: 12 lines in 6 files changed: 3 ins; 0 del; 9 mod 8334594: Generational ZGC: Deadlock after OopMap rewrites in 8331572 Reviewed-by: stefank Backport-of: 05ff3185edd25b381a97f6879f496e97b62dddc2 ------------- PR: https://git.openjdk.org/jdk/pull/19851 From aph at openjdk.org Wed Jun 26 08:04:17 2024 From: aph at openjdk.org (Andrew Haley) Date: Wed, 26 Jun 2024 08:04:17 GMT Subject: RFR: 8331117: [PPC64] secondary_super_cache does not scale well [v8] In-Reply-To: References: Message-ID: On Sat, 15 Jun 2024 08:48:30 GMT, Martin Doerr wrote: >> PPC64 implementation of [JDK-8180450](https://bugs.openjdk.org/browse/JDK-8180450). Please review! >> I noticed that `r_array_length` is sometimes 0 and I don't see code for that on x86. Any idea? (This has been addressed in the discussion.) >> How can we verify it? By comparing the performance using the micro benchmarks? >> >> Micro benchmark results without patch (measured on Power10 with 2*8 hardware threads): >> >> Original >> SecondarySuperCacheHits: 13.033 ?(99.9%) 0.058 ns/op [Average] >> SecondarySuperCacheInterContention.test avgt 15 432.366 ? 8.364 ns/op >> SecondarySuperCacheInterContention.test:t1 avgt 15 432.310 ? 8.460 ns/op >> SecondarySuperCacheInterContention.test:t2 avgt 15 432.422 ? 10.819 ns/op >> SecondarySuperCacheIntraContention.test avgt 15 355.192 ? 3.597 ns/op >> SecondarySupersLookup.testNegative00 avgt 15 12.274 ? 0.026 ns/op >> SecondarySupersLookup.testNegative01 avgt 15 12.300 ? 0.039 ns/op >> SecondarySupersLookup.testNegative02 avgt 15 12.304 ? 0.034 ns/op >> SecondarySupersLookup.testNegative03 avgt 15 12.276 ? 0.050 ns/op >> SecondarySupersLookup.testNegative04 avgt 15 12.235 ? 0.044 ns/op >> SecondarySupersLookup.testNegative05 avgt 15 12.308 ? 0.156 ns/op >> SecondarySupersLookup.testNegative06 avgt 15 12.291 ? 0.048 ns/op >> SecondarySupersLookup.testNegative07 avgt 15 12.307 ? 0.052 ns/op >> SecondarySupersLookup.testNegative08 avgt 15 12.398 ? 0.075 ns/op >> SecondarySupersLookup.testNegative09 avgt 15 12.552 ? 0.122 ns/op >> SecondarySupersLookup.testNegative10 avgt 15 12.490 ? 0.083 ns/op >> SecondarySupersLookup.testNegative16 avgt 15 12.565 ? 0.092 ns/op >> SecondarySupersLookup.testNegative20 avgt 15 19.059 ? 0.958 ns/op >> SecondarySupersLookup.testNegative30 avgt 15 19.268 ? 0.124 ns/op >> SecondarySupersLookup.testNegative32 avgt 15 20.059 ? 0.114 ns/op >> SecondarySupersLookup.testNegative40 avgt 15 25.117 ? 0.368 ns/op >> SecondarySupersLookup.testNegative50 avgt 15 32.735 ? 0.359 ns/op >> SecondarySupersLookup.testNegative55 avgt 15 34.866 ? 0.152 ns/op >> SecondarySupersLookup.testNegative56 avgt 15 35.492 ? 0.276 ns/op >> SecondarySupersLookup.testNegative57 avgt 15 36.620 ? 0.334 ns/op >> SecondarySupersLookup.testNegative58 avgt 15 37.226 ? 0.180 ns/op >> SecondarySupersLookup.testNegative59 avgt 15 37.774 ? 0.241 ns/op >> SecondarySupersLookup.testNegative60 avgt 15 38.627 ? 1.451 ns/op >> Sec... > > Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: > > Make sure UseSecondarySupersTable is only used on Power7 or later. src/hotspot/cpu/ppc/macroAssembler_ppc.cpp line 2165: > 2163: } while(0) > 2164: > 2165: // Return true: we succeeded in generating this code You seem to have removed the return value but not its comment. :-) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19368#discussion_r1654326872 From mdoerr at openjdk.org Wed Jun 26 08:04:18 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 26 Jun 2024 08:04:18 GMT Subject: RFR: 8331117: [PPC64] secondary_super_cache does not scale well [v8] In-Reply-To: References: Message-ID: On Wed, 26 Jun 2024 07:59:07 GMT, Andrew Haley wrote: >> Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: >> >> Make sure UseSecondarySupersTable is only used on Power7 or later. > > src/hotspot/cpu/ppc/macroAssembler_ppc.cpp line 2165: > >> 2163: } while(0) >> 2164: >> 2165: // Return true: we succeeded in generating this code > > You seem to have removed the return value but not its comment. :-) Oh, right. I think we can live with it :-) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19368#discussion_r1654329788 From rehn at openjdk.org Wed Jun 26 08:11:37 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Wed, 26 Jun 2024 08:11:37 GMT Subject: RFR: 8332689: RISC-V: Use load instead of trampolines [v18] In-Reply-To: References: Message-ID: > Hi all, please consider! > > Today we do JAL to **dest** if **dest** is in reach (+/- 1 MB). > Using a very small application or running very short time we have fast patchable calls. > But any normal application running longer will increase the code size and code chrun/fragmentation. > So whatever or not you get hot fast calls rely on luck. > > To be patchable and get code cache reach we also emit a stub trampoline which we can point the JAL to. > This would be the common case for a patchable call. > > Code stream: > JAL > Stubs: > AUIPC > LD > JALR > > > > On some CPUs L1D and L1I can't contain the same cache line, which means the tramopline stub can bounce from L1I->L1D->L1I, which is expensive. > Even if you don't have that problem having a call to a jump is not the fastest way. > Loading the address avoids the pitsfalls of cmodx. > > This patch suggest to solve the problems with trampolines, we take small penalty in the naive case of JAL to **dest**, > and instead do by default: > > Code stream: > AUIPC > LD > JALR > Stubs: > > > An experimental option for turning trampolines back on exists. > > It should be possible to enhanced this with the WIP [Zjid](https://github.com/riscv/riscv-j-extension) by changing the JALR to JAL and nop out the auipc+ld (as the current proposal of Zjid forces the I-fetcher to fetch instruction in order (meaning we will avoid a lot issues which arm has)) when in reach and vice-versa. > > Numbers from VF2 (I have done them a few times, they are always overall in favor of this patch): > > fop (msec) 2239 | 2128 = 0.950424 > h2 (msec) 18660 | 16594 = 0.889282 > jython (msec) 22022 | 21925 = 0.995595 > luindex (msec) 2866 | 2842 = 0.991626 > lusearch (msec) 4108 | 4311 = 1.04942 > lusearch-fix (msec) 4406 | 4116 = 0.934181 > pmd (msec) 5976 | 5897 = 0.98678 > jython (msec) 22022 | 21925 = 0.995595 > Avg: 0.974112 > fop(xcomp) (msec) 2721 | 2714 = 0.997427 > h2(xcomp) (msec) 37719 | 38004 = 1.00756 > jython(xcomp) ... Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 26 commits: - Merge branch 'master' into 8332689 - Comments - Missed in merge-fixes, minor revert - Merge branch 'master' into 8332689 - Minor review comments - Merge branch 'master' into 8332689 - To be pushed - Merge branch 'master' into 8332689 - Review comments, removed dead code. - Merge branch 'master' into 8332689 - ... and 16 more: https://git.openjdk.org/jdk/compare/25c3845b...78a240a9 ------------- Changes: https://git.openjdk.org/jdk/pull/19453/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19453&range=17 Stats: 897 lines in 16 files changed: 622 ins; 177 del; 98 mod Patch: https://git.openjdk.org/jdk/pull/19453.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19453/head:pull/19453 PR: https://git.openjdk.org/jdk/pull/19453 From fyang at openjdk.org Wed Jun 26 08:26:15 2024 From: fyang at openjdk.org (Fei Yang) Date: Wed, 26 Jun 2024 08:26:15 GMT Subject: RFR: 8332689: RISC-V: Use load instead of trampolines [v16] In-Reply-To: References: Message-ID: On Tue, 25 Jun 2024 08:45:12 GMT, Robbin Ehn wrote: >> src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 3718: >> >>> 3716: // is secondary_supers[r_array_index]. Bits 0 and 1 in the bitmap >>> 3717: // have been checked. >>> 3718: rt_call(StubRoutines::lookup_secondary_supers_table_slow_path_stub()); >> >> Why not make use of the `stub_is_near` param and do a simpler `jump_link` when the slow path stub is near? > > As there was no users of jump_link, before this merge, except trampoline call, it always emit JAL as that is the only encoding we need. Hence we only have one place where we use JAL for calls, trampoline call. > > If we think we should use JAL for very short calls, manully adding it to one place is not the right approach IMHO. > Instead we should have rt_call emit JAL for runtime addresses, i.e. fixed address in libjvm.so or fixed address in code cache if reachable. > > For exampel, I think this site, in gen_continuation_enter(): > `__ rt_call(CAST_FROM_FN_PTR(address, StubRoutines::cont_thaw()));` > Could be benficial to use JAL in. Sounds reasonable. I guess this change won't affect performance much as it's call to the slow path stub. I will do JMH for this case to see. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19453#discussion_r1654359891 From fyang at openjdk.org Wed Jun 26 08:30:16 2024 From: fyang at openjdk.org (Fei Yang) Date: Wed, 26 Jun 2024 08:30:16 GMT Subject: RFR: 8332689: RISC-V: Use load instead of trampolines [v18] In-Reply-To: References: Message-ID: On Wed, 26 Jun 2024 08:11:37 GMT, Robbin Ehn wrote: >> Hi all, please consider! >> >> Today we do JAL to **dest** if **dest** is in reach (+/- 1 MB). >> Using a very small application or running very short time we have fast patchable calls. >> But any normal application running longer will increase the code size and code chrun/fragmentation. >> So whatever or not you get hot fast calls rely on luck. >> >> To be patchable and get code cache reach we also emit a stub trampoline which we can point the JAL to. >> This would be the common case for a patchable call. >> >> Code stream: >> JAL >> Stubs: >> AUIPC >> LD >> JALR >> >> >> >> On some CPUs L1D and L1I can't contain the same cache line, which means the tramopline stub can bounce from L1I->L1D->L1I, which is expensive. >> Even if you don't have that problem having a call to a jump is not the fastest way. >> Loading the address avoids the pitsfalls of cmodx. >> >> This patch suggest to solve the problems with trampolines, we take small penalty in the naive case of JAL to **dest**, >> and instead do by default: >> >> Code stream: >> AUIPC >> LD >> JALR >> Stubs: >> >> >> An experimental option for turning trampolines back on exists. >> >> It should be possible to enhanced this with the WIP [Zjid](https://github.com/riscv/riscv-j-extension) by changing the JALR to JAL and nop out the auipc+ld (as the current proposal of Zjid forces the I-fetcher to fetch instruction in order (meaning we will avoid a lot issues which arm has)) when in reach and vice-versa. >> >> Numbers from VF2 (I have done them a few times, they are always overall in favor of this patch): >> >> fop (msec) 2239 | 2128 = 0.950424 >> h2 (msec) 18660 | 16594 = 0.889282 >> jython (msec) 22022 | 21925 = 0.995595 >> luindex (msec) 2866 | 2842 = 0.991626 >> lusearch (msec) 4108 | 4311 = 1.04942 >> lusearch-fix (msec) 4406 | 4116 = 0.934181 >> pmd (msec) 5976 | 5897 = 0.98678 >> jython (msec) 22022 | 21925 = 0.995595 >> Avg: 0.974112 >> fop(xcomp) (msec) 2721 | 2714 = 0.997427 >> h2(xcomp) ... > > Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 26 commits: > > - Merge branch 'master' into 8332689 > - Comments > - Missed in merge-fixes, minor revert > - Merge branch 'master' into 8332689 > - Minor review comments > - Merge branch 'master' into 8332689 > - To be pushed > - Merge branch 'master' into 8332689 > - Review comments, removed dead code. > - Merge branch 'master' into 8332689 > - ... and 16 more: https://git.openjdk.org/jdk/compare/25c3845b...78a240a9 src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 4027: > 4025: } > 4026: > 4027: address MacroAssembler::load_call(Address entry) { How about rename this as something like `indirect_call`? BTW: I may need to take another look at the update of this PR. Hopefully, I can finish this week. Thank you for your patience. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19453#discussion_r1654365375 From stuefe at openjdk.org Wed Jun 26 08:47:15 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 26 Jun 2024 08:47:15 GMT Subject: RFR: 8333994: NMT: call stacks should show source information [v6] In-Reply-To: References: Message-ID: On Tue, 18 Jun 2024 20:41:01 GMT, Gerard Ziemski wrote: >>> >>> I have the same question. Did dwarf decoder performance improve? If so, could you point me the PR? Thanks! >> >> I completely forgot that this had been an issue. The comment was even written by me :( >> >> No, Elf decoder is still slow. But I have found myself too many times staring at NMT output now trying to make sense of the offsets. Missing source info in combination with the small stack size of 4 makes investigations a pain. >> >> I added a simple caching mechanism to aid printing. Its pretty straight-forward, but still I am not sure it is worth the complexity. Here the numbers: >> >> Running all NMT jtreg tests: >> - Stock JVM (no source info): 40 seconds >> - Source info: 2 min 30 seconds >> - Source info + caching: 1 min 15 seconds >> >> I think that is acceptable. Any more intricate caching would be over the complexity-benefit line. >> >> @gerard-ziemski >> >> The cost is with Dwarf parsing, not dladdr. dladdr is cheap. But feel free to make Dwarf parsing cheaper, that would be surely welcome. > >> > I have the same question. Did dwarf decoder performance improve? If so, could you point me the PR? Thanks! >> >> I completely forgot that this had been an issue. The comment was even written by me :( >> >> No, Elf decoder is still slow. But I have found myself too many times staring at NMT output now trying to make sense of the offsets. Missing source info in combination with the small stack size of 4 makes investigations a pain. >> >> I added a simple caching mechanism to aid printing. Its pretty straight-forward, but still I am not sure it is worth the complexity. Here the numbers: >> >> Running all NMT jtreg tests: >> >> * Stock JVM (no source info): 40 seconds >> * Source info: 2 min 30 seconds >> * Source info + caching: 1 min 15 seconds >> >> I think that is acceptable. Any more intricate caching would be over the complexity-benefit line. > > I simply pointed out your own old concern. If you are happy with the final performance now, then I'm good. > > I will look at the cache shortly. Thanks @gerard-ziemski @jdksjolen ! ------------- PR Comment: https://git.openjdk.org/jdk/pull/19655#issuecomment-2191147623 From stuefe at openjdk.org Wed Jun 26 08:47:16 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 26 Jun 2024 08:47:16 GMT Subject: Integrated: 8333994: NMT: call stacks should show source information In-Reply-To: References: Message-ID: On Tue, 11 Jun 2024 12:38:09 GMT, Thomas Stuefe wrote: > Patch adds printing of source file and line number to NMT call stacks that are printed in detail mode. Useful for hunting down leaks. This pull request has now been integrated. Changeset: e1390056 Author: Thomas Stuefe URL: https://git.openjdk.org/jdk/commit/e1390056c9dbf0a02a131864ebee23435e997852 Stats: 198 lines in 8 files changed: 135 ins; 17 del; 46 mod 8333994: NMT: call stacks should show source information Reviewed-by: jsjolen, gziemski ------------- PR: https://git.openjdk.org/jdk/pull/19655 From rehn at openjdk.org Wed Jun 26 08:59:17 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Wed, 26 Jun 2024 08:59:17 GMT Subject: RFR: 8332689: RISC-V: Use load instead of trampolines [v16] In-Reply-To: References: Message-ID: <134O9aga12QdNZ6_n7eDAmaQdP2BIZEsyFE96LIiFgE=.e4ce4c70-1505-4f73-a16c-7dcea05905f1@github.com> On Wed, 26 Jun 2024 08:22:40 GMT, Fei Yang wrote: >> As there was no users of jump_link, before this merge, except trampoline call, it always emit JAL as that is the only encoding we need. Hence we only have one place where we use JAL for calls, trampoline call. >> >> If we think we should use JAL for very short calls, manully adding it to one place is not the right approach IMHO. >> Instead we should have rt_call emit JAL for runtime addresses, i.e. fixed address in libjvm.so or fixed address in code cache if reachable. >> >> For exampel, I think this site, in gen_continuation_enter(): >> `__ rt_call(CAST_FROM_FN_PTR(address, StubRoutines::cont_thaw()));` >> Could be benficial to use JAL in. > > Sounds reasonable. I guess this change won't affect performance much as it's call to the slow path stub. I will do JMH for this case to see. Thanks! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19453#discussion_r1654413923 From rehn at openjdk.org Wed Jun 26 09:09:15 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Wed, 26 Jun 2024 09:09:15 GMT Subject: RFR: 8332689: RISC-V: Use load instead of trampolines [v18] In-Reply-To: References: Message-ID: On Wed, 26 Jun 2024 08:26:28 GMT, Fei Yang wrote: >> Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 26 commits: >> >> - Merge branch 'master' into 8332689 >> - Comments >> - Missed in merge-fixes, minor revert >> - Merge branch 'master' into 8332689 >> - Minor review comments >> - Merge branch 'master' into 8332689 >> - To be pushed >> - Merge branch 'master' into 8332689 >> - Review comments, removed dead code. >> - Merge branch 'master' into 8332689 >> - ... and 16 more: https://git.openjdk.org/jdk/compare/25c3845b...78a240a9 > > src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 4027: > >> 4025: } >> 4026: >> 4027: address MacroAssembler::load_call(Address entry) { > > How about rename this as something like `indirect_call`? > > BTW: I may need to take another look at the update of this PR. Hopefully, I can finish this week. Thank you for your patience. The reason I don't like "indirect_call", is that all JALR jumps are indirect: `The indirect jump instruction JALR (jump and link register)` Maybe the middle ground: "indirect_load_call()" ? I can live with indirect_call also, as it's indirect from programmers POV vs movptr + jalr. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19453#discussion_r1654428332 From sgehwolf at openjdk.org Wed Jun 26 10:04:13 2024 From: sgehwolf at openjdk.org (Severin Gehwolf) Date: Wed, 26 Jun 2024 10:04:13 GMT Subject: RFR: 8261242: [Linux] OSContainer::is_containerized() returns true when run outside a container [v7] In-Reply-To: References: Message-ID: On Tue, 25 Jun 2024 13:54:46 GMT, Severin Gehwolf wrote: >> Please review this enhancement to the container detection code which allows it to figure out whether the JVM is actually running inside a container (`podman`, `docker`, `crio`), or with some other means that enforces memory/cpu limits by means of the cgroup filesystem. If neither of those conditions hold, the JVM runs in not containerized mode, addressing the issue described in the JBS tracker. For example, on my Linux system `is_containerized() == false" is being indicated with the following trace log line: >> >> >> [0.001s][debug][os,container] OSContainer::init: is_containerized() = false because no cpu or memory limit is present >> >> >> This state is being exposed by the Java `Metrics` API class using the new (still JDK internal) `isContainerized()` method. Example: >> >> >> java -XshowSettings:system --version >> Operating System Metrics: >> Provider: cgroupv1 >> System not containerized. >> openjdk 23-internal 2024-09-17 >> OpenJDK Runtime Environment (fastdebug build 23-internal-adhoc.sgehwolf.jdk-jdk) >> OpenJDK 64-Bit Server VM (fastdebug build 23-internal-adhoc.sgehwolf.jdk-jdk, mixed mode, sharing) >> >> >> The basic property this is being built on is the observation that the cgroup controllers typically get mounted read only into containers. Note that the current container tests assert that `OSContainer::is_containerized() == true` in various tests. Therefore, using the heuristic of "is any memory or cpu limit present" isn't sufficient. I had considered that in an earlier iteration, but many container tests failed. >> >> Overall, I think, with this patch we improve the current situation of claiming a containerized system being present when it's actually just a regular Linux system. >> >> Testing: >> >> - [x] GHA (risc-v failure seems infra related) >> - [x] Container tests on Linux x86_64 of cgroups v1 and cgroups v2 (including gtests) >> - [x] Some manual testing using cri-o >> >> Thoughts? > > Severin Gehwolf has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 17 commits: > > - Refactor mount info matching to helper function > - Merge branch 'master' into jdk-8261242-is-containerized-fix > - Remove problem listing of PlainRead which is reworked here > - Merge branch 'master' into jdk-8261242-is-containerized-fix > - Merge branch 'master' into jdk-8261242-is-containerized-fix > - Add doc for mountinfo scanning. > - Unify naming of variables > - Merge branch 'master' into jdk-8261242-is-containerized-fix > - Merge branch 'master' into jdk-8261242-is-containerized-fix > - jcheck fixes > - ... and 7 more: https://git.openjdk.org/jdk/compare/baafa662...532ea33b Could I get a second review on this please? @larry-cable maybe? Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/18201#issuecomment-2191300003 From jsjolen at openjdk.org Wed Jun 26 10:09:26 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Wed, 26 Jun 2024 10:09:26 GMT Subject: RFR: 8333658: NMT: Use an allocator with 4-byte pointers to save memory in NativeCallStackStorage [v34] In-Reply-To: <4uxxepR1o8-X2knnKyxZOK-lkiAZCe0tp-GR1MtkCc8=.36c8d96a-2683-40df-8ca2-f870b31b1eab@github.com> References: <4uxxepR1o8-X2knnKyxZOK-lkiAZCe0tp-GR1MtkCc8=.36c8d96a-2683-40df-8ca2-f870b31b1eab@github.com> Message-ID: On Tue, 25 Jun 2024 13:39:58 GMT, Johan Sj?len wrote: >> Hi, >> >> This PR introduces a new allocator, `IndexedFreelistAllocator`. It uses a `GrowableArray` in order to have 4-byte "pointers" to its elements and also works as a freelist as unused slots form a linked list. This allocator cannot shrink its used memory. I'm always open for better names. >> >> We then use this allocator in order to store the `NativeCallStackStorage` hash table. This saves `4 * 4099 = ~16KiB` of memory as each table element is now only 4 bytes. For each stored stack it also saves 4 bytes. The fact that the allocator cannot shrink is fine, we do not ever remove stacks from the storage. >> >> The main point of this is to introduce a pointer-generic experimential API, so I also implemented CHeap and Arena allocators. It's currently placed in NMT, but we might want to move it into utilities. It uses a bit of template-magic, but my IDE (Emacs+clangd) was actually able to figure things out when the types didn't line up correctly etc, so it's not an enemy to IDE help. >> >> It sounded expensive to have the GrowableArray continously realloc its underlying data array, so I did a basic test where we allocate 1 000 000 stacks and push them into NativeCallStackStorage backed by different allocators. This is available in the PR. >> >> The results are as follows on linux-x64-slowdebug: >> >> >> Generate stacks... Done >> Time taken with GrowableArray: 8341.240945 >> Time taken with CHeap: 12189.031318 >> Time taken with Arena: 8800.703092 >> Time taken with GrowableArray again: 8295.508829 >> >> >> And on linux-x64: >> >> >> Time taken with GrowableArray: 8378.018135 >> Time taken with CHeap: 12437.347868 >> Time taken with Arena: 8758.064717 >> Time taken with GrowableArray again: 8391.076291 >> >> >> Obviously, this is a very basic benchmark, but it seems like we're faster than CHeap and Arena for this case. > > Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: > > Do not use char array > _Mailing list message from [Thomas St?fe](mailto:thomas.stuefe at gmail.com) on [hotspot-dev](mailto:hotspot-dev at mail.openjdk.org):_ > > Johan, I would just remove the noncopyable. Its questionable anyway, there is no reason one shouldnt be able to copy the AWFL, in particular because client code just uses indices, not pointers. > > On Tue 25. Jun 2024 at 21:01, Sonia Zaldana Calles wrote: > > -------------- next part -------------- An HTML attachment was scrubbed... URL: Whichever is fine by me. The class will still be non-copyable however, as GrowableArrayCHeap already is. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18979#issuecomment-2191308811 From mbaesken at openjdk.org Wed Jun 26 11:26:25 2024 From: mbaesken at openjdk.org (Matthias Baesken) Date: Wed, 26 Jun 2024 11:26:25 GMT Subject: RFR: 8333363: ubsan: instanceKlass.cpp: runtime error: member call on null pointer of type 'struct AnnotationArray' [v2] In-Reply-To: References: Message-ID: <1amBBr0FLAjGNuaS1wDLzvCrelfI1wFiT_WB61OCkZk=.92b61bc1-b69a-4f14-894b-538622bc60ef@github.com> > With ubsan enabled binaries we run on Linux aarch64 and Linux x86_64 into this issue : > > runtime/CommandLine/PrintClasses_id0.jtr > > src/hotspot/share/oops/instanceKlass.cpp:3603:84: runtime error: member call on null pointer of type 'struct AnnotationArray' > #0 0xfffface09b40 in InstanceKlass::print_on(outputStream*) const src/hotspot/share/oops/instanceKlass.cpp:3603 > #1 0xffffacdcd088 in PrintClassClosure::do_klass(Klass*) src/hotspot/share/oops/instanceKlass.cpp:2228 > #2 0xffffac464200 in ClassLoaderData::classes_do(KlassClosure*) src/hotspot/share/classfile/classLoaderData.cpp:387 > #3 0xffffac475c4c in ClassLoaderDataGraph::classes_do(KlassClosure*) src/hotspot/share/classfile/classLoaderDataGraph.cpp:303 > #4 0xffffac7bc4f4 in VM_PrintClasses::doit() src/hotspot/share/services/diagnosticCommand.cpp:989 > #5 0xffffae599c88 in VM_Operation::evaluate() src/hotspot/share/runtime/vmOperations.cpp:75 > #6 0xffffae5a5a14 in VMThread::evaluate_operation(VM_Operation*) src/hotspot/share/runtime/vmThread.cpp:283 > #7 0xffffae5a779c in VMThread::inner_execute(VM_Operation*) src/hotspot/share/runtime/vmThread.cpp:427 > #8 0xffffae5a7fd8 in VMThread::loop() src/hotspot/share/runtime/vmThread.cpp:493 > #9 0xffffae5a80bc in VMThread::run() src/hotspot/share/runtime/vmThread.cpp:177 > #10 0xffffae396958 in Thread::call_run() src/hotspot/share/runtime/thread.cpp:225 > #11 0xffffadba1b0c in thread_native_entry src/hotspot/os/linux/os_linux.cpp:846 > #12 0xffffb1a9d5c4 (/lib/aarch64-linux-gnu/libc.so.6+0x7d5c4) > #13 0xffffb1b05ed8 (/lib/aarch64-linux-gnu/libc.so.6+0xe5ed8) Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: introduce print_on_maybe_null ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19885/files - new: https://git.openjdk.org/jdk/pull/19885/files/530375fb..fb63c5a7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19885&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19885&range=00-01 Stats: 50 lines in 2 files changed: 9 ins; 31 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/19885.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19885/head:pull/19885 PR: https://git.openjdk.org/jdk/pull/19885 From mbaesken at openjdk.org Wed Jun 26 11:37:11 2024 From: mbaesken at openjdk.org (Matthias Baesken) Date: Wed, 26 Jun 2024 11:37:11 GMT Subject: RFR: 8333363: ubsan: instanceKlass.cpp: runtime error: member call on null pointer of type 'struct AnnotationArray' [v2] In-Reply-To: <1amBBr0FLAjGNuaS1wDLzvCrelfI1wFiT_WB61OCkZk=.92b61bc1-b69a-4f14-894b-538622bc60ef@github.com> References: <1amBBr0FLAjGNuaS1wDLzvCrelfI1wFiT_WB61OCkZk=.92b61bc1-b69a-4f14-894b-538622bc60ef@github.com> Message-ID: On Wed, 26 Jun 2024 11:26:25 GMT, Matthias Baesken wrote: >> With ubsan enabled binaries we run on Linux aarch64 and Linux x86_64 into this issue : >> >> runtime/CommandLine/PrintClasses_id0.jtr >> >> src/hotspot/share/oops/instanceKlass.cpp:3603:84: runtime error: member call on null pointer of type 'struct AnnotationArray' >> #0 0xfffface09b40 in InstanceKlass::print_on(outputStream*) const src/hotspot/share/oops/instanceKlass.cpp:3603 >> #1 0xffffacdcd088 in PrintClassClosure::do_klass(Klass*) src/hotspot/share/oops/instanceKlass.cpp:2228 >> #2 0xffffac464200 in ClassLoaderData::classes_do(KlassClosure*) src/hotspot/share/classfile/classLoaderData.cpp:387 >> #3 0xffffac475c4c in ClassLoaderDataGraph::classes_do(KlassClosure*) src/hotspot/share/classfile/classLoaderDataGraph.cpp:303 >> #4 0xffffac7bc4f4 in VM_PrintClasses::doit() src/hotspot/share/services/diagnosticCommand.cpp:989 >> #5 0xffffae599c88 in VM_Operation::evaluate() src/hotspot/share/runtime/vmOperations.cpp:75 >> #6 0xffffae5a5a14 in VMThread::evaluate_operation(VM_Operation*) src/hotspot/share/runtime/vmThread.cpp:283 >> #7 0xffffae5a779c in VMThread::inner_execute(VM_Operation*) src/hotspot/share/runtime/vmThread.cpp:427 >> #8 0xffffae5a7fd8 in VMThread::loop() src/hotspot/share/runtime/vmThread.cpp:493 >> #9 0xffffae5a80bc in VMThread::run() src/hotspot/share/runtime/vmThread.cpp:177 >> #10 0xffffae396958 in Thread::call_run() src/hotspot/share/runtime/thread.cpp:225 >> #11 0xffffadba1b0c in thread_native_entry src/hotspot/os/linux/os_linux.cpp:846 >> #12 0xffffb1a9d5c4 (/lib/aarch64-linux-gnu/libc.so.6+0x7d5c4) >> #13 0xffffb1b05ed8 (/lib/aarch64-linux-gnu/libc.so.6+0xe5ed8) > > Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: > > introduce print_on_maybe_null Hi Coleen, that print_on_maybe_null template is a great idea ! Added that and used it at the places where we check for nullptr. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19885#issuecomment-2191471123 From stefank at openjdk.org Wed Jun 26 11:55:13 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Wed, 26 Jun 2024 11:55:13 GMT Subject: RFR: 8333363: ubsan: instanceKlass.cpp: runtime error: member call on null pointer of type 'struct AnnotationArray' [v2] In-Reply-To: <1amBBr0FLAjGNuaS1wDLzvCrelfI1wFiT_WB61OCkZk=.92b61bc1-b69a-4f14-894b-538622bc60ef@github.com> References: <1amBBr0FLAjGNuaS1wDLzvCrelfI1wFiT_WB61OCkZk=.92b61bc1-b69a-4f14-894b-538622bc60ef@github.com> Message-ID: On Wed, 26 Jun 2024 11:26:25 GMT, Matthias Baesken wrote: >> With ubsan enabled binaries we run on Linux aarch64 and Linux x86_64 into this issue : >> >> runtime/CommandLine/PrintClasses_id0.jtr >> >> src/hotspot/share/oops/instanceKlass.cpp:3603:84: runtime error: member call on null pointer of type 'struct AnnotationArray' >> #0 0xfffface09b40 in InstanceKlass::print_on(outputStream*) const src/hotspot/share/oops/instanceKlass.cpp:3603 >> #1 0xffffacdcd088 in PrintClassClosure::do_klass(Klass*) src/hotspot/share/oops/instanceKlass.cpp:2228 >> #2 0xffffac464200 in ClassLoaderData::classes_do(KlassClosure*) src/hotspot/share/classfile/classLoaderData.cpp:387 >> #3 0xffffac475c4c in ClassLoaderDataGraph::classes_do(KlassClosure*) src/hotspot/share/classfile/classLoaderDataGraph.cpp:303 >> #4 0xffffac7bc4f4 in VM_PrintClasses::doit() src/hotspot/share/services/diagnosticCommand.cpp:989 >> #5 0xffffae599c88 in VM_Operation::evaluate() src/hotspot/share/runtime/vmOperations.cpp:75 >> #6 0xffffae5a5a14 in VMThread::evaluate_operation(VM_Operation*) src/hotspot/share/runtime/vmThread.cpp:283 >> #7 0xffffae5a779c in VMThread::inner_execute(VM_Operation*) src/hotspot/share/runtime/vmThread.cpp:427 >> #8 0xffffae5a7fd8 in VMThread::loop() src/hotspot/share/runtime/vmThread.cpp:493 >> #9 0xffffae5a80bc in VMThread::run() src/hotspot/share/runtime/vmThread.cpp:177 >> #10 0xffffae396958 in Thread::call_run() src/hotspot/share/runtime/thread.cpp:225 >> #11 0xffffadba1b0c in thread_native_entry src/hotspot/os/linux/os_linux.cpp:846 >> #12 0xffffb1a9d5c4 (/lib/aarch64-linux-gnu/libc.so.6+0x7d5c4) >> #13 0xffffb1b05ed8 (/lib/aarch64-linux-gnu/libc.so.6+0xe5ed8) > > Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: > > introduce print_on_maybe_null src/hotspot/share/oops/instanceKlass.cpp line 3600: > 3598: print_on_maybe_null(st, BULLET"class annotations: ", class_annotations()); > 3599: print_on_maybe_null(st, BULLET"class type annotations: ", class_type_annotations()); > 3600: print_on_maybe_null(st, BULLET"field annotations: ", fields_annotations()); The number of spaces in the string got changed here. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19885#discussion_r1654668686 From rehn at openjdk.org Wed Jun 26 12:07:14 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Wed, 26 Jun 2024 12:07:14 GMT Subject: RFR: 8332689: RISC-V: Use load instead of trampolines [v18] In-Reply-To: References: Message-ID: <61wdY_KwUmEChcpHb175olPpicC0P6hk9wCqHPSMyQQ=.9624f037-a3a7-4e34-93db-8c820d15731e@github.com> On Wed, 26 Jun 2024 08:11:37 GMT, Robbin Ehn wrote: >> Hi all, please consider! >> >> Today we do JAL to **dest** if **dest** is in reach (+/- 1 MB). >> Using a very small application or running very short time we have fast patchable calls. >> But any normal application running longer will increase the code size and code chrun/fragmentation. >> So whatever or not you get hot fast calls rely on luck. >> >> To be patchable and get code cache reach we also emit a stub trampoline which we can point the JAL to. >> This would be the common case for a patchable call. >> >> Code stream: >> JAL >> Stubs: >> AUIPC >> LD >> JALR >> >> >> >> On some CPUs L1D and L1I can't contain the same cache line, which means the tramopline stub can bounce from L1I->L1D->L1I, which is expensive. >> Even if you don't have that problem having a call to a jump is not the fastest way. >> Loading the address avoids the pitsfalls of cmodx. >> >> This patch suggest to solve the problems with trampolines, we take small penalty in the naive case of JAL to **dest**, >> and instead do by default: >> >> Code stream: >> AUIPC >> LD >> JALR >> Stubs: >> >> >> An experimental option for turning trampolines back on exists. >> >> It should be possible to enhanced this with the WIP [Zjid](https://github.com/riscv/riscv-j-extension) by changing the JALR to JAL and nop out the auipc+ld (as the current proposal of Zjid forces the I-fetcher to fetch instruction in order (meaning we will avoid a lot issues which arm has)) when in reach and vice-versa. >> >> Numbers from VF2 (I have done them a few times, they are always overall in favor of this patch): >> >> fop (msec) 2239 | 2128 = 0.950424 >> h2 (msec) 18660 | 16594 = 0.889282 >> jython (msec) 22022 | 21925 = 0.995595 >> luindex (msec) 2866 | 2842 = 0.991626 >> lusearch (msec) 4108 | 4311 = 1.04942 >> lusearch-fix (msec) 4406 | 4116 = 0.934181 >> pmd (msec) 5976 | 5897 = 0.98678 >> jython (msec) 22022 | 21925 = 0.995595 >> Avg: 0.974112 >> fop(xcomp) (msec) 2721 | 2714 = 0.997427 >> h2(xcomp) ... > > Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 26 commits: > > - Merge branch 'master' into 8332689 > - Comments > - Missed in merge-fixes, minor revert > - Merge branch 'master' into 8332689 > - Minor review comments > - Merge branch 'master' into 8332689 > - To be pushed > - Merge branch 'master' into 8332689 > - Review comments, removed dead code. > - Merge branch 'master' into 8332689 > - ... and 16 more: https://git.openjdk.org/jdk/compare/25c3845b...78a240a9 FYI, if nothing else as reference, rebased patched using constant pool here (sanity tested): https://github.com/robehn/jdk/compare/8332689...robehn:jdk:8332689_cp ------------- PR Comment: https://git.openjdk.org/jdk/pull/19453#issuecomment-2191522484 From mbaesken at openjdk.org Wed Jun 26 12:57:21 2024 From: mbaesken at openjdk.org (Matthias Baesken) Date: Wed, 26 Jun 2024 12:57:21 GMT Subject: RFR: 8333363: ubsan: instanceKlass.cpp: runtime error: member call on null pointer of type 'struct AnnotationArray' [v3] In-Reply-To: References: Message-ID: > With ubsan enabled binaries we run on Linux aarch64 and Linux x86_64 into this issue : > > runtime/CommandLine/PrintClasses_id0.jtr > > src/hotspot/share/oops/instanceKlass.cpp:3603:84: runtime error: member call on null pointer of type 'struct AnnotationArray' > #0 0xfffface09b40 in InstanceKlass::print_on(outputStream*) const src/hotspot/share/oops/instanceKlass.cpp:3603 > #1 0xffffacdcd088 in PrintClassClosure::do_klass(Klass*) src/hotspot/share/oops/instanceKlass.cpp:2228 > #2 0xffffac464200 in ClassLoaderData::classes_do(KlassClosure*) src/hotspot/share/classfile/classLoaderData.cpp:387 > #3 0xffffac475c4c in ClassLoaderDataGraph::classes_do(KlassClosure*) src/hotspot/share/classfile/classLoaderDataGraph.cpp:303 > #4 0xffffac7bc4f4 in VM_PrintClasses::doit() src/hotspot/share/services/diagnosticCommand.cpp:989 > #5 0xffffae599c88 in VM_Operation::evaluate() src/hotspot/share/runtime/vmOperations.cpp:75 > #6 0xffffae5a5a14 in VMThread::evaluate_operation(VM_Operation*) src/hotspot/share/runtime/vmThread.cpp:283 > #7 0xffffae5a779c in VMThread::inner_execute(VM_Operation*) src/hotspot/share/runtime/vmThread.cpp:427 > #8 0xffffae5a7fd8 in VMThread::loop() src/hotspot/share/runtime/vmThread.cpp:493 > #9 0xffffae5a80bc in VMThread::run() src/hotspot/share/runtime/vmThread.cpp:177 > #10 0xffffae396958 in Thread::call_run() src/hotspot/share/runtime/thread.cpp:225 > #11 0xffffadba1b0c in thread_native_entry src/hotspot/os/linux/os_linux.cpp:846 > #12 0xffffb1a9d5c4 (/lib/aarch64-linux-gnu/libc.so.6+0x7d5c4) > #13 0xffffb1b05ed8 (/lib/aarch64-linux-gnu/libc.so.6+0xe5ed8) Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: add some blanks removed before ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19885/files - new: https://git.openjdk.org/jdk/pull/19885/files/fb63c5a7..fd996422 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19885&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19885&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/19885.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19885/head:pull/19885 PR: https://git.openjdk.org/jdk/pull/19885 From coleenp at openjdk.org Wed Jun 26 13:04:09 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 26 Jun 2024 13:04:09 GMT Subject: RFR: 8333363: ubsan: instanceKlass.cpp: runtime error: member call on null pointer of type 'struct AnnotationArray' [v3] In-Reply-To: References: Message-ID: On Wed, 26 Jun 2024 12:57:21 GMT, Matthias Baesken wrote: >> With ubsan enabled binaries we run on Linux aarch64 and Linux x86_64 into this issue : >> >> runtime/CommandLine/PrintClasses_id0.jtr >> >> src/hotspot/share/oops/instanceKlass.cpp:3603:84: runtime error: member call on null pointer of type 'struct AnnotationArray' >> #0 0xfffface09b40 in InstanceKlass::print_on(outputStream*) const src/hotspot/share/oops/instanceKlass.cpp:3603 >> #1 0xffffacdcd088 in PrintClassClosure::do_klass(Klass*) src/hotspot/share/oops/instanceKlass.cpp:2228 >> #2 0xffffac464200 in ClassLoaderData::classes_do(KlassClosure*) src/hotspot/share/classfile/classLoaderData.cpp:387 >> #3 0xffffac475c4c in ClassLoaderDataGraph::classes_do(KlassClosure*) src/hotspot/share/classfile/classLoaderDataGraph.cpp:303 >> #4 0xffffac7bc4f4 in VM_PrintClasses::doit() src/hotspot/share/services/diagnosticCommand.cpp:989 >> #5 0xffffae599c88 in VM_Operation::evaluate() src/hotspot/share/runtime/vmOperations.cpp:75 >> #6 0xffffae5a5a14 in VMThread::evaluate_operation(VM_Operation*) src/hotspot/share/runtime/vmThread.cpp:283 >> #7 0xffffae5a779c in VMThread::inner_execute(VM_Operation*) src/hotspot/share/runtime/vmThread.cpp:427 >> #8 0xffffae5a7fd8 in VMThread::loop() src/hotspot/share/runtime/vmThread.cpp:493 >> #9 0xffffae5a80bc in VMThread::run() src/hotspot/share/runtime/vmThread.cpp:177 >> #10 0xffffae396958 in Thread::call_run() src/hotspot/share/runtime/thread.cpp:225 >> #11 0xffffadba1b0c in thread_native_entry src/hotspot/os/linux/os_linux.cpp:846 >> #12 0xffffb1a9d5c4 (/lib/aarch64-linux-gnu/libc.so.6+0x7d5c4) >> #13 0xffffb1b05ed8 (/lib/aarch64-linux-gnu/libc.so.6+0xe5ed8) > > Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: > > add some blanks removed before This looks good. I think print_on_maybe_null will come in handy. Thank you for doing this ubsan work. ------------- Marked as reviewed by coleenp (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19885#pullrequestreview-2141749927 From stefank at openjdk.org Wed Jun 26 13:08:10 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Wed, 26 Jun 2024 13:08:10 GMT Subject: RFR: 8333363: ubsan: instanceKlass.cpp: runtime error: member call on null pointer of type 'struct AnnotationArray' [v3] In-Reply-To: References: Message-ID: On Wed, 26 Jun 2024 12:57:21 GMT, Matthias Baesken wrote: >> With ubsan enabled binaries we run on Linux aarch64 and Linux x86_64 into this issue : >> >> runtime/CommandLine/PrintClasses_id0.jtr >> >> src/hotspot/share/oops/instanceKlass.cpp:3603:84: runtime error: member call on null pointer of type 'struct AnnotationArray' >> #0 0xfffface09b40 in InstanceKlass::print_on(outputStream*) const src/hotspot/share/oops/instanceKlass.cpp:3603 >> #1 0xffffacdcd088 in PrintClassClosure::do_klass(Klass*) src/hotspot/share/oops/instanceKlass.cpp:2228 >> #2 0xffffac464200 in ClassLoaderData::classes_do(KlassClosure*) src/hotspot/share/classfile/classLoaderData.cpp:387 >> #3 0xffffac475c4c in ClassLoaderDataGraph::classes_do(KlassClosure*) src/hotspot/share/classfile/classLoaderDataGraph.cpp:303 >> #4 0xffffac7bc4f4 in VM_PrintClasses::doit() src/hotspot/share/services/diagnosticCommand.cpp:989 >> #5 0xffffae599c88 in VM_Operation::evaluate() src/hotspot/share/runtime/vmOperations.cpp:75 >> #6 0xffffae5a5a14 in VMThread::evaluate_operation(VM_Operation*) src/hotspot/share/runtime/vmThread.cpp:283 >> #7 0xffffae5a779c in VMThread::inner_execute(VM_Operation*) src/hotspot/share/runtime/vmThread.cpp:427 >> #8 0xffffae5a7fd8 in VMThread::loop() src/hotspot/share/runtime/vmThread.cpp:493 >> #9 0xffffae5a80bc in VMThread::run() src/hotspot/share/runtime/vmThread.cpp:177 >> #10 0xffffae396958 in Thread::call_run() src/hotspot/share/runtime/thread.cpp:225 >> #11 0xffffadba1b0c in thread_native_entry src/hotspot/os/linux/os_linux.cpp:846 >> #12 0xffffb1a9d5c4 (/lib/aarch64-linux-gnu/libc.so.6+0x7d5c4) >> #13 0xffffb1b05ed8 (/lib/aarch64-linux-gnu/libc.so.6+0xe5ed8) > > Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: > > add some blanks removed before Marked as reviewed by stefank (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/19885#pullrequestreview-2141762302 From mbaesken at openjdk.org Wed Jun 26 13:37:36 2024 From: mbaesken at openjdk.org (Matthias Baesken) Date: Wed, 26 Jun 2024 13:37:36 GMT Subject: RFR: 8333144: docker tests do not work when ubsan is configured Message-ID: Currently when we run with ubsan - enabled binaries (configure option --enable-ubsan, see [JDK-8298448](https://bugs.openjdk.org/browse/JDK-8298448)), the docker tests do not work. We find this in the test output [STDOUT] /jdk/bin/java: error while loading shared libraries: libubsan.so.1: cannot open shared object file: No such file or directory The container where the test is executed does not contain the ubsan package; we might skip the test in this case. ------------- Commit messages: - JDK-8333144 Changes: https://git.openjdk.org/jdk/pull/19907/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19907&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8333144 Stats: 24 lines in 3 files changed: 20 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/19907.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19907/head:pull/19907 PR: https://git.openjdk.org/jdk/pull/19907 From aph at openjdk.org Wed Jun 26 15:27:17 2024 From: aph at openjdk.org (Andrew Haley) Date: Wed, 26 Jun 2024 15:27:17 GMT Subject: RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF [v8] In-Reply-To: References: <0cUurmXlMJ_B66Wy1umd2n4r9ve7_Q4WOU0ffMd8s5Y=.bbc93b65-382c-4139-aaec-cb835d94a06e@github.com> Message-ID: <8VJaOlJAI9YZcNj2vpZukR_wbiVJWE5Z0a1A3_El47Y=.f9f748e5-09ef-4185-a8d7-c8accaa17701@github.com> On Mon, 24 Jun 2024 15:37:43 GMT, Hamlin Li wrote: >> Hi, >> Can you help to review the patch? >> This pr is based on previous work and discussion in [pr 16234](https://github.com/openjdk/jdk/pull/16234), [pr 18294](https://github.com/openjdk/jdk/pull/18294). >> >> Compared with previous prs, the major change in this pr is to integrate the source of sleef (for the steps, please check `src/jdk.incubator.vector/linux/native/libvectormath/README`), rather than depends on external sleef things (header or lib) at build or run time. >> Besides of this change, also modify the previous changes accordingly, e.g. remove some uncessary files or changes especially in make dir of jdk. >> >> Besides of the code changes, one important task is to handle the legal process. >> >> Thanks! >> >> ## Performance >> NOTE: >> * `Src` means implementation in this pr, i.e. without depenency on external sleef. >> * `Disabled` means disable intrinsics by `-XX:-UseVectorStubs` >> * `system_sleef` means implementation in [previous pr 18294](https://github.com/openjdk/jdk/pull/18294), i.e. build and run jdk with depenency on external sleef. >> >> Basically, the perf data below shows that >> * this implementation has better performance than previous version in [pr 18294](https://github.com/openjdk/jdk/pull/18294), >> * and both sleef versions has much better performance compared with non-sleef version. >> >> |Benchmark |(size)|Src |Units|system_sleef|(system_sleef-Src)/Src|Diabled |(Disable-Src)/Src| >> |------------------------------|------|---------|-----|------------|----------------------|---------|-----------------| >> |3472:Double128Vector.ACOS |1024 |8546.842 |ns/op|8516.007 |-0.004 |16799.273|0.966 | >> |3473:Double128Vector.ASIN |1024 |6864.656 |ns/op|6987.328 |0.018 |16602.442|1.419 | >> |3474:Double128Vector.ATAN |1024 |11489.255|ns/op|12261.800 |0.067 |26329.320|1.292 | >> |3475:Double128Vector.ATAN2 |1024 |16661.170|ns/op|17234.472 |0.034 |42084.100|1.526 | >> |3476:Double128Vector.CBRT |1024 |18999.387|ns/op|20298.458 |0.068 |35998.688|0.895 | >> |3477:Double128Vector.COS |1024 |14081.857|ns/op|14846.117 |0.054 |24420.692|0.734 | >> |3478:Double128Vector.COSH |1024 |12202.306|ns/op|12237.772 |0.003 |21343.863|0.749 | >> |3479:Double128Vector.EXP |1024 |4553.108 |ns/op|4777.638 ... > > Hamlin Li has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 32 commits: > > - merge master > - sleef 3.6.1 for riscv > - sleef 3.6.1 > - update header files for arm > - add inline header file for riscv64 > - remove notes about sleef changes > - fix performance issue > - disable unused-function warnings; add log msg > - minor > - minor > - ... and 22 more: https://git.openjdk.org/jdk/compare/ed149062...fe4be2c6 Please tell me the exact command line you used to produce these benchmark results. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18605#issuecomment-2191988240 From sviswanathan at openjdk.org Wed Jun 26 15:27:23 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Wed, 26 Jun 2024 15:27:23 GMT Subject: [jdk23] RFR: 8333583: Crypto-XDH.generateSecret regression after JDK-8329538 In-Reply-To: References: Message-ID: On Tue, 25 Jun 2024 23:50:20 GMT, Volodymyr Paprotski wrote: > Hi all, > > This pull request contains a backport of commit [f101e153](https://github.com/openjdk/jdk/commit/f101e153cee68750fcf1f12da10e29806875b522) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. > > The commit being backported was authored by Volodymyr Paprotski on 25 Jun 2024 and was reviewed by Sandhya Viswanathan, Vladimir Kozlov, Ferenc Rakoczi and Anthony Scarpino. > > Thanks! Marked as reviewed by sviswanathan (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/19893#pullrequestreview-2142217862 From duke at openjdk.org Wed Jun 26 15:27:23 2024 From: duke at openjdk.org (Volodymyr Paprotski) Date: Wed, 26 Jun 2024 15:27:23 GMT Subject: [jdk23] Integrated: 8333583: Crypto-XDH.generateSecret regression after JDK-8329538 In-Reply-To: References: Message-ID: On Tue, 25 Jun 2024 23:50:20 GMT, Volodymyr Paprotski wrote: > Hi all, > > This pull request contains a backport of commit [f101e153](https://github.com/openjdk/jdk/commit/f101e153cee68750fcf1f12da10e29806875b522) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. > > The commit being backported was authored by Volodymyr Paprotski on 25 Jun 2024 and was reviewed by Sandhya Viswanathan, Vladimir Kozlov, Ferenc Rakoczi and Anthony Scarpino. > > Thanks! This pull request has now been integrated. Changeset: b5fbdb21 Author: Volodymyr Paprotski Committer: Sandhya Viswanathan URL: https://git.openjdk.org/jdk/commit/b5fbdb2166352df63d9d9f5481f3b079238d6f90 Stats: 125 lines in 9 files changed: 35 ins; 52 del; 38 mod 8333583: Crypto-XDH.generateSecret regression after JDK-8329538 Reviewed-by: thartmann, sviswanathan Backport-of: f101e153cee68750fcf1f12da10e29806875b522 ------------- PR: https://git.openjdk.org/jdk/pull/19893 From szaldana at openjdk.org Wed Jun 26 16:13:31 2024 From: szaldana at openjdk.org (Sonia Zaldana Calles) Date: Wed, 26 Jun 2024 16:13:31 GMT Subject: RFR: 8335108: Build error after JDK-8333658 due to class templates Message-ID: <6ljQH6-vh9FlOzTeKq4Satpqa4V-Jj-KYzh-4bmTWq8=.7b6ebf6a-4ed8-4a58-a5b6-88083dd4df6d@github.com> Hi all, This PR addresses [8335108](https://bugs.openjdk.org/browse/JDK-8335108). The error arises as template-id is not allowed for constructor/destructor in C++20. Testing: - [x] Compilation succeeds with g++ 14.1.1. Thanks, Sonia ------------- Commit messages: - 8335108: Build error after JDK-8333658 due to class templates Changes: https://git.openjdk.org/jdk/pull/19890/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19890&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8335108 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/19890.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19890/head:pull/19890 PR: https://git.openjdk.org/jdk/pull/19890 From jwaters at openjdk.org Wed Jun 26 16:13:31 2024 From: jwaters at openjdk.org (Julian Waters) Date: Wed, 26 Jun 2024 16:13:31 GMT Subject: RFR: 8335108: Build error after JDK-8333658 due to class templates In-Reply-To: <6ljQH6-vh9FlOzTeKq4Satpqa4V-Jj-KYzh-4bmTWq8=.7b6ebf6a-4ed8-4a58-a5b6-88083dd4df6d@github.com> References: <6ljQH6-vh9FlOzTeKq4Satpqa4V-Jj-KYzh-4bmTWq8=.7b6ebf6a-4ed8-4a58-a5b6-88083dd4df6d@github.com> Message-ID: On Tue, 25 Jun 2024 18:52:20 GMT, Sonia Zaldana Calles wrote: > Hi all, > > This PR addresses [8335108](https://bugs.openjdk.org/browse/JDK-8335108). > > The error arises as template-id is not allowed for constructor/destructor in C++20. > > Testing: > - [x] Compilation succeeds with g++ 14.1.1. > > Thanks, > Sonia Ah, this annoying g++ issue again. I honestly don't like that we have to adhere to C++20 without any of its benefits while being on C++14, but I think Kim has mentioned that this is a desirable warning to have, so approved ------------- Marked as reviewed by jwaters (Committer). PR Review: https://git.openjdk.org/jdk/pull/19890#pullrequestreview-2140672045 From jsjolen at openjdk.org Wed Jun 26 16:13:31 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Wed, 26 Jun 2024 16:13:31 GMT Subject: RFR: 8335108: Build error after JDK-8333658 due to class templates In-Reply-To: <6ljQH6-vh9FlOzTeKq4Satpqa4V-Jj-KYzh-4bmTWq8=.7b6ebf6a-4ed8-4a58-a5b6-88083dd4df6d@github.com> References: <6ljQH6-vh9FlOzTeKq4Satpqa4V-Jj-KYzh-4bmTWq8=.7b6ebf6a-4ed8-4a58-a5b6-88083dd4df6d@github.com> Message-ID: On Tue, 25 Jun 2024 18:52:20 GMT, Sonia Zaldana Calles wrote: > Hi all, > > This PR addresses [8335108](https://bugs.openjdk.org/browse/JDK-8335108). > > The error arises as template-id is not allowed for constructor/destructor in C++20. > > Testing: > - [x] Compilation succeeds with g++ 14.1.1. > > Thanks, > Sonia Still in draft, but LGTM. ------------- Marked as reviewed by jsjolen (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19890#pullrequestreview-2141343336 From szaldana at openjdk.org Wed Jun 26 16:23:13 2024 From: szaldana at openjdk.org (Sonia Zaldana Calles) Date: Wed, 26 Jun 2024 16:23:13 GMT Subject: Integrated: 8335108: Build error after JDK-8333658 due to class templates In-Reply-To: <6ljQH6-vh9FlOzTeKq4Satpqa4V-Jj-KYzh-4bmTWq8=.7b6ebf6a-4ed8-4a58-a5b6-88083dd4df6d@github.com> References: <6ljQH6-vh9FlOzTeKq4Satpqa4V-Jj-KYzh-4bmTWq8=.7b6ebf6a-4ed8-4a58-a5b6-88083dd4df6d@github.com> Message-ID: On Tue, 25 Jun 2024 18:52:20 GMT, Sonia Zaldana Calles wrote: > Hi all, > > This PR addresses [8335108](https://bugs.openjdk.org/browse/JDK-8335108). > > The error arises as template-id is not allowed for constructor/destructor in C++20. > > Testing: > - [x] Compilation succeeds with g++ 14.1.1. > > Thanks, > Sonia This pull request has now been integrated. Changeset: b5d58962 Author: Sonia Zaldana Calles Committer: Julian Waters URL: https://git.openjdk.org/jdk/commit/b5d589623c174757e946011495f771718318f1cc Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod 8335108: Build error after JDK-8333658 due to class templates Reviewed-by: jwaters, jsjolen ------------- PR: https://git.openjdk.org/jdk/pull/19890 From sroy at openjdk.org Wed Jun 26 16:45:25 2024 From: sroy at openjdk.org (Suchismith Roy) Date: Wed, 26 Jun 2024 16:45:25 GMT Subject: RFR: JDK-8331732 : [PPC64] Unify and optimize code which converts != 0 to 1 [v2] In-Reply-To: References: Message-ID: <1_yx1uaGpcU9s6yBHI6WHy-P5kOJS5xIK7NnQBkVM3E=.513f043b-786d-4120-998a-8f4d209cd858@github.com> > [JDK-8331732](https://bugs.openjdk.org/browse/JDK-8331732) > The template interpreter contains branch-free conversion code for T_BOOLEAN (TemplateInterpreterGenerator::generate_result_handler_for). > > SharedRuntime::generate_native_wrapper uses unoptimized code to "Unpack the native result" for T_BOOLEAN. > Power10 has the "setbc" / "setbcr" instruction. > > A new function has been created for the conversion and use "setbcr" on Power10 (determined by VM_Version::has_brw()) and otherwise the branch-free implementation. We should have a function for 32 and one for 64 bit operations (or one with supports both). > > The new code for MacroAssembler::verify_secondary_supers_table also uses the new function. Suchismith Roy has updated the pull request incrementally with one additional commit since the last revision: empty lines ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19886/files - new: https://git.openjdk.org/jdk/pull/19886/files/67e186de..ce552346 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19886&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19886&range=00-01 Stats: 2 lines in 2 files changed: 0 ins; 2 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/19886.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19886/head:pull/19886 PR: https://git.openjdk.org/jdk/pull/19886 From amitkumar at openjdk.org Wed Jun 26 16:45:25 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Wed, 26 Jun 2024 16:45:25 GMT Subject: RFR: JDK-8331732 : [PPC64] Unify and optimize code which converts != 0 to 1 [v2] In-Reply-To: <1_yx1uaGpcU9s6yBHI6WHy-P5kOJS5xIK7NnQBkVM3E=.513f043b-786d-4120-998a-8f4d209cd858@github.com> References: <1_yx1uaGpcU9s6yBHI6WHy-P5kOJS5xIK7NnQBkVM3E=.513f043b-786d-4120-998a-8f4d209cd858@github.com> Message-ID: On Wed, 26 Jun 2024 16:42:37 GMT, Suchismith Roy wrote: >> [JDK-8331732](https://bugs.openjdk.org/browse/JDK-8331732) >> The template interpreter contains branch-free conversion code for T_BOOLEAN (TemplateInterpreterGenerator::generate_result_handler_for). >> >> SharedRuntime::generate_native_wrapper uses unoptimized code to "Unpack the native result" for T_BOOLEAN. >> Power10 has the "setbc" / "setbcr" instruction. >> >> A new function has been created for the conversion and use "setbcr" on Power10 (determined by VM_Version::has_brw()) and otherwise the branch-free implementation. We should have a function for 32 and one for 64 bit operations (or one with supports both). >> >> The new code for MacroAssembler::verify_secondary_supers_table also uses the new function. > > Suchismith Roy has updated the pull request incrementally with one additional commit since the last revision: > > empty lines src/hotspot/cpu/ppc/macroAssembler_ppc.cpp line 2408: > 2406: > 2407: // convert !=0 to 1 > 2408: normalize_bool(result,R0,true); Suggestion: normalize_bool(result, R0, true); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19886#discussion_r1655210461 From sroy at openjdk.org Wed Jun 26 16:52:35 2024 From: sroy at openjdk.org (Suchismith Roy) Date: Wed, 26 Jun 2024 16:52:35 GMT Subject: RFR: JDK-8331732 : [PPC64] Unify and optimize code which converts != 0 to 1 [v3] In-Reply-To: References: Message-ID: <9V7t8GgSok8bSTA5zYzfy23vyFN-mQPL4BtYjraf76Y=.e03244df-b669-4a89-8532-fd524dc5aa54@github.com> > [JDK-8331732](https://bugs.openjdk.org/browse/JDK-8331732) > The template interpreter contains branch-free conversion code for T_BOOLEAN (TemplateInterpreterGenerator::generate_result_handler_for). > > SharedRuntime::generate_native_wrapper uses unoptimized code to "Unpack the native result" for T_BOOLEAN. > Power10 has the "setbc" / "setbcr" instruction. > > A new function has been created for the conversion and use "setbcr" on Power10 (determined by VM_Version::has_brw()) and otherwise the branch-free implementation. We should have a function for 32 and one for 64 bit operations (or one with supports both). > > The new code for MacroAssembler::verify_secondary_supers_table also uses the new function. Suchismith Roy has updated the pull request incrementally with one additional commit since the last revision: space after comma ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19886/files - new: https://git.openjdk.org/jdk/pull/19886/files/ce552346..5d052d19 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19886&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19886&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/19886.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19886/head:pull/19886 PR: https://git.openjdk.org/jdk/pull/19886 From amitkumar at openjdk.org Wed Jun 26 17:01:15 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Wed, 26 Jun 2024 17:01:15 GMT Subject: RFR: JDK-8331732 : [PPC64] Unify and optimize code which converts != 0 to 1 [v2] In-Reply-To: <1_yx1uaGpcU9s6yBHI6WHy-P5kOJS5xIK7NnQBkVM3E=.513f043b-786d-4120-998a-8f4d209cd858@github.com> References: <1_yx1uaGpcU9s6yBHI6WHy-P5kOJS5xIK7NnQBkVM3E=.513f043b-786d-4120-998a-8f4d209cd858@github.com> Message-ID: <-SRQP2arPvQSbKpKXBvKxo4OQwWK-O_vPPbVxyhR8Bg=.bf68f7aa-5466-426c-ac9f-78ce7eb8cf7c@github.com> On Wed, 26 Jun 2024 16:45:25 GMT, Suchismith Roy wrote: >> [JDK-8331732](https://bugs.openjdk.org/browse/JDK-8331732) >> The template interpreter contains branch-free conversion code for T_BOOLEAN (TemplateInterpreterGenerator::generate_result_handler_for). >> >> SharedRuntime::generate_native_wrapper uses unoptimized code to "Unpack the native result" for T_BOOLEAN. >> Power10 has the "setbc" / "setbcr" instruction. >> >> A new function has been created for the conversion and use "setbcr" on Power10 (determined by VM_Version::has_brw()) and otherwise the branch-free implementation. We should have a function for 32 and one for 64 bit operations (or one with supports both). >> >> The new code for MacroAssembler::verify_secondary_supers_table also uses the new function. > > Suchismith Roy has updated the pull request incrementally with one additional commit since the last revision: > > empty lines src/hotspot/cpu/ppc/macroAssembler_ppc.cpp line 349: > 347: // Set register dst to true if dst is non zero using temp for calculations on Power Version<10. > 348: // Set register dst to true if dst is non zero for Power 10 and above machines. > 349: void MacroAssembler::normalize_bool(Register dst, Register temp, bool use_64bit) { I would've used `is_64bit/is_long` instead of `use_64bit`. But that's personal choice and leave it upto you. src/hotspot/cpu/ppc/macroAssembler_ppc.cpp line 356: > 354: else > 355: cmpwi(CCR0, dst, 0); > 356: setbcr(dst, CCR0, Assembler::zero); This is what I understood after implementation & definition: If bit BI of the CR contains a 1, register RT is set to 0. Otherwise, register RT is set to 1. CCR0 will contain `1` when `dst == 0`. then `dst` will be set to `1` by `setbcr`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19886#discussion_r1655235075 PR Review Comment: https://git.openjdk.org/jdk/pull/19886#discussion_r1655232478 From thartmann at openjdk.org Wed Jun 26 17:17:11 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 26 Jun 2024 17:17:11 GMT Subject: RFR: 8334779: Test compiler/c1/CanonicalizeArrayLength.java is timing out [v2] In-Reply-To: References: Message-ID: On Wed, 26 Jun 2024 02:33:25 GMT, Vladimir Kozlov wrote: >> The timeout is cause by running the test with `-Xcomp -XX:+VerifyOops -XX:+PatchALot`. >> The test continuously deoptimize and recompile `java.lang.Throwable::` method. >> `-XX:+VerifyOops ` adds a lot of external addresses because it use ExternallAddress for error messages. >> These messages are unique for each call to `verify_oop()` because they are constructed locally. >> I reduced number of loop iteration by 10 to get reasonable execution time (2 mins instead of 20 mins) and got next data: >> >> Without VerifyOops: >> External addresses table: 38 entries >> >> With VerifyOops: >> External addresses table: 125922 entries >> >> Looks like most of the time VM is spending to grow/reallocate big growable array added by [JDK-8333819](https://bugs.openjdk.org/browse/JDK-8333819). >> Before that change these addresses were recorded locally in nmethod's relocation info. When nmethod was deoptimized, its relocation info was discarded together with nmethod. >> >> Only on x86 we declared message address as ExternalAddress. Aarch64 uses movptr() to store address as simple pointer. ARM uses own InlineAddress with RelocInfo::none type of relocation. I verified other platforms: none is using ExternalAddress. >> >> I suggest to use AddressLiteral with RelocInfo::none for x86. With that the global table is small even with -XX:+VerifyOops: >> >> External addresses table: 42 entries >> >> >> Tested tier1, run test with corresponding flags to verify that time is similar to before [JDK-8333819](https://bugs.openjdk.org/browse/JDK-8333819) > > Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: > > Use external_word_Relocation::spec_for_immediate() instead of relocInfo::none Looks good as stop-the-gap solution. Do you think there are opportunities to improve the GrowableArray situation? ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19871#pullrequestreview-2142517052 From kvn at openjdk.org Wed Jun 26 17:33:11 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 26 Jun 2024 17:33:11 GMT Subject: RFR: 8334779: Test compiler/c1/CanonicalizeArrayLength.java is timing out [v2] In-Reply-To: References: Message-ID: On Wed, 26 Jun 2024 17:14:24 GMT, Tobias Hartmann wrote: > Looks good as stop-the-gap solution. Do you think there are opportunities to improve the GrowableArray situation? I have already RFE for that: https://bugs.openjdk.org/browse/JDK-8334691 ------------- PR Comment: https://git.openjdk.org/jdk/pull/19871#issuecomment-2192266133 From vlivanov at openjdk.org Wed Jun 26 17:56:18 2024 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Wed, 26 Jun 2024 17:56:18 GMT Subject: RFR: 8304693: Remove -XX:-UseVtableBasedCHA Message-ID: JDK-8266074 introduced new CHA implementation and a flag (-XX:-UseVtableBasedCHA) to switch back to the original implementation for diagnostic purposes. Vtable-based CHA implementation has been turned on by default since 17 and now the time has come to remove UseVtableBasedCHA flag. Testing: hs-tier1 - hs-tier6 ------------- Commit messages: - Remove -XX:-UseVtableBasedCHA Changes: https://git.openjdk.org/jdk/pull/19911/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19911&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8304693 Stats: 166 lines in 11 files changed: 1 ins; 154 del; 11 mod Patch: https://git.openjdk.org/jdk/pull/19911.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19911/head:pull/19911 PR: https://git.openjdk.org/jdk/pull/19911 From kvn at openjdk.org Wed Jun 26 18:07:10 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 26 Jun 2024 18:07:10 GMT Subject: RFR: 8304693: Remove -XX:-UseVtableBasedCHA In-Reply-To: References: Message-ID: On Wed, 26 Jun 2024 17:47:57 GMT, Vladimir Ivanov wrote: > JDK-8266074 introduced new CHA implementation and a flag (-XX:-UseVtableBasedCHA) to switch back to the original implementation for diagnostic purposes. Vtable-based CHA implementation has been turned on by default since 17 and now the time has come to remove UseVtableBasedCHA flag. > > Testing: hs-tier1 - hs-tier6 Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19911#pullrequestreview-2142609114 From coleenp at openjdk.org Wed Jun 26 18:07:11 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 26 Jun 2024 18:07:11 GMT Subject: RFR: 8304693: Remove -XX:-UseVtableBasedCHA In-Reply-To: References: Message-ID: On Wed, 26 Jun 2024 17:47:57 GMT, Vladimir Ivanov wrote: > JDK-8266074 introduced new CHA implementation and a flag (-XX:-UseVtableBasedCHA) to switch back to the original implementation for diagnostic purposes. Vtable-based CHA implementation has been turned on by default since 17 and now the time has come to remove UseVtableBasedCHA flag. > > Testing: hs-tier1 - hs-tier6 Thanks for fixing the add_to_hierarchy code. src/hotspot/share/runtime/arguments.cpp line 525: > 523: > 524: { "HeapFirstMaximumCompactionCount", JDK_Version::undefined(), JDK_Version::jdk(24), JDK_Version::jdk(25) }, > 525: { "UseVtableBasedCHA", JDK_Version::undefined(), JDK_Version::jdk(24), JDK_Version::jdk(25) }, We don't usually put diagnostic VM options in this table. ------------- PR Review: https://git.openjdk.org/jdk/pull/19911#pullrequestreview-2142605312 PR Review Comment: https://git.openjdk.org/jdk/pull/19911#discussion_r1655309826 From vlivanov at openjdk.org Wed Jun 26 18:13:09 2024 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Wed, 26 Jun 2024 18:13:09 GMT Subject: RFR: 8304693: Remove -XX:-UseVtableBasedCHA In-Reply-To: References: Message-ID: On Wed, 26 Jun 2024 18:00:12 GMT, Coleen Phillimore wrote: >> JDK-8266074 introduced new CHA implementation and a flag (-XX:-UseVtableBasedCHA) to switch back to the original implementation for diagnostic purposes. Vtable-based CHA implementation has been turned on by default since 17 and now the time has come to remove UseVtableBasedCHA flag. >> >> Testing: hs-tier1 - hs-tier6 > > src/hotspot/share/runtime/arguments.cpp line 525: > >> 523: >> 524: { "HeapFirstMaximumCompactionCount", JDK_Version::undefined(), JDK_Version::jdk(24), JDK_Version::jdk(25) }, >> 525: { "UseVtableBasedCHA", JDK_Version::undefined(), JDK_Version::jdk(24), JDK_Version::jdk(25) }, > > We don't usually put diagnostic VM options in this table. Ok, I was confused by the following comment before the table: * To remove internal options (e.g. diagnostic, experimental, develop options), use * a 2-step model adding major release numbers to the obsolete and expire columns. Does it make sense to adjust it as well? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19911#discussion_r1655326163 From matsaave at openjdk.org Wed Jun 26 18:19:13 2024 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Wed, 26 Jun 2024 18:19:13 GMT Subject: RFR: 8309634: Resolve CONSTANT_MethodRef at CDS dump time [v2] In-Reply-To: References: Message-ID: On Wed, 26 Jun 2024 03:11:41 GMT, Ioi Lam wrote: >> Resolve `CONSTANT_MethodRef` entries during CDS dump time to improve start-up performance. >> >> - This PR uses the same framework introduced in #19355 and just added handling for methods. >> - Support for getstatic/putstatic/invokestatic will be done separately in [JDK-8334898](https://bugs.openjdk.org/browse/JDK-8334898) > > Ioi Lam has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - Merge branch 'master' into 8309634-resolve-methods-at-dumptime > - @calvinccheung and @matias9927 comments > - Fixed whitespaces > - 8309634: Resolve CONSTANT_MethodRef at CDS dump time Thanks for the updates! ------------- Marked as reviewed by matsaave (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19866#pullrequestreview-2142647343 From thartmann at openjdk.org Wed Jun 26 18:57:10 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 26 Jun 2024 18:57:10 GMT Subject: RFR: 8334779: Test compiler/c1/CanonicalizeArrayLength.java is timing out [v2] In-Reply-To: References: Message-ID: On Wed, 26 Jun 2024 02:33:25 GMT, Vladimir Kozlov wrote: >> The timeout is cause by running the test with `-Xcomp -XX:+VerifyOops -XX:+PatchALot`. >> The test continuously deoptimize and recompile `java.lang.Throwable::` method. >> `-XX:+VerifyOops ` adds a lot of external addresses because it use ExternallAddress for error messages. >> These messages are unique for each call to `verify_oop()` because they are constructed locally. >> I reduced number of loop iteration by 10 to get reasonable execution time (2 mins instead of 20 mins) and got next data: >> >> Without VerifyOops: >> External addresses table: 38 entries >> >> With VerifyOops: >> External addresses table: 125922 entries >> >> Looks like most of the time VM is spending to grow/reallocate big growable array added by [JDK-8333819](https://bugs.openjdk.org/browse/JDK-8333819). >> Before that change these addresses were recorded locally in nmethod's relocation info. When nmethod was deoptimized, its relocation info was discarded together with nmethod. >> >> Only on x86 we declared message address as ExternalAddress. Aarch64 uses movptr() to store address as simple pointer. ARM uses own InlineAddress with RelocInfo::none type of relocation. I verified other platforms: none is using ExternalAddress. >> >> I suggest to use AddressLiteral with RelocInfo::none for x86. With that the global table is small even with -XX:+VerifyOops: >> >> External addresses table: 42 entries >> >> >> Tested tier1, run test with corresponding flags to verify that time is similar to before [JDK-8333819](https://bugs.openjdk.org/browse/JDK-8333819) > > Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: > > Use external_word_Relocation::spec_for_immediate() instead of relocInfo::none Ah, perfect! Thanks. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19871#issuecomment-2192426384 From mdoerr at openjdk.org Wed Jun 26 19:39:11 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 26 Jun 2024 19:39:11 GMT Subject: RFR: 8331126: [s390x] secondary_super_cache does not scale well [v4] In-Reply-To: References: Message-ID: <_cOM0Ka4W1Lf9aDovwWevwf21OWhm9lg1p3GV8wmqro=.db3001ad-6a46-4c37-9203-9fca2762369a@github.com> On Mon, 24 Jun 2024 14:02:15 GMT, Lutz Schmidt wrote: >> Amit Kumar has updated the pull request incrementally with two additional commits since the last revision: >> >> - Update src/hotspot/cpu/s390/macroAssembler_s390.cpp >> - rename: r_scratch to r_result in repne_scan method > > src/hotspot/cpu/s390/macroAssembler_s390.cpp line 3275: > >> 3273: call_stub(StubRoutines::lookup_secondary_supers_table_slow_path_stub()); >> 3274: >> 3275: z_bru(L_done); // pass whatever result we got from a slow path > > This one branch could be saved by using "load immediate on condition". But it's after slow path processing. Right, looks like we only reach here with "false" condition or after return from the stub which should have set the condition code accordingly, too (should be checked / enforced!). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19544#discussion_r1655423255 From mdoerr at openjdk.org Wed Jun 26 19:43:11 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 26 Jun 2024 19:43:11 GMT Subject: RFR: 8331126: [s390x] secondary_super_cache does not scale well [v6] In-Reply-To: References: Message-ID: On Tue, 25 Jun 2024 14:19:44 GMT, Amit Kumar wrote: >> s390x Port for [JDK-8180450](https://bugs.openjdk.org/browse/JDK-8180450) >> >> I ran `tier1` test with `-XX:+UseSecondarySupersTable -XX:+VerifySecondarySupers -XX:+StressSecondarySupers` in fastdebug vm and didn't see any new failure appearing; Except one I have mentioned [here](https://github.com/openjdk/jdk/pull/19368#issuecomment-2154983693); But this is reproducible on every other architecture with these flags. >> >> >> Without Patch: >> >> SecondarySuperCacheHits.test avgt 15 0.929 ? 0.010 ns/op >> >> SecondarySuperCacheInterContention.test avgt 15 1.413 ? 0.007 ns/op >> SecondarySuperCacheInterContention.test:t1 avgt 15 1.415 ? 0.016 ns/op >> SecondarySuperCacheInterContention.test:t2 avgt 15 1.410 ? 0.017 ns/op >> >> Benchmark Mode Cnt Score Error Units >> SecondarySupersLookup.testNegative00 avgt 15 1.806 ? 0.325 ns/op >> SecondarySupersLookup.testNegative01 avgt 15 2.364 ? 0.236 ns/op >> SecondarySupersLookup.testNegative02 avgt 15 2.903 ? 0.215 ns/op >> SecondarySupersLookup.testNegative03 avgt 15 3.417 ? 0.199 ns/op >> SecondarySupersLookup.testNegative04 avgt 15 3.758 ? 0.102 ns/op >> SecondarySupersLookup.testNegative05 avgt 15 4.352 ? 0.123 ns/op >> SecondarySupersLookup.testNegative06 avgt 15 4.800 ? 0.099 ns/op >> SecondarySupersLookup.testNegative07 avgt 15 5.365 ? 0.060 ns/op >> SecondarySupersLookup.testNegative08 avgt 15 6.316 ? 0.092 ns/op >> SecondarySupersLookup.testNegative09 avgt 15 6.669 ? 0.164 ns/op >> SecondarySupersLookup.testNegative10 avgt 15 7.041 ? 0.164 ns/op >> SecondarySupersLookup.testNegative16 avgt 15 9.336 ? 0.185 ns/op >> SecondarySupersLookup.testNegative20 avgt 15 11.373 ? 0.029 ns/op >> SecondarySupersLookup.testNegative30 avgt 15 15.236 ? 0.051 ns/op >> SecondarySupersLookup.testNegative32 avgt 15 16.031 ? 0.091 ns/op >> SecondarySupersLookup.testNegative40 avgt 15 19.197 ? 0.279 ns/op >> SecondarySupersLookup.testNegative50 avgt 15 23.804 ? 2.387 ns/op >> SecondarySupersLookup.testNegative55 avgt 15 25.610 ? 1.155 ns/op >> SecondarySupersLookup.testNegative56 avgt 15 26.128 ? 2.203 ns/op >> SecondarySupersLookup.testNegative57 avgt 15 26.126 ? 0.881 ns/op >> SecondarySupersLookup.testNegative58 avgt 15 26.314 ? 0.521 ns/op >> SecondarySupersLookup.testNegative59 avgt 15 26.750 ? 0.837 ns/op >> SecondarySupersLookup.testNegative60 avgt 15 27.118 ? 0.557 ... > > Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: > > add2reg -> z_la src/hotspot/cpu/s390/macroAssembler_s390.cpp line 3328: > 3326: > 3327: // NOTE: please load 0 only in r_result, as this is also being used for z_locgr down > 3328: clear_reg(r_result, true /* whole_reg */, false /* set_cc */); // let's hope that search will be a success Is this needed? It should be already done by `lookup_secondary_supers_table`. Could be asm_assert'ed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19544#discussion_r1655427029 From mdoerr at openjdk.org Wed Jun 26 19:49:12 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 26 Jun 2024 19:49:12 GMT Subject: RFR: 8331126: [s390x] secondary_super_cache does not scale well [v6] In-Reply-To: References: Message-ID: On Tue, 25 Jun 2024 14:19:44 GMT, Amit Kumar wrote: >> s390x Port for [JDK-8180450](https://bugs.openjdk.org/browse/JDK-8180450) >> >> I ran `tier1` test with `-XX:+UseSecondarySupersTable -XX:+VerifySecondarySupers -XX:+StressSecondarySupers` in fastdebug vm and didn't see any new failure appearing; Except one I have mentioned [here](https://github.com/openjdk/jdk/pull/19368#issuecomment-2154983693); But this is reproducible on every other architecture with these flags. >> >> >> Without Patch: >> >> SecondarySuperCacheHits.test avgt 15 0.929 ? 0.010 ns/op >> >> SecondarySuperCacheInterContention.test avgt 15 1.413 ? 0.007 ns/op >> SecondarySuperCacheInterContention.test:t1 avgt 15 1.415 ? 0.016 ns/op >> SecondarySuperCacheInterContention.test:t2 avgt 15 1.410 ? 0.017 ns/op >> >> Benchmark Mode Cnt Score Error Units >> SecondarySupersLookup.testNegative00 avgt 15 1.806 ? 0.325 ns/op >> SecondarySupersLookup.testNegative01 avgt 15 2.364 ? 0.236 ns/op >> SecondarySupersLookup.testNegative02 avgt 15 2.903 ? 0.215 ns/op >> SecondarySupersLookup.testNegative03 avgt 15 3.417 ? 0.199 ns/op >> SecondarySupersLookup.testNegative04 avgt 15 3.758 ? 0.102 ns/op >> SecondarySupersLookup.testNegative05 avgt 15 4.352 ? 0.123 ns/op >> SecondarySupersLookup.testNegative06 avgt 15 4.800 ? 0.099 ns/op >> SecondarySupersLookup.testNegative07 avgt 15 5.365 ? 0.060 ns/op >> SecondarySupersLookup.testNegative08 avgt 15 6.316 ? 0.092 ns/op >> SecondarySupersLookup.testNegative09 avgt 15 6.669 ? 0.164 ns/op >> SecondarySupersLookup.testNegative10 avgt 15 7.041 ? 0.164 ns/op >> SecondarySupersLookup.testNegative16 avgt 15 9.336 ? 0.185 ns/op >> SecondarySupersLookup.testNegative20 avgt 15 11.373 ? 0.029 ns/op >> SecondarySupersLookup.testNegative30 avgt 15 15.236 ? 0.051 ns/op >> SecondarySupersLookup.testNegative32 avgt 15 16.031 ? 0.091 ns/op >> SecondarySupersLookup.testNegative40 avgt 15 19.197 ? 0.279 ns/op >> SecondarySupersLookup.testNegative50 avgt 15 23.804 ? 2.387 ns/op >> SecondarySupersLookup.testNegative55 avgt 15 25.610 ? 1.155 ns/op >> SecondarySupersLookup.testNegative56 avgt 15 26.128 ? 2.203 ns/op >> SecondarySupersLookup.testNegative57 avgt 15 26.126 ? 0.881 ns/op >> SecondarySupersLookup.testNegative58 avgt 15 26.314 ? 0.521 ns/op >> SecondarySupersLookup.testNegative59 avgt 15 26.750 ? 0.837 ns/op >> SecondarySupersLookup.testNegative60 avgt 15 27.118 ? 0.557 ... > > Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: > > add2reg -> z_la src/hotspot/cpu/s390/macroAssembler_s390.cpp line 3414: > 3412: #ifdef ASSERT > 3413: { > 3414: // r_result should have either 0 or 1 value Could be done with one check (clear bit 0). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19544#discussion_r1655432078 From mdoerr at openjdk.org Wed Jun 26 19:53:13 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 26 Jun 2024 19:53:13 GMT Subject: RFR: 8331126: [s390x] secondary_super_cache does not scale well [v6] In-Reply-To: References: Message-ID: On Tue, 25 Jun 2024 14:19:44 GMT, Amit Kumar wrote: >> s390x Port for [JDK-8180450](https://bugs.openjdk.org/browse/JDK-8180450) >> >> I ran `tier1` test with `-XX:+UseSecondarySupersTable -XX:+VerifySecondarySupers -XX:+StressSecondarySupers` in fastdebug vm and didn't see any new failure appearing; Except one I have mentioned [here](https://github.com/openjdk/jdk/pull/19368#issuecomment-2154983693); But this is reproducible on every other architecture with these flags. >> >> >> Without Patch: >> >> SecondarySuperCacheHits.test avgt 15 0.929 ? 0.010 ns/op >> >> SecondarySuperCacheInterContention.test avgt 15 1.413 ? 0.007 ns/op >> SecondarySuperCacheInterContention.test:t1 avgt 15 1.415 ? 0.016 ns/op >> SecondarySuperCacheInterContention.test:t2 avgt 15 1.410 ? 0.017 ns/op >> >> Benchmark Mode Cnt Score Error Units >> SecondarySupersLookup.testNegative00 avgt 15 1.806 ? 0.325 ns/op >> SecondarySupersLookup.testNegative01 avgt 15 2.364 ? 0.236 ns/op >> SecondarySupersLookup.testNegative02 avgt 15 2.903 ? 0.215 ns/op >> SecondarySupersLookup.testNegative03 avgt 15 3.417 ? 0.199 ns/op >> SecondarySupersLookup.testNegative04 avgt 15 3.758 ? 0.102 ns/op >> SecondarySupersLookup.testNegative05 avgt 15 4.352 ? 0.123 ns/op >> SecondarySupersLookup.testNegative06 avgt 15 4.800 ? 0.099 ns/op >> SecondarySupersLookup.testNegative07 avgt 15 5.365 ? 0.060 ns/op >> SecondarySupersLookup.testNegative08 avgt 15 6.316 ? 0.092 ns/op >> SecondarySupersLookup.testNegative09 avgt 15 6.669 ? 0.164 ns/op >> SecondarySupersLookup.testNegative10 avgt 15 7.041 ? 0.164 ns/op >> SecondarySupersLookup.testNegative16 avgt 15 9.336 ? 0.185 ns/op >> SecondarySupersLookup.testNegative20 avgt 15 11.373 ? 0.029 ns/op >> SecondarySupersLookup.testNegative30 avgt 15 15.236 ? 0.051 ns/op >> SecondarySupersLookup.testNegative32 avgt 15 16.031 ? 0.091 ns/op >> SecondarySupersLookup.testNegative40 avgt 15 19.197 ? 0.279 ns/op >> SecondarySupersLookup.testNegative50 avgt 15 23.804 ? 2.387 ns/op >> SecondarySupersLookup.testNegative55 avgt 15 25.610 ? 1.155 ns/op >> SecondarySupersLookup.testNegative56 avgt 15 26.128 ? 2.203 ns/op >> SecondarySupersLookup.testNegative57 avgt 15 26.126 ? 0.881 ns/op >> SecondarySupersLookup.testNegative58 avgt 15 26.314 ? 0.521 ns/op >> SecondarySupersLookup.testNegative59 avgt 15 26.750 ? 0.837 ns/op >> SecondarySupersLookup.testNegative60 avgt 15 27.118 ? 0.557 ... > > Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: > > add2reg -> z_la src/hotspot/cpu/s390/macroAssembler_s390.cpp line 3452: > 3450: z_lgr(Z_ARG4, r_result); > 3451: const char* msg = "mismatch"; > 3452: load_const_optimized(Z_ARG5, (address)msg); Did you test this? It breaks when you have a register collision (see my assert_different_registers on PPC64). You can test it by removing the `z_bre` above and checking if the arguments are correct. src/hotspot/cpu/s390/stubGenerator_s390.cpp line 720: > 718: r_result = Z_R11; > 719: address start = __ pc(); > 720: Extra empty line. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19544#discussion_r1655435498 PR Review Comment: https://git.openjdk.org/jdk/pull/19544#discussion_r1655436383 From coleenp at openjdk.org Wed Jun 26 20:21:11 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 26 Jun 2024 20:21:11 GMT Subject: RFR: 8304693: Remove -XX:-UseVtableBasedCHA In-Reply-To: References: Message-ID: On Wed, 26 Jun 2024 18:10:29 GMT, Vladimir Ivanov wrote: >> src/hotspot/share/runtime/arguments.cpp line 525: >> >>> 523: >>> 524: { "HeapFirstMaximumCompactionCount", JDK_Version::undefined(), JDK_Version::jdk(24), JDK_Version::jdk(25) }, >>> 525: { "UseVtableBasedCHA", JDK_Version::undefined(), JDK_Version::jdk(24), JDK_Version::jdk(25) }, >> >> We don't usually put diagnostic VM options in this table. > > Ok, I was confused by the following comment before the table: > > * To remove internal options (e.g. diagnostic, experimental, develop options), use > * a 2-step model adding major release numbers to the obsolete and expire columns. > > Does it make sense to adjust it as well? Oh, right, that is what the comment says. I'd forgotten about the 2-step process. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19911#discussion_r1655468297 From coleenp at openjdk.org Wed Jun 26 20:21:10 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 26 Jun 2024 20:21:10 GMT Subject: RFR: 8304693: Remove -XX:-UseVtableBasedCHA In-Reply-To: References: Message-ID: <8DkQtn1MHl-KJPxmCAK3AqFvJLlSzAxs8RzIYOy3bSk=.03bfd636-2db5-4c06-973d-aff8ff496fd9@github.com> On Wed, 26 Jun 2024 17:47:57 GMT, Vladimir Ivanov wrote: > JDK-8266074 introduced new CHA implementation and a flag (-XX:-UseVtableBasedCHA) to switch back to the original implementation for diagnostic purposes. Vtable-based CHA implementation has been turned on by default since 17 and now the time has come to remove UseVtableBasedCHA flag. > > Testing: hs-tier1 - hs-tier6 Looks good! ------------- Marked as reviewed by coleenp (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19911#pullrequestreview-2142870633 From kvn at openjdk.org Thu Jun 27 01:27:08 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 27 Jun 2024 01:27:08 GMT Subject: RFR: 8334779: Test compiler/c1/CanonicalizeArrayLength.java is timing out In-Reply-To: References: Message-ID: <_bkPt1j_Z_2J04OOgUhM33iYh6k1nu9hClCPVLPv154=.6dfd420f-ebe3-48fe-ab86-a65e4618c651@github.com> On Tue, 25 Jun 2024 23:36:18 GMT, Dean Long wrote: >> The timeout is cause by running the test with `-Xcomp -XX:+VerifyOops -XX:+PatchALot`. >> The test continuously deoptimize and recompile `java.lang.Throwable::` method. >> `-XX:+VerifyOops ` adds a lot of external addresses because it use ExternallAddress for error messages. >> These messages are unique for each call to `verify_oop()` because they are constructed locally. >> I reduced number of loop iteration by 10 to get reasonable execution time (2 mins instead of 20 mins) and got next data: >> >> Without VerifyOops: >> External addresses table: 38 entries >> >> With VerifyOops: >> External addresses table: 125922 entries >> >> Looks like most of the time VM is spending to grow/reallocate big growable array added by [JDK-8333819](https://bugs.openjdk.org/browse/JDK-8333819). >> Before that change these addresses were recorded locally in nmethod's relocation info. When nmethod was deoptimized, its relocation info was discarded together with nmethod. >> >> Only on x86 we declared message address as ExternalAddress. Aarch64 uses movptr() to store address as simple pointer. ARM uses own InlineAddress with RelocInfo::none type of relocation. I verified other platforms: none is using ExternalAddress. >> >> I suggest to use AddressLiteral with RelocInfo::none for x86. With that the global table is small even with -XX:+VerifyOops: >> >> External addresses table: 42 entries >> >> >> Tested tier1, run test with corresponding flags to verify that time is similar to before [JDK-8333819](https://bugs.openjdk.org/browse/JDK-8333819) > > I forgot about spec_for_immediate(). I think it will work. For relocating in Leyden, you may need to enhance > Relocation::pd_get_address_from_code() to recognize pushptr(). @dean-long, are you fine with latest version? ------------- PR Comment: https://git.openjdk.org/jdk/pull/19871#issuecomment-2192881168 From amitkumar at openjdk.org Thu Jun 27 02:55:11 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Thu, 27 Jun 2024 02:55:11 GMT Subject: RFR: 8331126: [s390x] secondary_super_cache does not scale well [v6] In-Reply-To: References: Message-ID: On Wed, 26 Jun 2024 19:49:29 GMT, Martin Doerr wrote: >> Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: >> >> add2reg -> z_la > > src/hotspot/cpu/s390/macroAssembler_s390.cpp line 3452: > >> 3450: z_lgr(Z_ARG4, r_result); >> 3451: const char* msg = "mismatch"; >> 3452: load_const_optimized(Z_ARG5, (address)msg); > > Did you test this? It breaks when you have a register collision (see my assert_different_registers on PPC64). You can test it by removing the `z_bre` above and checking if the arguments are correct. I found one log, from when I was implementing this: fatal error: mismatch: java.lang.Integer implements java.util.concurrent.Callable: is_subtype_of: 0; linear_search: 0; table_lookup: 1 Context: repne_scan wasn't working properly, table_lookup came up to "1" and repne_scan returned "0" so check failed. For surety let me just add the asserts as well and see what happens. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19544#discussion_r1655981907 From dlong at openjdk.org Thu Jun 27 03:21:14 2024 From: dlong at openjdk.org (Dean Long) Date: Thu, 27 Jun 2024 03:21:14 GMT Subject: RFR: 8334779: Test compiler/c1/CanonicalizeArrayLength.java is timing out [v2] In-Reply-To: References: Message-ID: On Wed, 26 Jun 2024 02:33:25 GMT, Vladimir Kozlov wrote: >> The timeout is cause by running the test with `-Xcomp -XX:+VerifyOops -XX:+PatchALot`. >> The test continuously deoptimize and recompile `java.lang.Throwable::` method. >> `-XX:+VerifyOops ` adds a lot of external addresses because it use ExternallAddress for error messages. >> These messages are unique for each call to `verify_oop()` because they are constructed locally. >> I reduced number of loop iteration by 10 to get reasonable execution time (2 mins instead of 20 mins) and got next data: >> >> Without VerifyOops: >> External addresses table: 38 entries >> >> With VerifyOops: >> External addresses table: 125922 entries >> >> Looks like most of the time VM is spending to grow/reallocate big growable array added by [JDK-8333819](https://bugs.openjdk.org/browse/JDK-8333819). >> Before that change these addresses were recorded locally in nmethod's relocation info. When nmethod was deoptimized, its relocation info was discarded together with nmethod. >> >> Only on x86 we declared message address as ExternalAddress. Aarch64 uses movptr() to store address as simple pointer. ARM uses own InlineAddress with RelocInfo::none type of relocation. I verified other platforms: none is using ExternalAddress. >> >> I suggest to use AddressLiteral with RelocInfo::none for x86. With that the global table is small even with -XX:+VerifyOops: >> >> External addresses table: 42 entries >> >> >> Tested tier1, run test with corresponding flags to verify that time is similar to before [JDK-8333819](https://bugs.openjdk.org/browse/JDK-8333819) > > Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: > > Use external_word_Relocation::spec_for_immediate() instead of relocInfo::none Looks good. ------------- Marked as reviewed by dlong (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19871#pullrequestreview-2143801743 From kvn at openjdk.org Thu Jun 27 03:38:17 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 27 Jun 2024 03:38:17 GMT Subject: RFR: 8334779: Test compiler/c1/CanonicalizeArrayLength.java is timing out [v2] In-Reply-To: References: Message-ID: On Wed, 26 Jun 2024 02:33:25 GMT, Vladimir Kozlov wrote: >> The timeout is cause by running the test with `-Xcomp -XX:+VerifyOops -XX:+PatchALot`. >> The test continuously deoptimize and recompile `java.lang.Throwable::` method. >> `-XX:+VerifyOops ` adds a lot of external addresses because it use ExternallAddress for error messages. >> These messages are unique for each call to `verify_oop()` because they are constructed locally. >> I reduced number of loop iteration by 10 to get reasonable execution time (2 mins instead of 20 mins) and got next data: >> >> Without VerifyOops: >> External addresses table: 38 entries >> >> With VerifyOops: >> External addresses table: 125922 entries >> >> Looks like most of the time VM is spending to grow/reallocate big growable array added by [JDK-8333819](https://bugs.openjdk.org/browse/JDK-8333819). >> Before that change these addresses were recorded locally in nmethod's relocation info. When nmethod was deoptimized, its relocation info was discarded together with nmethod. >> >> Only on x86 we declared message address as ExternalAddress. Aarch64 uses movptr() to store address as simple pointer. ARM uses own InlineAddress with RelocInfo::none type of relocation. I verified other platforms: none is using ExternalAddress. >> >> I suggest to use AddressLiteral with RelocInfo::none for x86. With that the global table is small even with -XX:+VerifyOops: >> >> External addresses table: 42 entries >> >> >> Tested tier1, run test with corresponding flags to verify that time is similar to before [JDK-8333819](https://bugs.openjdk.org/browse/JDK-8333819) > > Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: > > Use external_word_Relocation::spec_for_immediate() instead of relocInfo::none Thank you, Dean and Tobias, for reviews. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19871#issuecomment-2193313905 From kvn at openjdk.org Thu Jun 27 03:38:18 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 27 Jun 2024 03:38:18 GMT Subject: Integrated: 8334779: Test compiler/c1/CanonicalizeArrayLength.java is timing out In-Reply-To: References: Message-ID: On Mon, 24 Jun 2024 23:40:22 GMT, Vladimir Kozlov wrote: > The timeout is cause by running the test with `-Xcomp -XX:+VerifyOops -XX:+PatchALot`. > The test continuously deoptimize and recompile `java.lang.Throwable::` method. > `-XX:+VerifyOops ` adds a lot of external addresses because it use ExternallAddress for error messages. > These messages are unique for each call to `verify_oop()` because they are constructed locally. > I reduced number of loop iteration by 10 to get reasonable execution time (2 mins instead of 20 mins) and got next data: > > Without VerifyOops: > External addresses table: 38 entries > > With VerifyOops: > External addresses table: 125922 entries > > Looks like most of the time VM is spending to grow/reallocate big growable array added by [JDK-8333819](https://bugs.openjdk.org/browse/JDK-8333819). > Before that change these addresses were recorded locally in nmethod's relocation info. When nmethod was deoptimized, its relocation info was discarded together with nmethod. > > Only on x86 we declared message address as ExternalAddress. Aarch64 uses movptr() to store address as simple pointer. ARM uses own InlineAddress with RelocInfo::none type of relocation. I verified other platforms: none is using ExternalAddress. > > I suggest to use AddressLiteral with RelocInfo::none for x86. With that the global table is small even with -XX:+VerifyOops: > > External addresses table: 42 entries > > > Tested tier1, run test with corresponding flags to verify that time is similar to before [JDK-8333819](https://bugs.openjdk.org/browse/JDK-8333819) This pull request has now been integrated. Changeset: 6682305e Author: Vladimir Kozlov URL: https://git.openjdk.org/jdk/commit/6682305ee21cf595ec953d95bea594734a2982a8 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod 8334779: Test compiler/c1/CanonicalizeArrayLength.java is timing out Reviewed-by: thartmann, dlong ------------- PR: https://git.openjdk.org/jdk/pull/19871 From kbarrett at openjdk.org Thu Jun 27 04:16:12 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Thu, 27 Jun 2024 04:16:12 GMT Subject: RFR: 8335108: Build error after JDK-8333658 due to class templates In-Reply-To: <6ljQH6-vh9FlOzTeKq4Satpqa4V-Jj-KYzh-4bmTWq8=.7b6ebf6a-4ed8-4a58-a5b6-88083dd4df6d@github.com> References: <6ljQH6-vh9FlOzTeKq4Satpqa4V-Jj-KYzh-4bmTWq8=.7b6ebf6a-4ed8-4a58-a5b6-88083dd4df6d@github.com> Message-ID: On Tue, 25 Jun 2024 18:52:20 GMT, Sonia Zaldana Calles wrote: > Hi all, > > This PR addresses [8335108](https://bugs.openjdk.org/browse/JDK-8335108). > > The error arises as template-id is not allowed for constructor/destructor in C++20. > > Testing: > - [x] Compilation succeeds with g++ 14.1.1. > > Thanks, > Sonia Coming late to the party, since y'all forgot the 24 hour rule for integrating changes, unless there is mutual agreement that it's a trivial change. Change looks good, and indeed I'd have agreed with a suggestion that it's trivial. I think fixing this kind of thing is worthwhile since the template parameters add nothing and just promote confusion. (Which is likely why the later standard removed them.) We might use C++20 someday, but I'm guessing not soon; there are a fair number of changes that impact us. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19890#issuecomment-2193625497 From jwaters at openjdk.org Thu Jun 27 05:29:17 2024 From: jwaters at openjdk.org (Julian Waters) Date: Thu, 27 Jun 2024 05:29:17 GMT Subject: RFR: 8335108: Build error after JDK-8333658 due to class templates In-Reply-To: <6ljQH6-vh9FlOzTeKq4Satpqa4V-Jj-KYzh-4bmTWq8=.7b6ebf6a-4ed8-4a58-a5b6-88083dd4df6d@github.com> References: <6ljQH6-vh9FlOzTeKq4Satpqa4V-Jj-KYzh-4bmTWq8=.7b6ebf6a-4ed8-4a58-a5b6-88083dd4df6d@github.com> Message-ID: On Tue, 25 Jun 2024 18:52:20 GMT, Sonia Zaldana Calles wrote: > Hi all, > > This PR addresses [8335108](https://bugs.openjdk.org/browse/JDK-8335108). > > The error arises as template-id is not allowed for constructor/destructor in C++20. > > Testing: > - [x] Compilation succeeds with g++ 14.1.1. > > Thanks, > Sonia I guess you learn something new every day, since this is the first time I'm hearing about the 24 hour rule (I always thought the only HotSpot requirement was 2 Reviewers, besides needing a 2 week waiting periods for Style Guide changes, never knew you had to wait 24 hours for regular ones too). It's a good thing this change was trivial in the end ------------- PR Comment: https://git.openjdk.org/jdk/pull/19890#issuecomment-2193751661 From kbarrett at openjdk.org Thu Jun 27 05:57:13 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Thu, 27 Jun 2024 05:57:13 GMT Subject: RFR: 8335108: Build error after JDK-8333658 due to class templates In-Reply-To: References: <6ljQH6-vh9FlOzTeKq4Satpqa4V-Jj-KYzh-4bmTWq8=.7b6ebf6a-4ed8-4a58-a5b6-88083dd4df6d@github.com> Message-ID: On Thu, 27 Jun 2024 05:26:12 GMT, Julian Waters wrote: > I guess you learn something new every day, since this is the first time I'm hearing about the 24 hour rule (I always thought the only HotSpot requirement was 2 Reviewers, besides needing a 2 week waiting periods for Style Guide changes, never knew you had to wait 24 hours for regular ones too). It's a good thing this change was trivial in the end It's a JDK rule, not specific to HotSpot. https://openjdk.org/guide/ Life of a PR 6. Allow enough time for review https://github.com/openjdk/guide/blame/95f1cd24c657c7837c359c7ba1a80e15319bcd15/src/guide/working-with-pull-requests.md#L77-L80 I don't remember where that requirement originally came from, before being incorporated into the guide. I thought there was someplace old but public that prescribed 1 Reviewer and 24 hours, but can't find it right now. Maybe it was in the openjdk.org contributing or sponsoring pages and got cleaned up when added to the guide? ------------- PR Comment: https://git.openjdk.org/jdk/pull/19890#issuecomment-2193860942 From stuefe at openjdk.org Thu Jun 27 06:19:19 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 27 Jun 2024 06:19:19 GMT Subject: RFR: 8335108: Build error after JDK-8333658 due to class templates In-Reply-To: References: <6ljQH6-vh9FlOzTeKq4Satpqa4V-Jj-KYzh-4bmTWq8=.7b6ebf6a-4ed8-4a58-a5b6-88083dd4df6d@github.com> Message-ID: On Thu, 27 Jun 2024 05:54:43 GMT, Kim Barrett wrote: > I guess you learn something new every day, since this is the first time I'm hearing about the 24 hour rule (I always thought the only HotSpot requirement was 2 Reviewers, besides needing a 2 week waiting periods for Style Guide changes, never knew you had to wait 24 hours for regular ones too). It's a good thing this change was trivial in the end Its a rule that makes a lot of sense. Especially nowadays, post the xz-fiasco. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19890#issuecomment-2193886056 From fyang at openjdk.org Thu Jun 27 06:43:13 2024 From: fyang at openjdk.org (Fei Yang) Date: Thu, 27 Jun 2024 06:43:13 GMT Subject: RFR: 8332689: RISC-V: Use load instead of trampolines [v16] In-Reply-To: <134O9aga12QdNZ6_n7eDAmaQdP2BIZEsyFE96LIiFgE=.e4ce4c70-1505-4f73-a16c-7dcea05905f1@github.com> References: <134O9aga12QdNZ6_n7eDAmaQdP2BIZEsyFE96LIiFgE=.e4ce4c70-1505-4f73-a16c-7dcea05905f1@github.com> Message-ID: On Wed, 26 Jun 2024 08:56:27 GMT, Robbin Ehn wrote: >> Sounds reasonable. I guess this change won't affect performance much as it's call to the slow path stub. I will do JMH for this case to see. > > Thanks! No obvious impact on test/micro/org/openjdk/bench/vm/lang/SecondarySupersLookup.java. So I think we are safe with this change. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19453#discussion_r1656542284 From mbaesken at openjdk.org Thu Jun 27 06:55:14 2024 From: mbaesken at openjdk.org (Matthias Baesken) Date: Thu, 27 Jun 2024 06:55:14 GMT Subject: RFR: 8333363: ubsan: instanceKlass.cpp: runtime error: member call on null pointer of type 'struct AnnotationArray' [v3] In-Reply-To: References: Message-ID: <_BVhQz0wU13FRokC3PfQ0S6xYtkxm566TghaaipZuu0=.528bc7a7-f8e6-4035-a980-903408ee8b6a@github.com> On Wed, 26 Jun 2024 12:57:21 GMT, Matthias Baesken wrote: >> With ubsan enabled binaries we run on Linux aarch64 and Linux x86_64 into this issue : >> >> runtime/CommandLine/PrintClasses_id0.jtr >> >> src/hotspot/share/oops/instanceKlass.cpp:3603:84: runtime error: member call on null pointer of type 'struct AnnotationArray' >> #0 0xfffface09b40 in InstanceKlass::print_on(outputStream*) const src/hotspot/share/oops/instanceKlass.cpp:3603 >> #1 0xffffacdcd088 in PrintClassClosure::do_klass(Klass*) src/hotspot/share/oops/instanceKlass.cpp:2228 >> #2 0xffffac464200 in ClassLoaderData::classes_do(KlassClosure*) src/hotspot/share/classfile/classLoaderData.cpp:387 >> #3 0xffffac475c4c in ClassLoaderDataGraph::classes_do(KlassClosure*) src/hotspot/share/classfile/classLoaderDataGraph.cpp:303 >> #4 0xffffac7bc4f4 in VM_PrintClasses::doit() src/hotspot/share/services/diagnosticCommand.cpp:989 >> #5 0xffffae599c88 in VM_Operation::evaluate() src/hotspot/share/runtime/vmOperations.cpp:75 >> #6 0xffffae5a5a14 in VMThread::evaluate_operation(VM_Operation*) src/hotspot/share/runtime/vmThread.cpp:283 >> #7 0xffffae5a779c in VMThread::inner_execute(VM_Operation*) src/hotspot/share/runtime/vmThread.cpp:427 >> #8 0xffffae5a7fd8 in VMThread::loop() src/hotspot/share/runtime/vmThread.cpp:493 >> #9 0xffffae5a80bc in VMThread::run() src/hotspot/share/runtime/vmThread.cpp:177 >> #10 0xffffae396958 in Thread::call_run() src/hotspot/share/runtime/thread.cpp:225 >> #11 0xffffadba1b0c in thread_native_entry src/hotspot/os/linux/os_linux.cpp:846 >> #12 0xffffb1a9d5c4 (/lib/aarch64-linux-gnu/libc.so.6+0x7d5c4) >> #13 0xffffb1b05ed8 (/lib/aarch64-linux-gnu/libc.so.6+0xe5ed8) > > Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: > > add some blanks removed before Hi Coleen and Stefan, thanks for the reviews ! ------------- PR Comment: https://git.openjdk.org/jdk/pull/19885#issuecomment-2193932098 From mbaesken at openjdk.org Thu Jun 27 06:55:14 2024 From: mbaesken at openjdk.org (Matthias Baesken) Date: Thu, 27 Jun 2024 06:55:14 GMT Subject: Integrated: 8333363: ubsan: instanceKlass.cpp: runtime error: member call on null pointer of type 'struct AnnotationArray' In-Reply-To: References: Message-ID: On Tue, 25 Jun 2024 15:00:03 GMT, Matthias Baesken wrote: > With ubsan enabled binaries we run on Linux aarch64 and Linux x86_64 into this issue : > > runtime/CommandLine/PrintClasses_id0.jtr > > src/hotspot/share/oops/instanceKlass.cpp:3603:84: runtime error: member call on null pointer of type 'struct AnnotationArray' > #0 0xfffface09b40 in InstanceKlass::print_on(outputStream*) const src/hotspot/share/oops/instanceKlass.cpp:3603 > #1 0xffffacdcd088 in PrintClassClosure::do_klass(Klass*) src/hotspot/share/oops/instanceKlass.cpp:2228 > #2 0xffffac464200 in ClassLoaderData::classes_do(KlassClosure*) src/hotspot/share/classfile/classLoaderData.cpp:387 > #3 0xffffac475c4c in ClassLoaderDataGraph::classes_do(KlassClosure*) src/hotspot/share/classfile/classLoaderDataGraph.cpp:303 > #4 0xffffac7bc4f4 in VM_PrintClasses::doit() src/hotspot/share/services/diagnosticCommand.cpp:989 > #5 0xffffae599c88 in VM_Operation::evaluate() src/hotspot/share/runtime/vmOperations.cpp:75 > #6 0xffffae5a5a14 in VMThread::evaluate_operation(VM_Operation*) src/hotspot/share/runtime/vmThread.cpp:283 > #7 0xffffae5a779c in VMThread::inner_execute(VM_Operation*) src/hotspot/share/runtime/vmThread.cpp:427 > #8 0xffffae5a7fd8 in VMThread::loop() src/hotspot/share/runtime/vmThread.cpp:493 > #9 0xffffae5a80bc in VMThread::run() src/hotspot/share/runtime/vmThread.cpp:177 > #10 0xffffae396958 in Thread::call_run() src/hotspot/share/runtime/thread.cpp:225 > #11 0xffffadba1b0c in thread_native_entry src/hotspot/os/linux/os_linux.cpp:846 > #12 0xffffb1a9d5c4 (/lib/aarch64-linux-gnu/libc.so.6+0x7d5c4) > #13 0xffffb1b05ed8 (/lib/aarch64-linux-gnu/libc.so.6+0xe5ed8) This pull request has now been integrated. Changeset: 46b817b7 Author: Matthias Baesken URL: https://git.openjdk.org/jdk/commit/46b817b7499e74ba8812d38bcce93147ebf93b25 Stats: 34 lines in 2 files changed: 9 ins; 15 del; 10 mod 8333363: ubsan: instanceKlass.cpp: runtime error: member call on null pointer of type 'struct AnnotationArray' Reviewed-by: coleenp, stefank ------------- PR: https://git.openjdk.org/jdk/pull/19885 From dholmes at openjdk.org Thu Jun 27 07:07:12 2024 From: dholmes at openjdk.org (David Holmes) Date: Thu, 27 Jun 2024 07:07:12 GMT Subject: RFR: 8334738: os::print_hex_dump should optionally print ASCII [v2] In-Reply-To: References: Message-ID: On Tue, 25 Jun 2024 12:45:22 GMT, Thomas Stuefe wrote: >> Motivated by analyzing CDS dump differences for reproducible builds, I found an optional ASCII printout to be valuable. As usual with hex dumps, ascii follows hex printout >> >> Example: >> >> >> >> 118 0x00000000000001c0: 204b444a6e65704f 53207469422d3436 4d56207265767265 6564747361662820 OpenJDK 64-Bit Server VM (fastde >> 119 0x00000000000001e0: 692d343220677562 2d6c616e7265746e 68742e636f686461 756f732e73616d6f bug 24-internal-adhoc.thomas.sou >> 120 0x0000000000000200: 726f662029656372 612d78756e696c20 45524a203436646d 746e692d34322820 rce) for linux-amd64 JRE (24-int >> 121 0x0000000000000220: 64612d6c616e7265 6d6f68742e636f68 6372756f732e7361 6c697562202c2965 ernal-adhoc.thomas.source), buil >> 122 0x0000000000000240: 323032206e6f2074 5430322d36302d34 32313a35343a3031 672068746977205a t on 2024-06-20T10:45:12Z with g >> 123 0x0000000000000260: 2e352e3031206363 0000000000000030 0000000000000000 0000000000000000 cc 10.5.0_______________________ >> 124 0x0000000000000280: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 ________________________________ >> 125 0x00000000000002a0: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 ________________________________ >> >> >> The patch does that. >> >> Small unrelated changes: >> >> - I rewrote and extended the gtests, testing now a real-life printout containing a mixture or readable and non-readable pages, and printable and non-printable characters. I re-enabled tests on Windows, since https://bugs.openjdk.org/browse/JDK-8185734 is long solved. >> >> - The new test uncovered an issue on 32-bit when printing giant words. We shift a signed value by 32 bits upwards, which can result in -1 resp. ffffffff in the upper half of the giant word. One of the pitfalls of intptr_t vs uintptr_t (I think most uses of intptr_t should probably use uintptr_t). >> >> - I got tired of casting constness away from to-be-printed memory range just to be able to feed an address to os::print_hex_dump. The content printed is usually const. os::print_hex_dump does not need non-constness, but since we use address, and address is typedef char*, and one cannot declare a typedef'ed pointer target-const, the issue is there. I therefore changed the input to const uint8_t*. Maybe we need a const_address or something similar. >> >> ---- >> >> Ran tests on Linux x64 and x86, Windows x86 and Mac aarch64. Fixed all issues I found. Only little-endian, I don't have big-e... > > Thomas Stuefe has updated the pull request incrementally with five additional commits since the last revision: > > - copyrights > - const_address instead of const uint8_t* > - use dot instead of underscore for unprintable > - rely on default true > - Revert "fix copyrights" > > This reverts commit 2b8bc55e53a88d13cd268dc89ebaac7fe42f60d5. Okay - seems reasonable. FYI I am away for a few days. Thanks src/hotspot/share/runtime/os.cpp line 949: > 947: uintptr_t i = (uintptr_t)SafeFetchN((intptr_t*)p, errval); > 948: if (i == errval) { > 949: i = (uintptr_t)SafeFetchN((intptr_t*)p, ~errval); Pre-existing but if the initial fetch fails why do we think the second one can succeed ??? src/hotspot/share/runtime/os.cpp line 964: > 962: uint8_t c[sizeof(v)]; > 963: } u = { value }; > 964: for (int i = 0; i < unitsize; i ++) { Nit: no space before unary operator ++ ------------- Marked as reviewed by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19835#pullrequestreview-2144419073 PR Review Comment: https://git.openjdk.org/jdk/pull/19835#discussion_r1656562653 PR Review Comment: https://git.openjdk.org/jdk/pull/19835#discussion_r1656564416 From rehn at openjdk.org Thu Jun 27 07:13:12 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Thu, 27 Jun 2024 07:13:12 GMT Subject: RFR: 8332689: RISC-V: Use load instead of trampolines [v16] In-Reply-To: References: <134O9aga12QdNZ6_n7eDAmaQdP2BIZEsyFE96LIiFgE=.e4ce4c70-1505-4f73-a16c-7dcea05905f1@github.com> Message-ID: On Thu, 27 Jun 2024 06:40:11 GMT, Fei Yang wrote: >> Thanks! > > No obvious impact on test/micro/org/openjdk/bench/vm/lang/SecondarySupersLookup.java. So I think we are safe with this change. Great, thanks, then we take all of these candidates later! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19453#discussion_r1656578150 From stuefe at openjdk.org Thu Jun 27 07:30:12 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 27 Jun 2024 07:30:12 GMT Subject: RFR: 8334738: os::print_hex_dump should optionally print ASCII [v2] In-Reply-To: References: Message-ID: On Thu, 27 Jun 2024 06:57:49 GMT, David Holmes wrote: >> Thomas Stuefe has updated the pull request incrementally with five additional commits since the last revision: >> >> - copyrights >> - const_address instead of const uint8_t* >> - use dot instead of underscore for unprintable >> - rely on default true >> - Revert "fix copyrights" >> >> This reverts commit 2b8bc55e53a88d13cd268dc89ebaac7fe42f60d5. > > src/hotspot/share/runtime/os.cpp line 949: > >> 947: uintptr_t i = (uintptr_t)SafeFetchN((intptr_t*)p, errval); >> 948: if (i == errval) { >> 949: i = (uintptr_t)SafeFetchN((intptr_t*)p, ~errval); > > Pre-existing but if the initial fetch fails why do we think the second one can succeed ??? There is a one-in-2^(32|64) chance the errval numerical value happend to be in memory. By reading twice, with different errval, we diminish the chance of mistaking a successful read for an error. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19835#discussion_r1656601176 From dholmes at openjdk.org Thu Jun 27 07:49:13 2024 From: dholmes at openjdk.org (David Holmes) Date: Thu, 27 Jun 2024 07:49:13 GMT Subject: RFR: 8335108: Build error after JDK-8333658 due to class templates In-Reply-To: <6ljQH6-vh9FlOzTeKq4Satpqa4V-Jj-KYzh-4bmTWq8=.7b6ebf6a-4ed8-4a58-a5b6-88083dd4df6d@github.com> References: <6ljQH6-vh9FlOzTeKq4Satpqa4V-Jj-KYzh-4bmTWq8=.7b6ebf6a-4ed8-4a58-a5b6-88083dd4df6d@github.com> Message-ID: On Tue, 25 Jun 2024 18:52:20 GMT, Sonia Zaldana Calles wrote: > Hi all, > > This PR addresses [8335108](https://bugs.openjdk.org/browse/JDK-8335108). > > The error arises as template-id is not allowed for constructor/destructor in C++20. > > Testing: > - [x] Compilation succeeds with g++ 14.1.1. > > Thanks, > Sonia I will claim the 24 hour rule is a generalization of "don't integrate before David in Australia has a chance to take a look" :D ------------- PR Comment: https://git.openjdk.org/jdk/pull/19890#issuecomment-2194020217 From dholmes at openjdk.org Thu Jun 27 08:03:12 2024 From: dholmes at openjdk.org (David Holmes) Date: Thu, 27 Jun 2024 08:03:12 GMT Subject: RFR: 8304693: Remove -XX:-UseVtableBasedCHA In-Reply-To: References: Message-ID: On Wed, 26 Jun 2024 17:47:57 GMT, Vladimir Ivanov wrote: > JDK-8266074 introduced new CHA implementation and a flag (-XX:-UseVtableBasedCHA) to switch back to the original implementation for diagnostic purposes. Vtable-based CHA implementation has been turned on by default since 17 and now the time has come to remove UseVtableBasedCHA flag. > > Testing: hs-tier1 - hs-tier6 LGTM2 ------------- Marked as reviewed by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19911#pullrequestreview-2144571669 From stuefe at openjdk.org Thu Jun 27 08:05:42 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 27 Jun 2024 08:05:42 GMT Subject: RFR: 8334738: os::print_hex_dump should optionally print ASCII [v3] In-Reply-To: References: Message-ID: > Motivated by analyzing CDS dump differences for reproducible builds, I found an optional ASCII printout to be valuable. As usual with hex dumps, ascii follows hex printout > > Example: > > > > 118 0x00000000000001c0: 204b444a6e65704f 53207469422d3436 4d56207265767265 6564747361662820 OpenJDK 64-Bit Server VM (fastde > 119 0x00000000000001e0: 692d343220677562 2d6c616e7265746e 68742e636f686461 756f732e73616d6f bug 24-internal-adhoc.thomas.sou > 120 0x0000000000000200: 726f662029656372 612d78756e696c20 45524a203436646d 746e692d34322820 rce) for linux-amd64 JRE (24-int > 121 0x0000000000000220: 64612d6c616e7265 6d6f68742e636f68 6372756f732e7361 6c697562202c2965 ernal-adhoc.thomas.source), buil > 122 0x0000000000000240: 323032206e6f2074 5430322d36302d34 32313a35343a3031 672068746977205a t on 2024-06-20T10:45:12Z with g > 123 0x0000000000000260: 2e352e3031206363 0000000000000030 0000000000000000 0000000000000000 cc 10.5.0_______________________ > 124 0x0000000000000280: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 ________________________________ > 125 0x00000000000002a0: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 ________________________________ > > > The patch does that. > > Small unrelated changes: > > - I rewrote and extended the gtests, testing now a real-life printout containing a mixture or readable and non-readable pages, and printable and non-printable characters. I re-enabled tests on Windows, since https://bugs.openjdk.org/browse/JDK-8185734 is long solved. > > - The new test uncovered an issue on 32-bit when printing giant words. We shift a signed value by 32 bits upwards, which can result in -1 resp. ffffffff in the upper half of the giant word. One of the pitfalls of intptr_t vs uintptr_t (I think most uses of intptr_t should probably use uintptr_t). > > - I got tired of casting constness away from to-be-printed memory range just to be able to feed an address to os::print_hex_dump. The content printed is usually const. os::print_hex_dump does not need non-constness, but since we use address, and address is typedef char*, and one cannot declare a typedef'ed pointer target-const, the issue is there. I therefore changed the input to const uint8_t*. Maybe we need a const_address or something similar. > > ---- > > Ran tests on Linux x64 and x86, Windows x86 and Mac aarch64. Fixed all issues I found. Only little-endian, I don't have big-endian machines and therefore made those changes blindly. ... Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: exclude test for AIX ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19835/files - new: https://git.openjdk.org/jdk/pull/19835/files/648c4d4f..ad4646b3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19835&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19835&range=01-02 Stats: 5 lines in 2 files changed: 4 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/19835.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19835/head:pull/19835 PR: https://git.openjdk.org/jdk/pull/19835 From sroy at openjdk.org Thu Jun 27 08:08:11 2024 From: sroy at openjdk.org (Suchismith Roy) Date: Thu, 27 Jun 2024 08:08:11 GMT Subject: RFR: JDK-8331732 : [PPC64] Unify and optimize code which converts != 0 to 1 [v2] In-Reply-To: <-SRQP2arPvQSbKpKXBvKxo4OQwWK-O_vPPbVxyhR8Bg=.bf68f7aa-5466-426c-ac9f-78ce7eb8cf7c@github.com> References: <1_yx1uaGpcU9s6yBHI6WHy-P5kOJS5xIK7NnQBkVM3E=.513f043b-786d-4120-998a-8f4d209cd858@github.com> <-SRQP2arPvQSbKpKXBvKxo4OQwWK-O_vPPbVxyhR8Bg=.bf68f7aa-5466-426c-ac9f-78ce7eb8cf7c@github.com> Message-ID: On Wed, 26 Jun 2024 16:55:28 GMT, Amit Kumar wrote: >> Suchismith Roy has updated the pull request incrementally with one additional commit since the last revision: >> >> empty lines > > src/hotspot/cpu/ppc/macroAssembler_ppc.cpp line 356: > >> 354: else >> 355: cmpwi(CCR0, dst, 0); >> 356: setbcr(dst, CCR0, Assembler::zero); > > This is what I understood after implementation & definition: > > If bit BI of the CR contains a 1, register RT is set to 0. Otherwise, register RT is set to 1. > > CCR0 will contain `1` when `dst == 0`. then `dst` will be set to `1` by `setbcr`. Yes the bit related to value of zero will be set. and setbcr will return 0 , if value is 1 in the CCR0. So the return register will have a value of 0, since it is 0. Similarly for non zero value, setbcr will return 1, and return register will have value of 1, which is the expected behaviour. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19886#discussion_r1656664120 From rehn at openjdk.org Thu Jun 27 08:32:41 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Thu, 27 Jun 2024 08:32:41 GMT Subject: RFR: 8332689: RISC-V: Use load instead of trampolines [v19] In-Reply-To: References: Message-ID: > Hi all, please consider! > > Today we do JAL to **dest** if **dest** is in reach (+/- 1 MB). > Using a very small application or running very short time we have fast patchable calls. > But any normal application running longer will increase the code size and code chrun/fragmentation. > So whatever or not you get hot fast calls rely on luck. > > To be patchable and get code cache reach we also emit a stub trampoline which we can point the JAL to. > This would be the common case for a patchable call. > > Code stream: > JAL > Stubs: > AUIPC > LD > JALR > > > > On some CPUs L1D and L1I can't contain the same cache line, which means the tramopline stub can bounce from L1I->L1D->L1I, which is expensive. > Even if you don't have that problem having a call to a jump is not the fastest way. > Loading the address avoids the pitsfalls of cmodx. > > This patch suggest to solve the problems with trampolines, we take small penalty in the naive case of JAL to **dest**, > and instead do by default: > > Code stream: > AUIPC > LD > JALR > Stubs: > > > An experimental option for turning trampolines back on exists. > > It should be possible to enhanced this with the WIP [Zjid](https://github.com/riscv/riscv-j-extension) by changing the JALR to JAL and nop out the auipc+ld (as the current proposal of Zjid forces the I-fetcher to fetch instruction in order (meaning we will avoid a lot issues which arm has)) when in reach and vice-versa. > > Numbers from VF2 (I have done them a few times, they are always overall in favor of this patch): > > fop (msec) 2239 | 2128 = 0.950424 > h2 (msec) 18660 | 16594 = 0.889282 > jython (msec) 22022 | 21925 = 0.995595 > luindex (msec) 2866 | 2842 = 0.991626 > lusearch (msec) 4108 | 4311 = 1.04942 > lusearch-fix (msec) 4406 | 4116 = 0.934181 > pmd (msec) 5976 | 5897 = 0.98678 > jython (msec) 22022 | 21925 = 0.995595 > Avg: 0.974112 > fop(xcomp) (msec) 2721 | 2714 = 0.997427 > h2(xcomp) (msec) 37719 | 38004 = 1.00756 > jython(xcomp) ... Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 28 commits: - Rename lc - Merge branch 'master' into 8332689 - Merge branch 'master' into 8332689 - Comments - Missed in merge-fixes, minor revert - Merge branch 'master' into 8332689 - Minor review comments - Merge branch 'master' into 8332689 - To be pushed - Merge branch 'master' into 8332689 - ... and 18 more: https://git.openjdk.org/jdk/compare/46b817b7...442680b4 ------------- Changes: https://git.openjdk.org/jdk/pull/19453/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19453&range=18 Stats: 897 lines in 16 files changed: 622 ins; 177 del; 98 mod Patch: https://git.openjdk.org/jdk/pull/19453.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19453/head:pull/19453 PR: https://git.openjdk.org/jdk/pull/19453 From fyang at openjdk.org Thu Jun 27 08:32:41 2024 From: fyang at openjdk.org (Fei Yang) Date: Thu, 27 Jun 2024 08:32:41 GMT Subject: RFR: 8332689: RISC-V: Use load instead of trampolines [v18] In-Reply-To: References: Message-ID: <7tOTPLrblaphlpgB_P32fNjK158nX2v7kZmWFwOqmNo=.89542dbd-eb2c-44e4-bb7f-974ed1eb5728@github.com> On Wed, 26 Jun 2024 09:06:38 GMT, Robbin Ehn wrote: >> src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 4027: >> >>> 4025: } >>> 4026: >>> 4027: address MacroAssembler::load_call(Address entry) { >> >> How about rename this as something like `indirect_call`? >> >> BTW: I may need to take another look at the update of this PR. Hopefully, I can finish this week. Thank you for your patience. > > The reason I don't like "indirect_call", is that all JALR jumps are indirect: > `The indirect jump instruction JALR (jump and link register)` > > Maybe the middle ground: "indirect_load_call()" ? > > I can live with indirect_call also, as it's indirect from programmers POV vs movptr + jalr. Or `load_and_call`? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19453#discussion_r1656593099 From rehn at openjdk.org Thu Jun 27 08:32:41 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Thu, 27 Jun 2024 08:32:41 GMT Subject: RFR: 8332689: RISC-V: Use load instead of trampolines [v18] In-Reply-To: <7tOTPLrblaphlpgB_P32fNjK158nX2v7kZmWFwOqmNo=.89542dbd-eb2c-44e4-bb7f-974ed1eb5728@github.com> References: <7tOTPLrblaphlpgB_P32fNjK158nX2v7kZmWFwOqmNo=.89542dbd-eb2c-44e4-bb7f-974ed1eb5728@github.com> Message-ID: <2oIfIiN35AunMAuOuACv8_9JNw7D2Qm7LPFPmSE6S_I=.570892d3-76cf-4571-bddd-8ecaa88b4013@github.com> On Thu, 27 Jun 2024 07:22:22 GMT, Fei Yang wrote: >> The reason I don't like "indirect_call", is that all JALR jumps are indirect: >> `The indirect jump instruction JALR (jump and link register)` >> >> Maybe the middle ground: "indirect_load_call()" ? >> >> I can live with indirect_call also, as it's indirect from programmers POV vs movptr + jalr. > > Or `load_and_call`? Ok, done! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19453#discussion_r1656698379 From mdoerr at openjdk.org Thu Jun 27 08:37:12 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Thu, 27 Jun 2024 08:37:12 GMT Subject: RFR: JDK-8331732 : [PPC64] Unify and optimize code which converts != 0 to 1 [v3] In-Reply-To: <9V7t8GgSok8bSTA5zYzfy23vyFN-mQPL4BtYjraf76Y=.e03244df-b669-4a89-8532-fd524dc5aa54@github.com> References: <9V7t8GgSok8bSTA5zYzfy23vyFN-mQPL4BtYjraf76Y=.e03244df-b669-4a89-8532-fd524dc5aa54@github.com> Message-ID: <3zN7SpsZSDvAbEJbvu9GN-aZMa8Y7h3J7bmcj5-WlQ8=.36af9c3e-ae6e-42dd-9acf-0cf8bb5e20cf@github.com> On Wed, 26 Jun 2024 16:52:35 GMT, Suchismith Roy wrote: >> [JDK-8331732](https://bugs.openjdk.org/browse/JDK-8331732) >> The template interpreter contains branch-free conversion code for T_BOOLEAN (TemplateInterpreterGenerator::generate_result_handler_for). >> >> SharedRuntime::generate_native_wrapper uses unoptimized code to "Unpack the native result" for T_BOOLEAN. >> Power10 has the "setbc" / "setbcr" instruction. >> >> A new function has been created for the conversion and use "setbcr" on Power10 (determined by VM_Version::has_brw()) and otherwise the branch-free implementation. We should have a function for 32 and one for 64 bit operations (or one with supports both). >> >> The new code for MacroAssembler::verify_secondary_supers_table also uses the new function. > > Suchismith Roy has updated the pull request incrementally with one additional commit since the last revision: > > space after comma Changes requested by mdoerr (Reviewer). src/hotspot/cpu/ppc/assembler_ppc.hpp line 1785: > 1783: inline void setnbc(Register d, ConditionRegister cr, Condition cc); > 1784: inline void setbcr( Register d, int biint); > 1785: inline void setbcr( Register d, ConditionRegister cr, Condition cc); Extra whitespace. Please remove. src/hotspot/cpu/ppc/macroAssembler_ppc.cpp line 346: > 344: return (long) x; > 345: } > 346: // Branch-free implementation to convert !0 to false Please change to "!=0 to 1" src/hotspot/cpu/ppc/macroAssembler_ppc.cpp line 348: > 346: // Branch-free implementation to convert !0 to false > 347: // Set register dst to true if dst is non zero using temp for calculations on Power Version<10. > 348: // Set register dst to true if dst is non zero for Power 10 and above machines. Please don't use the term "true". It's not defined in assembler. In addition: We do the same for older processors. Just by other instructions. I think these 2 lines could be removed. src/hotspot/cpu/ppc/macroAssembler_ppc.cpp line 352: > 350: > 351: if (VM_Version::has_brw()) { > 352: if(use_64bit) Please use hotspot coding style: `if (use_64bit) {` Curly braces should also be used for single statements. src/hotspot/cpu/ppc/macroAssembler_ppc.cpp line 2408: > 2406: > 2407: // convert !=0 to 1 > 2408: normalize_bool(result, R0, true); There's a second usage in this function. Please also replace that one. src/hotspot/cpu/ppc/macroAssembler_ppc.hpp line 297: > 295: }; > 296: > 297: // Branch-free implementation to convert !0 to false. "!=0" ------------- PR Review: https://git.openjdk.org/jdk/pull/19886#pullrequestreview-2144614270 PR Review Comment: https://git.openjdk.org/jdk/pull/19886#discussion_r1656680482 PR Review Comment: https://git.openjdk.org/jdk/pull/19886#discussion_r1656684654 PR Review Comment: https://git.openjdk.org/jdk/pull/19886#discussion_r1656688689 PR Review Comment: https://git.openjdk.org/jdk/pull/19886#discussion_r1656699697 PR Review Comment: https://git.openjdk.org/jdk/pull/19886#discussion_r1656705233 PR Review Comment: https://git.openjdk.org/jdk/pull/19886#discussion_r1656706013 From mdoerr at openjdk.org Thu Jun 27 08:37:13 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Thu, 27 Jun 2024 08:37:13 GMT Subject: RFR: JDK-8331732 : [PPC64] Unify and optimize code which converts != 0 to 1 [v2] In-Reply-To: <-SRQP2arPvQSbKpKXBvKxo4OQwWK-O_vPPbVxyhR8Bg=.bf68f7aa-5466-426c-ac9f-78ce7eb8cf7c@github.com> References: <1_yx1uaGpcU9s6yBHI6WHy-P5kOJS5xIK7NnQBkVM3E=.513f043b-786d-4120-998a-8f4d209cd858@github.com> <-SRQP2arPvQSbKpKXBvKxo4OQwWK-O_vPPbVxyhR8Bg=.bf68f7aa-5466-426c-ac9f-78ce7eb8cf7c@github.com> Message-ID: On Wed, 26 Jun 2024 16:56:40 GMT, Amit Kumar wrote: >> Suchismith Roy has updated the pull request incrementally with one additional commit since the last revision: >> >> empty lines > > src/hotspot/cpu/ppc/macroAssembler_ppc.cpp line 349: > >> 347: // Set register dst to true if dst is non zero using temp for calculations on Power Version<10. >> 348: // Set register dst to true if dst is non zero for Power 10 and above machines. >> 349: void MacroAssembler::normalize_bool(Register dst, Register temp, bool use_64bit) { > > I would've used `is_64bit/is_long` instead of `use_64bit`. But that's personal choice and leave it upto you. `is_64bit` would be fine with me, too. As you prefer. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19886#discussion_r1656693812 From gcao at openjdk.org Thu Jun 27 08:53:09 2024 From: gcao at openjdk.org (Gui Cao) Date: Thu, 27 Jun 2024 08:53:09 GMT Subject: RFR: 8334843: RISC-V: Fix wraparound checking for r_array_index in lookup_secondary_supers_table_slow_path In-Reply-To: References: Message-ID: On Mon, 24 Jun 2024 14:15:12 GMT, Fei Yang wrote: >> Branch condition for r_array_index wraparound checking in lookup_secondary_supers_table_slow_path is wrong. >> >> // Check for wraparound. >> Label skip; >> bge(r_array_length, r_array_index, skip); >> mv(r_array_index, zr); >> bind(skip); >> >> As discussed at https://github.com/openjdk/jdk/pull/19320/files#r1650548279 . If length == index, then we must set index to 0. That is `blt(r_array_index,r_array_length,skip);`. >> >> ### Correctness testing: >> - [x] Run tier1-3 tests on SOPHON SG2042 (release) >> >> ### JMH tested on Banana Pi BPI-F3 board (has Zbb) and Enable UseZbb >> without this patch: >> >> SecondarySupersLookup.testNegative00 avgt 15 13.275 ? 0.223 ns/op >> SecondarySupersLookup.testNegative01 avgt 15 13.264 ? 0.201 ns/op >> SecondarySupersLookup.testNegative02 avgt 15 13.261 ? 0.194 ns/op >> SecondarySupersLookup.testNegative03 avgt 15 13.271 ? 0.210 ns/op >> SecondarySupersLookup.testNegative04 avgt 15 13.265 ? 0.201 ns/op >> SecondarySupersLookup.testNegative05 avgt 15 13.258 ? 0.191 ns/op >> SecondarySupersLookup.testNegative06 avgt 15 13.280 ? 0.225 ns/op >> SecondarySupersLookup.testNegative07 avgt 15 13.268 ? 0.201 ns/op >> SecondarySupersLookup.testNegative08 avgt 15 13.266 ? 0.202 ns/op >> SecondarySupersLookup.testNegative09 avgt 15 13.261 ? 0.196 ns/op >> SecondarySupersLookup.testNegative10 avgt 15 13.268 ? 0.198 ns/op >> SecondarySupersLookup.testNegative16 avgt 15 13.268 ? 0.205 ns/op >> SecondarySupersLookup.testNegative20 avgt 15 13.284 ? 0.231 ns/op >> SecondarySupersLookup.testNegative30 avgt 15 13.281 ? 0.226 ns/op >> SecondarySupersLookup.testNegative32 avgt 15 13.273 ? 0.215 ns/op >> SecondarySupersLookup.testNegative40 avgt 15 13.287 ? 0.233 ns/op >> SecondarySupersLookup.testNegative50 avgt 15 13.292 ? 0.242 ns/op >> SecondarySupersLookup.testNegative55 avgt 15 53.064 ? 0.757 ns/op >> SecondarySupersLookup.testNegative56 avgt 15 53.052 ? 0.767 ns/op >> SecondarySupersLookup.testNegative57 avgt 15 53.068 ? 0.803 ns/op >> SecondarySupersLookup.testNegative58 avgt 15 53.076 ? 0.776 ns/op >> SecondarySupersLookup.testNegative59 avgt 15 53.095 ? 0.846 ns/op >> SecondarySupersLookup.testNegative60 avgt 15 75.106 ? 1.033 ns/op >> SecondarySupersLookup.testNegative61 avgt 15 76.832 ? 4.047 ns/op >> SecondarySupersLookup.testNegative62 avgt 15 75.085 ? 1.010 ns/op >> SecondarySupersLookup.test... > > Marked as reviewed by fyang (Reviewer). @RealFYang : Thanks for the review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19852#issuecomment-2194140058 From luhenry at openjdk.org Thu Jun 27 10:07:17 2024 From: luhenry at openjdk.org (Ludovic Henry) Date: Thu, 27 Jun 2024 10:07:17 GMT Subject: RFR: 8333245: RISC-V: UseRVV option can't be enabled after JDK-8316859 In-Reply-To: <14Zzi3W09YcO5NtfL7gUQwY0NDpexCOTdj4reavKKTI=.e8c2822c-6064-47c9-88d4-de50b980436a@github.com> References: <14Zzi3W09YcO5NtfL7gUQwY0NDpexCOTdj4reavKKTI=.e8c2822c-6064-47c9-88d4-de50b980436a@github.com> Message-ID: On Thu, 30 May 2024 09:13:30 GMT, Gui Cao wrote: > Because some dev boards only support RVV version 0.7, In [JDK-8316859](https://bugs.openjdk.org/browse/JDK-8316859) we masked the use of HWCAP to probe for RVV extensions, and in the meantime, we can use hwprobe to probe for V extensions in Linux kernel 6.5 and above. But recently we got Banana Pi BPI-F3 board (has RVV1.0), but his kernel is 6.1.15, so the V extensions detected by HWCAP are masked. And we get the warning: `RVV is not supported on this CPU` when we enable UseRVV with the command, and we can't enable UseRVV correctly. > > Without Patch: > > zifeihan at bananapif3:~/jre/jdk/bin$ ./java -XX:+PrintFlagsFinal -XX:+UseRVV -version | grep UseRVV > OpenJDK 64-Bit Server VM warning: RVV is not supported on this CPU > bool UseRVV = false {ARCH product} {command line} > bool UseRVVForBigIntegerShiftIntrinsics = false {ARCH product} {default} > openjdk version "23-internal" 2024-09-17 > OpenJDK Runtime Environment (build 23-internal-adhoc.zifeihan.jdk) > OpenJDK 64-Bit Server VM (build 23-internal-adhoc.zifeihan.jdk, mixed mode) > > > With Patch: > > zifeihan at bananapif3:~/jre/jdk/bin$ ./java -XX:+PrintFlagsFinal -XX:+UseRVV -version | grep UseRVV > bool UseRVV = true {ARCH product} {command line} > bool UseRVVForBigIntegerShiftIntrinsics = true {ARCH product} {default} > openjdk version "23-internal" 2024-09-17 > OpenJDK Runtime Environment (build 23-internal-adhoc.zifeihan.jdk) > OpenJDK 64-Bit Server VM (build 23-internal-adhoc.zifeihan.jdk, mixed mode) We need to update our check from hwcap with something along the line of: vsetvli t0, zero, e8, m1, ta, ma csrr a0, vtype sgtz a0, a0 ret That will return whether v1.0 is supported since `vsetvli t0, zero, e8, m1, ta, ma` will set the `vill` flag on RVV 0.7. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19472#issuecomment-2194290073 From rehn at openjdk.org Thu Jun 27 10:24:17 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Thu, 27 Jun 2024 10:24:17 GMT Subject: RFR: 8333245: RISC-V: UseRVV option can't be enabled after JDK-8316859 In-Reply-To: References: <14Zzi3W09YcO5NtfL7gUQwY0NDpexCOTdj4reavKKTI=.e8c2822c-6064-47c9-88d4-de50b980436a@github.com> Message-ID: On Thu, 27 Jun 2024 10:04:30 GMT, Ludovic Henry wrote: > We need to update our check from hwcap with something along the line of: > > ``` > vsetvli t0, zero, e8, m1, ta, ma > csrr a0, vtype > sgtz a0, a0 > ret > ``` > > That will return whether v1.0 is supported since `vsetvli t0, zero, e8, m1, ta, ma` will set the `vill` flag on RVV 0.7. As we can be context switched after vsetvli and thus execute csrr on another hart, without kernel support, the flag will not be set even if it's RVV 0.7. So this check is not reliable. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19472#issuecomment-2194324396 From fgao at openjdk.org Thu Jun 27 11:57:11 2024 From: fgao at openjdk.org (Fei Gao) Date: Thu, 27 Jun 2024 11:57:11 GMT Subject: RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF [v6] In-Reply-To: References: <0cUurmXlMJ_B66Wy1umd2n4r9ve7_Q4WOU0ffMd8s5Y=.bbc93b65-382c-4139-aaec-cb835d94a06e@github.com> <2VxBcA-0qxX3N35u5vnKyT920nTH5llf2k5_sKQcqT8=.23823400-536f-458e-baf7-53f99547abc4@github.com> Message-ID: On Thu, 6 Jun 2024 07:52:02 GMT, Hamlin Li wrote: >> Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: >> >> update header files for arm > > in progress... Hi @Hamlin-Li , thanks for your work. I tried to run benchmarks, [FloatMaxVector](https://github.com/openjdk/panama-vector/blob/vectorIntrinsics/test/micro/org/openjdk/bench/jdk/incubator/vector/operation/FloatMaxVector.java#L1068) and [DoubleMaxVector](https://github.com/openjdk/panama-vector/blob/vectorIntrinsics/test/micro/org/openjdk/bench/jdk/incubator/vector/operation/DoubleMaxVector.java#L1068), on different aarch64 machines. Here is the data I got for `TANH`, with args `-i 5 -f 3 -wi 3 -foe true -jvmArgs -Xms4g -Xmx4g -XX:+AlwaysPreTouch -XX:ObjectAlignmentInBytes=16`: // NEON machine Benchmark (size) Mode Cnt Units Perf gain DoubleMaxVector.TANH 1024 thrpt 15 ops/ms -38% FloatMaxVector.TANH 1024 thrpt 15 ops/ms -26% // 128-bit sve machine (TANH also implemented with NEON) Benchmark (size) Mode Cnt Units Perf gain DoubleMaxVector.TANH 1024 thrpt 15 ops/ms -19% FloatMaxVector.TANH 1024 thrpt 15 ops/ms ~00% The performance of vector stubs for `TANH` looks not quite stable on different NEON machines. Since this pr does not provide `TANH` interface on sve machines for [the performance regression](https://github.com/openjdk/jdk/pull/16234/commits/2a7730d6acbac80438a43d1502cff6a476f8b5b5#diff-9112056f732229b18fec48fb0b20a3fe824de49d0abd41fbdb4202cfe70ad114R8521-R8525), how about also disabling it on NEON for the same reason? WDYT? Thanks. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18605#issuecomment-2194480996 From tschatzl at openjdk.org Thu Jun 27 12:55:14 2024 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Thu, 27 Jun 2024 12:55:14 GMT Subject: [jdk23] RFR: 8324781: runtime/Thread/TestAlwaysPreTouchStacks.java failed with Expected a higher ratio between stack committed and reserved In-Reply-To: References: Message-ID: <4cUjizRM35HGTvERXdf8S_aYcs0XbuzRbw2qXG7fSy8=.935bfc02-7b20-45c6-91d9-6615a9bccc14@github.com> On Tue, 25 Jun 2024 07:51:35 GMT, Stefan Karlsson wrote: > Hi all, > > This pull request contains a backport of commit [31e8deba](https://github.com/openjdk/jdk/commit/31e8debae63e008da79e403bcb870a7be631af2c) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. > > The commit being backported was authored by Liming Liu on 17 Jun 2024 and was reviewed by Stefan Karlsson, Johan Sj?len and Thomas Stuefe. > > Thanks! Marked as reviewed by tschatzl (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/19877#pullrequestreview-2145278288 From aboldtch at openjdk.org Thu Jun 27 12:58:44 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Thu, 27 Jun 2024 12:58:44 GMT Subject: RFR: 8326820: Metadata artificially kept alive [v4] In-Reply-To: <1xvsDMD50nAALDma6JqSvqNVfYl1KnN0ihFdkztDHYE=.97983fa3-6feb-4ee6-b578-c76e326bcc33@github.com> References: <1xvsDMD50nAALDma6JqSvqNVfYl1KnN0ihFdkztDHYE=.97983fa3-6feb-4ee6-b578-c76e326bcc33@github.com> Message-ID: > ClassLoaderDataGraph provides APIs for walking different metadata. All the iterators which are not designed to be used by the GC also keep the holder of the CLDs alive and by extensions keeps all metadata alive. This is problematic for concurrent GC as it keeps otherwise unreachable classes from being unloaded and the respective metadata freed. > > This patch changes the default iteration behaviour to not keep the holder alive, with the exception of `loaded_classes_do` (renamed `loaded_classes_do_keepalive`) and `modules_do` (renamed `modules_do_keepalive`) which is used by jvmti APIs that requires that the holder is kept alive. > > All other uses consumes all the metadata it queries during its safepoint or before releasing the `ClassLoaderDataGraph_lock`. > > Before this change some jcmd, new jfr chunks and some jfr events, all of which consumed these APIs, could cause class unloading to not occur. > > Been running our internal stress test in an even more stressful mode which without this patch reproduces the metaspace OOME [JDK-8326005](https://bugs.openjdk.org/browse/JDK-8326005) consistently within a few hours. And after this patch it does not. > > Currently running tier1-tier8 testing. Axel Boldt-Christmas has updated the pull request incrementally with five additional commits since the last revision: - Rename iterator - Add SystemDictionary Comment - Revert "Rename and comment SystemDictionary::methods_do" This reverts commit 5f29cfbe79a741552683ff491360a0c920315747. - Revert "Fixup comments after renaming" This reverts commit 08366b1244775e5892bbbb184660821e8774f37a. - Revert "Rename and comment SystemDictionary::methods_do" This reverts commit 5f29cfbe79a741552683ff491360a0c920315747. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19769/files - new: https://git.openjdk.org/jdk/pull/19769/files/5f29cfbe..86bb89b0 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19769&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19769&range=02-03 Stats: 74 lines in 23 files changed: 0 ins; 2 del; 72 mod Patch: https://git.openjdk.org/jdk/pull/19769.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19769/head:pull/19769 PR: https://git.openjdk.org/jdk/pull/19769 From coleenp at openjdk.org Thu Jun 27 13:20:12 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 27 Jun 2024 13:20:12 GMT Subject: RFR: 8326820: Metadata artificially kept alive [v4] In-Reply-To: References: <1xvsDMD50nAALDma6JqSvqNVfYl1KnN0ihFdkztDHYE=.97983fa3-6feb-4ee6-b578-c76e326bcc33@github.com> Message-ID: On Thu, 27 Jun 2024 12:58:44 GMT, Axel Boldt-Christmas wrote: >> ClassLoaderDataGraph provides APIs for walking different metadata. All the iterators which are not designed to be used by the GC also keep the holder of the CLDs alive and by extensions keeps all metadata alive. This is problematic for concurrent GC as it keeps otherwise unreachable classes from being unloaded and the respective metadata freed. >> >> This patch changes the default iteration behaviour to not keep the holder alive, with the exception of `loaded_classes_do` (renamed `loaded_classes_do_keepalive`) and `modules_do` (renamed `modules_do_keepalive`) which is used by jvmti APIs that requires that the holder is kept alive. >> >> All other uses consumes all the metadata it queries during its safepoint or before releasing the `ClassLoaderDataGraph_lock`. >> >> Before this change some jcmd, new jfr chunks and some jfr events, all of which consumed these APIs, could cause class unloading to not occur. >> >> Been running our internal stress test in an even more stressful mode which without this patch reproduces the metaspace OOME [JDK-8326005](https://bugs.openjdk.org/browse/JDK-8326005) consistently within a few hours. And after this patch it does not. >> >> Currently running tier1-tier8 testing. > > Axel Boldt-Christmas has updated the pull request incrementally with five additional commits since the last revision: > > - Rename iterator > - Add SystemDictionary Comment > - Revert "Rename and comment SystemDictionary::methods_do" > > This reverts commit 5f29cfbe79a741552683ff491360a0c920315747. > - Revert "Fixup comments after renaming" > > This reverts commit 08366b1244775e5892bbbb184660821e8774f37a. > - Revert "Rename and comment SystemDictionary::methods_do" > > This reverts commit 5f29cfbe79a741552683ff491360a0c920315747. I like this change a lot better. If we run into more places where the oops are escaped from the CLD and need to be kept alive maybe we could try to make this more fool-proof, but it's really subtle. The places that do not resolve() the holder look good. src/hotspot/share/classfile/classLoaderDataGraph.cpp line 250: > 248: // that its CLD's holder is kept alive if they escape the > 249: // caller's safepoint or ClassLoaderDataGraph_lock > 250: // critical section. I like this comment. nit. can you take out the word "soon" since it's nearby. ------------- Marked as reviewed by coleenp (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19769#pullrequestreview-2145348782 PR Review Comment: https://git.openjdk.org/jdk/pull/19769#discussion_r1657123132 From aboldtch at openjdk.org Thu Jun 27 13:38:22 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Thu, 27 Jun 2024 13:38:22 GMT Subject: RFR: 8326820: Metadata artificially kept alive [v5] In-Reply-To: <1xvsDMD50nAALDma6JqSvqNVfYl1KnN0ihFdkztDHYE=.97983fa3-6feb-4ee6-b578-c76e326bcc33@github.com> References: <1xvsDMD50nAALDma6JqSvqNVfYl1KnN0ihFdkztDHYE=.97983fa3-6feb-4ee6-b578-c76e326bcc33@github.com> Message-ID: <0bmU9g-_tq5Z7JyQEeg72NU0wThVz-3xF4G4VvLp7uQ=.cebebec1-30b9-4627-8dfc-c158dd983a56@github.com> > ClassLoaderDataGraph provides APIs for walking different metadata. All the iterators which are not designed to be used by the GC also keep the holder of the CLDs alive and by extensions keeps all metadata alive. This is problematic for concurrent GC as it keeps otherwise unreachable classes from being unloaded and the respective metadata freed. > > This patch changes the default iteration behaviour to not keep the holder alive, with the exception of `loaded_classes_do` (renamed `loaded_classes_do_keepalive`) and `modules_do` (renamed `modules_do_keepalive`) which is used by jvmti APIs that requires that the holder is kept alive. > > All other uses consumes all the metadata it queries during its safepoint or before releasing the `ClassLoaderDataGraph_lock`. > > Before this change some jcmd, new jfr chunks and some jfr events, all of which consumed these APIs, could cause class unloading to not occur. > > Been running our internal stress test in an even more stressful mode which without this patch reproduces the metaspace OOME [JDK-8326005](https://bugs.openjdk.org/browse/JDK-8326005) consistently within a few hours. And after this patch it does not. > > Currently running tier1-tier8 testing. Axel Boldt-Christmas has updated the pull request incrementally with one additional commit since the last revision: Cleanup soon ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19769/files - new: https://git.openjdk.org/jdk/pull/19769/files/86bb89b0..526aa3a2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19769&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19769&range=03-04 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/19769.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19769/head:pull/19769 PR: https://git.openjdk.org/jdk/pull/19769 From stuefe at openjdk.org Thu Jun 27 13:59:17 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 27 Jun 2024 13:59:17 GMT Subject: RFR: 8329665: fatal error: memory leak: allocating without ResourceMark [v2] In-Reply-To: References: Message-ID: On Mon, 15 Apr 2024 14:53:12 GMT, Patricio Chilano Mateo wrote: >> There are two places in Loom code that call f.oops_interpreted_do() to process oops in the stackChunk. Although not obvious this method seem to require to have a ResourceMark on scope and there are several contexts where these two are call where we don't have one. The reason why a ResourceMark is needed is because OopMapCache::compute_one_oop_map() might allocate from the resource area if _mask_size is > 4 * BitsPerWord, which depends on the amount of locals + expression stack of the corresponding method. But ~InterpreterOopMap already checks if the _bit_mask was allocated in the resource area and in that case it will free it. So the ResourceMark is not strictly needed except that in debug mode we will actually hit the assert if there is not one in scope when trying to allocate the _bit_mask. >> >> Thanks, >> Patricio > > Patricio Chilano Mateo has updated the pull request incrementally with one additional commit since the last revision: > > take ResourceMark out of debug only I am a bit concerned about this fix. Introducing an RM into `frame::oops_interpreted_do` means we cannot assemble anything in RA in the closure code and keep the memory across the RM. But closure code is opaque to the iteration site. Do we have any safeguards against OopClosure using and retaining RA memory? (Because even if no closure does this today, this could sneak in easily) ------------- PR Comment: https://git.openjdk.org/jdk/pull/18632#issuecomment-2194755025 From stefank at openjdk.org Thu Jun 27 14:04:12 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Thu, 27 Jun 2024 14:04:12 GMT Subject: [jdk23] RFR: 8324781: runtime/Thread/TestAlwaysPreTouchStacks.java failed with Expected a higher ratio between stack committed and reserved In-Reply-To: References: Message-ID: On Tue, 25 Jun 2024 07:51:35 GMT, Stefan Karlsson wrote: > Hi all, > > This pull request contains a backport of commit [31e8deba](https://github.com/openjdk/jdk/commit/31e8debae63e008da79e403bcb870a7be631af2c) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. > > The commit being backported was authored by Liming Liu on 17 Jun 2024 and was reviewed by Stefan Karlsson, Johan Sj?len and Thomas Stuefe. > > Thanks! Thanks Thomas! Tier1 passes. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19877#issuecomment-2194763923 From stefank at openjdk.org Thu Jun 27 14:04:13 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Thu, 27 Jun 2024 14:04:13 GMT Subject: [jdk23] Integrated: 8324781: runtime/Thread/TestAlwaysPreTouchStacks.java failed with Expected a higher ratio between stack committed and reserved In-Reply-To: References: Message-ID: On Tue, 25 Jun 2024 07:51:35 GMT, Stefan Karlsson wrote: > Hi all, > > This pull request contains a backport of commit [31e8deba](https://github.com/openjdk/jdk/commit/31e8debae63e008da79e403bcb870a7be631af2c) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. > > The commit being backported was authored by Liming Liu on 17 Jun 2024 and was reviewed by Stefan Karlsson, Johan Sj?len and Thomas Stuefe. > > Thanks! This pull request has now been integrated. Changeset: d7b94542 Author: Stefan Karlsson URL: https://git.openjdk.org/jdk/commit/d7b9454205af3eae4d2f4422893feeef4836b910 Stats: 13 lines in 3 files changed: 4 ins; 3 del; 6 mod 8324781: runtime/Thread/TestAlwaysPreTouchStacks.java failed with Expected a higher ratio between stack committed and reserved 8325218: gc/parallel/TestAlwaysPreTouchBehavior.java fails Reviewed-by: tschatzl Backport-of: 31e8debae63e008da79e403bcb870a7be631af2c ------------- PR: https://git.openjdk.org/jdk/pull/19877 From szaldana at openjdk.org Thu Jun 27 14:23:17 2024 From: szaldana at openjdk.org (Sonia Zaldana Calles) Date: Thu, 27 Jun 2024 14:23:17 GMT Subject: RFR: 8335108: Build error after JDK-8333658 due to class templates In-Reply-To: References: <6ljQH6-vh9FlOzTeKq4Satpqa4V-Jj-KYzh-4bmTWq8=.7b6ebf6a-4ed8-4a58-a5b6-88083dd4df6d@github.com> Message-ID: <4YrIIT4sUKXXKbwGn4w2Uz5dc19fa3wrCIhauvN_nFM=.b4df23dc-d68e-42e9-805b-d7e34d3e8feb@github.com> On Thu, 27 Jun 2024 05:54:43 GMT, Kim Barrett wrote: >> I guess you learn something new every day, since this is the first time I'm hearing about the 24 hour rule (I always thought the only HotSpot requirement was 2 Reviewers, besides needing a 2 week waiting periods for Style Guide changes, never knew you had to wait 24 hours for regular ones too). It's a good thing this change was trivial in the end > >> I guess you learn something new every day, since this is the first time I'm hearing about the 24 hour rule (I always thought the only HotSpot requirement was 2 Reviewers, besides needing a 2 week waiting periods for Style Guide changes, never knew you had to wait 24 hours for regular ones too). It's a good thing this change was trivial in the end > > It's a JDK rule, not specific to HotSpot. > https://openjdk.org/guide/ > Life of a PR > 6. Allow enough time for review > > https://github.com/openjdk/guide/blame/95f1cd24c657c7837c359c7ba1a80e15319bcd15/src/guide/working-with-pull-requests.md#L77-L80 > > I don't remember where that requirement originally came from, before being incorporated into the guide. I thought > there was someplace old but public that prescribed 1 Reviewer and 24 hours, but can't find it right now. Maybe it was in the openjdk.org contributing or sponsoring pages and got cleaned up when added to the guide? @kimbarrett Understood, thanks for bringing that to my attention. Luckily the change was trivial this time but I will note that for the future. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19890#issuecomment-2194834971 From aboldtch at openjdk.org Thu Jun 27 14:24:17 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Thu, 27 Jun 2024 14:24:17 GMT Subject: RFR: 8326820: Metadata artificially kept alive [v5] In-Reply-To: <0bmU9g-_tq5Z7JyQEeg72NU0wThVz-3xF4G4VvLp7uQ=.cebebec1-30b9-4627-8dfc-c158dd983a56@github.com> References: <1xvsDMD50nAALDma6JqSvqNVfYl1KnN0ihFdkztDHYE=.97983fa3-6feb-4ee6-b578-c76e326bcc33@github.com> <0bmU9g-_tq5Z7JyQEeg72NU0wThVz-3xF4G4VvLp7uQ=.cebebec1-30b9-4627-8dfc-c158dd983a56@github.com> Message-ID: On Thu, 27 Jun 2024 13:38:22 GMT, Axel Boldt-Christmas wrote: >> ClassLoaderDataGraph provides APIs for walking different metadata. All the iterators which are not designed to be used by the GC also keep the holder of the CLDs alive and by extensions keeps all metadata alive. This is problematic for concurrent GC as it keeps otherwise unreachable classes from being unloaded and the respective metadata freed. >> >> This patch changes the default iteration behaviour to not keep the holder alive, with the exception of `loaded_classes_do` (renamed `loaded_classes_do_keepalive`) and `modules_do` (renamed `modules_do_keepalive`) which is used by jvmti APIs that requires that the holder is kept alive. >> >> All other uses consumes all the metadata it queries during its safepoint or before releasing the `ClassLoaderDataGraph_lock`. >> >> Before this change some jcmd, new jfr chunks and some jfr events, all of which consumed these APIs, could cause class unloading to not occur. >> >> Been running our internal stress test in an even more stressful mode which without this patch reproduces the metaspace OOME [JDK-8326005](https://bugs.openjdk.org/browse/JDK-8326005) consistently within a few hours. And after this patch it does not. >> >> Currently running tier1-tier8 testing. > > Axel Boldt-Christmas has updated the pull request incrementally with one additional commit since the last revision: > > Cleanup soon Thanks for the reviews. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19769#issuecomment-2194835926 From aboldtch at openjdk.org Thu Jun 27 14:24:18 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Thu, 27 Jun 2024 14:24:18 GMT Subject: Integrated: 8326820: Metadata artificially kept alive In-Reply-To: <1xvsDMD50nAALDma6JqSvqNVfYl1KnN0ihFdkztDHYE=.97983fa3-6feb-4ee6-b578-c76e326bcc33@github.com> References: <1xvsDMD50nAALDma6JqSvqNVfYl1KnN0ihFdkztDHYE=.97983fa3-6feb-4ee6-b578-c76e326bcc33@github.com> Message-ID: On Tue, 18 Jun 2024 12:25:36 GMT, Axel Boldt-Christmas wrote: > ClassLoaderDataGraph provides APIs for walking different metadata. All the iterators which are not designed to be used by the GC also keep the holder of the CLDs alive and by extensions keeps all metadata alive. This is problematic for concurrent GC as it keeps otherwise unreachable classes from being unloaded and the respective metadata freed. > > This patch changes the default iteration behaviour to not keep the holder alive, with the exception of `loaded_classes_do` (renamed `loaded_classes_do_keepalive`) and `modules_do` (renamed `modules_do_keepalive`) which is used by jvmti APIs that requires that the holder is kept alive. > > All other uses consumes all the metadata it queries during its safepoint or before releasing the `ClassLoaderDataGraph_lock`. > > Before this change some jcmd, new jfr chunks and some jfr events, all of which consumed these APIs, could cause class unloading to not occur. > > Been running our internal stress test in an even more stressful mode which without this patch reproduces the metaspace OOME [JDK-8326005](https://bugs.openjdk.org/browse/JDK-8326005) consistently within a few hours. And after this patch it does not. > > Currently running tier1-tier8 testing. This pull request has now been integrated. Changeset: 5909d541 Author: Axel Boldt-Christmas URL: https://git.openjdk.org/jdk/commit/5909d54147355dd7da5786ff39ead4c15816705c Stats: 80 lines in 6 files changed: 31 ins; 25 del; 24 mod 8326820: Metadata artificially kept alive Reviewed-by: eosterlund, stefank, coleenp ------------- PR: https://git.openjdk.org/jdk/pull/19769 From aboldtch at openjdk.org Thu Jun 27 14:36:33 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Thu, 27 Jun 2024 14:36:33 GMT Subject: [jdk23] RFR: 8326820: Metadata artificially kept alive Message-ID: <9401T6FMpCnxvfgCCxHR-7-wEcwchAqf_ETKFbQSXg0=.096ce771-f64a-4e41-bb3a-94a1b232965c@github.com> Hi all, This pull request contains a backport of commit [5909d541](https://github.com/openjdk/jdk/commit/5909d54147355dd7da5786ff39ead4c15816705c) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. The commit being backported was authored by Axel Boldt-Christmas on 27 Jun 2024 and was reviewed by Erik ?sterlund, Stefan Karlsson and Coleen Phillimore. Thanks! ------------- Commit messages: - Backport 5909d54147355dd7da5786ff39ead4c15816705c Changes: https://git.openjdk.org/jdk/pull/19929/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19929&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8326820 Stats: 80 lines in 6 files changed: 31 ins; 25 del; 24 mod Patch: https://git.openjdk.org/jdk/pull/19929.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19929/head:pull/19929 PR: https://git.openjdk.org/jdk/pull/19929 From stuefe at openjdk.org Thu Jun 27 15:04:13 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 27 Jun 2024 15:04:13 GMT Subject: RFR: 8335108: Build error after JDK-8333658 due to class templates In-Reply-To: References: <6ljQH6-vh9FlOzTeKq4Satpqa4V-Jj-KYzh-4bmTWq8=.7b6ebf6a-4ed8-4a58-a5b6-88083dd4df6d@github.com> Message-ID: On Thu, 27 Jun 2024 07:46:11 GMT, David Holmes wrote: > I will claim the 24 hour rule is a generalization of "don't integrate before David in Australia has a chance to take a look" :D I will now call it the David-Rule. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19890#issuecomment-2194956709 From stefank at openjdk.org Thu Jun 27 15:18:09 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Thu, 27 Jun 2024 15:18:09 GMT Subject: [jdk23] RFR: 8326820: Metadata artificially kept alive In-Reply-To: <9401T6FMpCnxvfgCCxHR-7-wEcwchAqf_ETKFbQSXg0=.096ce771-f64a-4e41-bb3a-94a1b232965c@github.com> References: <9401T6FMpCnxvfgCCxHR-7-wEcwchAqf_ETKFbQSXg0=.096ce771-f64a-4e41-bb3a-94a1b232965c@github.com> Message-ID: On Thu, 27 Jun 2024 14:30:43 GMT, Axel Boldt-Christmas wrote: > Hi all, > > This pull request contains a backport of commit [5909d541](https://github.com/openjdk/jdk/commit/5909d54147355dd7da5786ff39ead4c15816705c) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. > > The commit being backported was authored by Axel Boldt-Christmas on 27 Jun 2024 and was reviewed by Erik ?sterlund, Stefan Karlsson and Coleen Phillimore. > > Thanks! Marked as reviewed by stefank (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/19929#pullrequestreview-2145723893 From amitkumar at openjdk.org Thu Jun 27 15:48:12 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Thu, 27 Jun 2024 15:48:12 GMT Subject: RFR: 8331126: [s390x] secondary_super_cache does not scale well [v6] In-Reply-To: References: Message-ID: On Wed, 26 Jun 2024 19:46:04 GMT, Martin Doerr wrote: >> Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: >> >> add2reg -> z_la > > src/hotspot/cpu/s390/macroAssembler_s390.cpp line 3414: > >> 3412: #ifdef ASSERT >> 3413: { >> 3414: // r_result should have either 0 or 1 value > > Could be done with one check (clear bit 0). I guess this check requires the r_result to be in range `[0,1]`. So it checks (without modifying) whether the value is greater than or equal to 0 & less than equal to 1. By "clear bit 0" did you mean to `and` it with `1` and then do the check ? I'm really not sure what were your thoughts. Could you please elaborate ? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19544#discussion_r1657369732 From pchilanomate at openjdk.org Thu Jun 27 15:52:18 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Thu, 27 Jun 2024 15:52:18 GMT Subject: RFR: 8329665: fatal error: memory leak: allocating without ResourceMark [v2] In-Reply-To: References: Message-ID: <7xFWgWV0kVOSnQeQpQRMhAFXFJYvkVLPoqqXyV3KDZs=.e82e5500-fe15-4fe8-95d0-1cb2b728660f@github.com> On Thu, 27 Jun 2024 13:56:15 GMT, Thomas Stuefe wrote: > I am a bit concerned about this fix. Introducing an RM into `frame::oops_interpreted_do` means we cannot assemble anything in RA in the closure code and keep the memory across the RM. But closure code is opaque to the iteration site. Do we have any safeguards against OopClosure using and retaining RA memory? (Because even if no closure does this today, this could sneak in easily) > Yes, we can use RM inside the closure, and we do in some cases, but the case you mentioned would be problematic. Since that would be a special usage case I think it should be okay to use C heap instead. But I don't know how to guard against allocating and retaining RA memory in the closure. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18632#issuecomment-2195067808 From sroy at openjdk.org Thu Jun 27 16:02:11 2024 From: sroy at openjdk.org (Suchismith Roy) Date: Thu, 27 Jun 2024 16:02:11 GMT Subject: RFR: JDK-8331732 : [PPC64] Unify and optimize code which converts != 0 to 1 [v3] In-Reply-To: <3zN7SpsZSDvAbEJbvu9GN-aZMa8Y7h3J7bmcj5-WlQ8=.36af9c3e-ae6e-42dd-9acf-0cf8bb5e20cf@github.com> References: <9V7t8GgSok8bSTA5zYzfy23vyFN-mQPL4BtYjraf76Y=.e03244df-b669-4a89-8532-fd524dc5aa54@github.com> <3zN7SpsZSDvAbEJbvu9GN-aZMa8Y7h3J7bmcj5-WlQ8=.36af9c3e-ae6e-42dd-9acf-0cf8bb5e20cf@github.com> Message-ID: On Thu, 27 Jun 2024 08:22:38 GMT, Martin Doerr wrote: >> Suchismith Roy has updated the pull request incrementally with one additional commit since the last revision: >> >> space after comma > > src/hotspot/cpu/ppc/macroAssembler_ppc.cpp line 348: > >> 346: // Branch-free implementation to convert !0 to false >> 347: // Set register dst to true if dst is non zero using temp for calculations on Power Version<10. >> 348: // Set register dst to true if dst is non zero for Power 10 and above machines. > > Please don't use the term "true". It's not defined in assembler. > In addition: We do the same for older processors. Just by other instructions. > I think these 2 lines could be removed. Can i instead just write 1? I feel the comments can explain the logic in short ? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19886#discussion_r1657400271 From mdoerr at openjdk.org Thu Jun 27 16:05:13 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Thu, 27 Jun 2024 16:05:13 GMT Subject: RFR: 8331126: [s390x] secondary_super_cache does not scale well [v6] In-Reply-To: References: Message-ID: On Thu, 27 Jun 2024 15:45:28 GMT, Amit Kumar wrote: >> src/hotspot/cpu/s390/macroAssembler_s390.cpp line 3414: >> >>> 3412: #ifdef ASSERT >>> 3413: { >>> 3414: // r_result should have either 0 or 1 value >> >> Could be done with one check (clear bit 0). > > I guess this check requires the r_result to be in range `[0,1]`. So it checks (without modifying) whether the value is greater than or equal to 0 & less than equal to 1. > > By "clear bit 0" did you mean to `and` it with `1` and then do the check ? I'm really not sure what were your thoughts. Could you please elaborate ? There are several ways to clear the least significant bit. E.g. `and` it with ~1 and compare the result with 0. Or shift right by 1 and compare the result with 0. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19544#discussion_r1657403684 From mli at openjdk.org Thu Jun 27 16:19:16 2024 From: mli at openjdk.org (Hamlin Li) Date: Thu, 27 Jun 2024 16:19:16 GMT Subject: RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF [v6] In-Reply-To: References: <0cUurmXlMJ_B66Wy1umd2n4r9ve7_Q4WOU0ffMd8s5Y=.bbc93b65-382c-4139-aaec-cb835d94a06e@github.com> <2VxBcA-0qxX3N35u5vnKyT920nTH5llf2k5_sKQcqT8=.23823400-536f-458e-baf7-53f99547abc4@github.com> Message-ID: <57IV227QlMET8ejw55e-zQcExBjx2V_jJQWw0PsozwY=.764ee06f-0c0d-4c56-91d8-61dd2572588b@github.com> On Thu, 27 Jun 2024 11:53:38 GMT, Fei Gao wrote: >> in progress... > > Hi @Hamlin-Li , thanks for your work. > > I tried to run benchmarks, [FloatMaxVector](https://github.com/openjdk/panama-vector/blob/vectorIntrinsics/test/micro/org/openjdk/bench/jdk/incubator/vector/operation/FloatMaxVector.java#L1068) and [DoubleMaxVector](https://github.com/openjdk/panama-vector/blob/vectorIntrinsics/test/micro/org/openjdk/bench/jdk/incubator/vector/operation/DoubleMaxVector.java#L1068), on different aarch64 machines. > > Here is the data I got for `TANH`, with args `-i 5 -f 3 -wi 3 -foe true -jvmArgs -Xms4g -Xmx4g -XX:+AlwaysPreTouch -XX:ObjectAlignmentInBytes=16`: > > > // NEON machine > Benchmark (size) Mode Cnt Units Perf gain > DoubleMaxVector.TANH 1024 thrpt 15 ops/ms -38% > FloatMaxVector.TANH 1024 thrpt 15 ops/ms -26% > > > > // 128-bit sve machine (TANH also implemented with NEON) > Benchmark (size) Mode Cnt Units Perf gain > DoubleMaxVector.TANH 1024 thrpt 15 ops/ms -19% > FloatMaxVector.TANH 1024 thrpt 15 ops/ms ~00% > > > The performance of vector stubs for `TANH` looks not quite stable on different NEON machines. Since this pr does not provide `TANH` interface on sve machines for [the performance regression](https://github.com/openjdk/jdk/pull/16234/commits/2a7730d6acbac80438a43d1502cff6a476f8b5b5#diff-9112056f732229b18fec48fb0b20a3fe824de49d0abd41fbdb4202cfe70ad114R8521-R8525), how about also disabling it on NEON for the same reason? WDYT? > > Thanks. @fg1417 Thanks for testing. Sure, I can do that based on your test result, I will restart work on it after https://github.com/openjdk/jdk/pull/19185 is integrated. @theRealAph I lost my previous vm, so currently I only generate the header files, but did not test performance since last time, I don't remember I had special vm options passed in at that time. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18605#issuecomment-2195132669 From mdoerr at openjdk.org Thu Jun 27 16:19:15 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Thu, 27 Jun 2024 16:19:15 GMT Subject: RFR: JDK-8331732 : [PPC64] Unify and optimize code which converts != 0 to 1 [v3] In-Reply-To: References: <9V7t8GgSok8bSTA5zYzfy23vyFN-mQPL4BtYjraf76Y=.e03244df-b669-4a89-8532-fd524dc5aa54@github.com> <3zN7SpsZSDvAbEJbvu9GN-aZMa8Y7h3J7bmcj5-WlQ8=.36af9c3e-ae6e-42dd-9acf-0cf8bb5e20cf@github.com> Message-ID: On Thu, 27 Jun 2024 15:59:41 GMT, Suchismith Roy wrote: >> src/hotspot/cpu/ppc/macroAssembler_ppc.cpp line 348: >> >>> 346: // Branch-free implementation to convert !0 to false >>> 347: // Set register dst to true if dst is non zero using temp for calculations on Power Version<10. >>> 348: // Set register dst to true if dst is non zero for Power 10 and above machines. >> >> Please don't use the term "true". It's not defined in assembler. >> In addition: We do the same for older processors. Just by other instructions. >> I think these 2 lines could be removed. > > Can i instead just write 1? I feel the comments can explain the logic in short ? Maybe: "Set register dst to 1 if dst is non-zero. Use setbcr instruction on Power10." ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19886#discussion_r1657431435 From ccheung at openjdk.org Thu Jun 27 16:21:12 2024 From: ccheung at openjdk.org (Calvin Cheung) Date: Thu, 27 Jun 2024 16:21:12 GMT Subject: RFR: 8309634: Resolve CONSTANT_MethodRef at CDS dump time [v2] In-Reply-To: References: Message-ID: On Wed, 26 Jun 2024 03:11:41 GMT, Ioi Lam wrote: >> Resolve `CONSTANT_MethodRef` entries during CDS dump time to improve start-up performance. >> >> - This PR uses the same framework introduced in #19355 and just added handling for methods. >> - Support for getstatic/putstatic/invokestatic will be done separately in [JDK-8334898](https://bugs.openjdk.org/browse/JDK-8334898) > > Ioi Lam has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - Merge branch 'master' into 8309634-resolve-methods-at-dumptime > - @calvinccheung and @matias9927 comments > - Fixed whitespaces > - 8309634: Resolve CONSTANT_MethodRef at CDS dump time Updates look good. ------------- Marked as reviewed by ccheung (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19866#pullrequestreview-2145900266 From sroy at openjdk.org Thu Jun 27 16:27:36 2024 From: sroy at openjdk.org (Suchismith Roy) Date: Thu, 27 Jun 2024 16:27:36 GMT Subject: RFR: JDK-8331732 : [PPC64] Unify and optimize code which converts != 0 to 1 [v4] In-Reply-To: References: Message-ID: > [JDK-8331732](https://bugs.openjdk.org/browse/JDK-8331732) > The template interpreter contains branch-free conversion code for T_BOOLEAN (TemplateInterpreterGenerator::generate_result_handler_for). > > SharedRuntime::generate_native_wrapper uses unoptimized code to "Unpack the native result" for T_BOOLEAN. > Power10 has the "setbc" / "setbcr" instruction. > > A new function has been created for the conversion and use "setbcr" on Power10 (determined by VM_Version::has_brw()) and otherwise the branch-free implementation. We should have a function for 32 and one for 64 bit operations (or one with supports both). > > The new code for MacroAssembler::verify_secondary_supers_table also uses the new function. Suchismith Roy has updated the pull request incrementally with two additional commits since the last revision: - Comments and spaces - Comments and spaces ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19886/files - new: https://git.openjdk.org/jdk/pull/19886/files/5d052d19..05e6b031 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19886&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19886&range=02-03 Stats: 18 lines in 3 files changed: 4 ins; 3 del; 11 mod Patch: https://git.openjdk.org/jdk/pull/19886.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19886/head:pull/19886 PR: https://git.openjdk.org/jdk/pull/19886 From duke at openjdk.org Thu Jun 27 16:38:14 2024 From: duke at openjdk.org (Mikhail Ablakatov) Date: Thu, 27 Jun 2024 16:38:14 GMT Subject: RFR: 8322770: Implement C2 VectorizedHashCode on AArch64 In-Reply-To: References: <2VKOC-rT0vOyMcXUX2gs3sOrbZ5H79KBIo50sOOVmyI=.1936f78e-794c-4f54-af3c-b1b97e5fafa8@github.com> Message-ID: On Fri, 14 Jun 2024 16:15:29 GMT, snadampal wrote: >> Hi @theRealAph ! You may find the latest version here: https://github.com/mikabl-arm/jdk/commit/b3db421c795f683db1a001853990026bafc2ed4b . I gave a short explanation in the commit message, feel free to ask for more details if required. >> >> Unfortunately, it still contains critical bugs and I won't be able to take a look into the issue before the next week at best. Until it's fixed, it's not possible to run the benchmarks. Although I expect it to improve performance on longer integer arrays based on a benchmark I've written in C++ and Assembly. The results aren't comparable to the jmh results, so I won't post them here. > > Hi @mikabl-arm , the improvements for larger sizes look impressive, good work! > any timeline for getting it merged? Hi @snadampal ! Glad that you find the change useful ? Thanks to @nick-arm I have some progress with fixing existing issues, so I'm looking forward to update the PR before next Tuesday. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18487#issuecomment-2195181632 From mdoerr at openjdk.org Thu Jun 27 16:42:14 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Thu, 27 Jun 2024 16:42:14 GMT Subject: RFR: JDK-8331732 : [PPC64] Unify and optimize code which converts != 0 to 1 [v4] In-Reply-To: References: Message-ID: On Thu, 27 Jun 2024 16:27:36 GMT, Suchismith Roy wrote: >> [JDK-8331732](https://bugs.openjdk.org/browse/JDK-8331732) >> The template interpreter contains branch-free conversion code for T_BOOLEAN (TemplateInterpreterGenerator::generate_result_handler_for). >> >> SharedRuntime::generate_native_wrapper uses unoptimized code to "Unpack the native result" for T_BOOLEAN. >> Power10 has the "setbc" / "setbcr" instruction. >> >> A new function has been created for the conversion and use "setbcr" on Power10 (determined by VM_Version::has_brw()) and otherwise the branch-free implementation. We should have a function for 32 and one for 64 bit operations (or one with supports both). >> >> The new code for MacroAssembler::verify_secondary_supers_table also uses the new function. > > Suchismith Roy has updated the pull request incrementally with two additional commits since the last revision: > > - Comments and spaces > - Comments and spaces Thanks! This looks good now, except very minor nits. src/hotspot/cpu/ppc/macroAssembler_ppc.cpp line 27: > 25: > 26: #include "precompiled.hpp" > 27: #include "asm/assembler.inline.hpp" I think this is not needed because it is included via `macroAssembler.inline.hpp`. src/hotspot/cpu/ppc/macroAssembler_ppc.cpp line 351: > 349: > 350: if (VM_Version::has_brw()) { > 351: if(is_64bit) { Coding style should use a whitespace: `if (` src/hotspot/cpu/ppc/macroAssembler_ppc.cpp line 354: > 352: cmpdi(CCR0, dst, 0); > 353: } > 354: else { Coding style: `} else {` src/hotspot/cpu/ppc/macroAssembler_ppc.cpp line 359: > 357: setbcr(dst, CCR0, Assembler::zero); > 358: } > 359: else { Coding style: `} else {` src/hotspot/cpu/ppc/macroAssembler_ppc.cpp line 362: > 360: neg(temp, dst); > 361: orr(temp, dst, temp); > 362: if(is_64bit) { Coding style should use a whitespace: `if (` ------------- PR Review: https://git.openjdk.org/jdk/pull/19886#pullrequestreview-2145925527 PR Review Comment: https://git.openjdk.org/jdk/pull/19886#discussion_r1657457757 PR Review Comment: https://git.openjdk.org/jdk/pull/19886#discussion_r1657449820 PR Review Comment: https://git.openjdk.org/jdk/pull/19886#discussion_r1657450572 PR Review Comment: https://git.openjdk.org/jdk/pull/19886#discussion_r1657452613 PR Review Comment: https://git.openjdk.org/jdk/pull/19886#discussion_r1657451092 From mdoerr at openjdk.org Thu Jun 27 16:42:15 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Thu, 27 Jun 2024 16:42:15 GMT Subject: RFR: JDK-8331732 : [PPC64] Unify and optimize code which converts != 0 to 1 [v2] In-Reply-To: References: <1_yx1uaGpcU9s6yBHI6WHy-P5kOJS5xIK7NnQBkVM3E=.513f043b-786d-4120-998a-8f4d209cd858@github.com> <-SRQP2arPvQSbKpKXBvKxo4OQwWK-O_vPPbVxyhR8Bg=.bf68f7aa-5466-426c-ac9f-78ce7eb8cf7c@github.com> Message-ID: <-7A4ADZIoN_90lFtFG6U7iqUf80uXvdDQq2GvQoX4gU=.25f66709-0c36-4f7f-bf50-50ab4f1ccbe2@github.com> On Thu, 27 Jun 2024 08:05:55 GMT, Suchismith Roy wrote: >> src/hotspot/cpu/ppc/macroAssembler_ppc.cpp line 356: >> >>> 354: else >>> 355: cmpwi(CCR0, dst, 0); >>> 356: setbcr(dst, CCR0, Assembler::zero); >> >> This is what I understood after implementation & definition: >> >> If bit BI of the CR contains a 1, register RT is set to 0. Otherwise, register RT is set to 1. >> >> CCR0 will contain `1` when `dst == 0`. then `dst` will be set to `1` by `setbcr`. > > Yes the bit related to value of zero will be set. and setbcr will return 0 , if value is 1 in the CCR0. So the return register will have a value of 0, since it is 0. Similarly for non zero value, setbcr will return 1, and return register will have value of 1, which is the expected behaviour. I'd prefer using `Assembler::equal` because we have compared with 0 before. Also, please add curly braces. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19886#discussion_r1657453959 From sroy at openjdk.org Thu Jun 27 17:04:23 2024 From: sroy at openjdk.org (Suchismith Roy) Date: Thu, 27 Jun 2024 17:04:23 GMT Subject: RFR: JDK-8331732 : [PPC64] Unify and optimize code which converts != 0 to 1 [v5] In-Reply-To: References: Message-ID: <8kYHeAYif4020-JpG7jEbObkLDSDpf7ZSDbWW8RcmPA=.783fc4df-0a12-4882-bf01-0f96eafeb549@github.com> > [JDK-8331732](https://bugs.openjdk.org/browse/JDK-8331732) > The template interpreter contains branch-free conversion code for T_BOOLEAN (TemplateInterpreterGenerator::generate_result_handler_for). > > SharedRuntime::generate_native_wrapper uses unoptimized code to "Unpack the native result" for T_BOOLEAN. > Power10 has the "setbc" / "setbcr" instruction. > > A new function has been created for the conversion and use "setbcr" on Power10 (determined by VM_Version::has_brw()) and otherwise the branch-free implementation. We should have a function for 32 and one for 64 bit operations (or one with supports both). > > The new code for MacroAssembler::verify_secondary_supers_table also uses the new function. Suchismith Roy has updated the pull request incrementally with one additional commit since the last revision: braces,assmebler:equal ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19886/files - new: https://git.openjdk.org/jdk/pull/19886/files/05e6b031..30f566b7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19886&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19886&range=03-04 Stats: 9 lines in 1 file changed: 0 ins; 3 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/19886.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19886/head:pull/19886 PR: https://git.openjdk.org/jdk/pull/19886 From duke at openjdk.org Thu Jun 27 17:13:38 2024 From: duke at openjdk.org (duke) Date: Thu, 27 Jun 2024 17:13:38 GMT Subject: RFR: 8334843: RISC-V: Fix wraparound checking for r_array_index in lookup_secondary_supers_table_slow_path In-Reply-To: References: Message-ID: On Mon, 24 Jun 2024 09:25:31 GMT, Gui Cao wrote: > Branch condition for r_array_index wraparound checking in lookup_secondary_supers_table_slow_path is wrong. > > // Check for wraparound. > Label skip; > bge(r_array_length, r_array_index, skip); > mv(r_array_index, zr); > bind(skip); > > As discussed at https://github.com/openjdk/jdk/pull/19320/files#r1650548279 . If length == index, then we must set index to 0. That is `blt(r_array_index,r_array_length,skip);`. > > ### Correctness testing: > - [x] Run tier1-3 tests on SOPHON SG2042 (release) > > ### JMH tested on Banana Pi BPI-F3 board (has Zbb) and Enable UseZbb > without this patch: > > SecondarySupersLookup.testNegative00 avgt 15 13.275 ? 0.223 ns/op > SecondarySupersLookup.testNegative01 avgt 15 13.264 ? 0.201 ns/op > SecondarySupersLookup.testNegative02 avgt 15 13.261 ? 0.194 ns/op > SecondarySupersLookup.testNegative03 avgt 15 13.271 ? 0.210 ns/op > SecondarySupersLookup.testNegative04 avgt 15 13.265 ? 0.201 ns/op > SecondarySupersLookup.testNegative05 avgt 15 13.258 ? 0.191 ns/op > SecondarySupersLookup.testNegative06 avgt 15 13.280 ? 0.225 ns/op > SecondarySupersLookup.testNegative07 avgt 15 13.268 ? 0.201 ns/op > SecondarySupersLookup.testNegative08 avgt 15 13.266 ? 0.202 ns/op > SecondarySupersLookup.testNegative09 avgt 15 13.261 ? 0.196 ns/op > SecondarySupersLookup.testNegative10 avgt 15 13.268 ? 0.198 ns/op > SecondarySupersLookup.testNegative16 avgt 15 13.268 ? 0.205 ns/op > SecondarySupersLookup.testNegative20 avgt 15 13.284 ? 0.231 ns/op > SecondarySupersLookup.testNegative30 avgt 15 13.281 ? 0.226 ns/op > SecondarySupersLookup.testNegative32 avgt 15 13.273 ? 0.215 ns/op > SecondarySupersLookup.testNegative40 avgt 15 13.287 ? 0.233 ns/op > SecondarySupersLookup.testNegative50 avgt 15 13.292 ? 0.242 ns/op > SecondarySupersLookup.testNegative55 avgt 15 53.064 ? 0.757 ns/op > SecondarySupersLookup.testNegative56 avgt 15 53.052 ? 0.767 ns/op > SecondarySupersLookup.testNegative57 avgt 15 53.068 ? 0.803 ns/op > SecondarySupersLookup.testNegative58 avgt 15 53.076 ? 0.776 ns/op > SecondarySupersLookup.testNegative59 avgt 15 53.095 ? 0.846 ns/op > SecondarySupersLookup.testNegative60 avgt 15 75.106 ? 1.033 ns/op > SecondarySupersLookup.testNegative61 avgt 15 76.832 ? 4.047 ns/op > SecondarySupersLookup.testNegative62 avgt 15 75.085 ? 1.010 ns/op > SecondarySupersLookup.testNegative63 avgt 15 153.709 ? 0.893 ns/op > SecondarySupersLookup.testNegative64 ... @zifeihan Your change (at version 4780027f9f9e43ebb6e17b6079abb4b993395b8e) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19852#issuecomment-2194141988 From duke at openjdk.org Thu Jun 27 17:31:58 2024 From: duke at openjdk.org (duke) Date: Thu, 27 Jun 2024 17:31:58 GMT Subject: RFR: 8335108: Build error after JDK-8333658 due to class templates In-Reply-To: <6ljQH6-vh9FlOzTeKq4Satpqa4V-Jj-KYzh-4bmTWq8=.7b6ebf6a-4ed8-4a58-a5b6-88083dd4df6d@github.com> References: <6ljQH6-vh9FlOzTeKq4Satpqa4V-Jj-KYzh-4bmTWq8=.7b6ebf6a-4ed8-4a58-a5b6-88083dd4df6d@github.com> Message-ID: On Tue, 25 Jun 2024 18:52:20 GMT, Sonia Zaldana Calles wrote: > Hi all, > > This PR addresses [8335108](https://bugs.openjdk.org/browse/JDK-8335108). > > The error arises as template-id is not allowed for constructor/destructor in C++20. > > Testing: > - [x] Compilation succeeds with g++ 14.1.1. > > Thanks, > Sonia @SoniaZaldana Your change (at version 141eced97236b58a80a625eafdd09495332391aa) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19890#issuecomment-2192099459 From sroy at openjdk.org Thu Jun 27 17:34:11 2024 From: sroy at openjdk.org (Suchismith Roy) Date: Thu, 27 Jun 2024 17:34:11 GMT Subject: RFR: JDK-8331732 : [PPC64] Unify and optimize code which converts != 0 to 1 [v6] In-Reply-To: References: Message-ID: > [JDK-8331732](https://bugs.openjdk.org/browse/JDK-8331732) > The template interpreter contains branch-free conversion code for T_BOOLEAN (TemplateInterpreterGenerator::generate_result_handler_for). > > SharedRuntime::generate_native_wrapper uses unoptimized code to "Unpack the native result" for T_BOOLEAN. > Power10 has the "setbc" / "setbcr" instruction. > > A new function has been created for the conversion and use "setbcr" on Power10 (determined by VM_Version::has_brw()) and otherwise the branch-free implementation. We should have a function for 32 and one for 64 bit operations (or one with supports both). > > The new code for MacroAssembler::verify_secondary_supers_table also uses the new function. Suchismith Roy has updated the pull request incrementally with two additional commits since the last revision: - remove assembler header - remove assembler header ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19886/files - new: https://git.openjdk.org/jdk/pull/19886/files/30f566b7..3bff218b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19886&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19886&range=04-05 Stats: 1 line in 1 file changed: 0 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/19886.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19886/head:pull/19886 PR: https://git.openjdk.org/jdk/pull/19886 From mdoerr at openjdk.org Thu Jun 27 17:39:55 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Thu, 27 Jun 2024 17:39:55 GMT Subject: RFR: JDK-8331732 : [PPC64] Unify and optimize code which converts != 0 to 1 [v6] In-Reply-To: References: Message-ID: On Thu, 27 Jun 2024 17:34:11 GMT, Suchismith Roy wrote: >> [JDK-8331732](https://bugs.openjdk.org/browse/JDK-8331732) >> The template interpreter contains branch-free conversion code for T_BOOLEAN (TemplateInterpreterGenerator::generate_result_handler_for). >> >> SharedRuntime::generate_native_wrapper uses unoptimized code to "Unpack the native result" for T_BOOLEAN. >> Power10 has the "setbc" / "setbcr" instruction. >> >> A new function has been created for the conversion and use "setbcr" on Power10 (determined by VM_Version::has_brw()) and otherwise the branch-free implementation. We should have a function for 32 and one for 64 bit operations (or one with supports both). >> >> The new code for MacroAssembler::verify_secondary_supers_table also uses the new function. > > Suchismith Roy has updated the pull request incrementally with two additional commits since the last revision: > > - remove assembler header > - remove assembler header I had forgotten one minor thing. Otherwise, LGTM. I can test it over the weekend. src/hotspot/cpu/ppc/macroAssembler_ppc.hpp line 298: > 296: > 297: // Branch-free implementation to convert !=0 to 1. > 298: void normalize_bool(Register dst, Register src, bool use_64bit); Better use "temp" instead of "src" which make be confusing. ------------- PR Review: https://git.openjdk.org/jdk/pull/19886#pullrequestreview-2146082835 PR Review Comment: https://git.openjdk.org/jdk/pull/19886#discussion_r1657543879 From amitkumar at openjdk.org Thu Jun 27 18:07:18 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Thu, 27 Jun 2024 18:07:18 GMT Subject: RFR: 8331126: [s390x] secondary_super_cache does not scale well [v6] In-Reply-To: References: Message-ID: On Thu, 27 Jun 2024 16:02:04 GMT, Martin Doerr wrote: >> I guess this check requires the r_result to be in range `[0,1]`. So it checks (without modifying) whether the value is greater than or equal to 0 & less than equal to 1. >> >> By "clear bit 0" did you mean to `and` it with `1` and then do the check ? I'm really not sure what were your thoughts. Could you please elaborate ? > > There are several ways to clear the least significant bit. E.g. `and` it with ~1 and compare the result with 0. Or shift right by 1 and compare the result with 0. But if we clobber this then verification will fail. Because in this method we are dependent on value present in `r_result`. It's just that I'm making sure that whatever value is there it's either `1` or `0`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19544#discussion_r1657574235 From vlivanov at openjdk.org Thu Jun 27 18:27:31 2024 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Thu, 27 Jun 2024 18:27:31 GMT Subject: RFR: 8304693: Remove -XX:-UseVtableBasedCHA In-Reply-To: References: Message-ID: On Wed, 26 Jun 2024 17:47:57 GMT, Vladimir Ivanov wrote: > JDK-8266074 introduced new CHA implementation and a flag (-XX:-UseVtableBasedCHA) to switch back to the original implementation for diagnostic purposes. Vtable-based CHA implementation has been turned on by default since 17 and now the time has come to remove UseVtableBasedCHA flag. > > Testing: hs-tier1 - hs-tier6 Thanks for the reviews, Vladimir, Coleen, and David. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19911#issuecomment-2195414750 From vlivanov at openjdk.org Thu Jun 27 18:27:31 2024 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Thu, 27 Jun 2024 18:27:31 GMT Subject: Integrated: 8304693: Remove -XX:-UseVtableBasedCHA In-Reply-To: References: Message-ID: On Wed, 26 Jun 2024 17:47:57 GMT, Vladimir Ivanov wrote: > JDK-8266074 introduced new CHA implementation and a flag (-XX:-UseVtableBasedCHA) to switch back to the original implementation for diagnostic purposes. Vtable-based CHA implementation has been turned on by default since 17 and now the time has come to remove UseVtableBasedCHA flag. > > Testing: hs-tier1 - hs-tier6 This pull request has now been integrated. Changeset: 243bae7d Author: Vladimir Ivanov URL: https://git.openjdk.org/jdk/commit/243bae7dc0c3e71c02ffed9e1ee7d436af11d3b9 Stats: 166 lines in 11 files changed: 1 ins; 154 del; 11 mod 8304693: Remove -XX:-UseVtableBasedCHA Reviewed-by: kvn, coleenp, dholmes ------------- PR: https://git.openjdk.org/jdk/pull/19911 From iklam at openjdk.org Thu Jun 27 18:35:21 2024 From: iklam at openjdk.org (Ioi Lam) Date: Thu, 27 Jun 2024 18:35:21 GMT Subject: RFR: 8309634: Resolve CONSTANT_MethodRef at CDS dump time [v3] In-Reply-To: References: Message-ID: > Resolve `CONSTANT_MethodRef` entries during CDS dump time to improve start-up performance. > > - This PR uses the same framework introduced in #19355 and just added handling for methods. > - Support for getstatic/putstatic/invokestatic will be done separately in [JDK-8334898](https://bugs.openjdk.org/browse/JDK-8334898) Ioi Lam has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: - Merge branch 'master' into 8309634-resolve-methods-at-dumptime - Merge branch 'master' into 8309634-resolve-methods-at-dumptime - @calvinccheung and @matias9927 comments - Fixed whitespaces - 8309634: Resolve CONSTANT_MethodRef at CDS dump time ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19866/files - new: https://git.openjdk.org/jdk/pull/19866/files/fd039bef..368621a6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19866&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19866&range=01-02 Stats: 5646 lines in 136 files changed: 3442 ins; 1560 del; 644 mod Patch: https://git.openjdk.org/jdk/pull/19866.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19866/head:pull/19866 PR: https://git.openjdk.org/jdk/pull/19866 From duke at openjdk.org Thu Jun 27 18:42:22 2024 From: duke at openjdk.org (Larry Cable) Date: Thu, 27 Jun 2024 18:42:22 GMT Subject: RFR: 8261242: [Linux] OSContainer::is_containerized() returns true when run outside a container [v7] In-Reply-To: References: Message-ID: On Tue, 25 Jun 2024 13:54:46 GMT, Severin Gehwolf wrote: >> Please review this enhancement to the container detection code which allows it to figure out whether the JVM is actually running inside a container (`podman`, `docker`, `crio`), or with some other means that enforces memory/cpu limits by means of the cgroup filesystem. If neither of those conditions hold, the JVM runs in not containerized mode, addressing the issue described in the JBS tracker. For example, on my Linux system `is_containerized() == false" is being indicated with the following trace log line: >> >> >> [0.001s][debug][os,container] OSContainer::init: is_containerized() = false because no cpu or memory limit is present >> >> >> This state is being exposed by the Java `Metrics` API class using the new (still JDK internal) `isContainerized()` method. Example: >> >> >> java -XshowSettings:system --version >> Operating System Metrics: >> Provider: cgroupv1 >> System not containerized. >> openjdk 23-internal 2024-09-17 >> OpenJDK Runtime Environment (fastdebug build 23-internal-adhoc.sgehwolf.jdk-jdk) >> OpenJDK 64-Bit Server VM (fastdebug build 23-internal-adhoc.sgehwolf.jdk-jdk, mixed mode, sharing) >> >> >> The basic property this is being built on is the observation that the cgroup controllers typically get mounted read only into containers. Note that the current container tests assert that `OSContainer::is_containerized() == true` in various tests. Therefore, using the heuristic of "is any memory or cpu limit present" isn't sufficient. I had considered that in an earlier iteration, but many container tests failed. >> >> Overall, I think, with this patch we improve the current situation of claiming a containerized system being present when it's actually just a regular Linux system. >> >> Testing: >> >> - [x] GHA (risc-v failure seems infra related) >> - [x] Container tests on Linux x86_64 of cgroups v1 and cgroups v2 (including gtests) >> - [x] Some manual testing using cri-o >> >> Thoughts? > > Severin Gehwolf has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 17 commits: > > - Refactor mount info matching to helper function > - Merge branch 'master' into jdk-8261242-is-containerized-fix > - Remove problem listing of PlainRead which is reworked here > - Merge branch 'master' into jdk-8261242-is-containerized-fix > - Merge branch 'master' into jdk-8261242-is-containerized-fix > - Add doc for mountinfo scanning. > - Unify naming of variables > - Merge branch 'master' into jdk-8261242-is-containerized-fix > - Merge branch 'master' into jdk-8261242-is-containerized-fix > - jcheck fixes > - ... and 7 more: https://git.openjdk.org/jdk/compare/baafa662...532ea33b src/hotspot/share/prims/jvm.cpp line 504: > 502: JVM_LEAF(jboolean, JVM_IsContainerized(void)) > 503: #ifdef LINUX > 504: if (OSContainer::is_containerized()) { // nit: personal preference... return OSContainer::isContainerized() ? JNI_TRUE : JNI_FALSE; ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18201#discussion_r1657650139 From duke at openjdk.org Thu Jun 27 19:08:21 2024 From: duke at openjdk.org (Larry Cable) Date: Thu, 27 Jun 2024 19:08:21 GMT Subject: RFR: 8261242: [Linux] OSContainer::is_containerized() returns true when run outside a container [v7] In-Reply-To: References: Message-ID: On Tue, 25 Jun 2024 13:54:46 GMT, Severin Gehwolf wrote: >> Please review this enhancement to the container detection code which allows it to figure out whether the JVM is actually running inside a container (`podman`, `docker`, `crio`), or with some other means that enforces memory/cpu limits by means of the cgroup filesystem. If neither of those conditions hold, the JVM runs in not containerized mode, addressing the issue described in the JBS tracker. For example, on my Linux system `is_containerized() == false" is being indicated with the following trace log line: >> >> >> [0.001s][debug][os,container] OSContainer::init: is_containerized() = false because no cpu or memory limit is present >> >> >> This state is being exposed by the Java `Metrics` API class using the new (still JDK internal) `isContainerized()` method. Example: >> >> >> java -XshowSettings:system --version >> Operating System Metrics: >> Provider: cgroupv1 >> System not containerized. >> openjdk 23-internal 2024-09-17 >> OpenJDK Runtime Environment (fastdebug build 23-internal-adhoc.sgehwolf.jdk-jdk) >> OpenJDK 64-Bit Server VM (fastdebug build 23-internal-adhoc.sgehwolf.jdk-jdk, mixed mode, sharing) >> >> >> The basic property this is being built on is the observation that the cgroup controllers typically get mounted read only into containers. Note that the current container tests assert that `OSContainer::is_containerized() == true` in various tests. Therefore, using the heuristic of "is any memory or cpu limit present" isn't sufficient. I had considered that in an earlier iteration, but many container tests failed. >> >> Overall, I think, with this patch we improve the current situation of claiming a containerized system being present when it's actually just a regular Linux system. >> >> Testing: >> >> - [x] GHA (risc-v failure seems infra related) >> - [x] Container tests on Linux x86_64 of cgroups v1 and cgroups v2 (including gtests) >> - [x] Some manual testing using cri-o >> >> Thoughts? > > Severin Gehwolf has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 17 commits: > > - Refactor mount info matching to helper function > - Merge branch 'master' into jdk-8261242-is-containerized-fix > - Remove problem listing of PlainRead which is reworked here > - Merge branch 'master' into jdk-8261242-is-containerized-fix > - Merge branch 'master' into jdk-8261242-is-containerized-fix > - Add doc for mountinfo scanning. > - Unify naming of variables > - Merge branch 'master' into jdk-8261242-is-containerized-fix > - Merge branch 'master' into jdk-8261242-is-containerized-fix > - jcheck fixes > - ... and 7 more: https://git.openjdk.org/jdk/compare/baafa662...532ea33b Marked as reviewed by larry-cable at github.com (no known OpenJDK username). ------------- PR Review: https://git.openjdk.org/jdk/pull/18201#pullrequestreview-2146294816 From dlong at openjdk.org Thu Jun 27 20:05:27 2024 From: dlong at openjdk.org (Dean Long) Date: Thu, 27 Jun 2024 20:05:27 GMT Subject: RFR: 8332689: RISC-V: Use load instead of trampolines [v19] In-Reply-To: References: Message-ID: <-SCwSVP6zTNao7X4VlIPRor8OU9vDg79oGVDWTf4XCM=.b72cbf6f-66e7-4fb3-b387-e00ae0537ac6@github.com> On Thu, 27 Jun 2024 08:32:41 GMT, Robbin Ehn wrote: >> Hi all, please consider! >> >> Today we do JAL to **dest** if **dest** is in reach (+/- 1 MB). >> Using a very small application or running very short time we have fast patchable calls. >> But any normal application running longer will increase the code size and code chrun/fragmentation. >> So whatever or not you get hot fast calls rely on luck. >> >> To be patchable and get code cache reach we also emit a stub trampoline which we can point the JAL to. >> This would be the common case for a patchable call. >> >> Code stream: >> JAL >> Stubs: >> AUIPC >> LD >> JALR >> >> >> >> On some CPUs L1D and L1I can't contain the same cache line, which means the tramopline stub can bounce from L1I->L1D->L1I, which is expensive. >> Even if you don't have that problem having a call to a jump is not the fastest way. >> Loading the address avoids the pitsfalls of cmodx. >> >> This patch suggest to solve the problems with trampolines, we take small penalty in the naive case of JAL to **dest**, >> and instead do by default: >> >> Code stream: >> AUIPC >> LD >> JALR >> Stubs: >> >> >> An experimental option for turning trampolines back on exists. >> >> It should be possible to enhanced this with the WIP [Zjid](https://github.com/riscv/riscv-j-extension) by changing the JALR to JAL and nop out the auipc+ld (as the current proposal of Zjid forces the I-fetcher to fetch instruction in order (meaning we will avoid a lot issues which arm has)) when in reach and vice-versa. >> >> Numbers from VF2 (I have done them a few times, they are always overall in favor of this patch): >> >> fop (msec) 2239 | 2128 = 0.950424 >> h2 (msec) 18660 | 16594 = 0.889282 >> jython (msec) 22022 | 21925 = 0.995595 >> luindex (msec) 2866 | 2842 = 0.991626 >> lusearch (msec) 4108 | 4311 = 1.04942 >> lusearch-fix (msec) 4406 | 4116 = 0.934181 >> pmd (msec) 5976 | 5897 = 0.98678 >> jython (msec) 22022 | 21925 = 0.995595 >> Avg: 0.974112 >> fop(xcomp) (msec) 2721 | 2714 = 0.997427 >> h2(xcomp) ... > > Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 28 commits: > > - Rename lc > - Merge branch 'master' into 8332689 > - Merge branch 'master' into 8332689 > - Comments > - Missed in merge-fixes, minor revert > - Merge branch 'master' into 8332689 > - Minor review comments > - Merge branch 'master' into 8332689 > - To be pushed > - Merge branch 'master' into 8332689 > - ... and 18 more: https://git.openjdk.org/jdk/compare/46b817b7...442680b4 src/hotspot/cpu/riscv/macroAssembler_riscv.hpp line 1275: > 1273: // > 1274: // Return: the call PC or null if CodeCache is full. > 1275: address patchable_far_call(Address entry) { For runtime_call_type, I don't think we ever update the target/destination, so the name "patchable" seems not quite right for them. Also, for runtime_call_type, since they never change, we can decide early if a near call is possible when the destination is always reachable (based on the bounds of code cache), which is what aarch64 does for trampoline_call. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19453#discussion_r1657748195 From iklam at openjdk.org Thu Jun 27 20:12:25 2024 From: iklam at openjdk.org (Ioi Lam) Date: Thu, 27 Jun 2024 20:12:25 GMT Subject: RFR: 8309634: Resolve CONSTANT_MethodRef at CDS dump time [v3] In-Reply-To: References: Message-ID: On Wed, 26 Jun 2024 18:17:00 GMT, Matias Saavedra Silva wrote: >> Ioi Lam has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: >> >> - Merge branch 'master' into 8309634-resolve-methods-at-dumptime >> - Merge branch 'master' into 8309634-resolve-methods-at-dumptime >> - @calvinccheung and @matias9927 comments >> - Fixed whitespaces >> - 8309634: Resolve CONSTANT_MethodRef at CDS dump time > > Thanks for the updates! Thanks @matias9927 and @calvinccheung for the review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19866#issuecomment-2195587619 From iklam at openjdk.org Thu Jun 27 20:12:25 2024 From: iklam at openjdk.org (Ioi Lam) Date: Thu, 27 Jun 2024 20:12:25 GMT Subject: Integrated: 8309634: Resolve CONSTANT_MethodRef at CDS dump time In-Reply-To: References: Message-ID: On Mon, 24 Jun 2024 17:21:18 GMT, Ioi Lam wrote: > Resolve `CONSTANT_MethodRef` entries during CDS dump time to improve start-up performance. > > - This PR uses the same framework introduced in #19355 and just added handling for methods. > - Support for getstatic/putstatic/invokestatic will be done separately in [JDK-8334898](https://bugs.openjdk.org/browse/JDK-8334898) This pull request has now been integrated. Changeset: c35e58a5 Author: Ioi Lam URL: https://git.openjdk.org/jdk/commit/c35e58a5adf06e25a3b482e2be384af95a84f11a Stats: 354 lines in 13 files changed: 312 ins; 6 del; 36 mod 8309634: Resolve CONSTANT_MethodRef at CDS dump time Reviewed-by: matsaave, ccheung ------------- PR: https://git.openjdk.org/jdk/pull/19866 From mdoerr at openjdk.org Thu Jun 27 20:25:19 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Thu, 27 Jun 2024 20:25:19 GMT Subject: RFR: 8331126: [s390x] secondary_super_cache does not scale well [v6] In-Reply-To: References: Message-ID: On Thu, 27 Jun 2024 18:04:24 GMT, Amit Kumar wrote: >> There are several ways to clear the least significant bit. E.g. `and` it with ~1 and compare the result with 0. Or shift right by 1 and compare the result with 0. > > But if we clobber this then verification will fail. Because in this method we are dependent on value present in `r_result`. It's just that I'm making sure that whatever value is there it's either `1` or `0`. There are instructions which don't kill the src operand. E.g. ngrk or slrg. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19544#discussion_r1657775357 From gcao at openjdk.org Fri Jun 28 01:47:34 2024 From: gcao at openjdk.org (Gui Cao) Date: Fri, 28 Jun 2024 01:47:34 GMT Subject: Integrated: 8334843: RISC-V: Fix wraparound checking for r_array_index in lookup_secondary_supers_table_slow_path In-Reply-To: References: Message-ID: On Mon, 24 Jun 2024 09:25:31 GMT, Gui Cao wrote: > Branch condition for r_array_index wraparound checking in lookup_secondary_supers_table_slow_path is wrong. > > // Check for wraparound. > Label skip; > bge(r_array_length, r_array_index, skip); > mv(r_array_index, zr); > bind(skip); > > As discussed at https://github.com/openjdk/jdk/pull/19320/files#r1650548279 . If length == index, then we must set index to 0. That is `blt(r_array_index,r_array_length,skip);`. > > ### Correctness testing: > - [x] Run tier1-3 tests on SOPHON SG2042 (release) > > ### JMH tested on Banana Pi BPI-F3 board (has Zbb) and Enable UseZbb > without this patch: > > SecondarySupersLookup.testNegative00 avgt 15 13.275 ? 0.223 ns/op > SecondarySupersLookup.testNegative01 avgt 15 13.264 ? 0.201 ns/op > SecondarySupersLookup.testNegative02 avgt 15 13.261 ? 0.194 ns/op > SecondarySupersLookup.testNegative03 avgt 15 13.271 ? 0.210 ns/op > SecondarySupersLookup.testNegative04 avgt 15 13.265 ? 0.201 ns/op > SecondarySupersLookup.testNegative05 avgt 15 13.258 ? 0.191 ns/op > SecondarySupersLookup.testNegative06 avgt 15 13.280 ? 0.225 ns/op > SecondarySupersLookup.testNegative07 avgt 15 13.268 ? 0.201 ns/op > SecondarySupersLookup.testNegative08 avgt 15 13.266 ? 0.202 ns/op > SecondarySupersLookup.testNegative09 avgt 15 13.261 ? 0.196 ns/op > SecondarySupersLookup.testNegative10 avgt 15 13.268 ? 0.198 ns/op > SecondarySupersLookup.testNegative16 avgt 15 13.268 ? 0.205 ns/op > SecondarySupersLookup.testNegative20 avgt 15 13.284 ? 0.231 ns/op > SecondarySupersLookup.testNegative30 avgt 15 13.281 ? 0.226 ns/op > SecondarySupersLookup.testNegative32 avgt 15 13.273 ? 0.215 ns/op > SecondarySupersLookup.testNegative40 avgt 15 13.287 ? 0.233 ns/op > SecondarySupersLookup.testNegative50 avgt 15 13.292 ? 0.242 ns/op > SecondarySupersLookup.testNegative55 avgt 15 53.064 ? 0.757 ns/op > SecondarySupersLookup.testNegative56 avgt 15 53.052 ? 0.767 ns/op > SecondarySupersLookup.testNegative57 avgt 15 53.068 ? 0.803 ns/op > SecondarySupersLookup.testNegative58 avgt 15 53.076 ? 0.776 ns/op > SecondarySupersLookup.testNegative59 avgt 15 53.095 ? 0.846 ns/op > SecondarySupersLookup.testNegative60 avgt 15 75.106 ? 1.033 ns/op > SecondarySupersLookup.testNegative61 avgt 15 76.832 ? 4.047 ns/op > SecondarySupersLookup.testNegative62 avgt 15 75.085 ? 1.010 ns/op > SecondarySupersLookup.testNegative63 avgt 15 153.709 ? 0.893 ns/op > SecondarySupersLookup.testNegative64 ... This pull request has now been integrated. Changeset: cd46c87d Author: Gui Cao Committer: Fei Yang URL: https://git.openjdk.org/jdk/commit/cd46c87dc916b2b74067accf80c62df1792f74cf Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod 8334843: RISC-V: Fix wraparound checking for r_array_index in lookup_secondary_supers_table_slow_path Reviewed-by: fyang ------------- PR: https://git.openjdk.org/jdk/pull/19852 From kbarrett at openjdk.org Fri Jun 28 03:49:50 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Fri, 28 Jun 2024 03:49:50 GMT Subject: RFR: 8335294: Fix simple -Wzero-as-null-pointer-constant warnings in gc code Message-ID: Please review this change that replaces some uses of literal 0 as a null pointer constant in gc code to instead use nullptr. There is also one place where the use was eliminated entirely, because it was dead code following a [[noreturn]] call. Testing: mach5 tier1. ------------- Commit messages: - simple fixes in gc code Changes: https://git.openjdk.org/jdk/pull/19934/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19934&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8335294 Stats: 31 lines in 10 files changed: 0 ins; 1 del; 30 mod Patch: https://git.openjdk.org/jdk/pull/19934.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19934/head:pull/19934 PR: https://git.openjdk.org/jdk/pull/19934 From stuefe at openjdk.org Fri Jun 28 06:01:18 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Fri, 28 Jun 2024 06:01:18 GMT Subject: RFR: 8334738: os::print_hex_dump should optionally print ASCII [v2] In-Reply-To: References: Message-ID: On Thu, 27 Jun 2024 07:04:50 GMT, David Holmes wrote: > Okay - seems reasonable. > > FYI I am away for a few days. > > Thanks Many thanks, David, and have a nice time. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19835#issuecomment-2196197027 From amitkumar at openjdk.org Fri Jun 28 06:40:49 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Fri, 28 Jun 2024 06:40:49 GMT Subject: RFR: 8331126: [s390x] secondary_super_cache does not scale well [v7] In-Reply-To: References: Message-ID: > s390x Port for [JDK-8180450](https://bugs.openjdk.org/browse/JDK-8180450) > > I ran `tier1` test with `-XX:+UseSecondarySupersTable -XX:+VerifySecondarySupers -XX:+StressSecondarySupers` in fastdebug vm and didn't see any new failure appearing; Except one I have mentioned [here](https://github.com/openjdk/jdk/pull/19368#issuecomment-2154983693); But this is reproducible on every other architecture with these flags. > > > Without Patch: > > SecondarySuperCacheHits.test avgt 15 0.929 ? 0.010 ns/op > > SecondarySuperCacheInterContention.test avgt 15 1.413 ? 0.007 ns/op > SecondarySuperCacheInterContention.test:t1 avgt 15 1.415 ? 0.016 ns/op > SecondarySuperCacheInterContention.test:t2 avgt 15 1.410 ? 0.017 ns/op > > Benchmark Mode Cnt Score Error Units > SecondarySupersLookup.testNegative00 avgt 15 1.806 ? 0.325 ns/op > SecondarySupersLookup.testNegative01 avgt 15 2.364 ? 0.236 ns/op > SecondarySupersLookup.testNegative02 avgt 15 2.903 ? 0.215 ns/op > SecondarySupersLookup.testNegative03 avgt 15 3.417 ? 0.199 ns/op > SecondarySupersLookup.testNegative04 avgt 15 3.758 ? 0.102 ns/op > SecondarySupersLookup.testNegative05 avgt 15 4.352 ? 0.123 ns/op > SecondarySupersLookup.testNegative06 avgt 15 4.800 ? 0.099 ns/op > SecondarySupersLookup.testNegative07 avgt 15 5.365 ? 0.060 ns/op > SecondarySupersLookup.testNegative08 avgt 15 6.316 ? 0.092 ns/op > SecondarySupersLookup.testNegative09 avgt 15 6.669 ? 0.164 ns/op > SecondarySupersLookup.testNegative10 avgt 15 7.041 ? 0.164 ns/op > SecondarySupersLookup.testNegative16 avgt 15 9.336 ? 0.185 ns/op > SecondarySupersLookup.testNegative20 avgt 15 11.373 ? 0.029 ns/op > SecondarySupersLookup.testNegative30 avgt 15 15.236 ? 0.051 ns/op > SecondarySupersLookup.testNegative32 avgt 15 16.031 ? 0.091 ns/op > SecondarySupersLookup.testNegative40 avgt 15 19.197 ? 0.279 ns/op > SecondarySupersLookup.testNegative50 avgt 15 23.804 ? 2.387 ns/op > SecondarySupersLookup.testNegative55 avgt 15 25.610 ? 1.155 ns/op > SecondarySupersLookup.testNegative56 avgt 15 26.128 ? 2.203 ns/op > SecondarySupersLookup.testNegative57 avgt 15 26.126 ? 0.881 ns/op > SecondarySupersLookup.testNegative58 avgt 15 26.314 ? 0.521 ns/op > SecondarySupersLookup.testNegative59 avgt 15 26.750 ? 0.837 ns/op > SecondarySupersLookup.testNegative60 avgt 15 27.118 ? 0.557 ns/op > SecondarySupersLookup.testNegative61 avgt 15 27.763 ? 1.628 ns... Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: comments from martin ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19544/files - new: https://git.openjdk.org/jdk/pull/19544/files/ebbca614..d27d2cc3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19544&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19544&range=05-06 Stats: 16 lines in 2 files changed: 5 ins; 6 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/19544.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19544/head:pull/19544 PR: https://git.openjdk.org/jdk/pull/19544 From amitkumar at openjdk.org Fri Jun 28 06:46:23 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Fri, 28 Jun 2024 06:46:23 GMT Subject: RFR: 8319947: Recursive lightweight locking: s390x implementation [v7] In-Reply-To: References: <_HXRejW4TcLfTYbXlRQUVyejaomiwx6HhObYfG1lX6E=.f89b29a0-4c2d-43e1-83be-d7bafd84816c@github.com> Message-ID: On Tue, 25 Jun 2024 11:06:23 GMT, Amit Kumar wrote: >> s390x port for recursive locking. >> >> testing: >> - [x] build fastdebug-vm >> - [x] build slowdebug-vm >> - [x] build release-vm >> - [x] build optimized-vm >> - [x] ./test/jdk/java/util/concurrent (fastdebug-vm) >> - [x] with C1 >> - [x] with C2 >> - [x] with interpreter >> - [x] ./test/jdk/java/util/concurrent (release-vm) >> - [x] with C1 >> - [x] with C2 >> - [x] with interpreter >> - [x] ./test/jdk/java/util/concurrent (slowdebug-vm) >> - [x] with C1 >> - [x] with C2 >> - [x] with interpreter >> - [x] tier1 with fastdebug-vm >> - [x] tier1 with slowdebug-vm >> - [x] tier1 with release-vm >> >> *BenchMarks*: >> >> Results from Performance LPARs : >> >> >> Locking Mode = 1 (without Patch) >> >> Benchmark (innerCount) Mode Cnt Score Error Units >> LockUnlock.testContendedLock 100 avgt 12 5.144 ? 0.035 ns/op >> LockUnlock.testRecursiveLockUnlock 100 avgt 12 3824.742 ? 89.475 ns/op >> LockUnlock.testRecursiveSynchronization 100 avgt 12 25.348 ? 0.559 ns/op >> LockUnlock.testSerialLockUnlock 100 avgt 12 466.629 ? 3.036 ns/op >> LockUnlock.testSimpleLockUnlock 100 avgt 12 468.532 ? 1.793 ns/op >> Finished running test 'micro:vm.lang.LockUnlock' >> >> Locking Mode = 1 (with patch) >> >> Benchmark (innerCount) Mode Cnt Score Error Units >> LockUnlock.testContendedLock 100 avgt 12 5.146 ? 0.027 ns/op >> LockUnlock.testRecursiveLockUnlock 100 avgt 12 3833.175 ? 75.863 ns/op >> LockUnlock.testRecursiveSynchronization 100 avgt 12 25.206 ? 0.519 ns/op >> LockUnlock.testSerialLockUnlock 100 avgt 12 473.973 ? 2.103 ns/op >> LockUnlock.testSimpleLockUnlock 100 avgt 12 470.749 ? 2.229 ns/op >> Finished running test 'micro:vm.lang.LockUnlock' >> >> >> >> >> Locking Mode = 2 (without Patch) >> >> Benchmark (innerCount) Mode Cnt Score Error Units >> LockUnlock.testContendedLock 100 avgt 12 4.688 ? 0.051 ns/op >> LockUnlock.testRecursiveLockUnlock 100 avgt 12 12800.544 ? 92.265 ns/op >> LockUnlock.testRecursiveSynchronization 100 avgt 12 26.486 ? 2.229 ns/op >> LockUnlock.testSerialLockUnlock 100 avgt 12 424.499 ? 0.416 ns/op >> LockUnlock.te... > > Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: > > comments from Lutz Thanks Lutz, Axel for the reviews. I did one more round of testing and things seems fine. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18878#issuecomment-2196248511 From amitkumar at openjdk.org Fri Jun 28 06:46:23 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Fri, 28 Jun 2024 06:46:23 GMT Subject: Integrated: 8319947: Recursive lightweight locking: s390x implementation In-Reply-To: <_HXRejW4TcLfTYbXlRQUVyejaomiwx6HhObYfG1lX6E=.f89b29a0-4c2d-43e1-83be-d7bafd84816c@github.com> References: <_HXRejW4TcLfTYbXlRQUVyejaomiwx6HhObYfG1lX6E=.f89b29a0-4c2d-43e1-83be-d7bafd84816c@github.com> Message-ID: On Sun, 21 Apr 2024 16:30:43 GMT, Amit Kumar wrote: > s390x port for recursive locking. > > testing: > - [x] build fastdebug-vm > - [x] build slowdebug-vm > - [x] build release-vm > - [x] build optimized-vm > - [x] ./test/jdk/java/util/concurrent (fastdebug-vm) > - [x] with C1 > - [x] with C2 > - [x] with interpreter > - [x] ./test/jdk/java/util/concurrent (release-vm) > - [x] with C1 > - [x] with C2 > - [x] with interpreter > - [x] ./test/jdk/java/util/concurrent (slowdebug-vm) > - [x] with C1 > - [x] with C2 > - [x] with interpreter > - [x] tier1 with fastdebug-vm > - [x] tier1 with slowdebug-vm > - [x] tier1 with release-vm > > *BenchMarks*: > > Results from Performance LPARs : > > > Locking Mode = 1 (without Patch) > > Benchmark (innerCount) Mode Cnt Score Error Units > LockUnlock.testContendedLock 100 avgt 12 5.144 ? 0.035 ns/op > LockUnlock.testRecursiveLockUnlock 100 avgt 12 3824.742 ? 89.475 ns/op > LockUnlock.testRecursiveSynchronization 100 avgt 12 25.348 ? 0.559 ns/op > LockUnlock.testSerialLockUnlock 100 avgt 12 466.629 ? 3.036 ns/op > LockUnlock.testSimpleLockUnlock 100 avgt 12 468.532 ? 1.793 ns/op > Finished running test 'micro:vm.lang.LockUnlock' > > Locking Mode = 1 (with patch) > > Benchmark (innerCount) Mode Cnt Score Error Units > LockUnlock.testContendedLock 100 avgt 12 5.146 ? 0.027 ns/op > LockUnlock.testRecursiveLockUnlock 100 avgt 12 3833.175 ? 75.863 ns/op > LockUnlock.testRecursiveSynchronization 100 avgt 12 25.206 ? 0.519 ns/op > LockUnlock.testSerialLockUnlock 100 avgt 12 473.973 ? 2.103 ns/op > LockUnlock.testSimpleLockUnlock 100 avgt 12 470.749 ? 2.229 ns/op > Finished running test 'micro:vm.lang.LockUnlock' > > > > > Locking Mode = 2 (without Patch) > > Benchmark (innerCount) Mode Cnt Score Error Units > LockUnlock.testContendedLock 100 avgt 12 4.688 ? 0.051 ns/op > LockUnlock.testRecursiveLockUnlock 100 avgt 12 12800.544 ? 92.265 ns/op > LockUnlock.testRecursiveSynchronization 100 avgt 12 26.486 ? 2.229 ns/op > LockUnlock.testSerialLockUnlock 100 avgt 12 424.499 ? 0.416 ns/op > LockUnlock.testSimpleLockUnlock 100 avgt 12 424.241 ? 0.840 ns/op > Finished running test 'micro:vm.lang.Lo... This pull request has now been integrated. Changeset: d457609f Author: Amit Kumar URL: https://git.openjdk.org/jdk/commit/d457609f700bbb1fed233f1a04501c995852e5ac Stats: 553 lines in 9 files changed: 426 ins; 64 del; 63 mod 8319947: Recursive lightweight locking: s390x implementation Reviewed-by: aboldtch, lucy ------------- PR: https://git.openjdk.org/jdk/pull/18878 From amitkumar at openjdk.org Fri Jun 28 06:49:46 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Fri, 28 Jun 2024 06:49:46 GMT Subject: RFR: 8331126: [s390x] secondary_super_cache does not scale well [v8] In-Reply-To: References: Message-ID: <2RpIwN7sJviNmnwF-aOVEpd6VufubBjKfNwr-6Uc2dE=.d282f59f-e712-4ff6-a9ce-e920464d0c0b@github.com> > s390x Port for [JDK-8180450](https://bugs.openjdk.org/browse/JDK-8180450) > > I ran `tier1` test with `-XX:+UseSecondarySupersTable -XX:+VerifySecondarySupers -XX:+StressSecondarySupers` in fastdebug vm and didn't see any new failure appearing; Except one I have mentioned [here](https://github.com/openjdk/jdk/pull/19368#issuecomment-2154983693); But this is reproducible on every other architecture with these flags. > > > Without Patch: > > SecondarySuperCacheHits.test avgt 15 0.929 ? 0.010 ns/op > > SecondarySuperCacheInterContention.test avgt 15 1.413 ? 0.007 ns/op > SecondarySuperCacheInterContention.test:t1 avgt 15 1.415 ? 0.016 ns/op > SecondarySuperCacheInterContention.test:t2 avgt 15 1.410 ? 0.017 ns/op > > Benchmark Mode Cnt Score Error Units > SecondarySupersLookup.testNegative00 avgt 15 1.806 ? 0.325 ns/op > SecondarySupersLookup.testNegative01 avgt 15 2.364 ? 0.236 ns/op > SecondarySupersLookup.testNegative02 avgt 15 2.903 ? 0.215 ns/op > SecondarySupersLookup.testNegative03 avgt 15 3.417 ? 0.199 ns/op > SecondarySupersLookup.testNegative04 avgt 15 3.758 ? 0.102 ns/op > SecondarySupersLookup.testNegative05 avgt 15 4.352 ? 0.123 ns/op > SecondarySupersLookup.testNegative06 avgt 15 4.800 ? 0.099 ns/op > SecondarySupersLookup.testNegative07 avgt 15 5.365 ? 0.060 ns/op > SecondarySupersLookup.testNegative08 avgt 15 6.316 ? 0.092 ns/op > SecondarySupersLookup.testNegative09 avgt 15 6.669 ? 0.164 ns/op > SecondarySupersLookup.testNegative10 avgt 15 7.041 ? 0.164 ns/op > SecondarySupersLookup.testNegative16 avgt 15 9.336 ? 0.185 ns/op > SecondarySupersLookup.testNegative20 avgt 15 11.373 ? 0.029 ns/op > SecondarySupersLookup.testNegative30 avgt 15 15.236 ? 0.051 ns/op > SecondarySupersLookup.testNegative32 avgt 15 16.031 ? 0.091 ns/op > SecondarySupersLookup.testNegative40 avgt 15 19.197 ? 0.279 ns/op > SecondarySupersLookup.testNegative50 avgt 15 23.804 ? 2.387 ns/op > SecondarySupersLookup.testNegative55 avgt 15 25.610 ? 1.155 ns/op > SecondarySupersLookup.testNegative56 avgt 15 26.128 ? 2.203 ns/op > SecondarySupersLookup.testNegative57 avgt 15 26.126 ? 0.881 ns/op > SecondarySupersLookup.testNegative58 avgt 15 26.314 ? 0.521 ns/op > SecondarySupersLookup.testNegative59 avgt 15 26.750 ? 0.837 ns/op > SecondarySupersLookup.testNegative60 avgt 15 27.118 ? 0.557 ns/op > SecondarySupersLookup.testNegative61 avgt 15 27.763 ? 1.628 ns... Amit Kumar has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains ten commits: - Merge branch 'master' into ssc_v0 - comments from martin - add2reg -> z_la - comments from Lutz - Update src/hotspot/cpu/s390/macroAssembler_s390.cpp - rename: r_scratch to r_result in repne_scan method - Update src/hotspot/cpu/s390/macroAssembler_s390.cpp - Removes unused Labels & makes comment more sensible - [s390x] secondary super cache port ------------- Changes: https://git.openjdk.org/jdk/pull/19544/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19544&range=07 Stats: 431 lines in 5 files changed: 430 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/19544.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19544/head:pull/19544 PR: https://git.openjdk.org/jdk/pull/19544 From rehn at openjdk.org Fri Jun 28 07:04:25 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Fri, 28 Jun 2024 07:04:25 GMT Subject: RFR: 8332689: RISC-V: Use load instead of trampolines [v19] In-Reply-To: <-SCwSVP6zTNao7X4VlIPRor8OU9vDg79oGVDWTf4XCM=.b72cbf6f-66e7-4fb3-b387-e00ae0537ac6@github.com> References: <-SCwSVP6zTNao7X4VlIPRor8OU9vDg79oGVDWTf4XCM=.b72cbf6f-66e7-4fb3-b387-e00ae0537ac6@github.com> Message-ID: On Thu, 27 Jun 2024 20:02:18 GMT, Dean Long wrote: >> Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 28 commits: >> >> - Rename lc >> - Merge branch 'master' into 8332689 >> - Merge branch 'master' into 8332689 >> - Comments >> - Missed in merge-fixes, minor revert >> - Merge branch 'master' into 8332689 >> - Minor review comments >> - Merge branch 'master' into 8332689 >> - To be pushed >> - Merge branch 'master' into 8332689 >> - ... and 18 more: https://git.openjdk.org/jdk/compare/46b817b7...442680b4 > > src/hotspot/cpu/riscv/macroAssembler_riscv.hpp line 1275: > >> 1273: // >> 1274: // Return: the call PC or null if CodeCache is full. >> 1275: address patchable_far_call(Address entry) { > > For runtime_call_type, I don't think we ever update the target/destination, so the name "patchable" seems not quite right for them. Also, for runtime_call_type, since they never change, we can decide early if a near call is possible when the destination is always reachable (based on the bounds of code cache), which is what aarch64 does for trampoline_call. Yes. My thinking was, the site is still patachable, even if some sites don't need that capability. The reason why this patch ignores near calls is because the short reach of JAL +-1MB (so normally only a few stubs can be reach from a few nmethods). But it is on the enhancement list. I don't mind changing the name, feel free to suggest something! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19453#discussion_r1658251519 From sgehwolf at openjdk.org Fri Jun 28 08:43:31 2024 From: sgehwolf at openjdk.org (Severin Gehwolf) Date: Fri, 28 Jun 2024 08:43:31 GMT Subject: RFR: 8261242: [Linux] OSContainer::is_containerized() returns true when run outside a container [v7] In-Reply-To: References: Message-ID: On Tue, 25 Jun 2024 13:54:46 GMT, Severin Gehwolf wrote: >> Please review this enhancement to the container detection code which allows it to figure out whether the JVM is actually running inside a container (`podman`, `docker`, `crio`), or with some other means that enforces memory/cpu limits by means of the cgroup filesystem. If neither of those conditions hold, the JVM runs in not containerized mode, addressing the issue described in the JBS tracker. For example, on my Linux system `is_containerized() == false" is being indicated with the following trace log line: >> >> >> [0.001s][debug][os,container] OSContainer::init: is_containerized() = false because no cpu or memory limit is present >> >> >> This state is being exposed by the Java `Metrics` API class using the new (still JDK internal) `isContainerized()` method. Example: >> >> >> java -XshowSettings:system --version >> Operating System Metrics: >> Provider: cgroupv1 >> System not containerized. >> openjdk 23-internal 2024-09-17 >> OpenJDK Runtime Environment (fastdebug build 23-internal-adhoc.sgehwolf.jdk-jdk) >> OpenJDK 64-Bit Server VM (fastdebug build 23-internal-adhoc.sgehwolf.jdk-jdk, mixed mode, sharing) >> >> >> The basic property this is being built on is the observation that the cgroup controllers typically get mounted read only into containers. Note that the current container tests assert that `OSContainer::is_containerized() == true` in various tests. Therefore, using the heuristic of "is any memory or cpu limit present" isn't sufficient. I had considered that in an earlier iteration, but many container tests failed. >> >> Overall, I think, with this patch we improve the current situation of claiming a containerized system being present when it's actually just a regular Linux system. >> >> Testing: >> >> - [x] GHA (risc-v failure seems infra related) >> - [x] Container tests on Linux x86_64 of cgroups v1 and cgroups v2 (including gtests) >> - [x] Some manual testing using cri-o >> >> Thoughts? > > Severin Gehwolf has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 17 commits: > > - Refactor mount info matching to helper function > - Merge branch 'master' into jdk-8261242-is-containerized-fix > - Remove problem listing of PlainRead which is reworked here > - Merge branch 'master' into jdk-8261242-is-containerized-fix > - Merge branch 'master' into jdk-8261242-is-containerized-fix > - Add doc for mountinfo scanning. > - Unify naming of variables > - Merge branch 'master' into jdk-8261242-is-containerized-fix > - Merge branch 'master' into jdk-8261242-is-containerized-fix > - jcheck fixes > - ... and 7 more: https://git.openjdk.org/jdk/compare/baafa662...532ea33b Thanks for the review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/18201#issuecomment-2196421487 From aph at openjdk.org Fri Jun 28 09:05:27 2024 From: aph at openjdk.org (Andrew Haley) Date: Fri, 28 Jun 2024 09:05:27 GMT Subject: RFR: 8331126: [s390x] secondary_super_cache does not scale well [v8] In-Reply-To: <2RpIwN7sJviNmnwF-aOVEpd6VufubBjKfNwr-6Uc2dE=.d282f59f-e712-4ff6-a9ce-e920464d0c0b@github.com> References: <2RpIwN7sJviNmnwF-aOVEpd6VufubBjKfNwr-6Uc2dE=.d282f59f-e712-4ff6-a9ce-e920464d0c0b@github.com> Message-ID: <0ahlnaBKJFtHXM-a55hkMJVa6QpedeDLvjKG9B6iGis=.43b84ccc-4254-417c-ad8a-f724723fe70f@github.com> On Fri, 28 Jun 2024 06:49:46 GMT, Amit Kumar wrote: >> s390x Port for [JDK-8180450](https://bugs.openjdk.org/browse/JDK-8180450) >> >> I ran `tier1` test with `-XX:+UseSecondarySupersTable -XX:+VerifySecondarySupers -XX:+StressSecondarySupers` in fastdebug vm and didn't see any new failure appearing; Except one I have mentioned [here](https://github.com/openjdk/jdk/pull/19368#issuecomment-2154983693); But this is reproducible on every other architecture with these flags. >> >> >> Without Patch: >> >> SecondarySuperCacheHits.test avgt 15 0.929 ? 0.010 ns/op >> >> SecondarySuperCacheInterContention.test avgt 15 1.413 ? 0.007 ns/op >> SecondarySuperCacheInterContention.test:t1 avgt 15 1.415 ? 0.016 ns/op >> SecondarySuperCacheInterContention.test:t2 avgt 15 1.410 ? 0.017 ns/op >> >> Benchmark Mode Cnt Score Error Units >> SecondarySupersLookup.testNegative00 avgt 15 1.806 ? 0.325 ns/op >> SecondarySupersLookup.testNegative01 avgt 15 2.364 ? 0.236 ns/op >> SecondarySupersLookup.testNegative02 avgt 15 2.903 ? 0.215 ns/op >> SecondarySupersLookup.testNegative03 avgt 15 3.417 ? 0.199 ns/op >> SecondarySupersLookup.testNegative04 avgt 15 3.758 ? 0.102 ns/op >> SecondarySupersLookup.testNegative05 avgt 15 4.352 ? 0.123 ns/op >> SecondarySupersLookup.testNegative06 avgt 15 4.800 ? 0.099 ns/op >> SecondarySupersLookup.testNegative07 avgt 15 5.365 ? 0.060 ns/op >> SecondarySupersLookup.testNegative08 avgt 15 6.316 ? 0.092 ns/op >> SecondarySupersLookup.testNegative09 avgt 15 6.669 ? 0.164 ns/op >> SecondarySupersLookup.testNegative10 avgt 15 7.041 ? 0.164 ns/op >> SecondarySupersLookup.testNegative16 avgt 15 9.336 ? 0.185 ns/op >> SecondarySupersLookup.testNegative20 avgt 15 11.373 ? 0.029 ns/op >> SecondarySupersLookup.testNegative30 avgt 15 15.236 ? 0.051 ns/op >> SecondarySupersLookup.testNegative32 avgt 15 16.031 ? 0.091 ns/op >> SecondarySupersLookup.testNegative40 avgt 15 19.197 ? 0.279 ns/op >> SecondarySupersLookup.testNegative50 avgt 15 23.804 ? 2.387 ns/op >> SecondarySupersLookup.testNegative55 avgt 15 25.610 ? 1.155 ns/op >> SecondarySupersLookup.testNegative56 avgt 15 26.128 ? 2.203 ns/op >> SecondarySupersLookup.testNegative57 avgt 15 26.126 ? 0.881 ns/op >> SecondarySupersLookup.testNegative58 avgt 15 26.314 ? 0.521 ns/op >> SecondarySupersLookup.testNegative59 avgt 15 26.750 ? 0.837 ns/op >> SecondarySupersLookup.testNegative60 avgt 15 27.118 ? 0.557 ... > > Amit Kumar has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains ten commits: > > - Merge branch 'master' into ssc_v0 > - comments from martin > - add2reg -> z_la > - comments from Lutz > - Update src/hotspot/cpu/s390/macroAssembler_s390.cpp > - rename: r_scratch to r_result in repne_scan method > - Update src/hotspot/cpu/s390/macroAssembler_s390.cpp > - Removes unused Labels & makes comment more sensible > - [s390x] secondary super cache port src/hotspot/cpu/s390/macroAssembler_s390.cpp line 3329: > 3327: // NOTE: please load 0 only in r_result, for now lookup_secondary_supers_table sets r_result to 0 > 3328: // clear_reg(r_result, true /* whole_reg */, false /* set_cc */); // let's hope that search will be a success > 3329: z_cghi(r_result, 0); I don't understand this comment. What does "for now" mean? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19544#discussion_r1658389527 From amitkumar at openjdk.org Fri Jun 28 09:08:28 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Fri, 28 Jun 2024 09:08:28 GMT Subject: RFR: 8331126: [s390x] secondary_super_cache does not scale well [v8] In-Reply-To: <0ahlnaBKJFtHXM-a55hkMJVa6QpedeDLvjKG9B6iGis=.43b84ccc-4254-417c-ad8a-f724723fe70f@github.com> References: <2RpIwN7sJviNmnwF-aOVEpd6VufubBjKfNwr-6Uc2dE=.d282f59f-e712-4ff6-a9ce-e920464d0c0b@github.com> <0ahlnaBKJFtHXM-a55hkMJVa6QpedeDLvjKG9B6iGis=.43b84ccc-4254-417c-ad8a-f724723fe70f@github.com> Message-ID: On Fri, 28 Jun 2024 09:02:29 GMT, Andrew Haley wrote: >> Amit Kumar has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains ten commits: >> >> - Merge branch 'master' into ssc_v0 >> - comments from martin >> - add2reg -> z_la >> - comments from Lutz >> - Update src/hotspot/cpu/s390/macroAssembler_s390.cpp >> - rename: r_scratch to r_result in repne_scan method >> - Update src/hotspot/cpu/s390/macroAssembler_s390.cpp >> - Removes unused Labels & makes comment more sensible >> - [s390x] secondary super cache port > > src/hotspot/cpu/s390/macroAssembler_s390.cpp line 3329: > >> 3327: // NOTE: please load 0 only in r_result, for now lookup_secondary_supers_table sets r_result to 0 >> 3328: // clear_reg(r_result, true /* whole_reg */, false /* set_cc */); // let's hope that search will be a success >> 3329: z_cghi(r_result, 0); > > I don't understand this comment. What does "for now" mean? At present, lookup_secondary_supers_table updates r_result just in case it changes in future. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19544#discussion_r1658393165 From amitkumar at openjdk.org Fri Jun 28 09:08:28 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Fri, 28 Jun 2024 09:08:28 GMT Subject: RFR: 8331126: [s390x] secondary_super_cache does not scale well [v8] In-Reply-To: References: <2RpIwN7sJviNmnwF-aOVEpd6VufubBjKfNwr-6Uc2dE=.d282f59f-e712-4ff6-a9ce-e920464d0c0b@github.com> <0ahlnaBKJFtHXM-a55hkMJVa6QpedeDLvjKG9B6iGis=.43b84ccc-4254-417c-ad8a-f724723fe70f@github.com> Message-ID: On Fri, 28 Jun 2024 09:05:43 GMT, Amit Kumar wrote: >> src/hotspot/cpu/s390/macroAssembler_s390.cpp line 3329: >> >>> 3327: // NOTE: please load 0 only in r_result, for now lookup_secondary_supers_table sets r_result to 0 >>> 3328: // clear_reg(r_result, true /* whole_reg */, false /* set_cc */); // let's hope that search will be a success >>> 3329: z_cghi(r_result, 0); >> >> I don't understand this comment. What does "for now" mean? > > At present, lookup_secondary_supers_table updates r_result just in case it changes in future. Should I change it ? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19544#discussion_r1658393505 From amitkumar at openjdk.org Fri Jun 28 09:12:20 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Fri, 28 Jun 2024 09:12:20 GMT Subject: RFR: 8331126: [s390x] secondary_super_cache does not scale well [v6] In-Reply-To: References: Message-ID: On Thu, 27 Jun 2024 02:52:18 GMT, Amit Kumar wrote: >> src/hotspot/cpu/s390/macroAssembler_s390.cpp line 3452: >> >>> 3450: z_lgr(Z_ARG4, r_result); >>> 3451: const char* msg = "mismatch"; >>> 3452: load_const_optimized(Z_ARG5, (address)msg); >> >> Did you test this? It breaks when you have a register collision (see my assert_different_registers on PPC64). You can test it by removing the `z_bre` above and checking if the arguments are correct. > > I found one log, from when I was implementing this: > > fatal error: mismatch: java.lang.Integer implements java.util.concurrent.Callable: is_subtype_of: 0; linear_search: 0; table_lookup: 1 > > Context: repne_scan wasn't working properly, table_lookup came up to "1" and repne_scan returned "0" so check failed. > > For surety let me just add the asserts as well and see what happens. >You can test it by removing the z_bre above and checking if the arguments are correct. I have tested with your suggestion, in tier1 there were more then 500 failures, all of them were failing with mismatch error only. I didn't see any register clobbering. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19544#discussion_r1658397411 From aph at openjdk.org Fri Jun 28 09:39:20 2024 From: aph at openjdk.org (Andrew Haley) Date: Fri, 28 Jun 2024 09:39:20 GMT Subject: RFR: 8331126: [s390x] secondary_super_cache does not scale well [v8] In-Reply-To: References: <2RpIwN7sJviNmnwF-aOVEpd6VufubBjKfNwr-6Uc2dE=.d282f59f-e712-4ff6-a9ce-e920464d0c0b@github.com> <0ahlnaBKJFtHXM-a55hkMJVa6QpedeDLvjKG9B6iGis=.43b84ccc-4254-417c-ad8a-f724723fe70f@github.com> Message-ID: On Fri, 28 Jun 2024 09:05:58 GMT, Amit Kumar wrote: >> At present, lookup_secondary_supers_table updates r_result just in case it changes in future. > > Should I change it ? What does "in future" mean? In the next couple of nanoseconds after this instruction executes? When someone changes this code? You need to say, precisely, what this routine provides. That is to say, what is its postcondition. If some logic is not necessary for that postcondition, don't do it. This routine sets result, or it sets flags, or both. Say which of these it is, the make sure that all code paths do so. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19544#discussion_r1658429493 From mdoerr at openjdk.org Fri Jun 28 10:22:23 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Fri, 28 Jun 2024 10:22:23 GMT Subject: RFR: JDK-8331732 : [PPC64] Unify and optimize code which converts != 0 to 1 [v6] In-Reply-To: References: Message-ID: On Thu, 27 Jun 2024 17:34:11 GMT, Suchismith Roy wrote: >> [JDK-8331732](https://bugs.openjdk.org/browse/JDK-8331732) >> The template interpreter contains branch-free conversion code for T_BOOLEAN (TemplateInterpreterGenerator::generate_result_handler_for). >> >> SharedRuntime::generate_native_wrapper uses unoptimized code to "Unpack the native result" for T_BOOLEAN. >> Power10 has the "setbc" / "setbcr" instruction. >> >> A new function has been created for the conversion and use "setbcr" on Power10 (determined by VM_Version::has_brw()) and otherwise the branch-free implementation. We should have a function for 32 and one for 64 bit operations (or one with supports both). >> >> The new code for MacroAssembler::verify_secondary_supers_table also uses the new function. > > Suchismith Roy has updated the pull request incrementally with two additional commits since the last revision: > > - remove assembler header > - remove assembler header src/hotspot/cpu/ppc/assembler_ppc.inline.hpp line 426: > 424: inline void Assembler::setbcr(Register d, ConditionRegister cr, Condition cc) { > 425: setbcr(d, bi0(cr, cc)); > 426: } Please adapt indentation. It's a bit off. (See previous code.) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19886#discussion_r1658476978 From amitkumar at openjdk.org Fri Jun 28 10:46:49 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Fri, 28 Jun 2024 10:46:49 GMT Subject: RFR: 8331126: [s390x] secondary_super_cache does not scale well [v9] In-Reply-To: References: Message-ID: <6UqicMdY1Y-KI8LlGQOFAGbGNXyk696genOCy5avzxg=.a45176e6-c1d9-4f9d-83b4-c1f9043787c0@github.com> > s390x Port for [JDK-8180450](https://bugs.openjdk.org/browse/JDK-8180450) > > I ran `tier1` test with `-XX:+UseSecondarySupersTable -XX:+VerifySecondarySupers -XX:+StressSecondarySupers` in fastdebug vm and didn't see any new failure appearing; Except one I have mentioned [here](https://github.com/openjdk/jdk/pull/19368#issuecomment-2154983693); But this is reproducible on every other architecture with these flags. > > > Without Patch: > > SecondarySuperCacheHits.test avgt 15 0.929 ? 0.010 ns/op > > SecondarySuperCacheInterContention.test avgt 15 1.413 ? 0.007 ns/op > SecondarySuperCacheInterContention.test:t1 avgt 15 1.415 ? 0.016 ns/op > SecondarySuperCacheInterContention.test:t2 avgt 15 1.410 ? 0.017 ns/op > > Benchmark Mode Cnt Score Error Units > SecondarySupersLookup.testNegative00 avgt 15 1.806 ? 0.325 ns/op > SecondarySupersLookup.testNegative01 avgt 15 2.364 ? 0.236 ns/op > SecondarySupersLookup.testNegative02 avgt 15 2.903 ? 0.215 ns/op > SecondarySupersLookup.testNegative03 avgt 15 3.417 ? 0.199 ns/op > SecondarySupersLookup.testNegative04 avgt 15 3.758 ? 0.102 ns/op > SecondarySupersLookup.testNegative05 avgt 15 4.352 ? 0.123 ns/op > SecondarySupersLookup.testNegative06 avgt 15 4.800 ? 0.099 ns/op > SecondarySupersLookup.testNegative07 avgt 15 5.365 ? 0.060 ns/op > SecondarySupersLookup.testNegative08 avgt 15 6.316 ? 0.092 ns/op > SecondarySupersLookup.testNegative09 avgt 15 6.669 ? 0.164 ns/op > SecondarySupersLookup.testNegative10 avgt 15 7.041 ? 0.164 ns/op > SecondarySupersLookup.testNegative16 avgt 15 9.336 ? 0.185 ns/op > SecondarySupersLookup.testNegative20 avgt 15 11.373 ? 0.029 ns/op > SecondarySupersLookup.testNegative30 avgt 15 15.236 ? 0.051 ns/op > SecondarySupersLookup.testNegative32 avgt 15 16.031 ? 0.091 ns/op > SecondarySupersLookup.testNegative40 avgt 15 19.197 ? 0.279 ns/op > SecondarySupersLookup.testNegative50 avgt 15 23.804 ? 2.387 ns/op > SecondarySupersLookup.testNegative55 avgt 15 25.610 ? 1.155 ns/op > SecondarySupersLookup.testNegative56 avgt 15 26.128 ? 2.203 ns/op > SecondarySupersLookup.testNegative57 avgt 15 26.126 ? 0.881 ns/op > SecondarySupersLookup.testNegative58 avgt 15 26.314 ? 0.521 ns/op > SecondarySupersLookup.testNegative59 avgt 15 26.750 ? 0.837 ns/op > SecondarySupersLookup.testNegative60 avgt 15 27.118 ? 0.557 ns/op > SecondarySupersLookup.testNegative61 avgt 15 27.763 ? 1.628 ns... Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: updates the comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19544/files - new: https://git.openjdk.org/jdk/pull/19544/files/1977b505..dd89106a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19544&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19544&range=07-08 Stats: 9 lines in 1 file changed: 2 ins; 0 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/19544.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19544/head:pull/19544 PR: https://git.openjdk.org/jdk/pull/19544 From sroy at openjdk.org Fri Jun 28 10:47:32 2024 From: sroy at openjdk.org (Suchismith Roy) Date: Fri, 28 Jun 2024 10:47:32 GMT Subject: RFR: JDK-8331732 : [PPC64] Unify and optimize code which converts != 0 to 1 [v7] In-Reply-To: References: Message-ID: > [JDK-8331732](https://bugs.openjdk.org/browse/JDK-8331732) > The template interpreter contains branch-free conversion code for T_BOOLEAN (TemplateInterpreterGenerator::generate_result_handler_for). > > SharedRuntime::generate_native_wrapper uses unoptimized code to "Unpack the native result" for T_BOOLEAN. > Power10 has the "setbc" / "setbcr" instruction. > > A new function has been created for the conversion and use "setbcr" on Power10 (determined by VM_Version::has_brw()) and otherwise the branch-free implementation. We should have a function for 32 and one for 64 bit operations (or one with supports both). > > The new code for MacroAssembler::verify_secondary_supers_table also uses the new function. Suchismith Roy has updated the pull request incrementally with one additional commit since the last revision: register name and indendt ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19886/files - new: https://git.openjdk.org/jdk/pull/19886/files/3bff218b..8ecaf167 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19886&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19886&range=05-06 Stats: 3 lines in 2 files changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/19886.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19886/head:pull/19886 PR: https://git.openjdk.org/jdk/pull/19886 From mbaesken at openjdk.org Fri Jun 28 11:22:33 2024 From: mbaesken at openjdk.org (Matthias Baesken) Date: Fri, 28 Jun 2024 11:22:33 GMT Subject: RFR: 8335283: Build failure due to 'no_sanitize' attribute directive ignored Message-ID: The following build error has been reported with old gcc used installers/linux/universal/tar/corretto-build/buildRoot/src/hotspot/share/utilities/vmError.cpp:2068:44: error: 'no_sanitize' attribute directive ignored [-Werror=attributes] static void ALWAYSINLINE crash_with_sigfpe() { We can avoid it by not settings the mentioned attribute in case ubsan is not enabled. ------------- Commit messages: - JDK-8335283 Changes: https://git.openjdk.org/jdk/pull/19937/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19937&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8335283 Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/19937.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19937/head:pull/19937 PR: https://git.openjdk.org/jdk/pull/19937 From shade at openjdk.org Fri Jun 28 11:22:33 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 28 Jun 2024 11:22:33 GMT Subject: RFR: 8335283: Build failure due to 'no_sanitize' attribute directive ignored In-Reply-To: References: Message-ID: On Fri, 28 Jun 2024 11:04:16 GMT, Matthias Baesken wrote: > The following build error has been reported with old gcc used > installers/linux/universal/tar/corretto-build/buildRoot/src/hotspot/share/utilities/vmError.cpp:2068:44: error: 'no_sanitize' attribute directive ignored [-Werror=attributes] > static void ALWAYSINLINE crash_with_sigfpe() { > > We can avoid it by not settings the mentioned attribute in case ubsan is not enabled. Looks fine, thanks. ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19937#pullrequestreview-2147674010 From tschatzl at openjdk.org Fri Jun 28 11:55:19 2024 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Fri, 28 Jun 2024 11:55:19 GMT Subject: RFR: 8335294: Fix simple -Wzero-as-null-pointer-constant warnings in gc code In-Reply-To: References: Message-ID: On Fri, 28 Jun 2024 03:44:53 GMT, Kim Barrett wrote: > Please review this change that replaces some uses of literal 0 as a null > pointer constant in gc code to instead use nullptr. > > There is also one place where the use was eliminated entirely, because it was > dead code following a [[noreturn]] call. > > Testing: mach5 tier1. Marked as reviewed by tschatzl (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/19934#pullrequestreview-2147806879 From tschatzl at openjdk.org Fri Jun 28 11:57:21 2024 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Fri, 28 Jun 2024 11:57:21 GMT Subject: RFR: 8335283: Build failure due to 'no_sanitize' attribute directive ignored In-Reply-To: References: Message-ID: On Fri, 28 Jun 2024 11:04:16 GMT, Matthias Baesken wrote: > The following build error has been reported with old gcc used > installers/linux/universal/tar/corretto-build/buildRoot/src/hotspot/share/utilities/vmError.cpp:2068:44: error: 'no_sanitize' attribute directive ignored [-Werror=attributes] > static void ALWAYSINLINE crash_with_sigfpe() { > > We can avoid it by not settings the mentioned attribute in case ubsan is not enabled. Marked as reviewed by tschatzl (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/19937#pullrequestreview-2147814299 From coleenp at openjdk.org Fri Jun 28 12:09:19 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Fri, 28 Jun 2024 12:09:19 GMT Subject: RFR: 8335294: Fix simple -Wzero-as-null-pointer-constant warnings in gc code In-Reply-To: References: Message-ID: On Fri, 28 Jun 2024 03:44:53 GMT, Kim Barrett wrote: > Please review this change that replaces some uses of literal 0 as a null > pointer constant in gc code to instead use nullptr. > > There is also one place where the use was eliminated entirely, because it was > dead code following a [[noreturn]] call. > > Testing: mach5 tier1. Looks good. Does -Wzero-as-null-constant prevent any new NULL from creeping in also? ------------- Marked as reviewed by coleenp (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19934#pullrequestreview-2147845270 From sroy at openjdk.org Fri Jun 28 12:17:49 2024 From: sroy at openjdk.org (Suchismith Roy) Date: Fri, 28 Jun 2024 12:17:49 GMT Subject: RFR: JDK-8331732 : [PPC64] Unify and optimize code which converts != 0 to 1 [v8] In-Reply-To: References: Message-ID: > [JDK-8331732](https://bugs.openjdk.org/browse/JDK-8331732) > The template interpreter contains branch-free conversion code for T_BOOLEAN (TemplateInterpreterGenerator::generate_result_handler_for). > > SharedRuntime::generate_native_wrapper uses unoptimized code to "Unpack the native result" for T_BOOLEAN. > Power10 has the "setbc" / "setbcr" instruction. > > A new function has been created for the conversion and use "setbcr" on Power10 (determined by VM_Version::has_brw()) and otherwise the branch-free implementation. We should have a function for 32 and one for 64 bit operations (or one with supports both). > > The new code for MacroAssembler::verify_secondary_supers_table also uses the new function. Suchismith Roy has updated the pull request incrementally with one additional commit since the last revision: indent ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19886/files - new: https://git.openjdk.org/jdk/pull/19886/files/8ecaf167..938f9641 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19886&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19886&range=06-07 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/19886.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19886/head:pull/19886 PR: https://git.openjdk.org/jdk/pull/19886 From mdoerr at openjdk.org Fri Jun 28 12:17:49 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Fri, 28 Jun 2024 12:17:49 GMT Subject: RFR: JDK-8331732 : [PPC64] Unify and optimize code which converts != 0 to 1 [v6] In-Reply-To: References: Message-ID: On Fri, 28 Jun 2024 10:20:08 GMT, Martin Doerr wrote: >> Suchismith Roy has updated the pull request incrementally with two additional commits since the last revision: >> >> - remove assembler header >> - remove assembler header > > src/hotspot/cpu/ppc/assembler_ppc.inline.hpp line 426: > >> 424: inline void Assembler::setbcr(Register d, ConditionRegister cr, Condition cc) { >> 425: setbcr(d, bi0(cr, cc)); >> 426: } > > Please adapt indentation. It's a bit off. (See previous code.) Still one space before `}`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19886#discussion_r1658652902 From mdoerr at openjdk.org Fri Jun 28 12:17:50 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Fri, 28 Jun 2024 12:17:50 GMT Subject: RFR: JDK-8331732 : [PPC64] Unify and optimize code which converts != 0 to 1 [v7] In-Reply-To: References: Message-ID: <85Zhok1CQGA8wI048yqULDlsMVY9PpKbmg4T3hcaVyU=.251cdc31-d75f-48c1-92ae-b22a3b86e789@github.com> On Fri, 28 Jun 2024 10:47:32 GMT, Suchismith Roy wrote: >> [JDK-8331732](https://bugs.openjdk.org/browse/JDK-8331732) >> The template interpreter contains branch-free conversion code for T_BOOLEAN (TemplateInterpreterGenerator::generate_result_handler_for). >> >> SharedRuntime::generate_native_wrapper uses unoptimized code to "Unpack the native result" for T_BOOLEAN. >> Power10 has the "setbc" / "setbcr" instruction. >> >> A new function has been created for the conversion and use "setbcr" on Power10 (determined by VM_Version::has_brw()) and otherwise the branch-free implementation. We should have a function for 32 and one for 64 bit operations (or one with supports both). >> >> The new code for MacroAssembler::verify_secondary_supers_table also uses the new function. > > Suchismith Roy has updated the pull request incrementally with one additional commit since the last revision: > > register name and indendt src/hotspot/cpu/ppc/macroAssembler_ppc.cpp line 365: > 363: } > 364: } > 365: } Please find a better place to insert this function! It doesn't fit between the 2 functions which are related. src/hotspot/cpu/ppc/macroAssembler_ppc.hpp line 298: > 296: > 297: // Branch-free implementation to convert !=0 to 1. > 298: void normalize_bool(Register dst, Register temp, bool use_64bit); Same here. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19886#discussion_r1658654116 PR Review Comment: https://git.openjdk.org/jdk/pull/19886#discussion_r1658654282 From coleenp at openjdk.org Fri Jun 28 12:20:50 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Fri, 28 Jun 2024 12:20:50 GMT Subject: [jdk23] RFR: 8333542: Breakpoint in parallel code does not work Message-ID: Clean backport of JDK-8333542. After this, we need a backport for JDK-8335134 to fix the test. ------------- Commit messages: - Backport b3bf31a0a08da679ec2fd21613243fb17b1135a9 Changes: https://git.openjdk.org/jdk/pull/19938/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19938&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8333542 Stats: 516 lines in 16 files changed: 339 ins; 129 del; 48 mod Patch: https://git.openjdk.org/jdk/pull/19938.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19938/head:pull/19938 PR: https://git.openjdk.org/jdk/pull/19938 From sroy at openjdk.org Fri Jun 28 12:23:33 2024 From: sroy at openjdk.org (Suchismith Roy) Date: Fri, 28 Jun 2024 12:23:33 GMT Subject: RFR: JDK-8331732 : [PPC64] Unify and optimize code which converts != 0 to 1 [v9] In-Reply-To: References: Message-ID: > [JDK-8331732](https://bugs.openjdk.org/browse/JDK-8331732) > The template interpreter contains branch-free conversion code for T_BOOLEAN (TemplateInterpreterGenerator::generate_result_handler_for). > > SharedRuntime::generate_native_wrapper uses unoptimized code to "Unpack the native result" for T_BOOLEAN. > Power10 has the "setbc" / "setbcr" instruction. > > A new function has been created for the conversion and use "setbcr" on Power10 (determined by VM_Version::has_brw()) and otherwise the branch-free implementation. We should have a function for 32 and one for 64 bit operations (or one with supports both). > > The new code for MacroAssembler::verify_secondary_supers_table also uses the new function. Suchismith Roy has updated the pull request incrementally with one additional commit since the last revision: arranging functions ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19886/files - new: https://git.openjdk.org/jdk/pull/19886/files/938f9641..7315d384 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19886&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19886&range=07-08 Stats: 47 lines in 2 files changed: 24 ins; 23 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/19886.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19886/head:pull/19886 PR: https://git.openjdk.org/jdk/pull/19886 From sroy at openjdk.org Fri Jun 28 12:23:33 2024 From: sroy at openjdk.org (Suchismith Roy) Date: Fri, 28 Jun 2024 12:23:33 GMT Subject: RFR: JDK-8331732 : [PPC64] Unify and optimize code which converts != 0 to 1 [v7] In-Reply-To: <85Zhok1CQGA8wI048yqULDlsMVY9PpKbmg4T3hcaVyU=.251cdc31-d75f-48c1-92ae-b22a3b86e789@github.com> References: <85Zhok1CQGA8wI048yqULDlsMVY9PpKbmg4T3hcaVyU=.251cdc31-d75f-48c1-92ae-b22a3b86e789@github.com> Message-ID: On Fri, 28 Jun 2024 12:13:27 GMT, Martin Doerr wrote: >> Suchismith Roy has updated the pull request incrementally with one additional commit since the last revision: >> >> register name and indendt > > src/hotspot/cpu/ppc/macroAssembler_ppc.cpp line 365: > >> 363: } >> 364: } >> 365: } > > Please find a better place to insert this function! It doesn't fit between the 2 functions which are related. I am not sure how the functions have been grouped. I have grouped now based on the comment " Optimized instruction emittor" , which i think is relevant to this instruction. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19886#discussion_r1658660406 From sroy at openjdk.org Fri Jun 28 12:23:33 2024 From: sroy at openjdk.org (Suchismith Roy) Date: Fri, 28 Jun 2024 12:23:33 GMT Subject: RFR: JDK-8331732 : [PPC64] Unify and optimize code which converts != 0 to 1 [v9] In-Reply-To: References: Message-ID: On Fri, 28 Jun 2024 12:21:10 GMT, Suchismith Roy wrote: >> [JDK-8331732](https://bugs.openjdk.org/browse/JDK-8331732) >> The template interpreter contains branch-free conversion code for T_BOOLEAN (TemplateInterpreterGenerator::generate_result_handler_for). >> >> SharedRuntime::generate_native_wrapper uses unoptimized code to "Unpack the native result" for T_BOOLEAN. >> Power10 has the "setbc" / "setbcr" instruction. >> >> A new function has been created for the conversion and use "setbcr" on Power10 (determined by VM_Version::has_brw()) and otherwise the branch-free implementation. We should have a function for 32 and one for 64 bit operations (or one with supports both). >> >> The new code for MacroAssembler::verify_secondary_supers_table also uses the new function. > > Suchismith Roy has updated the pull request incrementally with one additional commit since the last revision: > > arranging functions src/hotspot/cpu/ppc/macroAssembler_ppc.hpp line 58: > 56: > 57: // Branch-free implementation to convert !=0 to 1. > 58: void normalize_bool(Register dst, Register temp, bool use_64bit); @TheRealMDoerr I have placed it here based on the comment // // Optimized instruction emitters // I think that is more relevant to this. Let me know if otherwise ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19886#discussion_r1658661795 From jwaters at openjdk.org Fri Jun 28 12:40:20 2024 From: jwaters at openjdk.org (Julian Waters) Date: Fri, 28 Jun 2024 12:40:20 GMT Subject: RFR: 8335294: Fix simple -Wzero-as-null-pointer-constant warnings in gc code In-Reply-To: References: Message-ID: On Fri, 28 Jun 2024 03:44:53 GMT, Kim Barrett wrote: > Please review this change that replaces some uses of literal 0 as a null > pointer constant in gc code to instead use nullptr. > > There is also one place where the use was eliminated entirely, because it was > dead code following a [[noreturn]] call. > > Testing: mach5 tier1. I would keep the return nullptr in that one place that it was removed (I'm somewhat certain that leaving out a return in a method that has a non void return type isn't allowed in C++, even if there was a call to a noreturn method directly before the return), but besides that this seems ok ------------- Marked as reviewed by jwaters (Committer). PR Review: https://git.openjdk.org/jdk/pull/19934#pullrequestreview-2147905101 From jwaters at openjdk.org Fri Jun 28 12:43:21 2024 From: jwaters at openjdk.org (Julian Waters) Date: Fri, 28 Jun 2024 12:43:21 GMT Subject: RFR: 8335294: Fix simple -Wzero-as-null-pointer-constant warnings in gc code In-Reply-To: References: Message-ID: <1ZaZycYoA4m6BbYBsPu85wX3QQ2uYPylEXLoKhaSb3Y=.73794fe1-da83-4c54-8a30-fdfbbf2746e2@github.com> On Fri, 28 Jun 2024 12:06:50 GMT, Coleen Phillimore wrote: > Looks good. Does -Wzero-as-null-constant prevent any new NULL from creeping in also? My guess is that it only warns for a raw 0 being used as a null pointer, but not NULL itself. It would seem odd to have an error in gcc against using a feature from the C Standard Library, after all ------------- PR Comment: https://git.openjdk.org/jdk/pull/19934#issuecomment-2196819387 From mdoerr at openjdk.org Fri Jun 28 12:50:19 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Fri, 28 Jun 2024 12:50:19 GMT Subject: RFR: JDK-8331732 : [PPC64] Unify and optimize code which converts != 0 to 1 [v9] In-Reply-To: References: Message-ID: On Fri, 28 Jun 2024 12:23:33 GMT, Suchismith Roy wrote: >> [JDK-8331732](https://bugs.openjdk.org/browse/JDK-8331732) >> The template interpreter contains branch-free conversion code for T_BOOLEAN (TemplateInterpreterGenerator::generate_result_handler_for). >> >> SharedRuntime::generate_native_wrapper uses unoptimized code to "Unpack the native result" for T_BOOLEAN. >> Power10 has the "setbc" / "setbcr" instruction. >> >> A new function has been created for the conversion and use "setbcr" on Power10 (determined by VM_Version::has_brw()) and otherwise the branch-free implementation. We should have a function for 32 and one for 64 bit operations (or one with supports both). >> >> The new code for MacroAssembler::verify_secondary_supers_table also uses the new function. > > Suchismith Roy has updated the pull request incrementally with one additional commit since the last revision: > > arranging functions src/hotspot/cpu/ppc/macroAssembler_ppc.cpp line 87: > 85: } > 86: } > 87: If you want to place it below `MacroAssembler::set_cmpu3`: It's in macroAssembler_ppc.inline.hpp. (Requires making it `inline` which should be ok.) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19886#discussion_r1658692532 From mdoerr at openjdk.org Fri Jun 28 12:50:20 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Fri, 28 Jun 2024 12:50:20 GMT Subject: RFR: JDK-8331732 : [PPC64] Unify and optimize code which converts != 0 to 1 [v9] In-Reply-To: References: Message-ID: <6BDIRhmS6MveBgpuphYqwjp8XXdgF5_Egu978jp0Ny8=.c5267ef4-6912-4902-9910-e487baa6c4ee@github.com> On Fri, 28 Jun 2024 12:21:10 GMT, Suchismith Roy wrote: >> Suchismith Roy has updated the pull request incrementally with one additional commit since the last revision: >> >> arranging functions > > src/hotspot/cpu/ppc/macroAssembler_ppc.hpp line 58: > >> 56: >> 57: // Branch-free implementation to convert !=0 to 1. >> 58: void normalize_bool(Register dst, Register temp, bool use_64bit); > > @TheRealMDoerr I have placed it here based on the comment // > // Optimized instruction emitters > // > > I think that is more relevant to this. Let me know if otherwise I'd place them below `set_cmpu3` which is a similar function. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19886#discussion_r1658689985 From mdoerr at openjdk.org Fri Jun 28 13:12:25 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Fri, 28 Jun 2024 13:12:25 GMT Subject: RFR: 8331126: [s390x] secondary_super_cache does not scale well [v8] In-Reply-To: References: <2RpIwN7sJviNmnwF-aOVEpd6VufubBjKfNwr-6Uc2dE=.d282f59f-e712-4ff6-a9ce-e920464d0c0b@github.com> <0ahlnaBKJFtHXM-a55hkMJVa6QpedeDLvjKG9B6iGis=.43b84ccc-4254-417c-ad8a-f724723fe70f@github.com> Message-ID: On Fri, 28 Jun 2024 09:36:41 GMT, Andrew Haley wrote: >> Should I change it ? > > What does "in future" mean? In the next couple of nanoseconds after this instruction executes? When someone changes this code? > You need to say, precisely, what this routine provides. That is to say, what is its postcondition. If some logic is not necessary for that postcondition, don't do it. > This routine sets result, or it sets flags, or both. Say which of these it is, the make sure that all code paths do so. I think the `z_cghi` is only used for the `asm_assert`. I'd guard both with `ifdef ASSERT`. Or combine it with the assert block below. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19544#discussion_r1658720301 From sroy at openjdk.org Fri Jun 28 14:30:30 2024 From: sroy at openjdk.org (Suchismith Roy) Date: Fri, 28 Jun 2024 14:30:30 GMT Subject: RFR: JDK-8331732 : [PPC64] Unify and optimize code which converts != 0 to 1 [v10] In-Reply-To: References: Message-ID: > [JDK-8331732](https://bugs.openjdk.org/browse/JDK-8331732) > The template interpreter contains branch-free conversion code for T_BOOLEAN (TemplateInterpreterGenerator::generate_result_handler_for). > > SharedRuntime::generate_native_wrapper uses unoptimized code to "Unpack the native result" for T_BOOLEAN. > Power10 has the "setbc" / "setbcr" instruction. > > A new function has been created for the conversion and use "setbcr" on Power10 (determined by VM_Version::has_brw()) and otherwise the branch-free implementation. We should have a function for 32 and one for 64 bit operations (or one with supports both). > > The new code for MacroAssembler::verify_secondary_supers_table also uses the new function. Suchismith Roy has updated the pull request incrementally with two additional commits since the last revision: - arranging functions - arranging functions ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19886/files - new: https://git.openjdk.org/jdk/pull/19886/files/7315d384..f0b1b087 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19886&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19886&range=08-09 Stats: 49 lines in 3 files changed: 24 ins; 24 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/19886.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19886/head:pull/19886 PR: https://git.openjdk.org/jdk/pull/19886 From mdoerr at openjdk.org Fri Jun 28 14:30:30 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Fri, 28 Jun 2024 14:30:30 GMT Subject: RFR: JDK-8331732 : [PPC64] Unify and optimize code which converts != 0 to 1 [v10] In-Reply-To: References: Message-ID: On Fri, 28 Jun 2024 14:25:58 GMT, Suchismith Roy wrote: >> [JDK-8331732](https://bugs.openjdk.org/browse/JDK-8331732) >> The template interpreter contains branch-free conversion code for T_BOOLEAN (TemplateInterpreterGenerator::generate_result_handler_for). >> >> SharedRuntime::generate_native_wrapper uses unoptimized code to "Unpack the native result" for T_BOOLEAN. >> Power10 has the "setbc" / "setbcr" instruction. >> >> A new function has been created for the conversion and use "setbcr" on Power10 (determined by VM_Version::has_brw()) and otherwise the branch-free implementation. We should have a function for 32 and one for 64 bit operations (or one with supports both). >> >> The new code for MacroAssembler::verify_secondary_supers_table also uses the new function. > > Suchismith Roy has updated the pull request incrementally with two additional commits since the last revision: > > - arranging functions > - arranging functions src/hotspot/cpu/ppc/macroAssembler_ppc.hpp line 183: > 181: // Branch-free implementation to convert !=0 to 1. > 182: void inline normalize_bool(Register dst, Register temp, bool use_64bit); > 183: inline void pd_patch_instruction(address branch, address target, const char* file, int line); Please add an empty line to separate it from `pd_patch_instruction` which is something completely different. And remove trailing whitespaces above. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19886#discussion_r1658842945 From sroy at openjdk.org Fri Jun 28 14:40:32 2024 From: sroy at openjdk.org (Suchismith Roy) Date: Fri, 28 Jun 2024 14:40:32 GMT Subject: RFR: JDK-8331732 : [PPC64] Unify and optimize code which converts != 0 to 1 [v11] In-Reply-To: References: Message-ID: > [JDK-8331732](https://bugs.openjdk.org/browse/JDK-8331732) > The template interpreter contains branch-free conversion code for T_BOOLEAN (TemplateInterpreterGenerator::generate_result_handler_for). > > SharedRuntime::generate_native_wrapper uses unoptimized code to "Unpack the native result" for T_BOOLEAN. > Power10 has the "setbc" / "setbcr" instruction. > > A new function has been created for the conversion and use "setbcr" on Power10 (determined by VM_Version::has_brw()) and otherwise the branch-free implementation. We should have a function for 32 and one for 64 bit operations (or one with supports both). > > The new code for MacroAssembler::verify_secondary_supers_table also uses the new function. Suchismith Roy has updated the pull request incrementally with one additional commit since the last revision: nits ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19886/files - new: https://git.openjdk.org/jdk/pull/19886/files/f0b1b087..b85c2726 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19886&range=10 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19886&range=09-10 Stats: 2 lines in 1 file changed: 1 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/19886.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19886/head:pull/19886 PR: https://git.openjdk.org/jdk/pull/19886 From sroy at openjdk.org Fri Jun 28 14:43:33 2024 From: sroy at openjdk.org (Suchismith Roy) Date: Fri, 28 Jun 2024 14:43:33 GMT Subject: RFR: JDK-8331732 : [PPC64] Unify and optimize code which converts != 0 to 1 [v12] In-Reply-To: References: Message-ID: > [JDK-8331732](https://bugs.openjdk.org/browse/JDK-8331732) > The template interpreter contains branch-free conversion code for T_BOOLEAN (TemplateInterpreterGenerator::generate_result_handler_for). > > SharedRuntime::generate_native_wrapper uses unoptimized code to "Unpack the native result" for T_BOOLEAN. > Power10 has the "setbc" / "setbcr" instruction. > > A new function has been created for the conversion and use "setbcr" on Power10 (determined by VM_Version::has_brw()) and otherwise the branch-free implementation. We should have a function for 32 and one for 64 bit operations (or one with supports both). > > The new code for MacroAssembler::verify_secondary_supers_table also uses the new function. Suchismith Roy has updated the pull request incrementally with one additional commit since the last revision: nits ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19886/files - new: https://git.openjdk.org/jdk/pull/19886/files/b85c2726..21e0c8ec Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19886&range=11 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19886&range=10-11 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/19886.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19886/head:pull/19886 PR: https://git.openjdk.org/jdk/pull/19886 From mdoerr at openjdk.org Fri Jun 28 14:58:20 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Fri, 28 Jun 2024 14:58:20 GMT Subject: RFR: JDK-8331732 : [PPC64] Unify and optimize code which converts != 0 to 1 [v12] In-Reply-To: References: Message-ID: On Fri, 28 Jun 2024 14:43:33 GMT, Suchismith Roy wrote: >> [JDK-8331732](https://bugs.openjdk.org/browse/JDK-8331732) >> The template interpreter contains branch-free conversion code for T_BOOLEAN (TemplateInterpreterGenerator::generate_result_handler_for). >> >> SharedRuntime::generate_native_wrapper uses unoptimized code to "Unpack the native result" for T_BOOLEAN. >> Power10 has the "setbc" / "setbcr" instruction. >> >> A new function has been created for the conversion and use "setbcr" on Power10 (determined by VM_Version::has_brw()) and otherwise the branch-free implementation. We should have a function for 32 and one for 64 bit operations (or one with supports both). >> >> The new code for MacroAssembler::verify_secondary_supers_table also uses the new function. > > Suchismith Roy has updated the pull request incrementally with one additional commit since the last revision: > > nits I think this looks good, now. Thanks! I'll run tests over the weekend and I can approve it when they have passed. @offamitkumar: Would be nice if you could provide a 2nd review since you have already taken a look. I haven't checked what kind of code s390 uses. Maybe something similar may make sense. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19886#issuecomment-2197123855 From amitkumar at openjdk.org Fri Jun 28 15:04:57 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Fri, 28 Jun 2024 15:04:57 GMT Subject: RFR: 8331126: [s390x] secondary_super_cache does not scale well [v10] In-Reply-To: References: Message-ID: > s390x Port for [JDK-8180450](https://bugs.openjdk.org/browse/JDK-8180450) > > I ran `tier1` test with `-XX:+UseSecondarySupersTable -XX:+VerifySecondarySupers -XX:+StressSecondarySupers` in fastdebug vm and didn't see any new failure appearing; Except one I have mentioned [here](https://github.com/openjdk/jdk/pull/19368#issuecomment-2154983693); But this is reproducible on every other architecture with these flags. > > > Without Patch: > > SecondarySuperCacheHits.test avgt 15 0.929 ? 0.010 ns/op > > SecondarySuperCacheInterContention.test avgt 15 1.413 ? 0.007 ns/op > SecondarySuperCacheInterContention.test:t1 avgt 15 1.415 ? 0.016 ns/op > SecondarySuperCacheInterContention.test:t2 avgt 15 1.410 ? 0.017 ns/op > > Benchmark Mode Cnt Score Error Units > SecondarySupersLookup.testNegative00 avgt 15 1.806 ? 0.325 ns/op > SecondarySupersLookup.testNegative01 avgt 15 2.364 ? 0.236 ns/op > SecondarySupersLookup.testNegative02 avgt 15 2.903 ? 0.215 ns/op > SecondarySupersLookup.testNegative03 avgt 15 3.417 ? 0.199 ns/op > SecondarySupersLookup.testNegative04 avgt 15 3.758 ? 0.102 ns/op > SecondarySupersLookup.testNegative05 avgt 15 4.352 ? 0.123 ns/op > SecondarySupersLookup.testNegative06 avgt 15 4.800 ? 0.099 ns/op > SecondarySupersLookup.testNegative07 avgt 15 5.365 ? 0.060 ns/op > SecondarySupersLookup.testNegative08 avgt 15 6.316 ? 0.092 ns/op > SecondarySupersLookup.testNegative09 avgt 15 6.669 ? 0.164 ns/op > SecondarySupersLookup.testNegative10 avgt 15 7.041 ? 0.164 ns/op > SecondarySupersLookup.testNegative16 avgt 15 9.336 ? 0.185 ns/op > SecondarySupersLookup.testNegative20 avgt 15 11.373 ? 0.029 ns/op > SecondarySupersLookup.testNegative30 avgt 15 15.236 ? 0.051 ns/op > SecondarySupersLookup.testNegative32 avgt 15 16.031 ? 0.091 ns/op > SecondarySupersLookup.testNegative40 avgt 15 19.197 ? 0.279 ns/op > SecondarySupersLookup.testNegative50 avgt 15 23.804 ? 2.387 ns/op > SecondarySupersLookup.testNegative55 avgt 15 25.610 ? 1.155 ns/op > SecondarySupersLookup.testNegative56 avgt 15 26.128 ? 2.203 ns/op > SecondarySupersLookup.testNegative57 avgt 15 26.126 ? 0.881 ns/op > SecondarySupersLookup.testNegative58 avgt 15 26.314 ? 0.521 ns/op > SecondarySupersLookup.testNegative59 avgt 15 26.750 ? 0.837 ns/op > SecondarySupersLookup.testNegative60 avgt 15 27.118 ? 0.557 ns/op > SecondarySupersLookup.testNegative61 avgt 15 27.763 ? 1.628 ns... Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: move check into ASSERT block ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19544/files - new: https://git.openjdk.org/jdk/pull/19544/files/dd89106a..80329c3c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19544&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19544&range=08-09 Stats: 11 lines in 1 file changed: 5 ins; 5 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/19544.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19544/head:pull/19544 PR: https://git.openjdk.org/jdk/pull/19544 From amitkumar at openjdk.org Fri Jun 28 15:04:57 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Fri, 28 Jun 2024 15:04:57 GMT Subject: RFR: 8331126: [s390x] secondary_super_cache does not scale well [v8] In-Reply-To: References: <2RpIwN7sJviNmnwF-aOVEpd6VufubBjKfNwr-6Uc2dE=.d282f59f-e712-4ff6-a9ce-e920464d0c0b@github.com> <0ahlnaBKJFtHXM-a55hkMJVa6QpedeDLvjKG9B6iGis=.43b84ccc-4254-417c-ad8a-f724723fe70f@github.com> Message-ID: On Fri, 28 Jun 2024 13:08:57 GMT, Martin Doerr wrote: >> What does "in future" mean? In the next couple of nanoseconds after this instruction executes? When someone changes this code? >> You need to say, precisely, what this routine provides. That is to say, what is its postcondition. If some logic is not necessary for that postcondition, don't do it. >> This routine sets result, or it sets flags, or both. Say which of these it is, the make sure that all code paths do so. > > I think the `z_cghi` is only used for the `asm_assert`. I'd guard both with `ifdef ASSERT`. Or combine it with the assert block below. Done; ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19544#discussion_r1658885951 From mdoerr at openjdk.org Fri Jun 28 15:04:58 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Fri, 28 Jun 2024 15:04:58 GMT Subject: RFR: 8331126: [s390x] secondary_super_cache does not scale well [v10] In-Reply-To: References: Message-ID: On Fri, 28 Jun 2024 15:01:47 GMT, Amit Kumar wrote: >> s390x Port for [JDK-8180450](https://bugs.openjdk.org/browse/JDK-8180450) >> >> I ran `tier1` test with `-XX:+UseSecondarySupersTable -XX:+VerifySecondarySupers -XX:+StressSecondarySupers` in fastdebug vm and didn't see any new failure appearing; Except one I have mentioned [here](https://github.com/openjdk/jdk/pull/19368#issuecomment-2154983693); But this is reproducible on every other architecture with these flags. >> >> >> Without Patch: >> >> SecondarySuperCacheHits.test avgt 15 0.929 ? 0.010 ns/op >> >> SecondarySuperCacheInterContention.test avgt 15 1.413 ? 0.007 ns/op >> SecondarySuperCacheInterContention.test:t1 avgt 15 1.415 ? 0.016 ns/op >> SecondarySuperCacheInterContention.test:t2 avgt 15 1.410 ? 0.017 ns/op >> >> Benchmark Mode Cnt Score Error Units >> SecondarySupersLookup.testNegative00 avgt 15 1.806 ? 0.325 ns/op >> SecondarySupersLookup.testNegative01 avgt 15 2.364 ? 0.236 ns/op >> SecondarySupersLookup.testNegative02 avgt 15 2.903 ? 0.215 ns/op >> SecondarySupersLookup.testNegative03 avgt 15 3.417 ? 0.199 ns/op >> SecondarySupersLookup.testNegative04 avgt 15 3.758 ? 0.102 ns/op >> SecondarySupersLookup.testNegative05 avgt 15 4.352 ? 0.123 ns/op >> SecondarySupersLookup.testNegative06 avgt 15 4.800 ? 0.099 ns/op >> SecondarySupersLookup.testNegative07 avgt 15 5.365 ? 0.060 ns/op >> SecondarySupersLookup.testNegative08 avgt 15 6.316 ? 0.092 ns/op >> SecondarySupersLookup.testNegative09 avgt 15 6.669 ? 0.164 ns/op >> SecondarySupersLookup.testNegative10 avgt 15 7.041 ? 0.164 ns/op >> SecondarySupersLookup.testNegative16 avgt 15 9.336 ? 0.185 ns/op >> SecondarySupersLookup.testNegative20 avgt 15 11.373 ? 0.029 ns/op >> SecondarySupersLookup.testNegative30 avgt 15 15.236 ? 0.051 ns/op >> SecondarySupersLookup.testNegative32 avgt 15 16.031 ? 0.091 ns/op >> SecondarySupersLookup.testNegative40 avgt 15 19.197 ? 0.279 ns/op >> SecondarySupersLookup.testNegative50 avgt 15 23.804 ? 2.387 ns/op >> SecondarySupersLookup.testNegative55 avgt 15 25.610 ? 1.155 ns/op >> SecondarySupersLookup.testNegative56 avgt 15 26.128 ? 2.203 ns/op >> SecondarySupersLookup.testNegative57 avgt 15 26.126 ? 0.881 ns/op >> SecondarySupersLookup.testNegative58 avgt 15 26.314 ? 0.521 ns/op >> SecondarySupersLookup.testNegative59 avgt 15 26.750 ? 0.837 ns/op >> SecondarySupersLookup.testNegative60 avgt 15 27.118 ? 0.557 ... > > Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: > > move check into ASSERT block src/hotspot/cpu/s390/stubGenerator_s390.cpp line 3296: > 3294: if (UseSecondarySupersTable) { > 3295: StubRoutines::_lookup_secondary_supers_table_slow_path_stub = generate_lookup_secondary_supers_table_slow_path_stub(); > 3296: if (! InlineSecondarySupersTest) { Extra whitespace. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19544#discussion_r1658888610 From amitkumar at openjdk.org Fri Jun 28 15:08:54 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Fri, 28 Jun 2024 15:08:54 GMT Subject: RFR: 8331126: [s390x] secondary_super_cache does not scale well [v11] In-Reply-To: References: Message-ID: > s390x Port for [JDK-8180450](https://bugs.openjdk.org/browse/JDK-8180450) > > I ran `tier1` test with `-XX:+UseSecondarySupersTable -XX:+VerifySecondarySupers -XX:+StressSecondarySupers` in fastdebug vm and didn't see any new failure appearing; Except one I have mentioned [here](https://github.com/openjdk/jdk/pull/19368#issuecomment-2154983693); But this is reproducible on every other architecture with these flags. > > > Without Patch: > > SecondarySuperCacheHits.test avgt 15 0.929 ? 0.010 ns/op > > SecondarySuperCacheInterContention.test avgt 15 1.413 ? 0.007 ns/op > SecondarySuperCacheInterContention.test:t1 avgt 15 1.415 ? 0.016 ns/op > SecondarySuperCacheInterContention.test:t2 avgt 15 1.410 ? 0.017 ns/op > > Benchmark Mode Cnt Score Error Units > SecondarySupersLookup.testNegative00 avgt 15 1.806 ? 0.325 ns/op > SecondarySupersLookup.testNegative01 avgt 15 2.364 ? 0.236 ns/op > SecondarySupersLookup.testNegative02 avgt 15 2.903 ? 0.215 ns/op > SecondarySupersLookup.testNegative03 avgt 15 3.417 ? 0.199 ns/op > SecondarySupersLookup.testNegative04 avgt 15 3.758 ? 0.102 ns/op > SecondarySupersLookup.testNegative05 avgt 15 4.352 ? 0.123 ns/op > SecondarySupersLookup.testNegative06 avgt 15 4.800 ? 0.099 ns/op > SecondarySupersLookup.testNegative07 avgt 15 5.365 ? 0.060 ns/op > SecondarySupersLookup.testNegative08 avgt 15 6.316 ? 0.092 ns/op > SecondarySupersLookup.testNegative09 avgt 15 6.669 ? 0.164 ns/op > SecondarySupersLookup.testNegative10 avgt 15 7.041 ? 0.164 ns/op > SecondarySupersLookup.testNegative16 avgt 15 9.336 ? 0.185 ns/op > SecondarySupersLookup.testNegative20 avgt 15 11.373 ? 0.029 ns/op > SecondarySupersLookup.testNegative30 avgt 15 15.236 ? 0.051 ns/op > SecondarySupersLookup.testNegative32 avgt 15 16.031 ? 0.091 ns/op > SecondarySupersLookup.testNegative40 avgt 15 19.197 ? 0.279 ns/op > SecondarySupersLookup.testNegative50 avgt 15 23.804 ? 2.387 ns/op > SecondarySupersLookup.testNegative55 avgt 15 25.610 ? 1.155 ns/op > SecondarySupersLookup.testNegative56 avgt 15 26.128 ? 2.203 ns/op > SecondarySupersLookup.testNegative57 avgt 15 26.126 ? 0.881 ns/op > SecondarySupersLookup.testNegative58 avgt 15 26.314 ? 0.521 ns/op > SecondarySupersLookup.testNegative59 avgt 15 26.750 ? 0.837 ns/op > SecondarySupersLookup.testNegative60 avgt 15 27.118 ? 0.557 ns/op > SecondarySupersLookup.testNegative61 avgt 15 27.763 ? 1.628 ns... Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: extra whitespace ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19544/files - new: https://git.openjdk.org/jdk/pull/19544/files/80329c3c..8c7f5509 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19544&range=10 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19544&range=09-10 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/19544.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19544/head:pull/19544 PR: https://git.openjdk.org/jdk/pull/19544 From amitkumar at openjdk.org Fri Jun 28 15:08:54 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Fri, 28 Jun 2024 15:08:54 GMT Subject: RFR: 8331126: [s390x] secondary_super_cache does not scale well [v10] In-Reply-To: References: Message-ID: On Fri, 28 Jun 2024 15:01:47 GMT, Martin Doerr wrote: >> Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: >> >> move check into ASSERT block > > src/hotspot/cpu/s390/stubGenerator_s390.cpp line 3296: > >> 3294: if (UseSecondarySupersTable) { >> 3295: StubRoutines::_lookup_secondary_supers_table_slow_path_stub = generate_lookup_secondary_supers_table_slow_path_stub(); >> 3296: if (! InlineSecondarySupersTest) { > > Extra whitespace. fixed, I copy pasted it from aarch64 ?. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19544#discussion_r1658893302 From amitkumar at openjdk.org Fri Jun 28 15:39:22 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Fri, 28 Jun 2024 15:39:22 GMT Subject: RFR: JDK-8331732 : [PPC64] Unify and optimize code which converts != 0 to 1 [v12] In-Reply-To: References: Message-ID: On Fri, 28 Jun 2024 14:43:33 GMT, Suchismith Roy wrote: >> [JDK-8331732](https://bugs.openjdk.org/browse/JDK-8331732) >> The template interpreter contains branch-free conversion code for T_BOOLEAN (TemplateInterpreterGenerator::generate_result_handler_for). >> >> SharedRuntime::generate_native_wrapper uses unoptimized code to "Unpack the native result" for T_BOOLEAN. >> Power10 has the "setbc" / "setbcr" instruction. >> >> A new function has been created for the conversion and use "setbcr" on Power10 (determined by VM_Version::has_brw()) and otherwise the branch-free implementation. We should have a function for 32 and one for 64 bit operations (or one with supports both). >> >> The new code for MacroAssembler::verify_secondary_supers_table also uses the new function. > > Suchismith Roy has updated the pull request incrementally with one additional commit since the last revision: > > nits Things you may consider: 1. `is_64bit` also could be set to a default value; i.e. by default make it `true`; 2. I can see that `R0` is used temp register consistently, maybe make it default; src/hotspot/cpu/ppc/assembler_ppc.hpp line 353: > 351: SETBC_OPCODE = (31u << OPCODE_SHIFT | 384u << 1), > 352: SETNBC_OPCODE = (31u << OPCODE_SHIFT | 448u << 1), > 353: SETBCR_OPCODE = (31u << OPCODE_SHIFT | 416u << 1), update copyright header. src/hotspot/cpu/ppc/assembler_ppc.inline.hpp line 422: > 420: setnbc(d, bi0(cr, cc)); > 421: } > 422: inline void Assembler::setbcr(Register d, int biint) update copyright headers src/hotspot/cpu/ppc/macroAssembler_ppc.hpp line 182: > 180: void inline set_cmpu3(Register dst, bool treat_unordered_like_less); > 181: // Branch-free implementation to convert !=0 to 1. > 182: void inline normalize_bool(Register dst, Register temp, bool use_64bit); Suggestion: void inline normalize_bool(Register dst, Register temp, bool is_64bit); src/hotspot/cpu/ppc/macroAssembler_ppc.inline.hpp line 268: > 266: > 267: // Branch-free implementation to convert !=0 to 1 > 268: // Set register dst to 1 if dst is non-zero. Use setbcr instruction on Power10. Suggestion: // Set register dst to 1 if dst is non-zero. Uses setbcr instruction on Power10. ------------- Changes requested by amitkumar (Committer). PR Review: https://git.openjdk.org/jdk/pull/19886#pullrequestreview-2148294357 PR Review Comment: https://git.openjdk.org/jdk/pull/19886#discussion_r1658922332 PR Review Comment: https://git.openjdk.org/jdk/pull/19886#discussion_r1658919731 PR Review Comment: https://git.openjdk.org/jdk/pull/19886#discussion_r1658912749 PR Review Comment: https://git.openjdk.org/jdk/pull/19886#discussion_r1658906625 From sgehwolf at openjdk.org Fri Jun 28 15:41:48 2024 From: sgehwolf at openjdk.org (Severin Gehwolf) Date: Fri, 28 Jun 2024 15:41:48 GMT Subject: RFR: 8261242: [Linux] OSContainer::is_containerized() returns true when run outside a container [v8] In-Reply-To: References: Message-ID: > Please review this enhancement to the container detection code which allows it to figure out whether the JVM is actually running inside a container (`podman`, `docker`, `crio`), or with some other means that enforces memory/cpu limits by means of the cgroup filesystem. If neither of those conditions hold, the JVM runs in not containerized mode, addressing the issue described in the JBS tracker. For example, on my Linux system `is_containerized() == false" is being indicated with the following trace log line: > > > [0.001s][debug][os,container] OSContainer::init: is_containerized() = false because no cpu or memory limit is present > > > This state is being exposed by the Java `Metrics` API class using the new (still JDK internal) `isContainerized()` method. Example: > > > java -XshowSettings:system --version > Operating System Metrics: > Provider: cgroupv1 > System not containerized. > openjdk 23-internal 2024-09-17 > OpenJDK Runtime Environment (fastdebug build 23-internal-adhoc.sgehwolf.jdk-jdk) > OpenJDK 64-Bit Server VM (fastdebug build 23-internal-adhoc.sgehwolf.jdk-jdk, mixed mode, sharing) > > > The basic property this is being built on is the observation that the cgroup controllers typically get mounted read only into containers. Note that the current container tests assert that `OSContainer::is_containerized() == true` in various tests. Therefore, using the heuristic of "is any memory or cpu limit present" isn't sufficient. I had considered that in an earlier iteration, but many container tests failed. > > Overall, I think, with this patch we improve the current situation of claiming a containerized system being present when it's actually just a regular Linux system. > > Testing: > > - [x] GHA (risc-v failure seems infra related) > - [x] Container tests on Linux x86_64 of cgroups v1 and cgroups v2 (including gtests) > - [x] Some manual testing using cri-o > > Thoughts? Severin Gehwolf has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 18 commits: - Merge branch 'master' into jdk-8261242-is-containerized-fix - Refactor mount info matching to helper function - Merge branch 'master' into jdk-8261242-is-containerized-fix - Remove problem listing of PlainRead which is reworked here - Merge branch 'master' into jdk-8261242-is-containerized-fix - Merge branch 'master' into jdk-8261242-is-containerized-fix - Add doc for mountinfo scanning. - Unify naming of variables - Merge branch 'master' into jdk-8261242-is-containerized-fix - Merge branch 'master' into jdk-8261242-is-containerized-fix - ... and 8 more: https://git.openjdk.org/jdk/compare/486aa11e...1017da35 ------------- Changes: https://git.openjdk.org/jdk/pull/18201/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=18201&range=07 Stats: 411 lines in 20 files changed: 305 ins; 79 del; 27 mod Patch: https://git.openjdk.org/jdk/pull/18201.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18201/head:pull/18201 PR: https://git.openjdk.org/jdk/pull/18201 From sroy at openjdk.org Fri Jun 28 15:45:23 2024 From: sroy at openjdk.org (Suchismith Roy) Date: Fri, 28 Jun 2024 15:45:23 GMT Subject: RFR: JDK-8331732 : [PPC64] Unify and optimize code which converts != 0 to 1 [v12] In-Reply-To: References: Message-ID: <4DE7YaT_Ml5M5QVe_k_BMci-XhJZMaQJ9E3_Ex9D614=.cb0cbb8f-2266-481d-8164-2222f0c7e969@github.com> On Fri, 28 Jun 2024 15:29:55 GMT, Amit Kumar wrote: >> Suchismith Roy has updated the pull request incrementally with one additional commit since the last revision: >> >> nits > > src/hotspot/cpu/ppc/assembler_ppc.hpp line 353: > >> 351: SETBC_OPCODE = (31u << OPCODE_SHIFT | 384u << 1), >> 352: SETNBC_OPCODE = (31u << OPCODE_SHIFT | 448u << 1), >> 353: SETBCR_OPCODE = (31u << OPCODE_SHIFT | 416u << 1), > > update copyright header. Adding IBM Corp or just he Oracle header ? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19886#discussion_r1658938136 From sgehwolf at openjdk.org Fri Jun 28 15:45:25 2024 From: sgehwolf at openjdk.org (Severin Gehwolf) Date: Fri, 28 Jun 2024 15:45:25 GMT Subject: RFR: 8261242: [Linux] OSContainer::is_containerized() returns true when run outside a container [v7] In-Reply-To: References: Message-ID: On Thu, 27 Jun 2024 18:40:09 GMT, Larry Cable wrote: >> Severin Gehwolf has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 17 commits: >> >> - Refactor mount info matching to helper function >> - Merge branch 'master' into jdk-8261242-is-containerized-fix >> - Remove problem listing of PlainRead which is reworked here >> - Merge branch 'master' into jdk-8261242-is-containerized-fix >> - Merge branch 'master' into jdk-8261242-is-containerized-fix >> - Add doc for mountinfo scanning. >> - Unify naming of variables >> - Merge branch 'master' into jdk-8261242-is-containerized-fix >> - Merge branch 'master' into jdk-8261242-is-containerized-fix >> - jcheck fixes >> - ... and 7 more: https://git.openjdk.org/jdk/compare/baafa662...532ea33b > > src/hotspot/share/prims/jvm.cpp line 504: > >> 502: JVM_LEAF(jboolean, JVM_IsContainerized(void)) >> 503: #ifdef LINUX >> 504: if (OSContainer::is_containerized()) { > > // nit: personal preference... > > return OSContainer::isContainerized() ? JNI_TRUE : JNI_FALSE; I've kept this as is, since the suggestion generates this code after preprocessing on Linux: return OSContainer::is_containerized() ? JNI_TRUE : JNI_FALSE; return JNI_FALSE; over the existing version: if (OSContainer::is_containerized()) { return JNI_TRUE; } return JNI_FALSE; ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18201#discussion_r1658938198 From sroy at openjdk.org Fri Jun 28 15:48:20 2024 From: sroy at openjdk.org (Suchismith Roy) Date: Fri, 28 Jun 2024 15:48:20 GMT Subject: RFR: JDK-8331732 : [PPC64] Unify and optimize code which converts != 0 to 1 [v12] In-Reply-To: References: Message-ID: On Fri, 28 Jun 2024 15:36:41 GMT, Amit Kumar wrote: > Things you may consider: > > 1. `is_64bit` also could be set to a default value; i.e. by default make it `true`; > 2. I can see that `R0` is used temp register consistently, maybe make it default; 1. I am not sure if we have a defined default behaviour for it. We are doing the appropriate behaviour based on the version check.So by definition it does not sound to me like there is a default behaviour. 2. Yes R0 is used, but i think we should have the flexibility to use any register. Making it to R0 default is sounding like R0 is defined just for this, which is not true. @TheRealMDoerr to comment further. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19886#issuecomment-2197210289 From sgehwolf at openjdk.org Fri Jun 28 15:49:21 2024 From: sgehwolf at openjdk.org (Severin Gehwolf) Date: Fri, 28 Jun 2024 15:49:21 GMT Subject: RFR: 8261242: [Linux] OSContainer::is_containerized() returns true when run outside a container [v8] In-Reply-To: References: Message-ID: On Fri, 28 Jun 2024 15:41:48 GMT, Severin Gehwolf wrote: >> Please review this enhancement to the container detection code which allows it to figure out whether the JVM is actually running inside a container (`podman`, `docker`, `crio`), or with some other means that enforces memory/cpu limits by means of the cgroup filesystem. If neither of those conditions hold, the JVM runs in not containerized mode, addressing the issue described in the JBS tracker. For example, on my Linux system `is_containerized() == false" is being indicated with the following trace log line: >> >> >> [0.001s][debug][os,container] OSContainer::init: is_containerized() = false because no cpu or memory limit is present >> >> >> This state is being exposed by the Java `Metrics` API class using the new (still JDK internal) `isContainerized()` method. Example: >> >> >> java -XshowSettings:system --version >> Operating System Metrics: >> Provider: cgroupv1 >> System not containerized. >> openjdk 23-internal 2024-09-17 >> OpenJDK Runtime Environment (fastdebug build 23-internal-adhoc.sgehwolf.jdk-jdk) >> OpenJDK 64-Bit Server VM (fastdebug build 23-internal-adhoc.sgehwolf.jdk-jdk, mixed mode, sharing) >> >> >> The basic property this is being built on is the observation that the cgroup controllers typically get mounted read only into containers. Note that the current container tests assert that `OSContainer::is_containerized() == true` in various tests. Therefore, using the heuristic of "is any memory or cpu limit present" isn't sufficient. I had considered that in an earlier iteration, but many container tests failed. >> >> Overall, I think, with this patch we improve the current situation of claiming a containerized system being present when it's actually just a regular Linux system. >> >> Testing: >> >> - [x] GHA (risc-v failure seems infra related) >> - [x] Container tests on Linux x86_64 of cgroups v1 and cgroups v2 (including gtests) >> - [x] Some manual testing using cri-o >> >> Thoughts? > > Severin Gehwolf has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 18 commits: > > - Merge branch 'master' into jdk-8261242-is-containerized-fix > - Refactor mount info matching to helper function > - Merge branch 'master' into jdk-8261242-is-containerized-fix > - Remove problem listing of PlainRead which is reworked here > - Merge branch 'master' into jdk-8261242-is-containerized-fix > - Merge branch 'master' into jdk-8261242-is-containerized-fix > - Add doc for mountinfo scanning. > - Unify naming of variables > - Merge branch 'master' into jdk-8261242-is-containerized-fix > - Merge branch 'master' into jdk-8261242-is-containerized-fix > - ... and 8 more: https://git.openjdk.org/jdk/compare/486aa11e...1017da35 @adinn @iklam Could one of you please help with a second review, please? Not sure if @larry-cable review gets recorded with him having no OpenJDK project role :-/ Thanks in advance! ------------- PR Comment: https://git.openjdk.org/jdk/pull/18201#issuecomment-2197212014 From szaldana at openjdk.org Fri Jun 28 15:53:18 2024 From: szaldana at openjdk.org (Sonia Zaldana Calles) Date: Fri, 28 Jun 2024 15:53:18 GMT Subject: RFR: 8334513: New test gc/TestAlwaysPreTouchBehavior.java is failing In-Reply-To: References: Message-ID: On Thu, 20 Jun 2024 11:35:06 GMT, Thomas Stuefe wrote: > See JBS issue. > > It is not completely obvious what the problem is in Oracle's CI, but the current assumption is that RSS of the testee VM gets reduced after it started and before we measured due to memory pressure. > > The patch: > - exposes os::available_memory via Whitebox > - For the test to count as failed, we require a certain minimum size of available memory both before and during the start of the testee JVM. Otherwise, we throw a `SkippedException` > > I have some misgivings about this solution, though: > 1) obviously, it is not bullet-proof either, since it is vulnerable to fast changes in machine memory load. > 2) On MacOS, we have the problem that 'os::available_memory()' totally underreports how much memory is available. Therefore, as an estimate of whether the test is valid, it is too conservative. I opened https://bugs.openjdk.org/browse/JDK-8334767 to track that issue. As long as it is not fixed, the tests will likely fall below the threshold on MacOS and, therefore, be skipped. Still, this is somewhat better than outright excluding the test for MacOS (or is it? Open to opinions) > 3) `SkippedException` leads to the test counting as "passed", not "skipped". I think that is a usability issue with jtreg. I cannot easily see which tests had been skipped due to SkippedException. > > Despite my doubts, I think this is the best we can come up with if we want to have such a test. > > Note: One way to go about (3) would be to make "minimum available memory" a `@requires` tag, similar to os.maxMemory. However, I fear that this may be easily misused and cause many tests to be excluded without notice. Hi Thomas, I'm not a Reviewer but this looks good to me. Just a small nit. test/hotspot/jtreg/gc/TestAlwaysPreTouchBehavior.java line 122: > 120: > 121: final static long M = 1024 * 1024; > 122: final static long G = M * 1024L; This field is never used ------------- PR Review: https://git.openjdk.org/jdk/pull/19803#pullrequestreview-2148329492 PR Review Comment: https://git.openjdk.org/jdk/pull/19803#discussion_r1658927365 From amitkumar at openjdk.org Fri Jun 28 16:08:20 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Fri, 28 Jun 2024 16:08:20 GMT Subject: RFR: JDK-8331732 : [PPC64] Unify and optimize code which converts != 0 to 1 [v12] In-Reply-To: <4DE7YaT_Ml5M5QVe_k_BMci-XhJZMaQJ9E3_Ex9D614=.cb0cbb8f-2266-481d-8164-2222f0c7e969@github.com> References: <4DE7YaT_Ml5M5QVe_k_BMci-XhJZMaQJ9E3_Ex9D614=.cb0cbb8f-2266-481d-8164-2222f0c7e969@github.com> Message-ID: On Fri, 28 Jun 2024 15:42:53 GMT, Suchismith Roy wrote: >> src/hotspot/cpu/ppc/assembler_ppc.hpp line 353: >> >>> 351: SETBC_OPCODE = (31u << OPCODE_SHIFT | 384u << 1), >>> 352: SETNBC_OPCODE = (31u << OPCODE_SHIFT | 448u << 1), >>> 353: SETBCR_OPCODE = (31u << OPCODE_SHIFT | 416u << 1), >> >> update copyright header. > > Adding IBM Corp or just he Oracle header ? Oh, I was just asking about updating copyright header years. Make it 2024. for both SAP and Oracle. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19886#discussion_r1658963961 From amitkumar at openjdk.org Fri Jun 28 16:16:19 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Fri, 28 Jun 2024 16:16:19 GMT Subject: RFR: JDK-8331732 : [PPC64] Unify and optimize code which converts != 0 to 1 [v12] In-Reply-To: References: Message-ID: <9wgcfV-OPtfUHejKhgWZd9YsSBNZGJPpaoSOWNT9oAE=.f0241f89-4c2f-418a-951b-7ae76dc43744@github.com> On Fri, 28 Jun 2024 15:45:55 GMT, Suchismith Roy wrote: > > Things you may consider: > > > > 1. `is_64bit` also could be set to a default value; i.e. by default make it `true`; > > 2. I can see that `R0` is used temp register consistently, maybe make it default; > > 1. I am not sure if we have a defined default behaviour for it. We are doing the appropriate behaviour based on the version check.So by definition it does not sound to me like there is a default behaviour. My intention was to update normalize_bool method like this: void inline normalize_bool(Register dst, Register temp=R0, bool is_64bit = true); That way you don't have to pass it everywhere. > 2. Yes R0 is used, but i think we should have the flexibility to use any register. Making it to R0 default is sounding like R0 is defined just for this, which is not true. I didn't get it. It will be like you have a default register present, but if situation states that you can't use `R0` then pass a custom register. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19886#issuecomment-2197252730 From sroy at openjdk.org Fri Jun 28 16:33:38 2024 From: sroy at openjdk.org (Suchismith Roy) Date: Fri, 28 Jun 2024 16:33:38 GMT Subject: RFR: JDK-8331732 : [PPC64] Unify and optimize code which converts != 0 to 1 [v13] In-Reply-To: References: Message-ID: <-YF0bQsgKc-QGH5OX8d3IDkPkIuxRFiHnBm7SHfJxvM=.da8348de-561e-44f1-8f31-196e9bd98f36@github.com> > [JDK-8331732](https://bugs.openjdk.org/browse/JDK-8331732) > The template interpreter contains branch-free conversion code for T_BOOLEAN (TemplateInterpreterGenerator::generate_result_handler_for). > > SharedRuntime::generate_native_wrapper uses unoptimized code to "Unpack the native result" for T_BOOLEAN. > Power10 has the "setbc" / "setbcr" instruction. > > A new function has been created for the conversion and use "setbcr" on Power10 (determined by VM_Version::has_brw()) and otherwise the branch-free implementation. We should have a function for 32 and one for 64 bit operations (or one with supports both). > > The new code for MacroAssembler::verify_secondary_supers_table also uses the new function. Suchismith Roy has updated the pull request incrementally with one additional commit since the last revision: copyright anddefault behaviour ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19886/files - new: https://git.openjdk.org/jdk/pull/19886/files/21e0c8ec..711cd148 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19886&range=12 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19886&range=11-12 Stats: 10 lines in 4 files changed: 2 ins; 0 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/19886.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19886/head:pull/19886 PR: https://git.openjdk.org/jdk/pull/19886 From sroy at openjdk.org Fri Jun 28 16:33:38 2024 From: sroy at openjdk.org (Suchismith Roy) Date: Fri, 28 Jun 2024 16:33:38 GMT Subject: RFR: JDK-8331732 : [PPC64] Unify and optimize code which converts != 0 to 1 [v12] In-Reply-To: <9wgcfV-OPtfUHejKhgWZd9YsSBNZGJPpaoSOWNT9oAE=.f0241f89-4c2f-418a-951b-7ae76dc43744@github.com> References: <9wgcfV-OPtfUHejKhgWZd9YsSBNZGJPpaoSOWNT9oAE=.f0241f89-4c2f-418a-951b-7ae76dc43744@github.com> Message-ID: On Fri, 28 Jun 2024 16:14:04 GMT, Amit Kumar wrote: > My intention was to update normalize_bool method like this: Yeah i got that. I agree with the is_64bit now. it makes sense to me now. For the register R0, i am not sure if we should define it that way or can it cause interference with any other operation. I was having some build failures when i was changing the Ret register, so i am not sure if the same can happen with the temporary register being used ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/19886#issuecomment-2197276829 From sroy at openjdk.org Fri Jun 28 16:44:50 2024 From: sroy at openjdk.org (Suchismith Roy) Date: Fri, 28 Jun 2024 16:44:50 GMT Subject: RFR: JDK-8331732 : [PPC64] Unify and optimize code which converts != 0 to 1 [v14] In-Reply-To: References: Message-ID: > [JDK-8331732](https://bugs.openjdk.org/browse/JDK-8331732) > The template interpreter contains branch-free conversion code for T_BOOLEAN (TemplateInterpreterGenerator::generate_result_handler_for). > > SharedRuntime::generate_native_wrapper uses unoptimized code to "Unpack the native result" for T_BOOLEAN. > Power10 has the "setbc" / "setbcr" instruction. > > A new function has been created for the conversion and use "setbcr" on Power10 (determined by VM_Version::has_brw()) and otherwise the branch-free implementation. We should have a function for 32 and one for 64 bit operations (or one with supports both). > > The new code for MacroAssembler::verify_secondary_supers_table also uses the new function. Suchismith Roy has updated the pull request incrementally with one additional commit since the last revision: remove debug code ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19886/files - new: https://git.openjdk.org/jdk/pull/19886/files/711cd148..da88f98c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19886&range=13 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19886&range=12-13 Stats: 2 lines in 1 file changed: 0 ins; 2 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/19886.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19886/head:pull/19886 PR: https://git.openjdk.org/jdk/pull/19886 From sroy at openjdk.org Fri Jun 28 17:34:35 2024 From: sroy at openjdk.org (Suchismith Roy) Date: Fri, 28 Jun 2024 17:34:35 GMT Subject: RFR: JDK-8331732 : [PPC64] Unify and optimize code which converts != 0 to 1 [v15] In-Reply-To: References: Message-ID: > [JDK-8331732](https://bugs.openjdk.org/browse/JDK-8331732) > The template interpreter contains branch-free conversion code for T_BOOLEAN (TemplateInterpreterGenerator::generate_result_handler_for). > > SharedRuntime::generate_native_wrapper uses unoptimized code to "Unpack the native result" for T_BOOLEAN. > Power10 has the "setbc" / "setbcr" instruction. > > A new function has been created for the conversion and use "setbcr" on Power10 (determined by VM_Version::has_brw()) and otherwise the branch-free implementation. We should have a function for 32 and one for 64 bit operations (or one with supports both). > > The new code for MacroAssembler::verify_secondary_supers_table also uses the new function. Suchismith Roy has updated the pull request incrementally with one additional commit since the last revision: use default call ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19886/files - new: https://git.openjdk.org/jdk/pull/19886/files/da88f98c..7daa9ddc Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19886&range=14 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19886&range=13-14 Stats: 2 lines in 2 files changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/19886.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19886/head:pull/19886 PR: https://git.openjdk.org/jdk/pull/19886 From sroy at openjdk.org Fri Jun 28 17:39:54 2024 From: sroy at openjdk.org (Suchismith Roy) Date: Fri, 28 Jun 2024 17:39:54 GMT Subject: RFR: JDK-8331732 : [PPC64] Unify and optimize code which converts != 0 to 1 [v16] In-Reply-To: References: Message-ID: > [JDK-8331732](https://bugs.openjdk.org/browse/JDK-8331732) > The template interpreter contains branch-free conversion code for T_BOOLEAN (TemplateInterpreterGenerator::generate_result_handler_for). > > SharedRuntime::generate_native_wrapper uses unoptimized code to "Unpack the native result" for T_BOOLEAN. > Power10 has the "setbc" / "setbcr" instruction. > > A new function has been created for the conversion and use "setbcr" on Power10 (determined by VM_Version::has_brw()) and otherwise the branch-free implementation. We should have a function for 32 and one for 64 bit operations (or one with supports both). > > The new code for MacroAssembler::verify_secondary_supers_table also uses the new function. Suchismith Roy has updated the pull request incrementally with one additional commit since the last revision: default value usage ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19886/files - new: https://git.openjdk.org/jdk/pull/19886/files/7daa9ddc..5cd36a03 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19886&range=15 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19886&range=14-15 Stats: 3 lines in 2 files changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/19886.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19886/head:pull/19886 PR: https://git.openjdk.org/jdk/pull/19886 From sroy at openjdk.org Fri Jun 28 17:50:48 2024 From: sroy at openjdk.org (Suchismith Roy) Date: Fri, 28 Jun 2024 17:50:48 GMT Subject: RFR: JDK-8331732 : [PPC64] Unify and optimize code which converts != 0 to 1 [v17] In-Reply-To: References: Message-ID: <16ZWE0XUOtqmHZ0A_mHQpWtILevlI09asBBFRsCoJos=.3a66ac8f-6314-4a7b-b4a1-9e7c52a5cfa5@github.com> > [JDK-8331732](https://bugs.openjdk.org/browse/JDK-8331732) > The template interpreter contains branch-free conversion code for T_BOOLEAN (TemplateInterpreterGenerator::generate_result_handler_for). > > SharedRuntime::generate_native_wrapper uses unoptimized code to "Unpack the native result" for T_BOOLEAN. > Power10 has the "setbc" / "setbcr" instruction. > > A new function has been created for the conversion and use "setbcr" on Power10 (determined by VM_Version::has_brw()) and otherwise the branch-free implementation. We should have a function for 32 and one for 64 bit operations (or one with supports both). > > The new code for MacroAssembler::verify_secondary_supers_table also uses the new function. Suchismith Roy has updated the pull request incrementally with one additional commit since the last revision: default value usage ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19886/files - new: https://git.openjdk.org/jdk/pull/19886/files/5cd36a03..f5af70c0 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19886&range=16 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19886&range=15-16 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/19886.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19886/head:pull/19886 PR: https://git.openjdk.org/jdk/pull/19886 From amitkumar at openjdk.org Fri Jun 28 18:04:22 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Fri, 28 Jun 2024 18:04:22 GMT Subject: RFR: JDK-8331732 : [PPC64] Unify and optimize code which converts != 0 to 1 [v17] In-Reply-To: <16ZWE0XUOtqmHZ0A_mHQpWtILevlI09asBBFRsCoJos=.3a66ac8f-6314-4a7b-b4a1-9e7c52a5cfa5@github.com> References: <16ZWE0XUOtqmHZ0A_mHQpWtILevlI09asBBFRsCoJos=.3a66ac8f-6314-4a7b-b4a1-9e7c52a5cfa5@github.com> Message-ID: On Fri, 28 Jun 2024 17:50:48 GMT, Suchismith Roy wrote: >> [JDK-8331732](https://bugs.openjdk.org/browse/JDK-8331732) >> The template interpreter contains branch-free conversion code for T_BOOLEAN (TemplateInterpreterGenerator::generate_result_handler_for). >> >> SharedRuntime::generate_native_wrapper uses unoptimized code to "Unpack the native result" for T_BOOLEAN. >> Power10 has the "setbc" / "setbcr" instruction. >> >> A new function has been created for the conversion and use "setbcr" on Power10 (determined by VM_Version::has_brw()) and otherwise the branch-free implementation. We should have a function for 32 and one for 64 bit operations (or one with supports both). >> >> The new code for MacroAssembler::verify_secondary_supers_table also uses the new function. > > Suchismith Roy has updated the pull request incrementally with one additional commit since the last revision: > > default value usage src/hotspot/cpu/ppc/macroAssembler_ppc.cpp line 2386: > 2384: > 2385: // convert !=0 to 1 > 2386: normalize_bool(result, true); wait, is build successful with this code change ? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19886#discussion_r1659131584 From duke at openjdk.org Fri Jun 28 18:04:23 2024 From: duke at openjdk.org (Larry Cable) Date: Fri, 28 Jun 2024 18:04:23 GMT Subject: RFR: 8261242: [Linux] OSContainer::is_containerized() returns true when run outside a container [v8] In-Reply-To: References: Message-ID: On Fri, 28 Jun 2024 15:41:48 GMT, Severin Gehwolf wrote: >> Please review this enhancement to the container detection code which allows it to figure out whether the JVM is actually running inside a container (`podman`, `docker`, `crio`), or with some other means that enforces memory/cpu limits by means of the cgroup filesystem. If neither of those conditions hold, the JVM runs in not containerized mode, addressing the issue described in the JBS tracker. For example, on my Linux system `is_containerized() == false" is being indicated with the following trace log line: >> >> >> [0.001s][debug][os,container] OSContainer::init: is_containerized() = false because no cpu or memory limit is present >> >> >> This state is being exposed by the Java `Metrics` API class using the new (still JDK internal) `isContainerized()` method. Example: >> >> >> java -XshowSettings:system --version >> Operating System Metrics: >> Provider: cgroupv1 >> System not containerized. >> openjdk 23-internal 2024-09-17 >> OpenJDK Runtime Environment (fastdebug build 23-internal-adhoc.sgehwolf.jdk-jdk) >> OpenJDK 64-Bit Server VM (fastdebug build 23-internal-adhoc.sgehwolf.jdk-jdk, mixed mode, sharing) >> >> >> The basic property this is being built on is the observation that the cgroup controllers typically get mounted read only into containers. Note that the current container tests assert that `OSContainer::is_containerized() == true` in various tests. Therefore, using the heuristic of "is any memory or cpu limit present" isn't sufficient. I had considered that in an earlier iteration, but many container tests failed. >> >> Overall, I think, with this patch we improve the current situation of claiming a containerized system being present when it's actually just a regular Linux system. >> >> Testing: >> >> - [x] GHA (risc-v failure seems infra related) >> - [x] Container tests on Linux x86_64 of cgroups v1 and cgroups v2 (including gtests) >> - [x] Some manual testing using cri-o >> >> Thoughts? > > Severin Gehwolf has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 18 commits: > > - Merge branch 'master' into jdk-8261242-is-containerized-fix > - Refactor mount info matching to helper function > - Merge branch 'master' into jdk-8261242-is-containerized-fix > - Remove problem listing of PlainRead which is reworked here > - Merge branch 'master' into jdk-8261242-is-containerized-fix > - Merge branch 'master' into jdk-8261242-is-containerized-fix > - Add doc for mountinfo scanning. > - Unify naming of variables > - Merge branch 'master' into jdk-8261242-is-containerized-fix > - Merge branch 'master' into jdk-8261242-is-containerized-fix > - ... and 8 more: https://git.openjdk.org/jdk/compare/486aa11e...1017da35 On 6/28/24 8:47 AM, Severin Gehwolf wrote: > > @adinn > > @iklam > > Could one of you please help with a second review, please? Not sure if > @larry-cable > > review gets recorded with him having no OpenJDK project role :-/ > Thanks in advance! > yeah sorry - I'm a "newbie" ... only since 1.1 ... :) > > ? > Reply to this email directly, view it on GitHub > , > or unsubscribe > . > You are receiving this because you were mentioned.Message ID: > ***@***.***> > --------------UaLc7Fb3y3GBgvf0paImu5tU Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: 8bit

On 6/28/24 8:47 AM, Severin Gehwolf wrote:

Could one of you please help with a second review, please? Not sure if review gets recorded with him having no OpenJDK project role :-/ Thanks in advance!


yeah sorry - I'm a "newbie" ... only since 1.1 ... :)

?
Reply to this email directly,
view it on GitHub, or unsubscribe.
You are receiving this because you were mentioned.Message ID: <openjdk/jdk/pull/18201/c2197212014@github.com>


--------------UaLc7Fb3y3GBgvf0paImu5tU-- ------------- PR Comment: https://git.openjdk.org/jdk/pull/18201#issuecomment-2197407053 From amitkumar at openjdk.org Fri Jun 28 18:09:19 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Fri, 28 Jun 2024 18:09:19 GMT Subject: RFR: JDK-8331732 : [PPC64] Unify and optimize code which converts != 0 to 1 [v17] In-Reply-To: References: <16ZWE0XUOtqmHZ0A_mHQpWtILevlI09asBBFRsCoJos=.3a66ac8f-6314-4a7b-b4a1-9e7c52a5cfa5@github.com> Message-ID: <2YyVMpGE3v2Sl2gqVs0OxIC6Dv19HwGVeJfNY_i80eY=.8df91914-d37b-4d74-9676-1e60c5a79a06@github.com> On Fri, 28 Jun 2024 18:02:11 GMT, Amit Kumar wrote: >> Suchismith Roy has updated the pull request incrementally with one additional commit since the last revision: >> >> default value usage > > src/hotspot/cpu/ppc/macroAssembler_ppc.cpp line 2386: > >> 2384: >> 2385: // convert !=0 to 1 >> 2386: normalize_bool(result, true); > > wait, is build successful with this code change ? I think you can't specify the parameters arbitrarily like this. If you want to pass "true" then you have to specify the register as well; ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19886#discussion_r1659135752 From stuefe at openjdk.org Fri Jun 28 19:22:32 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Fri, 28 Jun 2024 19:22:32 GMT Subject: RFR: 8334513: New test gc/TestAlwaysPreTouchBehavior.java is failing [v2] In-Reply-To: References: Message-ID: > See JBS issue. > > It is not completely obvious what the problem is in Oracle's CI, but the current assumption is that RSS of the testee VM gets reduced after it started and before we measured due to memory pressure. > > The patch: > - exposes os::available_memory via Whitebox > - For the test to count as failed, we require a certain minimum size of available memory both before and during the start of the testee JVM. Otherwise, we throw a `SkippedException` > > I have some misgivings about this solution, though: > 1) obviously, it is not bullet-proof either, since it is vulnerable to fast changes in machine memory load. > 2) On MacOS, we have the problem that 'os::available_memory()' totally underreports how much memory is available. Therefore, as an estimate of whether the test is valid, it is too conservative. I opened https://bugs.openjdk.org/browse/JDK-8334767 to track that issue. As long as it is not fixed, the tests will likely fall below the threshold on MacOS and, therefore, be skipped. Still, this is somewhat better than outright excluding the test for MacOS (or is it? Open to opinions) > 3) `SkippedException` leads to the test counting as "passed", not "skipped". I think that is a usability issue with jtreg. I cannot easily see which tests had been skipped due to SkippedException. > > Despite my doubts, I think this is the best we can come up with if we want to have such a test. > > Note: One way to go about (3) would be to make "minimum available memory" a `@requires` tag, similar to os.maxMemory. However, I fear that this may be easily misused and cause many tests to be excluded without notice. Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: Update TestAlwaysPreTouchBehavior.java ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19803/files - new: https://git.openjdk.org/jdk/pull/19803/files/30bc66df..19ed5833 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19803&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19803&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/19803.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19803/head:pull/19803 PR: https://git.openjdk.org/jdk/pull/19803 From stuefe at openjdk.org Fri Jun 28 19:22:32 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Fri, 28 Jun 2024 19:22:32 GMT Subject: RFR: 8334513: New test gc/TestAlwaysPreTouchBehavior.java is failing [v2] In-Reply-To: References: Message-ID: On Fri, 28 Jun 2024 15:51:09 GMT, Sonia Zaldana Calles wrote: > Hi Thomas, > > I'm not a Reviewer but this looks good to me. Just a small nit. Thank you, @SoniaZaldana ! Good catch. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19803#issuecomment-2197495388 From sroy at openjdk.org Fri Jun 28 20:26:18 2024 From: sroy at openjdk.org (Suchismith Roy) Date: Fri, 28 Jun 2024 20:26:18 GMT Subject: RFR: JDK-8331732 : [PPC64] Unify and optimize code which converts != 0 to 1 [v17] In-Reply-To: <2YyVMpGE3v2Sl2gqVs0OxIC6Dv19HwGVeJfNY_i80eY=.8df91914-d37b-4d74-9676-1e60c5a79a06@github.com> References: <16ZWE0XUOtqmHZ0A_mHQpWtILevlI09asBBFRsCoJos=.3a66ac8f-6314-4a7b-b4a1-9e7c52a5cfa5@github.com> <2YyVMpGE3v2Sl2gqVs0OxIC6Dv19HwGVeJfNY_i80eY=.8df91914-d37b-4d74-9676-1e60c5a79a06@github.com> Message-ID: On Fri, 28 Jun 2024 18:07:02 GMT, Amit Kumar wrote: >> src/hotspot/cpu/ppc/macroAssembler_ppc.cpp line 2386: >> >>> 2384: >>> 2385: // convert !=0 to 1 >>> 2386: normalize_bool(result, true); >> >> wait, is build successful with this code change ? > > I think you can't specify the parameters arbitrarily like this. If you want to pass "true" then you have to specify the register as well; Build is succeding . And i also did a register check for templateGenerator with this example using normalize_bool(result,false) ,(java -version was not hitting this code). Seems to be working fine. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19886#discussion_r1659283834 From sroy at openjdk.org Fri Jun 28 20:39:20 2024 From: sroy at openjdk.org (Suchismith Roy) Date: Fri, 28 Jun 2024 20:39:20 GMT Subject: RFR: JDK-8331732 : [PPC64] Unify and optimize code which converts != 0 to 1 [v17] In-Reply-To: References: <16ZWE0XUOtqmHZ0A_mHQpWtILevlI09asBBFRsCoJos=.3a66ac8f-6314-4a7b-b4a1-9e7c52a5cfa5@github.com> <2YyVMpGE3v2Sl2gqVs0OxIC6Dv19HwGVeJfNY_i80eY=.8df91914-d37b-4d74-9676-1e60c5a79a06@github.com> Message-ID: On Fri, 28 Jun 2024 20:23:42 GMT, Suchismith Roy wrote: >> I think you can't specify the parameters arbitrarily like this. If you want to pass "true" then you have to specify the register as well; > > Build is succeding . And i also did a register check for templateGenerator with this example using normalize_bool(result,false) ,(java -version was not hitting this code). Seems to be working fine. I think the 2nd value is taken at default. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19886#discussion_r1659295984 From kbarrett at openjdk.org Fri Jun 28 22:57:22 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Fri, 28 Jun 2024 22:57:22 GMT Subject: RFR: 8335294: Fix simple -Wzero-as-null-pointer-constant warnings in gc code In-Reply-To: <1ZaZycYoA4m6BbYBsPu85wX3QQ2uYPylEXLoKhaSb3Y=.73794fe1-da83-4c54-8a30-fdfbbf2746e2@github.com> References: <1ZaZycYoA4m6BbYBsPu85wX3QQ2uYPylEXLoKhaSb3Y=.73794fe1-da83-4c54-8a30-fdfbbf2746e2@github.com> Message-ID: On Fri, 28 Jun 2024 12:40:22 GMT, Julian Waters wrote: > Looks good. Does -Wzero-as-null-constant prevent any new NULL from creeping in also? Thanks. As mentioned in the umbrella bug (https://bugs.openjdk.org/browse/JDK-8332189), enabling that warning doesn't help with backsliding on NULL usage. I've not (yet) found a way to do that. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19934#issuecomment-2197749060 From kbarrett at openjdk.org Fri Jun 28 23:08:17 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Fri, 28 Jun 2024 23:08:17 GMT Subject: RFR: 8335294: Fix simple -Wzero-as-null-pointer-constant warnings in gc code In-Reply-To: References: Message-ID: On Fri, 28 Jun 2024 12:37:41 GMT, Julian Waters wrote: > I would keep the return nullptr in that one place that it was removed (I'm somewhat certain that leaving out a return in a method that has a non void return type isn't allowed in C++, even if there was a call to a noreturn method directly before the return), but besides that this seems ok Having execution run off the end of a function not returning void is UB. Ending with a [[noreturn]] function prevents that from happening (assuming the [[noreturn]] attribute is correctly applied, else that is itself UB). We should instead be removing these "spurious" continuations / returns as encountered, to eliminate any questions about whether, for example, a function _can_ return null. I don't recommend going on a hunt to clean these up, as that would likely be a lot of code churn. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19934#issuecomment-2197755214 From cjplummer at openjdk.org Fri Jun 28 23:25:18 2024 From: cjplummer at openjdk.org (Chris Plummer) Date: Fri, 28 Jun 2024 23:25:18 GMT Subject: [jdk23] RFR: 8333542: Breakpoint in parallel code does not work In-Reply-To: References: Message-ID: <-Dn8a4f8XmS0SCf8cffIrMFVKBe9_o8_1yfsNlKjZ34=.1c4ebc2e-b500-45c1-8d9f-ea034d8810cc@github.com> On Fri, 28 Jun 2024 12:14:55 GMT, Coleen Phillimore wrote: > Clean backport of JDK-8333542. After this, we need a backport for JDK-8335134 to fix the test. Marked as reviewed by cjplummer (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/19938#pullrequestreview-2149101087 From kbarrett at openjdk.org Sat Jun 29 00:56:20 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Sat, 29 Jun 2024 00:56:20 GMT Subject: RFR: 8335283: Build failure due to 'no_sanitize' attribute directive ignored In-Reply-To: References: Message-ID: On Fri, 28 Jun 2024 11:04:16 GMT, Matthias Baesken wrote: > The following build error has been reported with old gcc used > installers/linux/universal/tar/corretto-build/buildRoot/src/hotspot/share/utilities/vmError.cpp:2068:44: error: 'no_sanitize' attribute directive ignored [-Werror=attributes] > static void ALWAYSINLINE crash_with_sigfpe() { > > We can avoid it by not settings the mentioned attribute in case ubsan is not enabled. Looks good. ------------- Marked as reviewed by kbarrett (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19937#pullrequestreview-2149139289 From dlong at openjdk.org Sat Jun 29 01:58:32 2024 From: dlong at openjdk.org (Dean Long) Date: Sat, 29 Jun 2024 01:58:32 GMT Subject: RFR: 8332689: RISC-V: Use load instead of trampolines [v19] In-Reply-To: References: <-SCwSVP6zTNao7X4VlIPRor8OU9vDg79oGVDWTf4XCM=.b72cbf6f-66e7-4fb3-b387-e00ae0537ac6@github.com> Message-ID: On Fri, 28 Jun 2024 07:02:03 GMT, Robbin Ehn wrote: >> src/hotspot/cpu/riscv/macroAssembler_riscv.hpp line 1275: >> >>> 1273: // >>> 1274: // Return: the call PC or null if CodeCache is full. >>> 1275: address patchable_far_call(Address entry) { >> >> For runtime_call_type, I don't think we ever update the target/destination, so the name "patchable" seems not quite right for them. Also, for runtime_call_type, since they never change, we can decide early if a near call is possible when the destination is always reachable (based on the bounds of code cache), which is what aarch64 does for trampoline_call. > > Yes. My thinking was, the site is still patachable, even if some sites don't need that capability. > The reason why this patch ignores near calls is because the short reach of JAL +-1MB (so normally only a few stubs can be reach from a few nmethods). > But it is on the enhancement list. > > I don't mind changing the name, feel free to suggest something! The key things seems to be that they are typed with a relocInfo, so maybe `reloc_call`? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19453#discussion_r1659486646 From amitkumar at openjdk.org Sat Jun 29 02:51:25 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Sat, 29 Jun 2024 02:51:25 GMT Subject: RFR: JDK-8331732 : [PPC64] Unify and optimize code which converts != 0 to 1 [v17] In-Reply-To: References: <16ZWE0XUOtqmHZ0A_mHQpWtILevlI09asBBFRsCoJos=.3a66ac8f-6314-4a7b-b4a1-9e7c52a5cfa5@github.com> <2YyVMpGE3v2Sl2gqVs0OxIC6Dv19HwGVeJfNY_i80eY=.8df91914-d37b-4d74-9676-1e60c5a79a06@github.com> Message-ID: On Fri, 28 Jun 2024 20:37:08 GMT, Suchismith Roy wrote: >> Build is succeding . And i also did a register check for templateGenerator with this example using normalize_bool(result,false) ,(java -version was not hitting this code). Seems to be working fine. > > I think the 2nd value is taken at default. amitkumar at Amits-MacBook-Pro ~ % cat tt.cpp #include using namespace std; void fun(int a, char b = 'a', float c = 10.3); int main(void) { fun(10); fun(10, 103.483); return 0; } void fun(int a, char b, float c) { } amitkumar at Amits-MacBook-Pro ~ % g++ tt.cpp tt.cpp:8:11: warning: implicit conversion from 'double' to 'char' changes value from 103.483 to 103 [-Wliteral-conversion] fun(10, 103.483); ~~~ ^~~~~~~ 1 warning generated. That's weird!!! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19886#discussion_r1659497821 From amitkumar at openjdk.org Sat Jun 29 03:01:24 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Sat, 29 Jun 2024 03:01:24 GMT Subject: RFR: JDK-8331732 : [PPC64] Unify and optimize code which converts != 0 to 1 [v17] In-Reply-To: <16ZWE0XUOtqmHZ0A_mHQpWtILevlI09asBBFRsCoJos=.3a66ac8f-6314-4a7b-b4a1-9e7c52a5cfa5@github.com> References: <16ZWE0XUOtqmHZ0A_mHQpWtILevlI09asBBFRsCoJos=.3a66ac8f-6314-4a7b-b4a1-9e7c52a5cfa5@github.com> Message-ID: On Fri, 28 Jun 2024 17:50:48 GMT, Suchismith Roy wrote: >> [JDK-8331732](https://bugs.openjdk.org/browse/JDK-8331732) >> The template interpreter contains branch-free conversion code for T_BOOLEAN (TemplateInterpreterGenerator::generate_result_handler_for). >> >> SharedRuntime::generate_native_wrapper uses unoptimized code to "Unpack the native result" for T_BOOLEAN. >> Power10 has the "setbc" / "setbcr" instruction. >> >> A new function has been created for the conversion and use "setbcr" on Power10 (determined by VM_Version::has_brw()) and otherwise the branch-free implementation. We should have a function for 32 and one for 64 bit operations (or one with supports both). >> >> The new code for MacroAssembler::verify_secondary_supers_table also uses the new function. > > Suchismith Roy has updated the pull request incrementally with one additional commit since the last revision: > > default value usage src/hotspot/cpu/ppc/macroAssembler_ppc.inline.hpp line 279: > 277: setbcr(dst, CCR0, Assembler::equal); > 278: } else { > 279: neg(temp, dst); I guess it would be better if we add assert here, `assert_different_registers(dst, temp);` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19886#discussion_r1659498940 From amitkumar at openjdk.org Sat Jun 29 03:40:23 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Sat, 29 Jun 2024 03:40:23 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v2] In-Reply-To: References: Message-ID: On Thu, 20 Jun 2024 04:17:30 GMT, Roberto Casta?eda Lozano wrote: >> This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail. >> >> We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold: >> >> - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and >> - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work. >> >> ## Summary of the Changes >> >> ### Platform-Independent Changes (`src/hotspot/share`) >> >> These consist mainly of: >> >> - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early; >> - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and >> - temporary support for porting the JEP to the remaining platforms. >> >> The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed. >> >> ### Platform-Dependent Changes (`src/hotspot/cpu`) >> >> These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms. >> >> #### ADL Changes >> >> The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy. >> >> #### `G1BarrierSetAssembler` Changes >> >> Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live ... > > Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: > > Build barrier data in G1BarrierSetC2::get_store_barrier() by adding, rather than removing, barrier tags make/hotspot/gensrc/GensrcAdlc.gmk line 205: > 203: ifeq ($(call check-jvm-feature, g1gc), true) > 204: AD_SRC_FILES += $(call uniq, $(wildcard $(foreach d, $(AD_SRC_ROOTS), \ > 205: $d/cpu/$(HOTSPOT_TARGET_CPU_ARCH)/gc/g1/g1_$(HOTSPOT_TARGET_CPU).ad \ on s390, `g1_s390.ad` file is not compiled with current code. Suggestion: $d/cpu/$(HOTSPOT_TARGET_CPU_ARCH)/gc/g1/g1_$(HOTSPOT_TARGET_CPU_ARCH).ad \ ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1659508638 From amitkumar at openjdk.org Sat Jun 29 03:54:25 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Sat, 29 Jun 2024 03:54:25 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v2] In-Reply-To: References: Message-ID: On Sat, 29 Jun 2024 03:37:59 GMT, Amit Kumar wrote: >> Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: >> >> Build barrier data in G1BarrierSetC2::get_store_barrier() by adding, rather than removing, barrier tags > > make/hotspot/gensrc/GensrcAdlc.gmk line 205: > >> 203: ifeq ($(call check-jvm-feature, g1gc), true) >> 204: AD_SRC_FILES += $(call uniq, $(wildcard $(foreach d, $(AD_SRC_ROOTS), \ >> 205: $d/cpu/$(HOTSPOT_TARGET_CPU_ARCH)/gc/g1/g1_$(HOTSPOT_TARGET_CPU).ad \ > > on s390, `g1_s390.ad` file is not compiled with current code. > > Suggestion: > > $d/cpu/$(HOTSPOT_TARGET_CPU_ARCH)/gc/g1/g1_$(HOTSPOT_TARGET_CPU_ARCH).ad \ I guess this one might be better: diff --git a/make/hotspot/gensrc/GensrcAdlc.gmk b/make/hotspot/gensrc/GensrcAdlc.gmk index e34f0725397..ef9c15b2975 100644 --- a/make/hotspot/gensrc/GensrcAdlc.gmk +++ b/make/hotspot/gensrc/GensrcAdlc.gmk @@ -203,6 +203,7 @@ ifeq ($(call check-jvm-feature, compiler2), true) ifeq ($(call check-jvm-feature, g1gc), true) AD_SRC_FILES += $(call uniq, $(wildcard $(foreach d, $(AD_SRC_ROOTS), \ $d/cpu/$(HOTSPOT_TARGET_CPU_ARCH)/gc/g1/g1_$(HOTSPOT_TARGET_CPU).ad \ + $d/cpu/$(HOTSPOT_TARGET_CPU_ARCH)/gc/g1/g1_$(HOTSPOT_TARGET_CPU_ARCH).ad \ ))) endif Build is fine with both changes, (tested on Mac-M1) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1659516461 From kbarrett at openjdk.org Sat Jun 29 05:07:21 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Sat, 29 Jun 2024 05:07:21 GMT Subject: RFR: 8335294: Fix simple -Wzero-as-null-pointer-constant warnings in gc code In-Reply-To: References: Message-ID: On Fri, 28 Jun 2024 03:44:53 GMT, Kim Barrett wrote: > Please review this change that replaces some uses of literal 0 as a null > pointer constant in gc code to instead use nullptr. > > There is also one place where the use was eliminated entirely, because it was > dead code following a [[noreturn]] call. > > Testing: mach5 tier1. Thanks for reviews. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19934#issuecomment-2197948391 From kbarrett at openjdk.org Sat Jun 29 05:07:21 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Sat, 29 Jun 2024 05:07:21 GMT Subject: Integrated: 8335294: Fix simple -Wzero-as-null-pointer-constant warnings in gc code In-Reply-To: References: Message-ID: On Fri, 28 Jun 2024 03:44:53 GMT, Kim Barrett wrote: > Please review this change that replaces some uses of literal 0 as a null > pointer constant in gc code to instead use nullptr. > > There is also one place where the use was eliminated entirely, because it was > dead code following a [[noreturn]] call. > > Testing: mach5 tier1. This pull request has now been integrated. Changeset: 8350b1da Author: Kim Barrett URL: https://git.openjdk.org/jdk/commit/8350b1daedae8ef5785a7165e664b1d3149b18b7 Stats: 31 lines in 10 files changed: 0 ins; 1 del; 30 mod 8335294: Fix simple -Wzero-as-null-pointer-constant warnings in gc code Reviewed-by: tschatzl, coleenp, jwaters ------------- PR: https://git.openjdk.org/jdk/pull/19934 From jwaters at openjdk.org Sat Jun 29 06:00:21 2024 From: jwaters at openjdk.org (Julian Waters) Date: Sat, 29 Jun 2024 06:00:21 GMT Subject: RFR: 8335283: Build failure due to 'no_sanitize' attribute directive ignored In-Reply-To: References: Message-ID: On Fri, 28 Jun 2024 11:04:16 GMT, Matthias Baesken wrote: > The following build error has been reported with old gcc used > installers/linux/universal/tar/corretto-build/buildRoot/src/hotspot/share/utilities/vmError.cpp:2068:44: error: 'no_sanitize' attribute directive ignored [-Werror=attributes] > static void ALWAYSINLINE crash_with_sigfpe() { > > We can avoid it by not settings the mentioned attribute in case ubsan is not enabled. Wouldn't this result in an undefined macro if UBSAN is off? At least, that's what I suspect might happen sometimes ------------- PR Comment: https://git.openjdk.org/jdk/pull/19937#issuecomment-2197996213 From tanksherman27 at gmail.com Sat Jun 29 06:40:49 2024 From: tanksherman27 at gmail.com (Julian Waters) Date: Sat, 29 Jun 2024 14:40:49 +0800 Subject: os::current_stack_pointer seems a little strange Message-ID: Hi all, While looking through HotSpot I recently came across os::current_stack_pointer, and the implementations seem a little odd. On Windows x86, ARM64 and macOS Zero, it essentially returns the address of the first local defined in itself, which should correspond to the frame pointer of os::current_stack_pointer, which in turn should equal the stack pointer of the caller, which is what we want to get. Windows x64 instead delegates this task to runtime created assembly from StubRoutines, Linux x86, PowerPC, ARM64, RISC-V, Zero as well as AIX PowerPC all use __builtin_frame_address(0), Linux ARM32 loads the stack pointer using register address sp __asm__ ("sp"); which essentially loads the stack pointer into the sp local, and Linux s390x and macOS x64 and ARM64 all use some form of handwritten assembly to load the current stack pointer (Someone please correct me if I'm wrong about what os::current_stack_pointer is trying to get, as in, the current stack pointer) My concern is that this seems to be a little easy to break. I cannot comment on Windows x64, Linux s390x or macOS x64/ARM64 since I don't really understand their implementations, but for Windows x86, ARM64 and macOS Zero, the obvious worry is when os::current_stack_pointer is inlined or if a jmp instruction to os::current_stack_pointer is used rather than a call as in tail recursion, which would result in an address from anywhere between the frame pointer of the caller to the stack pointer of the caller. The same issue would hold true for Linux x86, PowerPC, ARM64, RISC-V, Zero as well as AIX PowerPC, albeit instead of a range between the caller's frame and stack pointer it would always return the frame pointer of the caller. On the flip side, Linux ARM32 would only work if os::current_stack_pointer was inlined, otherwise it would return the stack pointer of os::current_stack_pointer instead. I see some definitions of os::current_stack_pointer are marked with NOINLINE as appropriate, but not all of them are. There are probably some more issues that I haven't thought of with the current implementations, such as the prologue of os::current_stack_pointer potentially also distorting the returned address. Is this an issue worth looking into? best regards, Julian From sroy at openjdk.org Sat Jun 29 06:41:19 2024 From: sroy at openjdk.org (Suchismith Roy) Date: Sat, 29 Jun 2024 06:41:19 GMT Subject: RFR: JDK-8331732 : [PPC64] Unify and optimize code which converts != 0 to 1 [v17] In-Reply-To: References: <16ZWE0XUOtqmHZ0A_mHQpWtILevlI09asBBFRsCoJos=.3a66ac8f-6314-4a7b-b4a1-9e7c52a5cfa5@github.com> <2YyVMpGE3v2Sl2gqVs0OxIC6Dv19HwGVeJfNY_i80eY=.8df91914-d37b-4d74-9676-1e60c5a79a06@github.com> Message-ID: On Sat, 29 Jun 2024 02:48:58 GMT, Amit Kumar wrote: >> I think the 2nd value is taken at default. > > amitkumar at Amits-MacBook-Pro ~ % cat tt.cpp > #include > using namespace std; > > void fun(int a, char b = 'a', float c = 10.3); > > int main(void) { > fun(10); > fun(10, 103.483); > return 0; > } > > void fun(int a, char b, float c) { > } > > amitkumar at Amits-MacBook-Pro ~ % g++ tt.cpp > tt.cpp:8:11: warning: implicit conversion from 'double' to 'char' changes value from 103.483 to 103 [-Wliteral-conversion] > fun(10, 103.483); > ~~~ ^~~~~~~ > 1 warning generated. > > > That's weird!!! Yeah i am not sure why the warning did not pop up during build.It run through fine. Thanks for pointing it out. I will correct it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19886#discussion_r1659631315 From sroy at openjdk.org Sat Jun 29 06:47:52 2024 From: sroy at openjdk.org (Suchismith Roy) Date: Sat, 29 Jun 2024 06:47:52 GMT Subject: RFR: JDK-8331732 : [PPC64] Unify and optimize code which converts != 0 to 1 [v17] In-Reply-To: References: <16ZWE0XUOtqmHZ0A_mHQpWtILevlI09asBBFRsCoJos=.3a66ac8f-6314-4a7b-b4a1-9e7c52a5cfa5@github.com> <2YyVMpGE3v2Sl2gqVs0OxIC6Dv19HwGVeJfNY_i80eY=.8df91914-d37b-4d74-9676-1e60c5a79a06@github.com> Message-ID: <3TyG0hn5dAKgn_kP6q3m7PDJmBewnGQZOSHqm5hEAQQ=.378f032c-3b65-404d-b4fd-67ff11fa4405@github.com> On Sat, 29 Jun 2024 02:48:58 GMT, Amit Kumar wrote: >> I think the 2nd value is taken at default. > > amitkumar at Amits-MacBook-Pro ~ % cat tt.cpp > #include > using namespace std; > > void fun(int a, char b = 'a', float c = 10.3); > > int main(void) { > fun(10); > fun(10, 103.483); > return 0; > } > > void fun(int a, char b, float c) { > } > > amitkumar at Amits-MacBook-Pro ~ % g++ tt.cpp > tt.cpp:8:11: warning: implicit conversion from 'double' to 'char' changes value from 103.483 to 103 [-Wliteral-conversion] > fun(10, 103.483); > ~~~ ^~~~~~~ > 1 warning generated. > > > That's weird!!! Maybe this needs to be enforced during build then ? @offamitkumar ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19886#discussion_r1659635447 From sroy at openjdk.org Sat Jun 29 06:47:52 2024 From: sroy at openjdk.org (Suchismith Roy) Date: Sat, 29 Jun 2024 06:47:52 GMT Subject: RFR: JDK-8331732 : [PPC64] Unify and optimize code which converts != 0 to 1 [v18] In-Reply-To: References: Message-ID: > [JDK-8331732](https://bugs.openjdk.org/browse/JDK-8331732) > The template interpreter contains branch-free conversion code for T_BOOLEAN (TemplateInterpreterGenerator::generate_result_handler_for). > > SharedRuntime::generate_native_wrapper uses unoptimized code to "Unpack the native result" for T_BOOLEAN. > Power10 has the "setbc" / "setbcr" instruction. > > A new function has been created for the conversion and use "setbcr" on Power10 (determined by VM_Version::has_brw()) and otherwise the branch-free implementation. We should have a function for 32 and one for 64 bit operations (or one with supports both). > > The new code for MacroAssembler::verify_secondary_supers_table also uses the new function. Suchismith Roy has updated the pull request incrementally with one additional commit since the last revision: default value correction ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19886/files - new: https://git.openjdk.org/jdk/pull/19886/files/f5af70c0..0de46f43 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19886&range=17 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19886&range=16-17 Stats: 4 lines in 2 files changed: 1 ins; 1 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/19886.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19886/head:pull/19886 PR: https://git.openjdk.org/jdk/pull/19886 From aph-open at littlepinkcloud.com Sat Jun 29 09:14:54 2024 From: aph-open at littlepinkcloud.com (Andrew Haley) Date: Sat, 29 Jun 2024 10:14:54 +0100 Subject: os::current_stack_pointer seems a little strange In-Reply-To: References: Message-ID: <2fc624c9-082f-4f0f-a57f-af54cf70e9a3@littlepinkcloud.com> On 6/29/24 07:40, Julian Waters wrote: > While looking through HotSpot I recently came across > os::current_stack_pointer, and the implementations seem a little odd. > On Windows x86, ARM64 and macOS Zero, it essentially returns the > address of the first local defined in itself, which should correspond > to the frame pointer of os::current_stack_pointer, which in turn > should equal the stack pointer of the caller, which is what we want to > get. Windows x64 instead delegates this task to runtime created > assembly from StubRoutines, Linux x86, PowerPC, ARM64, RISC-V, Zero as > well as AIX PowerPC all use __builtin_frame_address(0), Linux ARM32 > loads the stack pointer using register address sp __asm__ ("sp"); > which essentially loads the stack pointer into the sp local, and Linux > s390x and macOS x64 and ARM64 all use some form of handwritten > assembly to load the current stack pointer > > (Someone please correct me if I'm wrong about what > os::current_stack_pointer is trying to get, as in, the current stack > pointer) This: // Returns an estimate of the current stack pointer. Result must be guaranteed // to point into the calling threads stack, and be no lower than the current // stack pointer. We don't attempt to trace the C++ stack precisely, whatever that would mean in practice. Of course a C++ implementation is entitled to move the stack pointer whenever it chooses, which includes immediately before and after any call to os::current_stack_pointer(). From that point of view, the idea of trying to catch a precise value of the current stack pointer makes sense, sort of, but it's only valid for that instant. Trying to catch such a moving target is like trying to catch smoke. The right question for you to think about is "What is the result of current_stack_pointer used for?" Then consider if those uses are correct. And if you find any that aren't guaranteed to be correct, then consider that to some extent we depend on "reasonable" behaviour of C++ implementations, for some definition of reasonable... -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From amitkumar at openjdk.org Sat Jun 29 09:18:19 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Sat, 29 Jun 2024 09:18:19 GMT Subject: RFR: JDK-8331732 : [PPC64] Unify and optimize code which converts != 0 to 1 [v17] In-Reply-To: <3TyG0hn5dAKgn_kP6q3m7PDJmBewnGQZOSHqm5hEAQQ=.378f032c-3b65-404d-b4fd-67ff11fa4405@github.com> References: <16ZWE0XUOtqmHZ0A_mHQpWtILevlI09asBBFRsCoJos=.3a66ac8f-6314-4a7b-b4a1-9e7c52a5cfa5@github.com> <2YyVMpGE3v2Sl2gqVs0OxIC6Dv19HwGVeJfNY_i80eY=.8df91914-d37b-4d74-9676-1e60c5a79a06@github.com> <3TyG0hn5dAKgn_kP6q3m7PDJmBewnGQZOSHqm5hEAQQ=.378f032c-3b65-404d-b4fd-67ff11fa4405@github.com> Message-ID: On Sat, 29 Jun 2024 06:44:38 GMT, Suchismith Roy wrote: > Maybe this needs to be enforced during build then ? In this type of conversion we get a warning, And hotspot code treats all of the warnings as error. So this should break the build. But what I suspect is that as registers are being treated as integers, we can do this `Register temp = 1`. This is valid syntax. So with `normalise_bool(result, true)` you set `Register dst = result`, `Register temp=1(true)` and `is_64bit` was considered `default`. Seems like there is nothing wrong syntax-wise. Hence no warnings were generated. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19886#discussion_r1659676446 From mdoerr at openjdk.org Sat Jun 29 09:35:20 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Sat, 29 Jun 2024 09:35:20 GMT Subject: RFR: JDK-8331732 : [PPC64] Unify and optimize code which converts != 0 to 1 [v18] In-Reply-To: References: Message-ID: <3SiJyZU3rLqz_eVqRNdKEP0RclV9QBttlB9Z--W0AuA=.c1d13f79-b5ea-4acd-9359-4cac9ab61515@github.com> On Sat, 29 Jun 2024 06:47:52 GMT, Suchismith Roy wrote: >> [JDK-8331732](https://bugs.openjdk.org/browse/JDK-8331732) >> The template interpreter contains branch-free conversion code for T_BOOLEAN (TemplateInterpreterGenerator::generate_result_handler_for). >> >> SharedRuntime::generate_native_wrapper uses unoptimized code to "Unpack the native result" for T_BOOLEAN. >> Power10 has the "setbc" / "setbcr" instruction. >> >> A new function has been created for the conversion and use "setbcr" on Power10 (determined by VM_Version::has_brw()) and otherwise the branch-free implementation. We should have a function for 32 and one for 64 bit operations (or one with supports both). >> >> The new code for MacroAssembler::verify_secondary_supers_table also uses the new function. > > Suchismith Roy has updated the pull request incrementally with one additional commit since the last revision: > > default value correction I don't like a default for `is_64bit`. That increases the risk that somebody uses it incorrectly. It's better to have the user specify it. So, I like the version without default arguments more. Using R0 internally without passing it as argument would be ok with me. But having the temp argument is fine, too. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19886#issuecomment-2198066844 From mdoerr at openjdk.org Sat Jun 29 10:07:20 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Sat, 29 Jun 2024 10:07:20 GMT Subject: RFR: 8331126: [s390x] secondary_super_cache does not scale well [v6] In-Reply-To: References: Message-ID: On Thu, 27 Jun 2024 20:22:46 GMT, Martin Doerr wrote: >> But if we clobber this then verification will fail. Because in this method we are dependent on value present in `r_result`. It's just that I'm making sure that whatever value is there it's either `1` or `0`. > > There are instructions which don't kill the src operand. E.g. ngrk or slrg. Ok. I assume that CPUs which don't support `z_srlk` are no longer supported. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19544#discussion_r1659737266 From mdoerr at openjdk.org Sat Jun 29 10:12:19 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Sat, 29 Jun 2024 10:12:19 GMT Subject: RFR: 8331126: [s390x] secondary_super_cache does not scale well [v11] In-Reply-To: References: Message-ID: <5BijWitaUNWei0eVTLOetgseJ9qCb4QVEQ8k6SVzZds=.9f3a2399-ef61-44e1-9019-7e9cc5f9779c@github.com> On Fri, 28 Jun 2024 15:08:54 GMT, Amit Kumar wrote: >> s390x Port for [JDK-8180450](https://bugs.openjdk.org/browse/JDK-8180450) >> >> I ran `tier1` test with `-XX:+UseSecondarySupersTable -XX:+VerifySecondarySupers -XX:+StressSecondarySupers` in fastdebug vm and didn't see any new failure appearing; Except one I have mentioned [here](https://github.com/openjdk/jdk/pull/19368#issuecomment-2154983693); But this is reproducible on every other architecture with these flags. >> >> >> Without Patch: >> >> SecondarySuperCacheHits.test avgt 15 0.929 ? 0.010 ns/op >> >> SecondarySuperCacheInterContention.test avgt 15 1.413 ? 0.007 ns/op >> SecondarySuperCacheInterContention.test:t1 avgt 15 1.415 ? 0.016 ns/op >> SecondarySuperCacheInterContention.test:t2 avgt 15 1.410 ? 0.017 ns/op >> >> Benchmark Mode Cnt Score Error Units >> SecondarySupersLookup.testNegative00 avgt 15 1.806 ? 0.325 ns/op >> SecondarySupersLookup.testNegative01 avgt 15 2.364 ? 0.236 ns/op >> SecondarySupersLookup.testNegative02 avgt 15 2.903 ? 0.215 ns/op >> SecondarySupersLookup.testNegative03 avgt 15 3.417 ? 0.199 ns/op >> SecondarySupersLookup.testNegative04 avgt 15 3.758 ? 0.102 ns/op >> SecondarySupersLookup.testNegative05 avgt 15 4.352 ? 0.123 ns/op >> SecondarySupersLookup.testNegative06 avgt 15 4.800 ? 0.099 ns/op >> SecondarySupersLookup.testNegative07 avgt 15 5.365 ? 0.060 ns/op >> SecondarySupersLookup.testNegative08 avgt 15 6.316 ? 0.092 ns/op >> SecondarySupersLookup.testNegative09 avgt 15 6.669 ? 0.164 ns/op >> SecondarySupersLookup.testNegative10 avgt 15 7.041 ? 0.164 ns/op >> SecondarySupersLookup.testNegative16 avgt 15 9.336 ? 0.185 ns/op >> SecondarySupersLookup.testNegative20 avgt 15 11.373 ? 0.029 ns/op >> SecondarySupersLookup.testNegative30 avgt 15 15.236 ? 0.051 ns/op >> SecondarySupersLookup.testNegative32 avgt 15 16.031 ? 0.091 ns/op >> SecondarySupersLookup.testNegative40 avgt 15 19.197 ? 0.279 ns/op >> SecondarySupersLookup.testNegative50 avgt 15 23.804 ? 2.387 ns/op >> SecondarySupersLookup.testNegative55 avgt 15 25.610 ? 1.155 ns/op >> SecondarySupersLookup.testNegative56 avgt 15 26.128 ? 2.203 ns/op >> SecondarySupersLookup.testNegative57 avgt 15 26.126 ? 0.881 ns/op >> SecondarySupersLookup.testNegative58 avgt 15 26.314 ? 0.521 ns/op >> SecondarySupersLookup.testNegative59 avgt 15 26.750 ? 0.837 ns/op >> SecondarySupersLookup.testNegative60 avgt 15 27.118 ? 0.557 ... > > Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: > > extra whitespace Looks correct to me. I leave register usage and other minor things for the other reviewers. ------------- Marked as reviewed by mdoerr (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19544#pullrequestreview-2149514645 From amitkumar at openjdk.org Sat Jun 29 12:29:20 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Sat, 29 Jun 2024 12:29:20 GMT Subject: RFR: 8331126: [s390x] secondary_super_cache does not scale well [v4] In-Reply-To: <_cOM0Ka4W1Lf9aDovwWevwf21OWhm9lg1p3GV8wmqro=.db3001ad-6a46-4c37-9203-9fca2762369a@github.com> References: <_cOM0Ka4W1Lf9aDovwWevwf21OWhm9lg1p3GV8wmqro=.db3001ad-6a46-4c37-9203-9fca2762369a@github.com> Message-ID: On Wed, 26 Jun 2024 19:36:46 GMT, Martin Doerr wrote: >> src/hotspot/cpu/s390/macroAssembler_s390.cpp line 3275: >> >>> 3273: call_stub(StubRoutines::lookup_secondary_supers_table_slow_path_stub()); >>> 3274: >>> 3275: z_bru(L_done); // pass whatever result we got from a slow path >> >> This one branch could be saved by using "load immediate on condition". But it's after slow path processing. > > Right, looks like we only reach here with "false" condition or after return from the stub which should have set the condition code accordingly, too (should be checked / enforced!). I have opened [JDK-8335367](https://bugs.openjdk.org/browse/JDK-8335367) which will take care of this. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19544#discussion_r1659805508 From amitkumar at openjdk.org Sat Jun 29 12:34:18 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Sat, 29 Jun 2024 12:34:18 GMT Subject: RFR: 8331126: [s390x] secondary_super_cache does not scale well [v6] In-Reply-To: References: Message-ID: <6hBdW-lyHeMyNB9pDNw2EH49uOQJwdJnzqKuelGmR-U=.4b3be0d6-e54d-4183-91ff-a43d3acda010@github.com> On Sat, 29 Jun 2024 10:04:49 GMT, Martin Doerr wrote: >> There are instructions which don't kill the src operand. E.g. ngrk or slrg. > > Ok. I assume that CPUs which don't support `z_srlk` are no longer supported. Yes those are not supported. To answer your question, `Distinct-operands facility` needs to be installed for `z_srlk`, which is there from `z11` onwards. We are planning to drop the support for `z10 and below versions` for sure, and will remove the checks. I'll file a RFE soon, after discussion with management. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19544#discussion_r1659806376 From mdoerr at openjdk.org Sat Jun 29 13:08:25 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Sat, 29 Jun 2024 13:08:25 GMT Subject: RFR: JDK-8331732 : [PPC64] Unify and optimize code which converts != 0 to 1 [v18] In-Reply-To: References: Message-ID: On Sat, 29 Jun 2024 06:47:52 GMT, Suchismith Roy wrote: >> [JDK-8331732](https://bugs.openjdk.org/browse/JDK-8331732) >> The template interpreter contains branch-free conversion code for T_BOOLEAN (TemplateInterpreterGenerator::generate_result_handler_for). >> >> SharedRuntime::generate_native_wrapper uses unoptimized code to "Unpack the native result" for T_BOOLEAN. >> Power10 has the "setbc" / "setbcr" instruction. >> >> A new function has been created for the conversion and use "setbcr" on Power10 (determined by VM_Version::has_brw()) and otherwise the branch-free implementation. We should have a function for 32 and one for 64 bit operations (or one with supports both). >> >> The new code for MacroAssembler::verify_secondary_supers_table also uses the new function. > > Suchismith Roy has updated the pull request incrementally with one additional commit since the last revision: > > default value correction I don't require more changes. I've run a couple of tests on Power9 and Power10. Also with -XX:+VerifySecondarySupers to test the new code. Results are good. ------------- Marked as reviewed by mdoerr (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19886#pullrequestreview-2149603162 From mdoerr at openjdk.org Sat Jun 29 13:13:19 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Sat, 29 Jun 2024 13:13:19 GMT Subject: RFR: 8331126: [s390x] secondary_super_cache does not scale well [v6] In-Reply-To: <6hBdW-lyHeMyNB9pDNw2EH49uOQJwdJnzqKuelGmR-U=.4b3be0d6-e54d-4183-91ff-a43d3acda010@github.com> References: <6hBdW-lyHeMyNB9pDNw2EH49uOQJwdJnzqKuelGmR-U=.4b3be0d6-e54d-4183-91ff-a43d3acda010@github.com> Message-ID: <4uWxbHGkeSB2N4YJU7Dj03MyJ8fxUaeJdUNCdFRQ57E=.49909e75-dcca-48bb-8e10-c0d89440d8dd@github.com> On Sat, 29 Jun 2024 12:31:06 GMT, Amit Kumar wrote: >> Ok. I assume that CPUs which don't support `z_srlk` are no longer supported. > > Yes those are not supported. > > To answer your question, `Distinct-operands facility` needs to be installed for `z_srlk`, which is there from `z11` onwards. We are planning to drop the support for `z10 and below versions` for sure, and will remove the checks. I'll file a RFE soon, after discussion with management. You can do something like https://github.com/openjdk/jdk/pull/19368/commits/5633ff25982f24d916968001d7a66feeda0a1c7f before that is resolved. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19544#discussion_r1659819400 From amitkumar at openjdk.org Sat Jun 29 14:05:49 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Sat, 29 Jun 2024 14:05:49 GMT Subject: RFR: 8331126: [s390x] secondary_super_cache does not scale well [v12] In-Reply-To: References: Message-ID: <3Sg1rDtV4zRA5Bq4wa205dRenuVjHzsDbZCI0IXV4Zw=.aaf16393-b9b1-4f8f-95d7-f767ac82bd8a@github.com> > s390x Port for [JDK-8180450](https://bugs.openjdk.org/browse/JDK-8180450) > > I ran `tier1` test with `-XX:+UseSecondarySupersTable -XX:+VerifySecondarySupers -XX:+StressSecondarySupers` in fastdebug vm and didn't see any new failure appearing; Except one I have mentioned [here](https://github.com/openjdk/jdk/pull/19368#issuecomment-2154983693); But this is reproducible on every other architecture with these flags. > > > Without Patch: > > SecondarySuperCacheHits.test avgt 15 0.929 ? 0.010 ns/op > > SecondarySuperCacheInterContention.test avgt 15 1.413 ? 0.007 ns/op > SecondarySuperCacheInterContention.test:t1 avgt 15 1.415 ? 0.016 ns/op > SecondarySuperCacheInterContention.test:t2 avgt 15 1.410 ? 0.017 ns/op > > Benchmark Mode Cnt Score Error Units > SecondarySupersLookup.testNegative00 avgt 15 1.806 ? 0.325 ns/op > SecondarySupersLookup.testNegative01 avgt 15 2.364 ? 0.236 ns/op > SecondarySupersLookup.testNegative02 avgt 15 2.903 ? 0.215 ns/op > SecondarySupersLookup.testNegative03 avgt 15 3.417 ? 0.199 ns/op > SecondarySupersLookup.testNegative04 avgt 15 3.758 ? 0.102 ns/op > SecondarySupersLookup.testNegative05 avgt 15 4.352 ? 0.123 ns/op > SecondarySupersLookup.testNegative06 avgt 15 4.800 ? 0.099 ns/op > SecondarySupersLookup.testNegative07 avgt 15 5.365 ? 0.060 ns/op > SecondarySupersLookup.testNegative08 avgt 15 6.316 ? 0.092 ns/op > SecondarySupersLookup.testNegative09 avgt 15 6.669 ? 0.164 ns/op > SecondarySupersLookup.testNegative10 avgt 15 7.041 ? 0.164 ns/op > SecondarySupersLookup.testNegative16 avgt 15 9.336 ? 0.185 ns/op > SecondarySupersLookup.testNegative20 avgt 15 11.373 ? 0.029 ns/op > SecondarySupersLookup.testNegative30 avgt 15 15.236 ? 0.051 ns/op > SecondarySupersLookup.testNegative32 avgt 15 16.031 ? 0.091 ns/op > SecondarySupersLookup.testNegative40 avgt 15 19.197 ? 0.279 ns/op > SecondarySupersLookup.testNegative50 avgt 15 23.804 ? 2.387 ns/op > SecondarySupersLookup.testNegative55 avgt 15 25.610 ? 1.155 ns/op > SecondarySupersLookup.testNegative56 avgt 15 26.128 ? 2.203 ns/op > SecondarySupersLookup.testNegative57 avgt 15 26.126 ? 0.881 ns/op > SecondarySupersLookup.testNegative58 avgt 15 26.314 ? 0.521 ns/op > SecondarySupersLookup.testNegative59 avgt 15 26.750 ? 0.837 ns/op > SecondarySupersLookup.testNegative60 avgt 15 27.118 ? 0.557 ns/op > SecondarySupersLookup.testNegative61 avgt 15 27.763 ? 1.628 ns... Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: exclude z10 & older hardware ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19544/files - new: https://git.openjdk.org/jdk/pull/19544/files/8c7f5509..8ab5a40a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19544&range=11 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19544&range=10-11 Stats: 9 lines in 1 file changed: 7 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/19544.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19544/head:pull/19544 PR: https://git.openjdk.org/jdk/pull/19544 From amitkumar at openjdk.org Sat Jun 29 14:05:49 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Sat, 29 Jun 2024 14:05:49 GMT Subject: RFR: 8331126: [s390x] secondary_super_cache does not scale well [v6] In-Reply-To: <4uWxbHGkeSB2N4YJU7Dj03MyJ8fxUaeJdUNCdFRQ57E=.49909e75-dcca-48bb-8e10-c0d89440d8dd@github.com> References: <6hBdW-lyHeMyNB9pDNw2EH49uOQJwdJnzqKuelGmR-U=.4b3be0d6-e54d-4183-91ff-a43d3acda010@github.com> <4uWxbHGkeSB2N4YJU7Dj03MyJ8fxUaeJdUNCdFRQ57E=.49909e75-dcca-48bb-8e10-c0d89440d8dd@github.com> Message-ID: On Sat, 29 Jun 2024 13:10:26 GMT, Martin Doerr wrote: >> Yes those are not supported. >> >> To answer your question, `Distinct-operands facility` needs to be installed for `z_srlk`, which is there from `z11` onwards. We are planning to drop the support for `z10 and below versions` for sure, and will remove the checks. I'll file a RFE soon, after discussion with management. > > You can do something like https://github.com/openjdk/jdk/pull/19368/commits/5633ff25982f24d916968001d7a66feeda0a1c7f before that is resolved. Done, please see the latest commit. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19544#discussion_r1659833478 From amitkumar at openjdk.org Sat Jun 29 16:13:20 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Sat, 29 Jun 2024 16:13:20 GMT Subject: RFR: JDK-8331732 : [PPC64] Unify and optimize code which converts != 0 to 1 [v18] In-Reply-To: References: Message-ID: On Sat, 29 Jun 2024 06:47:52 GMT, Suchismith Roy wrote: >> [JDK-8331732](https://bugs.openjdk.org/browse/JDK-8331732) >> The template interpreter contains branch-free conversion code for T_BOOLEAN (TemplateInterpreterGenerator::generate_result_handler_for). >> >> SharedRuntime::generate_native_wrapper uses unoptimized code to "Unpack the native result" for T_BOOLEAN. >> Power10 has the "setbc" / "setbcr" instruction. >> >> A new function has been created for the conversion and use "setbcr" on Power10 (determined by VM_Version::has_brw()) and otherwise the branch-free implementation. We should have a function for 32 and one for 64 bit operations (or one with supports both). >> >> The new code for MacroAssembler::verify_secondary_supers_table also uses the new function. > > Suchismith Roy has updated the pull request incrementally with one additional commit since the last revision: > > default value correction LGTM. I did testing on Power8 machine only, No regression seen. ------------- Marked as reviewed by amitkumar (Committer). PR Review: https://git.openjdk.org/jdk/pull/19886#pullrequestreview-2149694360 From xpeng at openjdk.org Sat Jun 29 20:03:26 2024 From: xpeng at openjdk.org (Xiaolong Peng) Date: Sat, 29 Jun 2024 20:03:26 GMT Subject: RFR: 8334220: Optimize Klass layout after JDK-8180450 Message-ID: Hi all, This PR is created to optimize the layout of Klass in hotspot, after JDK-8180450 the layout of Klsss seems broken, there are 3 holes, they are caused by alignment issue introduced by the 1 byte ```_hash_slot```. (gdb) ptype /ox Klass /* offset | size */ type = class Klass : public Metadata { public: static const uint KLASS_KIND_COUNT; protected: /* 0x000c | 0x0004 */ jint _layout_helper; /* 0x0010 | 0x0004 */ const enum Klass::KlassKind _kind; /* 0x0014 | 0x0004 */ jint _modifier_flags; /* 0x0018 | 0x0004 */ juint _super_check_offset; /* XXX 4-byte hole */ /* 0x0020 | 0x0008 */ class Symbol *_name; /* 0x0028 | 0x0008 */ class Klass *_secondary_super_cache; /* 0x0030 | 0x0008 */ class Array *_secondary_supers; /* 0x0038 | 0x0040 */ class Klass *_primary_supers[8]; /* 0x0078 | 0x0008 */ class OopHandle { private: /* 0x0078 | 0x0008 */ class oop *_obj; /* total size (bytes): 8 */ } _java_mirror; /* 0x0080 | 0x0008 */ class Klass *_super; /* 0x0088 | 0x0008 */ class Klass * volatile _subklass; /* 0x0090 | 0x0008 */ class Klass * volatile _next_sibling; /* 0x0098 | 0x0008 */ class Klass *_next_link; /* 0x00a0 | 0x0008 */ class ClassLoaderData *_class_loader_data; /* 0x00a8 | 0x0008 */ uintx _bitmap; /* 0x00b0 | 0x0001 */ uint8_t _hash_slot; /* XXX 3-byte hole */ /* 0x00b4 | 0x0004 */ int _vtable_len; /* 0x00b8 | 0x0004 */ class AccessFlags { private: /* 0x00b8 | 0x0004 */ jint _flags; /* total size (bytes): 4 */ } _access_flags; /* XXX 4-byte hole */ /* 0x00c0 | 0x0008 */ traceid _trace_id; private: /* 0x00c8 | 0x0002 */ s2 _shared_class_path_index; /* 0x00ca | 0x0002 */ u2 _shared_class_flags; /* 0x00cc | 0x0004 */ int _archived_mirror_index; public: static const int SECONDARY_SUPERS_TABLE_SIZE; static const int SECONDARY_SUPERS_TABLE_MASK; static const uintx SECONDARY_SUPERS_BITMAP_EMPTY; static const uintx SECONDARY_SUPERS_BITMAP_FULL; static const int _lh_neutral_value; static const int _lh_instance_slow_path_bit; static const int _lh_log2_element_size_shift; static const int _lh_log2_element_size_mask; static const int _lh_element_type_shift; static const int _lh_element_type_mask; static const int _lh_header_size_shift; static const int _lh_header_size_mask; static const int _lh_array_tag_bits; static const int _lh_array_tag_shift; static const int _lh_array_tag_obj_value; static const unsigned int _lh_array_tag_type_value; /* total size (bytes): 208 */ } As Aleksey suggested, moving _hash_slot to somewhere later could solve the alignments issue, I have tested it it works well, but causes 2 smaller holes in the private fields which could be solved by padding. Layout after moving _hash_slot w/o padding /* offset | size */ type = class Klass : public Metadata { public: static const uint KLASS_KIND_COUNT; protected: /* 0x0008 | 0x0004 */ jint _layout_helper; /* 0x000c | 0x0004 */ const enum Klass::KlassKind _kind; /* 0x0010 | 0x0004 */ jint _modifier_flags; /* 0x0014 | 0x0004 */ juint _super_check_offset; /* 0x0018 | 0x0008 */ class Symbol *_name; /* 0x0020 | 0x0008 */ class Klass *_secondary_super_cache; /* 0x0028 | 0x0008 */ class Array *_secondary_supers; /* 0x0030 | 0x0040 */ class Klass *_primary_supers[8]; /* 0x0070 | 0x0008 */ class OopHandle { private: /* 0x0070 | 0x0008 */ oop *_obj; /* total size (bytes): 8 */ } _java_mirror; /* 0x0078 | 0x0008 */ class Klass *_super; /* 0x0080 | 0x0008 */ class Klass * volatile _subklass; /* 0x0088 | 0x0008 */ class Klass * volatile _next_sibling; /* 0x0090 | 0x0008 */ class Klass *_next_link; /* 0x0098 | 0x0008 */ class ClassLoaderData *_class_loader_data; /* 0x00a0 | 0x0008 */ uintx _bitmap; /* 0x00a8 | 0x0004 */ int _vtable_len; /* 0x00ac | 0x0004 */ class AccessFlags { private: /* 0x00ac | 0x0004 */ jint _flags; /* total size (bytes): 4 */ } _access_flags; /* 0x00b0 | 0x0008 */ traceid _trace_id; /* 0x00b8 | 0x0001 */ uint8_t _hash_slot; private: /* XXX 1-byte hole */ /* 0x00ba | 0x0002 */ s2 _shared_class_path_index; /* 0x00bc | 0x0002 */ u2 _shared_class_flags; /* XXX 2-byte hole */ /* 0x00c0 | 0x0004 */ int _archived_mirror_index; public: static const int SECONDARY_SUPERS_TABLE_SIZE; static const int SECONDARY_SUPERS_TABLE_MASK; static const uintx SECONDARY_SUPERS_BITMAP_EMPTY; static const uintx SECONDARY_SUPERS_BITMAP_FULL; static const int _lh_neutral_value; static const int _lh_instance_slow_path_bit; static const int _lh_log2_element_size_shift; static const int _lh_log2_element_size_mask; static const int _lh_element_type_shift; static const int _lh_element_type_mask; static const int _lh_header_size_shift; static const int _lh_header_size_mask; static const int _lh_array_tag_bits; static const int _lh_array_tag_shift; static const int _lh_array_tag_obj_value; static const unsigned int _lh_array_tag_type_value; /* XXX 4-byte padding */ /* total size (bytes): 200 */ } ``` Layout after moving _hash_slot with padding: /* offset | size */ type = class Klass : public Metadata { public: static const uint KLASS_KIND_COUNT; protected: /* 0x0008 | 0x0004 */ jint _layout_helper; /* 0x000c | 0x0004 */ const enum Klass::KlassKind _kind; /* 0x0010 | 0x0004 */ jint _modifier_flags; /* 0x0014 | 0x0004 */ juint _super_check_offset; /* 0x0018 | 0x0008 */ class Symbol *_name; /* 0x0020 | 0x0008 */ class Klass *_secondary_super_cache; /* 0x0028 | 0x0008 */ class Array *_secondary_supers; /* 0x0030 | 0x0040 */ class Klass *_primary_supers[8]; /* 0x0070 | 0x0008 */ class OopHandle { private: /* 0x0070 | 0x0008 */ oop *_obj; /* total size (bytes): 8 */ } _java_mirror; /* 0x0078 | 0x0008 */ class Klass *_super; /* 0x0080 | 0x0008 */ class Klass * volatile _subklass; /* 0x0088 | 0x0008 */ class Klass * volatile _next_sibling; /* 0x0090 | 0x0008 */ class Klass *_next_link; /* 0x0098 | 0x0008 */ class ClassLoaderData *_class_loader_data; /* 0x00a0 | 0x0008 */ uintx _bitmap; /* 0x00a8 | 0x0004 */ int _vtable_len; /* 0x00ac | 0x0004 */ class AccessFlags { private: /* 0x00ac | 0x0004 */ jint _flags; /* total size (bytes): 4 */ } _access_flags; /* 0x00b0 | 0x0008 */ traceid _trace_id; /* 0x00b8 | 0x0001 */ uint8_t _hash_slot; /* 0x00b9 | 0x0003 */ char _pad_buf1[3]; private: /* 0x00bc | 0x0002 */ s2 _shared_class_path_index; /* 0x00be | 0x0002 */ u2 _shared_class_flags; /* 0x00c0 | 0x0004 */ int _archived_mirror_index; public: static const int SECONDARY_SUPERS_TABLE_SIZE; static const int SECONDARY_SUPERS_TABLE_MASK; static const uintx SECONDARY_SUPERS_BITMAP_EMPTY; static const uintx SECONDARY_SUPERS_BITMAP_FULL; static const int _lh_neutral_value; static const int _lh_instance_slow_path_bit; static const int _lh_log2_element_size_shift; static const int _lh_log2_element_size_mask; static const int _lh_element_type_shift; static const int _lh_element_type_mask; static const int _lh_header_size_shift; static const int _lh_header_size_mask; static const int _lh_array_tag_bits; static const int _lh_array_tag_shift; static const int _lh_array_tag_obj_value; static const unsigned int _lh_array_tag_type_value; /* XXX 4-byte padding */ /* total size (bytes): 200 */ } Best, Xiaolong. ------------- Commit messages: - Merge branch 'openjdk:master' into klass-layout - 8334220: Optimize Klass layout after JDK-8180450 Changes: https://git.openjdk.org/jdk/pull/19958/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19958&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8334220 Stats: 4 lines in 1 file changed: 3 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/19958.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19958/head:pull/19958 PR: https://git.openjdk.org/jdk/pull/19958 From kbarrett at openjdk.org Sun Jun 30 03:20:17 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Sun, 30 Jun 2024 03:20:17 GMT Subject: RFR: 8335283: Build failure due to 'no_sanitize' attribute directive ignored In-Reply-To: References: Message-ID: On Sat, 29 Jun 2024 05:57:49 GMT, Julian Waters wrote: > Wouldn't this result in an undefined macro if UBSAN is off? At least, that's what I suspect might happen sometimes This pre-existing code already deals with ensuring ATTRIBUTE_NO_UBSAN is defined: https://github.com/openjdk/jdk/blob/d9bcf061450ebfb7fe02b5a50c855db1d9178e5d/src/hotspot/share/sanitizers/ub.hpp#L39-L41 ------------- PR Comment: https://git.openjdk.org/jdk/pull/19937#issuecomment-2198418304 From jwaters at openjdk.org Sun Jun 30 03:28:25 2024 From: jwaters at openjdk.org (Julian Waters) Date: Sun, 30 Jun 2024 03:28:25 GMT Subject: RFR: 8335283: Build failure due to 'no_sanitize' attribute directive ignored In-Reply-To: References: Message-ID: On Fri, 28 Jun 2024 11:04:16 GMT, Matthias Baesken wrote: > The following build error has been reported with old gcc used > installers/linux/universal/tar/corretto-build/buildRoot/src/hotspot/share/utilities/vmError.cpp:2068:44: error: 'no_sanitize' attribute directive ignored [-Werror=attributes] > static void ALWAYSINLINE crash_with_sigfpe() { > > We can avoid it by not settings the mentioned attribute in case ubsan is not enabled. Marked as reviewed by jwaters (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/19937#pullrequestreview-2149941762 From jwaters at openjdk.org Sun Jun 30 03:28:25 2024 From: jwaters at openjdk.org (Julian Waters) Date: Sun, 30 Jun 2024 03:28:25 GMT Subject: RFR: 8335283: Build failure due to 'no_sanitize' attribute directive ignored In-Reply-To: References: Message-ID: On Sun, 30 Jun 2024 03:17:37 GMT, Kim Barrett wrote: > > Wouldn't this result in an undefined macro if UBSAN is off? At least, that's what I suspect might happen sometimes > > This pre-existing code already deals with ensuring ATTRIBUTE_NO_UBSAN is defined: > > https://github.com/openjdk/jdk/blob/d9bcf061450ebfb7fe02b5a50c855db1d9178e5d/src/hotspot/share/sanitizers/ub.hpp#L39-L41 Ah, no idea how I missed that. It's alright then ------------- PR Comment: https://git.openjdk.org/jdk/pull/19937#issuecomment-2198419339 From iklam at openjdk.org Sun Jun 30 05:51:22 2024 From: iklam at openjdk.org (Ioi Lam) Date: Sun, 30 Jun 2024 05:51:22 GMT Subject: RFR: 8261242: [Linux] OSContainer::is_containerized() returns true when run outside a container [v8] In-Reply-To: References: Message-ID: On Fri, 28 Jun 2024 15:41:48 GMT, Severin Gehwolf wrote: >> Please review this enhancement to the container detection code which allows it to figure out whether the JVM is actually running inside a container (`podman`, `docker`, `crio`), or with some other means that enforces memory/cpu limits by means of the cgroup filesystem. If neither of those conditions hold, the JVM runs in not containerized mode, addressing the issue described in the JBS tracker. For example, on my Linux system `is_containerized() == false" is being indicated with the following trace log line: >> >> >> [0.001s][debug][os,container] OSContainer::init: is_containerized() = false because no cpu or memory limit is present >> >> >> This state is being exposed by the Java `Metrics` API class using the new (still JDK internal) `isContainerized()` method. Example: >> >> >> java -XshowSettings:system --version >> Operating System Metrics: >> Provider: cgroupv1 >> System not containerized. >> openjdk 23-internal 2024-09-17 >> OpenJDK Runtime Environment (fastdebug build 23-internal-adhoc.sgehwolf.jdk-jdk) >> OpenJDK 64-Bit Server VM (fastdebug build 23-internal-adhoc.sgehwolf.jdk-jdk, mixed mode, sharing) >> >> >> The basic property this is being built on is the observation that the cgroup controllers typically get mounted read only into containers. Note that the current container tests assert that `OSContainer::is_containerized() == true` in various tests. Therefore, using the heuristic of "is any memory or cpu limit present" isn't sufficient. I had considered that in an earlier iteration, but many container tests failed. >> >> Overall, I think, with this patch we improve the current situation of claiming a containerized system being present when it's actually just a regular Linux system. >> >> Testing: >> >> - [x] GHA (risc-v failure seems infra related) >> - [x] Container tests on Linux x86_64 of cgroups v1 and cgroups v2 (including gtests) >> - [x] Some manual testing using cri-o >> >> Thoughts? > > Severin Gehwolf has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 18 commits: > > - Merge branch 'master' into jdk-8261242-is-containerized-fix > - Refactor mount info matching to helper function > - Merge branch 'master' into jdk-8261242-is-containerized-fix > - Remove problem listing of PlainRead which is reworked here > - Merge branch 'master' into jdk-8261242-is-containerized-fix > - Merge branch 'master' into jdk-8261242-is-containerized-fix > - Add doc for mountinfo scanning. > - Unify naming of variables > - Merge branch 'master' into jdk-8261242-is-containerized-fix > - Merge branch 'master' into jdk-8261242-is-containerized-fix > - ... and 8 more: https://git.openjdk.org/jdk/compare/486aa11e...1017da35 Looks reasonable to me ------------- Marked as reviewed by iklam (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18201#pullrequestreview-2149956104 From aph at openjdk.org Sun Jun 30 09:39:22 2024 From: aph at openjdk.org (Andrew Haley) Date: Sun, 30 Jun 2024 09:39:22 GMT Subject: RFR: 8331126: [s390x] secondary_super_cache does not scale well [v6] In-Reply-To: References: Message-ID: On Tue, 25 Jun 2024 14:19:44 GMT, Amit Kumar wrote: >> s390x Port for [JDK-8180450](https://bugs.openjdk.org/browse/JDK-8180450) >> >> I ran `tier1` test with `-XX:+UseSecondarySupersTable -XX:+VerifySecondarySupers -XX:+StressSecondarySupers` in fastdebug vm and didn't see any new failure appearing; Except one I have mentioned [here](https://github.com/openjdk/jdk/pull/19368#issuecomment-2154983693); But this is reproducible on every other architecture with these flags. >> >> >> Without Patch: >> >> SecondarySuperCacheHits.test avgt 15 0.929 ? 0.010 ns/op >> >> SecondarySuperCacheInterContention.test avgt 15 1.413 ? 0.007 ns/op >> SecondarySuperCacheInterContention.test:t1 avgt 15 1.415 ? 0.016 ns/op >> SecondarySuperCacheInterContention.test:t2 avgt 15 1.410 ? 0.017 ns/op >> >> Benchmark Mode Cnt Score Error Units >> SecondarySupersLookup.testNegative00 avgt 15 1.806 ? 0.325 ns/op >> SecondarySupersLookup.testNegative01 avgt 15 2.364 ? 0.236 ns/op >> SecondarySupersLookup.testNegative02 avgt 15 2.903 ? 0.215 ns/op >> SecondarySupersLookup.testNegative03 avgt 15 3.417 ? 0.199 ns/op >> SecondarySupersLookup.testNegative04 avgt 15 3.758 ? 0.102 ns/op >> SecondarySupersLookup.testNegative05 avgt 15 4.352 ? 0.123 ns/op >> SecondarySupersLookup.testNegative06 avgt 15 4.800 ? 0.099 ns/op >> SecondarySupersLookup.testNegative07 avgt 15 5.365 ? 0.060 ns/op >> SecondarySupersLookup.testNegative08 avgt 15 6.316 ? 0.092 ns/op >> SecondarySupersLookup.testNegative09 avgt 15 6.669 ? 0.164 ns/op >> SecondarySupersLookup.testNegative10 avgt 15 7.041 ? 0.164 ns/op >> SecondarySupersLookup.testNegative16 avgt 15 9.336 ? 0.185 ns/op >> SecondarySupersLookup.testNegative20 avgt 15 11.373 ? 0.029 ns/op >> SecondarySupersLookup.testNegative30 avgt 15 15.236 ? 0.051 ns/op >> SecondarySupersLookup.testNegative32 avgt 15 16.031 ? 0.091 ns/op >> SecondarySupersLookup.testNegative40 avgt 15 19.197 ? 0.279 ns/op >> SecondarySupersLookup.testNegative50 avgt 15 23.804 ? 2.387 ns/op >> SecondarySupersLookup.testNegative55 avgt 15 25.610 ? 1.155 ns/op >> SecondarySupersLookup.testNegative56 avgt 15 26.128 ? 2.203 ns/op >> SecondarySupersLookup.testNegative57 avgt 15 26.126 ? 0.881 ns/op >> SecondarySupersLookup.testNegative58 avgt 15 26.314 ? 0.521 ns/op >> SecondarySupersLookup.testNegative59 avgt 15 26.750 ? 0.837 ns/op >> SecondarySupersLookup.testNegative60 avgt 15 27.118 ? 0.557 ... > > Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: > > add2reg -> z_la src/hotspot/cpu/s390/macroAssembler_s390.cpp line 3414: > 3412: #ifdef ASSERT > 3413: { > 3414: // r_result should have either 0 or 1 value What is this assert block for? `result` is either zero or nonzero, and there's no need for anything stronger. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19544#discussion_r1660113109 From aph at openjdk.org Sun Jun 30 09:45:20 2024 From: aph at openjdk.org (Andrew Haley) Date: Sun, 30 Jun 2024 09:45:20 GMT Subject: RFR: 8331126: [s390x] secondary_super_cache does not scale well [v6] In-Reply-To: References: Message-ID: On Tue, 25 Jun 2024 14:19:44 GMT, Amit Kumar wrote: >> s390x Port for [JDK-8180450](https://bugs.openjdk.org/browse/JDK-8180450) >> >> I ran `tier1` test with `-XX:+UseSecondarySupersTable -XX:+VerifySecondarySupers -XX:+StressSecondarySupers` in fastdebug vm and didn't see any new failure appearing; Except one I have mentioned [here](https://github.com/openjdk/jdk/pull/19368#issuecomment-2154983693); But this is reproducible on every other architecture with these flags. >> >> >> Without Patch: >> >> SecondarySuperCacheHits.test avgt 15 0.929 ? 0.010 ns/op >> >> SecondarySuperCacheInterContention.test avgt 15 1.413 ? 0.007 ns/op >> SecondarySuperCacheInterContention.test:t1 avgt 15 1.415 ? 0.016 ns/op >> SecondarySuperCacheInterContention.test:t2 avgt 15 1.410 ? 0.017 ns/op >> >> Benchmark Mode Cnt Score Error Units >> SecondarySupersLookup.testNegative00 avgt 15 1.806 ? 0.325 ns/op >> SecondarySupersLookup.testNegative01 avgt 15 2.364 ? 0.236 ns/op >> SecondarySupersLookup.testNegative02 avgt 15 2.903 ? 0.215 ns/op >> SecondarySupersLookup.testNegative03 avgt 15 3.417 ? 0.199 ns/op >> SecondarySupersLookup.testNegative04 avgt 15 3.758 ? 0.102 ns/op >> SecondarySupersLookup.testNegative05 avgt 15 4.352 ? 0.123 ns/op >> SecondarySupersLookup.testNegative06 avgt 15 4.800 ? 0.099 ns/op >> SecondarySupersLookup.testNegative07 avgt 15 5.365 ? 0.060 ns/op >> SecondarySupersLookup.testNegative08 avgt 15 6.316 ? 0.092 ns/op >> SecondarySupersLookup.testNegative09 avgt 15 6.669 ? 0.164 ns/op >> SecondarySupersLookup.testNegative10 avgt 15 7.041 ? 0.164 ns/op >> SecondarySupersLookup.testNegative16 avgt 15 9.336 ? 0.185 ns/op >> SecondarySupersLookup.testNegative20 avgt 15 11.373 ? 0.029 ns/op >> SecondarySupersLookup.testNegative30 avgt 15 15.236 ? 0.051 ns/op >> SecondarySupersLookup.testNegative32 avgt 15 16.031 ? 0.091 ns/op >> SecondarySupersLookup.testNegative40 avgt 15 19.197 ? 0.279 ns/op >> SecondarySupersLookup.testNegative50 avgt 15 23.804 ? 2.387 ns/op >> SecondarySupersLookup.testNegative55 avgt 15 25.610 ? 1.155 ns/op >> SecondarySupersLookup.testNegative56 avgt 15 26.128 ? 2.203 ns/op >> SecondarySupersLookup.testNegative57 avgt 15 26.126 ? 0.881 ns/op >> SecondarySupersLookup.testNegative58 avgt 15 26.314 ? 0.521 ns/op >> SecondarySupersLookup.testNegative59 avgt 15 26.750 ? 0.837 ns/op >> SecondarySupersLookup.testNegative60 avgt 15 27.118 ? 0.557 ... > > Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: > > add2reg -> z_la src/hotspot/cpu/s390/macroAssembler_s390.cpp line 3243: > 3241: // Get the first array index that can contain super_klass. > 3242: if (bit != 0) { > 3243: pop_count_long(r_array_index, r_array_index, Z_R1_scratch); // all the registers are hardcoded so should be fine This comment is also rather baffling. You seem to be concerned about something, but what? `pop_count_long` doesn't cause any particular risk, does it? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19544#discussion_r1660114126 From aph at openjdk.org Sun Jun 30 09:49:19 2024 From: aph at openjdk.org (Andrew Haley) Date: Sun, 30 Jun 2024 09:49:19 GMT Subject: RFR: 8331126: [s390x] secondary_super_cache does not scale well [v6] In-Reply-To: References: Message-ID: On Tue, 25 Jun 2024 14:19:44 GMT, Amit Kumar wrote: >> s390x Port for [JDK-8180450](https://bugs.openjdk.org/browse/JDK-8180450) >> >> I ran `tier1` test with `-XX:+UseSecondarySupersTable -XX:+VerifySecondarySupers -XX:+StressSecondarySupers` in fastdebug vm and didn't see any new failure appearing; Except one I have mentioned [here](https://github.com/openjdk/jdk/pull/19368#issuecomment-2154983693); But this is reproducible on every other architecture with these flags. >> >> >> Without Patch: >> >> SecondarySuperCacheHits.test avgt 15 0.929 ? 0.010 ns/op >> >> SecondarySuperCacheInterContention.test avgt 15 1.413 ? 0.007 ns/op >> SecondarySuperCacheInterContention.test:t1 avgt 15 1.415 ? 0.016 ns/op >> SecondarySuperCacheInterContention.test:t2 avgt 15 1.410 ? 0.017 ns/op >> >> Benchmark Mode Cnt Score Error Units >> SecondarySupersLookup.testNegative00 avgt 15 1.806 ? 0.325 ns/op >> SecondarySupersLookup.testNegative01 avgt 15 2.364 ? 0.236 ns/op >> SecondarySupersLookup.testNegative02 avgt 15 2.903 ? 0.215 ns/op >> SecondarySupersLookup.testNegative03 avgt 15 3.417 ? 0.199 ns/op >> SecondarySupersLookup.testNegative04 avgt 15 3.758 ? 0.102 ns/op >> SecondarySupersLookup.testNegative05 avgt 15 4.352 ? 0.123 ns/op >> SecondarySupersLookup.testNegative06 avgt 15 4.800 ? 0.099 ns/op >> SecondarySupersLookup.testNegative07 avgt 15 5.365 ? 0.060 ns/op >> SecondarySupersLookup.testNegative08 avgt 15 6.316 ? 0.092 ns/op >> SecondarySupersLookup.testNegative09 avgt 15 6.669 ? 0.164 ns/op >> SecondarySupersLookup.testNegative10 avgt 15 7.041 ? 0.164 ns/op >> SecondarySupersLookup.testNegative16 avgt 15 9.336 ? 0.185 ns/op >> SecondarySupersLookup.testNegative20 avgt 15 11.373 ? 0.029 ns/op >> SecondarySupersLookup.testNegative30 avgt 15 15.236 ? 0.051 ns/op >> SecondarySupersLookup.testNegative32 avgt 15 16.031 ? 0.091 ns/op >> SecondarySupersLookup.testNegative40 avgt 15 19.197 ? 0.279 ns/op >> SecondarySupersLookup.testNegative50 avgt 15 23.804 ? 2.387 ns/op >> SecondarySupersLookup.testNegative55 avgt 15 25.610 ? 1.155 ns/op >> SecondarySupersLookup.testNegative56 avgt 15 26.128 ? 2.203 ns/op >> SecondarySupersLookup.testNegative57 avgt 15 26.126 ? 0.881 ns/op >> SecondarySupersLookup.testNegative58 avgt 15 26.314 ? 0.521 ns/op >> SecondarySupersLookup.testNegative59 avgt 15 26.750 ? 0.837 ns/op >> SecondarySupersLookup.testNegative60 avgt 15 27.118 ? 0.557 ... > > Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: > > add2reg -> z_la src/hotspot/cpu/s390/macroAssembler_s390.cpp line 3322: > 3320: // The bitmap is full to bursting. > 3321: // Implicit invariant: BITMAP_FULL implies (length > 0) > 3322: assert(Klass::SECONDARY_SUPERS_BITMAP_FULL == ~uintx(0), ""); This assert isn't needed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19544#discussion_r1660114749 From aph at openjdk.org Sun Jun 30 09:52:20 2024 From: aph at openjdk.org (Andrew Haley) Date: Sun, 30 Jun 2024 09:52:20 GMT Subject: RFR: 8331126: [s390x] secondary_super_cache does not scale well [v6] In-Reply-To: References: Message-ID: <9mdJExrUc4sDKcg33-uU4hxJUeRlBQROkmUCiUN5xXs=.7e85c74e-9eb1-47bf-b799-9035e16751be@github.com> On Tue, 25 Jun 2024 14:19:44 GMT, Amit Kumar wrote: >> s390x Port for [JDK-8180450](https://bugs.openjdk.org/browse/JDK-8180450) >> >> I ran `tier1` test with `-XX:+UseSecondarySupersTable -XX:+VerifySecondarySupers -XX:+StressSecondarySupers` in fastdebug vm and didn't see any new failure appearing; Except one I have mentioned [here](https://github.com/openjdk/jdk/pull/19368#issuecomment-2154983693); But this is reproducible on every other architecture with these flags. >> >> >> Without Patch: >> >> SecondarySuperCacheHits.test avgt 15 0.929 ? 0.010 ns/op >> >> SecondarySuperCacheInterContention.test avgt 15 1.413 ? 0.007 ns/op >> SecondarySuperCacheInterContention.test:t1 avgt 15 1.415 ? 0.016 ns/op >> SecondarySuperCacheInterContention.test:t2 avgt 15 1.410 ? 0.017 ns/op >> >> Benchmark Mode Cnt Score Error Units >> SecondarySupersLookup.testNegative00 avgt 15 1.806 ? 0.325 ns/op >> SecondarySupersLookup.testNegative01 avgt 15 2.364 ? 0.236 ns/op >> SecondarySupersLookup.testNegative02 avgt 15 2.903 ? 0.215 ns/op >> SecondarySupersLookup.testNegative03 avgt 15 3.417 ? 0.199 ns/op >> SecondarySupersLookup.testNegative04 avgt 15 3.758 ? 0.102 ns/op >> SecondarySupersLookup.testNegative05 avgt 15 4.352 ? 0.123 ns/op >> SecondarySupersLookup.testNegative06 avgt 15 4.800 ? 0.099 ns/op >> SecondarySupersLookup.testNegative07 avgt 15 5.365 ? 0.060 ns/op >> SecondarySupersLookup.testNegative08 avgt 15 6.316 ? 0.092 ns/op >> SecondarySupersLookup.testNegative09 avgt 15 6.669 ? 0.164 ns/op >> SecondarySupersLookup.testNegative10 avgt 15 7.041 ? 0.164 ns/op >> SecondarySupersLookup.testNegative16 avgt 15 9.336 ? 0.185 ns/op >> SecondarySupersLookup.testNegative20 avgt 15 11.373 ? 0.029 ns/op >> SecondarySupersLookup.testNegative30 avgt 15 15.236 ? 0.051 ns/op >> SecondarySupersLookup.testNegative32 avgt 15 16.031 ? 0.091 ns/op >> SecondarySupersLookup.testNegative40 avgt 15 19.197 ? 0.279 ns/op >> SecondarySupersLookup.testNegative50 avgt 15 23.804 ? 2.387 ns/op >> SecondarySupersLookup.testNegative55 avgt 15 25.610 ? 1.155 ns/op >> SecondarySupersLookup.testNegative56 avgt 15 26.128 ? 2.203 ns/op >> SecondarySupersLookup.testNegative57 avgt 15 26.126 ? 0.881 ns/op >> SecondarySupersLookup.testNegative58 avgt 15 26.314 ? 0.521 ns/op >> SecondarySupersLookup.testNegative59 avgt 15 26.750 ? 0.837 ns/op >> SecondarySupersLookup.testNegative60 avgt 15 27.118 ? 0.557 ... > > Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: > > add2reg -> z_la src/hotspot/cpu/s390/macroAssembler_s390.cpp line 3327: > 3325: z_bre(L_huge); > 3326: > 3327: // NOTE: please load 0 only in r_result, as this is also being used for z_locgr down Can you please explain what his comment means? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19544#discussion_r1660115211 From duke at openjdk.org Sun Jun 30 14:06:53 2024 From: duke at openjdk.org (ArsenyBochkarev) Date: Sun, 30 Jun 2024 14:06:53 GMT Subject: RFR: 8334999: RISC-V: implement AES single block encryption/decryption intrinsics Message-ID: Hello everyone! Please review this port of vector AES single block encryption/decryption intrinsics. On my QEMU with `Zvkned` extension enabled the `test/hotspot/jtreg/compiler/codegen/aes/TestAESMain.java` test is OK. I know that currently hardware implementing this extension is not available on the market but I suppose this PR can be a good starting point on supporting AES intrinsics for RISC-V in OpenJDK. ------------- Commit messages: - 8334999: RISC-V: implement AES single block encryption/decryption intrinsics Changes: https://git.openjdk.org/jdk/pull/19960/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19960&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8334999 Stats: 261 lines in 3 files changed: 250 ins; 11 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/19960.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19960/head:pull/19960 PR: https://git.openjdk.org/jdk/pull/19960 From amitkumar at openjdk.org Sun Jun 30 15:36:19 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Sun, 30 Jun 2024 15:36:19 GMT Subject: RFR: 8331126: [s390x] secondary_super_cache does not scale well [v6] In-Reply-To: References: Message-ID: On Sun, 30 Jun 2024 09:42:52 GMT, Andrew Haley wrote: >> Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: >> >> add2reg -> z_la > > src/hotspot/cpu/s390/macroAssembler_s390.cpp line 3243: > >> 3241: // Get the first array index that can contain super_klass. >> 3242: if (bit != 0) { >> 3243: pop_count_long(r_array_index, r_array_index, Z_R1_scratch); // all the registers are hardcoded so should be fine > > This comment is also rather baffling. You seem to be concerned about something, but what? `pop_count_long` doesn't cause any particular risk, does it? For machines older than `Z15`, `pop_count_long` clobbers `Z_R1_scratch` register. That's why I added it there. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19544#discussion_r1660203297 From amitkumar at openjdk.org Sun Jun 30 15:50:19 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Sun, 30 Jun 2024 15:50:19 GMT Subject: RFR: 8331126: [s390x] secondary_super_cache does not scale well [v6] In-Reply-To: <9mdJExrUc4sDKcg33-uU4hxJUeRlBQROkmUCiUN5xXs=.7e85c74e-9eb1-47bf-b799-9035e16751be@github.com> References: <9mdJExrUc4sDKcg33-uU4hxJUeRlBQROkmUCiUN5xXs=.7e85c74e-9eb1-47bf-b799-9035e16751be@github.com> Message-ID: On Sun, 30 Jun 2024 09:49:35 GMT, Andrew Haley wrote: >> Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: >> >> add2reg -> z_la > > src/hotspot/cpu/s390/macroAssembler_s390.cpp line 3327: > >> 3325: z_bre(L_huge); >> 3326: >> 3327: // NOTE: please load 0 only in r_result, as this is also being used for z_locgr down > > Can you please explain what his comment means? Is this old comment ? Code seems outdated. This is moved inside ASSERT now: z_cghi(r_result, 0); asm_assert(bcondEqual, "r_result required to be 0, used by z_locgr", 44); This is used by the `z_locgr` in the below loop. NearLabel L_loop; bind(L_loop); // Check for wraparound. z_cgr(r_array_index, r_array_length); z_locgr(r_array_index, r_result, bcondHigh); // r_result is containing 0 z_cg(r_super_klass, Address(r_array_base, r_array_index)); z_bre(L_done); // success // look-ahead check (Bit 2), if bit-2 is also 0, we're done testbit(r_bitmap, 2); z_bfalse(L_failure); z_rllg(r_bitmap, r_bitmap, 64-1); // rotate right add2reg(r_array_index, BytesPerWord); z_bru(L_loop); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19544#discussion_r1660205202 From amitkumar at openjdk.org Sun Jun 30 15:56:48 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Sun, 30 Jun 2024 15:56:48 GMT Subject: RFR: 8331126: [s390x] secondary_super_cache does not scale well [v13] In-Reply-To: References: Message-ID: > s390x Port for [JDK-8180450](https://bugs.openjdk.org/browse/JDK-8180450) > > I ran `tier1` test with `-XX:+UseSecondarySupersTable -XX:+VerifySecondarySupers -XX:+StressSecondarySupers` in fastdebug vm and didn't see any new failure appearing; Except one I have mentioned [here](https://github.com/openjdk/jdk/pull/19368#issuecomment-2154983693); But this is reproducible on every other architecture with these flags. > > > Without Patch: > > SecondarySuperCacheHits.test avgt 15 0.929 ? 0.010 ns/op > > SecondarySuperCacheInterContention.test avgt 15 1.413 ? 0.007 ns/op > SecondarySuperCacheInterContention.test:t1 avgt 15 1.415 ? 0.016 ns/op > SecondarySuperCacheInterContention.test:t2 avgt 15 1.410 ? 0.017 ns/op > > Benchmark Mode Cnt Score Error Units > SecondarySupersLookup.testNegative00 avgt 15 1.806 ? 0.325 ns/op > SecondarySupersLookup.testNegative01 avgt 15 2.364 ? 0.236 ns/op > SecondarySupersLookup.testNegative02 avgt 15 2.903 ? 0.215 ns/op > SecondarySupersLookup.testNegative03 avgt 15 3.417 ? 0.199 ns/op > SecondarySupersLookup.testNegative04 avgt 15 3.758 ? 0.102 ns/op > SecondarySupersLookup.testNegative05 avgt 15 4.352 ? 0.123 ns/op > SecondarySupersLookup.testNegative06 avgt 15 4.800 ? 0.099 ns/op > SecondarySupersLookup.testNegative07 avgt 15 5.365 ? 0.060 ns/op > SecondarySupersLookup.testNegative08 avgt 15 6.316 ? 0.092 ns/op > SecondarySupersLookup.testNegative09 avgt 15 6.669 ? 0.164 ns/op > SecondarySupersLookup.testNegative10 avgt 15 7.041 ? 0.164 ns/op > SecondarySupersLookup.testNegative16 avgt 15 9.336 ? 0.185 ns/op > SecondarySupersLookup.testNegative20 avgt 15 11.373 ? 0.029 ns/op > SecondarySupersLookup.testNegative30 avgt 15 15.236 ? 0.051 ns/op > SecondarySupersLookup.testNegative32 avgt 15 16.031 ? 0.091 ns/op > SecondarySupersLookup.testNegative40 avgt 15 19.197 ? 0.279 ns/op > SecondarySupersLookup.testNegative50 avgt 15 23.804 ? 2.387 ns/op > SecondarySupersLookup.testNegative55 avgt 15 25.610 ? 1.155 ns/op > SecondarySupersLookup.testNegative56 avgt 15 26.128 ? 2.203 ns/op > SecondarySupersLookup.testNegative57 avgt 15 26.126 ? 0.881 ns/op > SecondarySupersLookup.testNegative58 avgt 15 26.314 ? 0.521 ns/op > SecondarySupersLookup.testNegative59 avgt 15 26.750 ? 0.837 ns/op > SecondarySupersLookup.testNegative60 avgt 15 27.118 ? 0.557 ns/op > SecondarySupersLookup.testNegative61 avgt 15 27.763 ? 1.628 ns... Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: removed unnecessary checks ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19544/files - new: https://git.openjdk.org/jdk/pull/19544/files/8ab5a40a..98a8f5ad Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19544&range=12 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19544&range=11-12 Stats: 12 lines in 1 file changed: 0 ins; 12 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/19544.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19544/head:pull/19544 PR: https://git.openjdk.org/jdk/pull/19544 From amitkumar at openjdk.org Sun Jun 30 15:56:49 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Sun, 30 Jun 2024 15:56:49 GMT Subject: RFR: 8331126: [s390x] secondary_super_cache does not scale well [v6] In-Reply-To: References: Message-ID: On Sun, 30 Jun 2024 09:37:03 GMT, Andrew Haley wrote: >> Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: >> >> add2reg -> z_la > > src/hotspot/cpu/s390/macroAssembler_s390.cpp line 3414: > >> 3412: #ifdef ASSERT >> 3413: { >> 3414: // r_result should have either 0 or 1 value > > What is this assert block for? `result` is either zero or nonzero, and there's no need for anything stronger. Removed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19544#discussion_r1660206109