From amenkov at openjdk.org Wed May 1 00:20:52 2024 From: amenkov at openjdk.org (Alex Menkov) Date: Wed, 1 May 2024 00:20:52 GMT Subject: RFR: 8330852: All callers of JvmtiEnvBase::get_threadOop_and_JavaThread should pass current thread explicitly [v3] In-Reply-To: References: Message-ID: On Tue, 30 Apr 2024 02:05:10 GMT, Serguei Spitsyn wrote: >> Looks like in JVMTI `current_thread` is more common (and `current` is usually used in runtime :) > > The plan is to unify this with the approach used by the Runtime team. Replaced all touched "current_thread" and "calling_thread" with "current" ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18986#discussion_r1585718011 From kvn at openjdk.org Wed May 1 03:38:24 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 1 May 2024 03:38:24 GMT Subject: RFR: 8331253: 16 bits is not enough for nmethod::_skipped_instructions_size field Message-ID: In [JDK-8329433](https://bugs.openjdk.org/browse/JDK-8329433) I changed `nmethod::_skipped_instructions_size` field type to `uint16_t` assuming that it only count NOP instructions and GC barriers. I did not take into account that Generational ZGC also incudes barrier stubs into this size (original ZGC missed that). It is correct to include them because these stubs are generated in instructions section and not in stubs section: Statistics for 1330 bytecoded nmethods for C2: ... ZGC: main code = 3237080 (75.567032%) stubs code = 810577 (25.040375%) skipped insts = 44432 (1.372595%) GenZGC: main code = 4034704 (78.238518%) stubs code = 1356703 (33.625839%) skipped insts = 1074611 (26.634197%) Note, GenZGC has bigger code because it has store barriers. It generates a separate stub for each barrier, no sharing. After looking on how `_skipped_instructions_size` is used (only in one place when calculated inlinining size of compiled code) I decided replace it with `int _inline_insts_size;`. It is calculated the same way as before. And instead of including instructions stubs into `_skipped_instructions_size` I recorded size of instructions in code section before stubs are generated. This allow to get more accurate size of main instructions and no need for `InlineSkippedInstructionsCounter` in GC barriers stubs. I also fixed code in C2 which estimates size of code and stubs sections. Tested tier1-4,tier8,stress,xcomp ------------- Commit messages: - 8331253: 16 bits is not enough for nmethod::_skipped_instructions_size field Changes: https://git.openjdk.org/jdk/pull/19029/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19029&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8331253 Stats: 46 lines in 9 files changed: 27 ins; 7 del; 12 mod Patch: https://git.openjdk.org/jdk/pull/19029.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19029/head:pull/19029 PR: https://git.openjdk.org/jdk/pull/19029 From rehn at openjdk.org Wed May 1 08:08:51 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Wed, 1 May 2024 08:08:51 GMT Subject: RFR: 8326306: RISC-V: Re-structure MASM calls and jumps [v2] In-Reply-To: <4iLVM5rBRUo43EgY72DPBxJJ3qaHC4Nx_aWBUW9pIM8=.1f7cdee2-15d8-4b0f-b4ac-082f23198d8e@github.com> References: <1UZeWIQJIEYbPetxWPlhQffyAy4gWXvNiV79i4_3pMQ=.86fb3068-940b-49ea-a2ea-b84a865d4cca@github.com> <0gMQgeYKyAzms64-hBIrltqUSfetu3Kczwr7IwLmF18=.8f583ac0-afff-4f1b-985f-a688cd898ae3@github.com> <4iLVM5rBRUo43EgY72DPBxJJ3qaHC4Nx_aWBUW9pIM8=.1f7cdee2-15d8-4b0f-b4ac-082f23198d8e@github.com> Message-ID: On Tue, 30 Apr 2024 13:46:03 GMT, Fei Yang wrote: >> Make sense? > > I am still thinking about the possibility of unifying `call` and `rt_call`. Having both of them could be confusing to me (and new comers I guess). What I was talking about in my previous comment is something like this add-on change: > [addon.diff.txt](https://github.com/openjdk/jdk/files/15164874/addon.diff.txt) > What do you think? Off today, I'll have a look tomorrow, thanks. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18942#discussion_r1586000837 From rehn at openjdk.org Wed May 1 08:13:03 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Wed, 1 May 2024 08:13:03 GMT Subject: Integrated: 8331393: AArch64: u32 _partial_subtype_ctr loaded/stored as 64 In-Reply-To: <3xNeycdTwfJhuy6uEm2uCcXl5NN9Nc3RElC0gVfPYQQ=.a5a2bb51-dad5-41c7-aa7d-ace5628832dd@github.com> References: <3xNeycdTwfJhuy6uEm2uCcXl5NN9Nc3RElC0gVfPYQQ=.a5a2bb51-dad5-41c7-aa7d-ace5628832dd@github.com> Message-ID: On Tue, 30 Apr 2024 08:51:03 GMT, Robbin Ehn wrote: > Hi, please consider. > > Let's use incw for these. > > Untested, hoping GHA checks this :) > > Thanks, Robbin This pull request has now been integrated. Changeset: f215899a Author: Robbin Ehn URL: https://git.openjdk.org/jdk/commit/f215899a088d1abe86adccf0e65a073189272ddd Stats: 9 lines in 2 files changed: 0 ins; 7 del; 2 mod 8331393: AArch64: u32 _partial_subtype_ctr loaded/stored as 64 Reviewed-by: aph, fyang ------------- PR: https://git.openjdk.org/jdk/pull/19011 From kevinw at openjdk.org Wed May 1 08:26:52 2024 From: kevinw at openjdk.org (Kevin Walls) Date: Wed, 1 May 2024 08:26:52 GMT Subject: RFR: 8314225: SIGSEGV in JavaThread::is_lock_owned Message-ID: Removal of JavaThread's MonitorChunks member. This held lock information during deoptimization, but access to it is unnecessary for anything other than the deoptimization itself. Access to it in is_lock_owned() was racy, and caused rare crashes. ------------- Commit messages: - Merge remote-tracking branch 'upstream/master' into 8314225_is_lock_owned_no_monitor_chunks_check - Add asserts around move_to calls - Merge remote-tracking branch 'upstream/master' into 8314225_is_lock_owned_no_monitor_chunks_check - 8314225: SIGSEGV in JavaThread::is_lock_owned Changes: https://git.openjdk.org/jdk/pull/18940/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=18940&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8314225 Stats: 77 lines in 8 files changed: 10 ins; 57 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/18940.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18940/head:pull/18940 PR: https://git.openjdk.org/jdk/pull/18940 From dlong at openjdk.org Wed May 1 08:26:53 2024 From: dlong at openjdk.org (Dean Long) Date: Wed, 1 May 2024 08:26:53 GMT Subject: RFR: 8314225: SIGSEGV in JavaThread::is_lock_owned In-Reply-To: References: Message-ID: On Wed, 24 Apr 2024 19:50:08 GMT, Kevin Walls wrote: > Removal of JavaThread's MonitorChunks member. This held lock information during deoptimization, but access to it is unnecessary for anything other than the deoptimization itself. > > Access to it in is_lock_owned() was racy, and caused rare crashes. Looks good. After thinking about it some more, I think we need a stronger guarantee that the monitors are always inflated during deoptimization. ------------- Marked as reviewed by dlong (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18940#pullrequestreview-2021116316 Changes requested by dlong (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18940#pullrequestreview-2021693970 From kevinw at openjdk.org Wed May 1 08:26:53 2024 From: kevinw at openjdk.org (Kevin Walls) Date: Wed, 1 May 2024 08:26:53 GMT Subject: RFR: 8314225: SIGSEGV in JavaThread::is_lock_owned In-Reply-To: References: Message-ID: On Wed, 24 Apr 2024 19:50:08 GMT, Kevin Walls wrote: > Removal of JavaThread's MonitorChunks member. This held lock information during deoptimization, but access to it is unnecessary for anything other than the deoptimization itself. > > Access to it in is_lock_owned() was racy, and caused rare crashes. JavaThread's MonitorChunks member is obsolete. In lightweight locking, where an object has its mark word copied/displaced into a thread stack, owner checks can be made by checking if such a pointer is within the stack of a thread. Lock inflation makes a lightweight lock into a heavyweight lock, and we always inflate during OSR and deoptimization, therefore monitor_chunks is obsolete. BasicObjectLock::move_to(oop obj, BasicLock* dest) is called during deoptimization to move the BasicLocks to these chunks, and always inflates the monitor. It doesn?t change the object?s markword to point to this new address. (Thanks to David, Dean and Patricio for talking this through!) So: JavaThread::is_lock_owned should not check and traverse the _monitor_chunks list. src/hotspot/share/runtime/vframeArray.cpp: This does not need to save MonitorChunks in the JavaThread, which means JavaThread can remove _monitor_chunks and its accessor methods. vframeArrayElement::fill_in(compiledVFrame* vf, bool realloc_failures) This allocates monitor chunks during deoptimization. It can skip saving the MonitorChunk* in the JavaThread, but the MonitorChunk* _monitors is used locally so should stay. (As MonitorChunks are not inserted into a list in the JavaThread, it doesn't even need a _next pointer etc...) Incidental other use of monitor chunks: src/hotspot/share/jfr/leakprofiler/checkpoint/rootResolver.cpp: At a safepoint, where they were always null. Remove this call. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18940#issuecomment-2075736595 From dholmes at openjdk.org Wed May 1 08:33:03 2024 From: dholmes at openjdk.org (David Holmes) Date: Wed, 1 May 2024 08:33:03 GMT Subject: RFR: 8330076: NMT: add/make a mandatory MEMFLAGS argument to family of os::reserve/commit/uncommit memory API [v4] In-Reply-To: References: <5GDKVVPITIzIcyfm-0tKOFzFIEPBgzOe-or1eX_POns=.a5205641-139b-4749-afcc-57ddbc85e6be@github.com> <4eN_yJUIi_0MTBROX0yxeIZIYo4W3KNlBGGOSA3glI4=.8e6ec837-1cb3-414f-959c-86fb3e3c9907@github.com> Message-ID: On Mon, 15 Apr 2024 16:03:51 GMT, Afshin Zafari wrote: >> src/hotspot/share/memory/metaspace/testHelpers.cpp line 81: >> >>> 79: if (reserve_limit > 0) { >>> 80: // have reserve limit -> non-expandable context >>> 81: _rs = ReservedSpace(reserve_limit * BytesPerWord, Metaspace::reserve_alignment(), os::vm_page_size(), mtTest); >> >> mtMetaspace > > Done Not done yet ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18745#discussion_r1586016936 From dholmes at openjdk.org Wed May 1 08:33:02 2024 From: dholmes at openjdk.org (David Holmes) Date: Wed, 1 May 2024 08:33:02 GMT Subject: RFR: 8330076: NMT: add/make a mandatory MEMFLAGS argument to family of os::reserve/commit/uncommit memory API [v13] In-Reply-To: References: <5GDKVVPITIzIcyfm-0tKOFzFIEPBgzOe-or1eX_POns=.a5205641-139b-4749-afcc-57ddbc85e6be@github.com> Message-ID: On Tue, 23 Apr 2024 06:31:30 GMT, David Holmes wrote: >> Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision: >> >> removed extra blank line. > > This is a big change, but the pattern of the changes is quite easy to follow. > > I do have a couple of queries below. > > Thanks > @dholmes-ora, I am not sure if you got all your comments addressed. Would you please, have a look at here? Thanks. My comments were addressed - thanks - but I will leave it to the experts in this area to grant the approvals. I did spot one change in testHelpers.cpp that had not actually been made yet. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18745#issuecomment-2088150756 From dlong at openjdk.org Wed May 1 08:37:56 2024 From: dlong at openjdk.org (Dean Long) Date: Wed, 1 May 2024 08:37:56 GMT Subject: RFR: 8314225: SIGSEGV in JavaThread::is_lock_owned In-Reply-To: References: Message-ID: On Wed, 24 Apr 2024 19:50:08 GMT, Kevin Walls wrote: > Removal of JavaThread's MonitorChunks member. This held lock information during deoptimization, but access to it is unnecessary for anything other than the deoptimization itself. > > Access to it in is_lock_owned() was racy, and caused rare crashes. src/hotspot/share/runtime/vframeArray.cpp line 99: > 97: dest->set_obj(monitor->owner()); > 98: > 99: assert(current_thread->is_Java_thread(), "Must be a JavaThread"); How about making `current_thread` a JavaThread* instead? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18940#discussion_r1586021282 From dlong at openjdk.org Wed May 1 08:43:53 2024 From: dlong at openjdk.org (Dean Long) Date: Wed, 1 May 2024 08:43:53 GMT Subject: RFR: 8314225: SIGSEGV in JavaThread::is_lock_owned In-Reply-To: References: Message-ID: <4NzfdylxvqETF87l3E4O3XdBMInuP7_8S9mhS6tN0QA=.cc497605-246b-4ebc-9816-09b384683e0d@github.com> On Wed, 24 Apr 2024 19:50:08 GMT, Kevin Walls wrote: > Removal of JavaThread's MonitorChunks member. This held lock information during deoptimization, but access to it is unnecessary for anything other than the deoptimization itself. > > Access to it in is_lock_owned() was racy, and caused rare crashes. src/hotspot/share/runtime/vframeArray.cpp line 100: > 98: > 99: assert(current_thread->is_Java_thread(), "Must be a JavaThread"); > 100: assert(ObjectSynchronizer::current_thread_holds_lock((JavaThread*) current_thread, Handle(current_thread, dest->obj())), This makes me wonder about the assert at line 96 that allows monitor->owner() == nullptr. If that can happen due to OOM, then we need to check for that here too. src/hotspot/share/runtime/vframeArray.cpp line 317: > 315: BasicObjectLock* src = _monitors->at(index); > 316: top->set_obj(src->obj()); > 317: assert(ObjectSynchronizer::current_thread_holds_lock(thread, Handle(thread, src->obj())), "should be held, before move_to"); Same comment as above, may need to check for null obj. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18940#discussion_r1586025200 PR Review Comment: https://git.openjdk.org/jdk/pull/18940#discussion_r1586028694 From sspitsyn at openjdk.org Wed May 1 08:43:59 2024 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Wed, 1 May 2024 08:43:59 GMT Subject: RFR: 8330969: scalability issue with loaded JVMTI agent [v3] In-Reply-To: References: Message-ID: <-p7pqoYebg7hw2X1zB4TKN_xf6L0ZAaKA1uB8ajkPDo=.9620a1b6-49cc-4545-86ae-94d1c10d7596@github.com> On Tue, 30 Apr 2024 01:56:13 GMT, Serguei Spitsyn wrote: >> This is a fix of the following JVMTI scalability issue. A closed benchmark with millions of virtual threads shows 3X-4X overhead when a JVMTI agent has been loaded. For instance, this is observable when an app is executed under control of the Oracle Studio `collect` utility. >> For performance analysis, experiments and numbers, please, see the comment below this description. >> >> The fix is to replace the global counter `_VTMS_transition_count` with the mark bit `_VTMS_transition_mark` in each `JavaThread`'. >> >> Testing: >> - Tested with mach5 tiers 1-6 > > Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: > > review: correct comments related to VTMS transition counters Alex and Chris, thank you for review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/18937#issuecomment-2088158954 From sspitsyn at openjdk.org Wed May 1 08:44:00 2024 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Wed, 1 May 2024 08:44:00 GMT Subject: Integrated: 8330969: scalability issue with loaded JVMTI agent In-Reply-To: References: Message-ID: On Wed, 24 Apr 2024 16:04:30 GMT, Serguei Spitsyn wrote: > This is a fix of the following JVMTI scalability issue. A closed benchmark with millions of virtual threads shows 3X-4X overhead when a JVMTI agent has been loaded. For instance, this is observable when an app is executed under control of the Oracle Studio `collect` utility. > For performance analysis, experiments and numbers, please, see the comment below this description. > > The fix is to replace the global counter `_VTMS_transition_count` with the mark bit `_VTMS_transition_mark` in each `JavaThread`'. > > Testing: > - Tested with mach5 tiers 1-6 This pull request has now been integrated. Changeset: 663acd2e Author: Serguei Spitsyn URL: https://git.openjdk.org/jdk/commit/663acd2e173114fec7c2f50084af9ec56150d394 Stats: 42 lines in 5 files changed: 14 ins; 11 del; 17 mod 8330969: scalability issue with loaded JVMTI agent Reviewed-by: amenkov, cjplummer ------------- PR: https://git.openjdk.org/jdk/pull/18937 From mli at openjdk.org Wed May 1 08:53:53 2024 From: mli at openjdk.org (Hamlin Li) Date: Wed, 1 May 2024 08:53:53 GMT Subject: RFR: 8331360: RISCV: u32 _partial_subtype_ctr loaded/stored as 64 In-Reply-To: References: Message-ID: On Tue, 30 Apr 2024 06:54:56 GMT, Robbin Ehn wrote: > Hi, please consider. > > We should use incrementw() for these. > > Sanity tested, running t1. > > Thanks, Robbin Looks good, thanks! ------------- Marked as reviewed by mli (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19010#pullrequestreview-2033248194 From mli at openjdk.org Wed May 1 08:53:54 2024 From: mli at openjdk.org (Hamlin Li) Date: Wed, 1 May 2024 08:53:54 GMT Subject: RFR: 8331399: RISC-V: Don't us mv instead of la In-Reply-To: References: Message-ID: On Tue, 30 Apr 2024 09:27:09 GMT, Robbin Ehn wrote: > Hi please consider, > > It makes no sense to use mv instead of la. > It doesn't follow the standard mnemonics and it confusing when people use mv when they really mean la. > > la will do the reloc with movptr in this case, so the code is the same. > > Testing t1. > > Thanks, Robbin Looks good, thanks! ------------- Marked as reviewed by mli (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19014#pullrequestreview-2033248097 From dholmes at openjdk.org Wed May 1 09:06:52 2024 From: dholmes at openjdk.org (David Holmes) Date: Wed, 1 May 2024 09:06:52 GMT Subject: RFR: 8331208: Memory stress test that checks OutOfMemoryError stack trace fails In-Reply-To: References: Message-ID: On Tue, 23 Apr 2024 21:11:53 GMT, Doug Simon wrote: > This pull request mitigates failures in memory stress tests that check the stack trace of an `OutOfMemoryError` for certain expected entries. > > The stack trace of an OOME will [not be allocated once all preallocated OOMEs are used up](https://github.com/openjdk/jdk/blob/3d5eeac3a38ece4a23ea6da2dfe5939d64e81cea/src/hotspot/share/memory/universe.cpp#L722). If the only heap allocations performed in stressful conditions are those of the stress test, then the [4 preallocated OOMEs](https://github.com/openjdk/jdk/blob/f1d0e715b67e2ca47b525069d8153abbb33f75b9/src/hotspot/share/runtime/globals.hpp#L800) would be sufficient. However, it's possible for VM internal allocations to also occur during stressful conditions, especially in `-Xcomp` mode. For example, [CompileBroker::compile_method](https://github.com/openjdk/jdk/blob/3d5eeac3a38ece4a23ea6da2dfe5939d64e81cea/src/hotspot/share/compiler/compileBroker.cpp#L1399) will try to resolve the string constants in the constant pool of the method about to be compiled. This can fail as shown here: > > V [jvm.dll+0x62c23a] Exceptions::_throw+0x11a (exceptions.cpp:168) > V [jvm.dll+0x62d85b] Exceptions::_throw_oop+0xab (exceptions.cpp:140) > V [jvm.dll+0xbbce78] MemAllocator::Allocation::check_out_of_memory+0x208 (memAllocator.cpp:138) > V [jvm.dll+0xbbcac8] MemAllocator::allocate+0x158 (memAllocator.cpp:377) > V [jvm.dll+0x79bd05] InstanceKlass::allocate_instance+0x95 (instanceKlass.cpp:1509) > V [jvm.dll+0x7ddeed] java_lang_String::basic_create+0x9d (javaClasses.cpp:273) > V [jvm.dll+0x7e43c0] java_lang_String::create_from_unicode+0x60 (javaClasses.cpp:291) > V [jvm.dll+0xdb91a5] StringTable::do_intern+0xb5 (stringTable.cpp:379) > V [jvm.dll+0xdba9f2] StringTable::intern+0x1b2 (stringTable.cpp:368) > V [jvm.dll+0xdbaaa6] StringTable::intern+0x86 (stringTable.cpp:328) > V [jvm.dll+0x51c8b1] ConstantPool::string_at_impl+0x1d1 (constantPool.cpp:1251) > V [jvm.dll+0x51b95b] ConstantPool::resolve_string_constants_impl+0xeb (constantPool.cpp:800) > V [jvm.dll+0x4f2f8d] CompileBroker::compile_method+0x31d (compileBroker.cpp:1395) > V [jvm.dll+0x4f3474] CompileBroker::compile_method+0xc4 (compileBroker.cpp:1348) > > These internal allocations can occur before the allocations of the test and thus use up the pre-allocated OOMEs. As a result, the OOMEs triggered by the stress test may end up throwing the [default, shared OOME instance](https://github.com/openjdk/jdk/blob/3d5eeac3a38ece4a23ea6da2dfe5939d64e81cea/src/hotspot/... I don't think "sandbox" fits in this context: > Sandboxing is a security practice in which you use an isolated environment, or a ?sandbox,? for testing. Within the sandbox you run code, analyze the code in a safe, isolated environment without affecting the application, system or platform. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18925#issuecomment-2088182756 From dholmes at openjdk.org Wed May 1 09:09:53 2024 From: dholmes at openjdk.org (David Holmes) Date: Wed, 1 May 2024 09:09:53 GMT Subject: RFR: 8331208: Memory stress test that checks OutOfMemoryError stack trace fails In-Reply-To: References: Message-ID: On Tue, 30 Apr 2024 07:08:26 GMT, Doug Simon wrote: >> src/hotspot/share/gc/shared/memAllocator.cpp line 127: >> >>> 125: const char* message = _overhead_limit_exceeded ? "GC overhead limit exceeded" : "Java heap space"; >>> 126: // -XX:+HeapDumpOnOutOfMemoryError and -XX:OnOutOfMemoryError support >>> 127: report_java_out_of_memory(message); >> >> Not obvious we now need this to be unconditional. > > I think it was a mistake to make it conditional when RetryableAllocationMark was first introduced. The purpose of RAM was to only to resolve a correctness issue wrt to JVMTI (it was seeing the "same" exception being reported twice). The -XX actions do not change the semantics of the exception throwing so can be done unconditionally. But if this is a hidden/internal OOME then why would we treat it as a normal OOME and trigger the XX action? If the allocation routines returned null instead, we would never consider triggering the XX actions for OOME. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18925#discussion_r1586046804 From dholmes at openjdk.org Wed May 1 09:12:54 2024 From: dholmes at openjdk.org (David Holmes) Date: Wed, 1 May 2024 09:12:54 GMT Subject: RFR: 8331208: Memory stress test that checks OutOfMemoryError stack trace fails In-Reply-To: References: Message-ID: On Tue, 30 Apr 2024 07:22:22 GMT, Doug Simon wrote: >> src/hotspot/share/gc/shared/memAllocator.hpp line 139: >> >>> 137: _outer = false; >>> 138: _thread = nullptr; >>> 139: } >> >> It isn't obvious to me how this part is intended to be used. I see it ties back to the retryable allocation "activate" mode, but I'm unclear what that means as well. > > By "this part", do you mean the `else` branch? It exists for the `!activate` case of RetryableAllocationMark which is used when the `null_on_fail` parameter of `JVMCIRuntime::new_instance_common` is true. That is, the runtime call is from compiled code that does *not* want to trigger throwing of an OOME. Graal will deopt in such cases and let the interpreter throw the exception. This ensures the OOME is reported exactly once to JVMTI. "this part" means the "else branch" which means the null receiving constructor. Yeah that whole "null_on_fail" thing had me a bit perplexed and I see there is now a JBS issue filed. to kill it off as we always want null-on-fail. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18925#discussion_r1586048612 From dholmes at openjdk.org Wed May 1 09:30:52 2024 From: dholmes at openjdk.org (David Holmes) Date: Wed, 1 May 2024 09:30:52 GMT Subject: RFR: 8331285: Deprecate and obsolete OldSize [v2] In-Reply-To: References: Message-ID: On Tue, 30 Apr 2024 07:34:22 GMT, Albert Mingkun Yang wrote: >> Simple deprecating a jvm flag. > > Albert Mingkun Yang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - merge > - review > - old-size Looks good. Thanks. ------------- Marked as reviewed by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18994#pullrequestreview-2033333269 From kevinw at openjdk.org Wed May 1 09:31:52 2024 From: kevinw at openjdk.org (Kevin Walls) Date: Wed, 1 May 2024 09:31:52 GMT Subject: RFR: 8314225: SIGSEGV in JavaThread::is_lock_owned In-Reply-To: References: Message-ID: On Wed, 1 May 2024 08:35:00 GMT, Dean Long wrote: >> Removal of JavaThread's MonitorChunks member. This held lock information during deoptimization, but access to it is unnecessary for anything other than the deoptimization itself. >> >> Access to it in is_lock_owned() was racy, and caused rare crashes. > > src/hotspot/share/runtime/vframeArray.cpp line 99: > >> 97: dest->set_obj(monitor->owner()); >> 98: >> 99: assert(current_thread->is_Java_thread(), "Must be a JavaThread"); > > How about making `current_thread` a JavaThread* instead? Yes certainly. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18940#discussion_r1586061205 From azafari at openjdk.org Wed May 1 09:33:44 2024 From: azafari at openjdk.org (Afshin Zafari) Date: Wed, 1 May 2024 09:33:44 GMT Subject: RFR: 8330076: NMT: add/make a mandatory MEMFLAGS argument to family of os::reserve/commit/uncommit memory API [v15] In-Reply-To: <5GDKVVPITIzIcyfm-0tKOFzFIEPBgzOe-or1eX_POns=.a5205641-139b-4749-afcc-57ddbc85e6be@github.com> References: <5GDKVVPITIzIcyfm-0tKOFzFIEPBgzOe-or1eX_POns=.a5205641-139b-4749-afcc-57ddbc85e6be@github.com> Message-ID: > `MEMFLAGS flag` is used to hold/show the type of the memory regions in NMT. Each call of NMT API requires a search through the list of memory regions. > The Hotspot code reserves/commits/uncommits memory regions and later calls explicitly NMT API with a specific memory type (e.g., `mtGC`, `mtJavaHeap`) for that region. Therefore, there are two search in the list of regions per reserve/commit/uncommit operations, one for the operation and another for setting the type of the region. > When the memory type is passed in during reserve/commit/uncommit operations, NMT can use it and avoid the extra search for setting the memory type. > > Tests: tiers1-5 passed on linux-x64, macosx-aarch64 and windows-x64 for debug and non-debug builds. Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision: One missed mtTest is changed to mtMetaspace in testHelpers.cpp ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18745/files - new: https://git.openjdk.org/jdk/pull/18745/files/72467f68..512144e0 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18745&range=14 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18745&range=13-14 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/18745.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18745/head:pull/18745 PR: https://git.openjdk.org/jdk/pull/18745 From azafari at openjdk.org Wed May 1 09:33:44 2024 From: azafari at openjdk.org (Afshin Zafari) Date: Wed, 1 May 2024 09:33:44 GMT Subject: RFR: 8330076: NMT: add/make a mandatory MEMFLAGS argument to family of os::reserve/commit/uncommit memory API [v4] In-Reply-To: References: <5GDKVVPITIzIcyfm-0tKOFzFIEPBgzOe-or1eX_POns=.a5205641-139b-4749-afcc-57ddbc85e6be@github.com> <4eN_yJUIi_0MTBROX0yxeIZIYo4W3KNlBGGOSA3glI4=.8e6ec837-1cb3-414f-959c-86fb3e3c9907@github.com> Message-ID: On Wed, 1 May 2024 08:28:09 GMT, David Holmes wrote: >> Done > > Not done yet Should be done now. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18745#discussion_r1586061028 From dnsimon at openjdk.org Wed May 1 09:38:52 2024 From: dnsimon at openjdk.org (Doug Simon) Date: Wed, 1 May 2024 09:38:52 GMT Subject: RFR: 8331208: Memory stress test that checks OutOfMemoryError stack trace fails In-Reply-To: References: Message-ID: On Tue, 23 Apr 2024 21:11:53 GMT, Doug Simon wrote: > This pull request mitigates failures in memory stress tests that check the stack trace of an `OutOfMemoryError` for certain expected entries. > > The stack trace of an OOME will [not be allocated once all preallocated OOMEs are used up](https://github.com/openjdk/jdk/blob/3d5eeac3a38ece4a23ea6da2dfe5939d64e81cea/src/hotspot/share/memory/universe.cpp#L722). If the only heap allocations performed in stressful conditions are those of the stress test, then the [4 preallocated OOMEs](https://github.com/openjdk/jdk/blob/f1d0e715b67e2ca47b525069d8153abbb33f75b9/src/hotspot/share/runtime/globals.hpp#L800) would be sufficient. However, it's possible for VM internal allocations to also occur during stressful conditions, especially in `-Xcomp` mode. For example, [CompileBroker::compile_method](https://github.com/openjdk/jdk/blob/3d5eeac3a38ece4a23ea6da2dfe5939d64e81cea/src/hotspot/share/compiler/compileBroker.cpp#L1399) will try to resolve the string constants in the constant pool of the method about to be compiled. This can fail as shown here: > > V [jvm.dll+0x62c23a] Exceptions::_throw+0x11a (exceptions.cpp:168) > V [jvm.dll+0x62d85b] Exceptions::_throw_oop+0xab (exceptions.cpp:140) > V [jvm.dll+0xbbce78] MemAllocator::Allocation::check_out_of_memory+0x208 (memAllocator.cpp:138) > V [jvm.dll+0xbbcac8] MemAllocator::allocate+0x158 (memAllocator.cpp:377) > V [jvm.dll+0x79bd05] InstanceKlass::allocate_instance+0x95 (instanceKlass.cpp:1509) > V [jvm.dll+0x7ddeed] java_lang_String::basic_create+0x9d (javaClasses.cpp:273) > V [jvm.dll+0x7e43c0] java_lang_String::create_from_unicode+0x60 (javaClasses.cpp:291) > V [jvm.dll+0xdb91a5] StringTable::do_intern+0xb5 (stringTable.cpp:379) > V [jvm.dll+0xdba9f2] StringTable::intern+0x1b2 (stringTable.cpp:368) > V [jvm.dll+0xdbaaa6] StringTable::intern+0x86 (stringTable.cpp:328) > V [jvm.dll+0x51c8b1] ConstantPool::string_at_impl+0x1d1 (constantPool.cpp:1251) > V [jvm.dll+0x51b95b] ConstantPool::resolve_string_constants_impl+0xeb (constantPool.cpp:800) > V [jvm.dll+0x4f2f8d] CompileBroker::compile_method+0x31d (compileBroker.cpp:1395) > V [jvm.dll+0x4f3474] CompileBroker::compile_method+0xc4 (compileBroker.cpp:1348) > > These internal allocations can occur before the allocations of the test and thus use up the pre-allocated OOMEs. As a result, the OOMEs triggered by the stress test may end up throwing the [default, shared OOME instance](https://github.com/openjdk/jdk/blob/3d5eeac3a38ece4a23ea6da2dfe5939d64e81cea/src/hotspot/... Ok, I will rename it to `InternalOOMEMark`. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18925#issuecomment-2088212828 From dnsimon at openjdk.org Wed May 1 09:38:52 2024 From: dnsimon at openjdk.org (Doug Simon) Date: Wed, 1 May 2024 09:38:52 GMT Subject: RFR: 8331208: Memory stress test that checks OutOfMemoryError stack trace fails In-Reply-To: References: Message-ID: <9w0MpMhs68Tl4MHxLfuWY2ZRja_JFYBvjD8JRqjcTOY=.aa79b8a4-1abb-48a6-96a2-e4a543c401ea@github.com> On Wed, 1 May 2024 09:07:01 GMT, David Holmes wrote: >> I think it was a mistake to make it conditional when RetryableAllocationMark was first introduced. The purpose of RAM was to only to resolve a correctness issue wrt to JVMTI (it was seeing the "same" exception being reported twice). The -XX actions do not change the semantics of the exception throwing so can be done unconditionally. > > But if this is a hidden/internal OOME then why would we treat it as a normal OOME and trigger the XX action? If the allocation routines returned null instead, we would never consider triggering the XX actions for OOME. It depends on what the purpose of the `-XX` actions is. As far as I can tell, they are for understanding when and why the JVM hits a memory limit from an external perspective. For example, until something like https://bugs.openjdk.org/browse/JDK-8328639 exists, I don't think it would be easy to discover an OOME caused by the string constant resolution done by the JIT. But maybe that doesn't matter? I'm fine with keeping the XX actions conditional if you'd prefer. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18925#discussion_r1586064382 From kevinw at openjdk.org Wed May 1 10:05:03 2024 From: kevinw at openjdk.org (Kevin Walls) Date: Wed, 1 May 2024 10:05:03 GMT Subject: RFR: 8314225: SIGSEGV in JavaThread::is_lock_owned [v2] In-Reply-To: References: Message-ID: > Removal of JavaThread's MonitorChunks member. This held lock information during deoptimization, but access to it is unnecessary for anything other than the deoptimization itself. > > Access to it in is_lock_owned() was racy, and caused rare crashes. Kevin Walls has updated the pull request incrementally with one additional commit since the last revision: Feedback from Dean ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18940/files - new: https://git.openjdk.org/jdk/pull/18940/files/62aadcd7..17760bd5 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18940&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18940&range=00-01 Stats: 9 lines in 1 file changed: 3 ins; 0 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/18940.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18940/head:pull/18940 PR: https://git.openjdk.org/jdk/pull/18940 From dholmes at openjdk.org Wed May 1 10:05:03 2024 From: dholmes at openjdk.org (David Holmes) Date: Wed, 1 May 2024 10:05:03 GMT Subject: RFR: 8314225: SIGSEGV in JavaThread::is_lock_owned [v2] In-Reply-To: References: Message-ID: On Wed, 1 May 2024 10:00:49 GMT, Kevin Walls wrote: >> Removal of JavaThread's MonitorChunks member. This held lock information during deoptimization, but access to it is unnecessary for anything other than the deoptimization itself. >> >> Access to it in is_lock_owned() was racy, and caused rare crashes. > > Kevin Walls has updated the pull request incrementally with one additional commit since the last revision: > > Feedback from Dean src/hotspot/share/runtime/javaThread.cpp line 1051: > 1049: assert(LockingMode != LM_LIGHTWEIGHT, "should not be called with new lightweight locking"); > 1050: return Thread::is_lock_owned(adr); > 1051: } Can't we just remove `JavaThread::is_lock_owned` now? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18940#discussion_r1586075215 From dholmes at openjdk.org Wed May 1 10:05:03 2024 From: dholmes at openjdk.org (David Holmes) Date: Wed, 1 May 2024 10:05:03 GMT Subject: RFR: 8314225: SIGSEGV in JavaThread::is_lock_owned [v2] In-Reply-To: <4NzfdylxvqETF87l3E4O3XdBMInuP7_8S9mhS6tN0QA=.cc497605-246b-4ebc-9816-09b384683e0d@github.com> References: <4NzfdylxvqETF87l3E4O3XdBMInuP7_8S9mhS6tN0QA=.cc497605-246b-4ebc-9816-09b384683e0d@github.com> Message-ID: On Wed, 1 May 2024 08:39:35 GMT, Dean Long wrote: >> Kevin Walls has updated the pull request incrementally with one additional commit since the last revision: >> >> Feedback from Dean > > src/hotspot/share/runtime/vframeArray.cpp line 100: > >> 98: >> 99: assert(current_thread->is_Java_thread(), "Must be a JavaThread"); >> 100: assert(ObjectSynchronizer::current_thread_holds_lock((JavaThread*) current_thread, Handle(current_thread, dest->obj())), > > This makes me wonder about the assert at line 96 that allows monitor->owner() == nullptr. If that can happen due to OOM, then we need to check for that here too. Nit: don't use C-style casts use `JavaThread::cast(thread_current)` (though this won't be necessary once you change the type of `current_thread`. > src/hotspot/share/runtime/vframeArray.cpp line 317: > >> 315: BasicObjectLock* src = _monitors->at(index); >> 316: top->set_obj(src->obj()); >> 317: assert(ObjectSynchronizer::current_thread_holds_lock(thread, Handle(thread, src->obj())), "should be held, before move_to"); > > Same comment as above, may need to check for null obj. Not sure how `obj` can be null in this code. ??? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18940#discussion_r1586088980 PR Review Comment: https://git.openjdk.org/jdk/pull/18940#discussion_r1586116316 From dholmes at openjdk.org Wed May 1 10:05:03 2024 From: dholmes at openjdk.org (David Holmes) Date: Wed, 1 May 2024 10:05:03 GMT Subject: RFR: 8314225: SIGSEGV in JavaThread::is_lock_owned [v2] In-Reply-To: References: <4NzfdylxvqETF87l3E4O3XdBMInuP7_8S9mhS6tN0QA=.cc497605-246b-4ebc-9816-09b384683e0d@github.com> Message-ID: On Wed, 1 May 2024 09:49:06 GMT, David Holmes wrote: >> src/hotspot/share/runtime/vframeArray.cpp line 100: >> >>> 98: >>> 99: assert(current_thread->is_Java_thread(), "Must be a JavaThread"); >>> 100: assert(ObjectSynchronizer::current_thread_holds_lock((JavaThread*) current_thread, Handle(current_thread, dest->obj())), >> >> This makes me wonder about the assert at line 96 that allows monitor->owner() == nullptr. If that can happen due to OOM, then we need to check for that here too. > > Nit: don't use C-style casts use `JavaThread::cast(thread_current)` (though this won't be necessary once you change the type of `current_thread`. We need a RFE to rename `MonitorInfo::_owner` to be `MonitorInfo::_obj` - the current terminology is very confusing! But yes we need to check `dest->obj()` for null here. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18940#discussion_r1586112864 From kevinw at openjdk.org Wed May 1 10:05:03 2024 From: kevinw at openjdk.org (Kevin Walls) Date: Wed, 1 May 2024 10:05:03 GMT Subject: RFR: 8314225: SIGSEGV in JavaThread::is_lock_owned [v2] In-Reply-To: References: <4NzfdylxvqETF87l3E4O3XdBMInuP7_8S9mhS6tN0QA=.cc497605-246b-4ebc-9816-09b384683e0d@github.com> Message-ID: On Wed, 1 May 2024 09:56:59 GMT, David Holmes wrote: >> Nit: don't use C-style casts use `JavaThread::cast(thread_current)` (though this won't be necessary once you change the type of `current_thread`. > > We need a RFE to rename `MonitorInfo::_owner` to be `MonitorInfo::_obj` - the current terminology is very confusing! > > But yes we need to check `dest->obj()` for null here. OK yes - if monitor->owner() == nullptr that's the MonitorInfo* we got from vframe->monitors() has a null Handle _owner current_thread_holds_lock() is going to call read_stable_mark() on the oop, and likely crash, so this is a really good point! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18940#discussion_r1586113374 From kevinw at openjdk.org Wed May 1 10:06:53 2024 From: kevinw at openjdk.org (Kevin Walls) Date: Wed, 1 May 2024 10:06:53 GMT Subject: RFR: 8314225: SIGSEGV in JavaThread::is_lock_owned [v2] In-Reply-To: References: Message-ID: On Wed, 1 May 2024 09:44:41 GMT, David Holmes wrote: >> Kevin Walls has updated the pull request incrementally with one additional commit since the last revision: >> >> Feedback from Dean > > src/hotspot/share/runtime/javaThread.cpp line 1051: > >> 1049: assert(LockingMode != LM_LIGHTWEIGHT, "should not be called with new lightweight locking"); >> 1050: return Thread::is_lock_owned(adr); >> 1051: } > > Can't we just remove `JavaThread::is_lock_owned` now? Yes I was thinking about that - it just defers to Thread::is_lock_owned and that has the same assert. I will remove it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18940#discussion_r1586123494 From dholmes at openjdk.org Wed May 1 10:13:02 2024 From: dholmes at openjdk.org (David Holmes) Date: Wed, 1 May 2024 10:13:02 GMT Subject: RFR: 8330076: NMT: add/make a mandatory MEMFLAGS argument to family of os::reserve/commit/uncommit memory API [v13] In-Reply-To: References: <5GDKVVPITIzIcyfm-0tKOFzFIEPBgzOe-or1eX_POns=.a5205641-139b-4749-afcc-57ddbc85e6be@github.com> Message-ID: On Tue, 23 Apr 2024 08:54:42 GMT, Afshin Zafari wrote: >> src/hotspot/share/memory/metaspace/testHelpers.cpp line 81: >> >>> 79: if (reserve_limit > 0) { >>> 80: // have reserve limit -> non-expandable context >>> 81: _rs = ReservedSpace(reserve_limit * BytesPerWord, Metaspace::reserve_alignment(), os::vm_page_size(), mtMetaspace); >> >> I would make this mtTest. This should not increase the metaspace counters in NMT > > Done. Huh! Now these comments are appearing saying it should be `mtTest` ??? I think github is messing things up here with hidden comments that make the actual time flow invisible. @tstuefe can you please clarify what this should be. Thanks. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18745#discussion_r1586128894 From dholmes at openjdk.org Wed May 1 10:18:51 2024 From: dholmes at openjdk.org (David Holmes) Date: Wed, 1 May 2024 10:18:51 GMT Subject: RFR: 8331208: Memory stress test that checks OutOfMemoryError stack trace fails In-Reply-To: <9w0MpMhs68Tl4MHxLfuWY2ZRja_JFYBvjD8JRqjcTOY=.aa79b8a4-1abb-48a6-96a2-e4a543c401ea@github.com> References: <9w0MpMhs68Tl4MHxLfuWY2ZRja_JFYBvjD8JRqjcTOY=.aa79b8a4-1abb-48a6-96a2-e4a543c401ea@github.com> Message-ID: On Wed, 1 May 2024 09:34:37 GMT, Doug Simon wrote: >> But if this is a hidden/internal OOME then why would we treat it as a normal OOME and trigger the XX action? If the allocation routines returned null instead, we would never consider triggering the XX actions for OOME. > > It depends on what the purpose of the `-XX` actions is. As far as I can tell, they are for understanding when and why the JVM hits a memory limit from an external perspective. For example, until something like https://bugs.openjdk.org/browse/JDK-8328639 exists, I don't think it would be easy to discover an OOME caused by the string constant resolution done by the JIT. But maybe that doesn't matter? I'm fine with keeping the XX actions conditional if you'd prefer. I think it is better to keep them conditional - thanks. The -XX:OnOutOfMemoryError is an action to take when a user-visible OOME would be thrown. We don't run these actions for VM allocation failures. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18925#discussion_r1586134438 From kevinw at openjdk.org Wed May 1 11:04:21 2024 From: kevinw at openjdk.org (Kevin Walls) Date: Wed, 1 May 2024 11:04:21 GMT Subject: RFR: 8314225: SIGSEGV in JavaThread::is_lock_owned [v3] In-Reply-To: References: Message-ID: > Removal of JavaThread's MonitorChunks member. This held lock information during deoptimization, but access to it is unnecessary for anything other than the deoptimization itself. > > Access to it in is_lock_owned() was racy, and caused rare crashes. Kevin Walls has updated the pull request incrementally with one additional commit since the last revision: Remove JavaThread's is_lock_owned ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18940/files - new: https://git.openjdk.org/jdk/pull/18940/files/17760bd5..ce92b92b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18940&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18940&range=01-02 Stats: 13 lines in 3 files changed: 0 ins; 13 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/18940.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18940/head:pull/18940 PR: https://git.openjdk.org/jdk/pull/18940 From kevinw at openjdk.org Wed May 1 11:04:22 2024 From: kevinw at openjdk.org (Kevin Walls) Date: Wed, 1 May 2024 11:04:22 GMT Subject: RFR: 8314225: SIGSEGV in JavaThread::is_lock_owned [v3] In-Reply-To: References: Message-ID: On Wed, 1 May 2024 10:04:33 GMT, Kevin Walls wrote: >> src/hotspot/share/runtime/javaThread.cpp line 1051: >> >>> 1049: assert(LockingMode != LM_LIGHTWEIGHT, "should not be called with new lightweight locking"); >>> 1050: return Thread::is_lock_owned(adr); >>> 1051: } >> >> Can't we just remove `JavaThread::is_lock_owned` now? > > Yes I was thinking about that - it just defers to Thread::is_lock_owned and that has the same assert. I will remove it. Updated with this removal. Thread::is_lock_owned had a comment above it from jdk5 or before, referring to JavaThread::is_lock_owned, which I removed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18940#discussion_r1586179476 From coleenp at openjdk.org Wed May 1 12:10:52 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 1 May 2024 12:10:52 GMT Subject: RFR: 8314225: SIGSEGV in JavaThread::is_lock_owned [v3] In-Reply-To: References: Message-ID: On Wed, 1 May 2024 11:04:21 GMT, Kevin Walls wrote: >> Removal of JavaThread's MonitorChunks member. This held lock information during deoptimization, but access to it is unnecessary for anything other than the deoptimization itself. >> >> Access to it in is_lock_owned() was racy, and caused rare crashes. > > Kevin Walls has updated the pull request incrementally with one additional commit since the last revision: > > Remove JavaThread's is_lock_owned src/hotspot/share/runtime/vframeArray.cpp line 55: > 53: MonitorChunk* chunk = _monitors; > 54: _monitors = nullptr; > 55: delete chunk; Is there just one monitor now on the vframeArrayElement? All of these functions imply there are more than one but if I'm reading this right only one gets deleted. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18940#discussion_r1586235275 From sspitsyn at openjdk.org Wed May 1 12:33:08 2024 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Wed, 1 May 2024 12:33:08 GMT Subject: RFR: 8328083: degrade virtual thread support for GetObjectMonitorUsage Message-ID: <-lAT5GzHVRrOUJhhMLfV5CkkPA3DHHDUZfdE7CBOcHg=.ecb91a2d-82c2-4e94-a1f6-f84d7a8c2a87@github.com> The fix is to degrade virtual threads support in the JVM TI `GetObjectMonitorUsage` function so that it is specified to only return an owner when the owner is a platform thread. Also, virtual threads are not listed in the both `waiters` and `notify_waiters` lists returned in the `jvmtiMonitorUsage` structure. Java 19 re-specified a number of JVMTI functions and events for virtual threads, we missed this one. The main motivation for degrading it now is that the object monitor implementation is being updated to allow virtual threads unmount while owning monitors. It would add overhead to record monitor usage when freezing/unmount, overhead that couldn't be tied to a JVMTI capability as the capability can be enabled at any time. `GetObjectMonitorUsage` was broken for 20+ years ([8247972](https://bugs.openjdk.org/browse/JDK-8247972)) without bug reports so it seems unlikely that the function is widely used. Degrading it to only return an owner when the owner is a platform thread has no compatibility impact for tooling that uses it in conjunction with `HotSpot` thread dumps or `ThreadMXBean`. One other point about `GetObjectMonitorUsage` is that it pre-dates j.u.concurrent in Java 5 so it can't be used to get a full picture of the lock usage in a program. The specs of the impacted `JDWP ObjectReference.MonitorInfo` command and the JDI `ObjectReference` `ownerThread()`, `waitingThreads()` and `entryCount()` methods are updated to match the JVM TI spec. Also, please, review the related CSR and Release Note: - CSR: [8331422](https://bugs.openjdk.org/browse/JDK-8331422): degrade virtual thread support for GetObjectMonitorUsage - RN: [8331465](https://bugs.openjdk.org/browse/JDK-8331465): Release Note: degrade virtual thread support for GetObjectMonitorUsage Testing: - tested impacted and updated tests locally - tested with mach5 tiers 1-6 ------------- Commit messages: - fix trailing space in one test - 8328083: degrade virtual thread support for GetObjectMonitorUsage Changes: https://git.openjdk.org/jdk/pull/19030/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19030&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8328083 Stats: 156 lines in 12 files changed: 102 ins; 2 del; 52 mod Patch: https://git.openjdk.org/jdk/pull/19030.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19030/head:pull/19030 PR: https://git.openjdk.org/jdk/pull/19030 From tonyp at openjdk.org Wed May 1 12:34:52 2024 From: tonyp at openjdk.org (Antonios Printezis) Date: Wed, 1 May 2024 12:34:52 GMT Subject: RFR: 8331399: RISC-V: Don't us mv instead of la In-Reply-To: References: Message-ID: <7YxEIH-MssXs543WWT5P-kSdr7FdHeorzQbrgZ1z3CQ=.29978cdd-4e86-4b29-bd44-7454ce5996f3@github.com> On Tue, 30 Apr 2024 09:27:09 GMT, Robbin Ehn wrote: > Hi please consider, > > It makes no sense to use mv instead of la. > It doesn't follow the standard mnemonics and it confusing when people use mv when they really mean la. > > la will do the reloc with movptr in this case, so the code is the same. > > Testing t1. > > Thanks, Robbin Marked as reviewed by tonyp (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/19014#pullrequestreview-2033536087 From tonyp at openjdk.org Wed May 1 12:35:51 2024 From: tonyp at openjdk.org (Antonios Printezis) Date: Wed, 1 May 2024 12:35:51 GMT Subject: RFR: 8331360: RISCV: u32 _partial_subtype_ctr loaded/stored as 64 In-Reply-To: References: Message-ID: On Tue, 30 Apr 2024 06:54:56 GMT, Robbin Ehn wrote: > Hi, please consider. > > We should use incrementw() for these. > > Sanity tested, running t1. > > Thanks, Robbin Marked as reviewed by tonyp (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/19010#pullrequestreview-2033536613 From kevinw at openjdk.org Wed May 1 13:12:52 2024 From: kevinw at openjdk.org (Kevin Walls) Date: Wed, 1 May 2024 13:12:52 GMT Subject: RFR: 8314225: SIGSEGV in JavaThread::is_lock_owned [v3] In-Reply-To: References: Message-ID: On Wed, 1 May 2024 12:08:31 GMT, Coleen Phillimore wrote: >> Kevin Walls has updated the pull request incrementally with one additional commit since the last revision: >> >> Remove JavaThread's is_lock_owned > > src/hotspot/share/runtime/vframeArray.cpp line 55: > >> 53: MonitorChunk* chunk = _monitors; >> 54: _monitors = nullptr; >> 55: delete chunk; > > Is there just one monitor now on the vframeArrayElement? All of these functions imply there are more than one but if I'm reading this right only one gets deleted. Thanks Coleen - There is one MonitorChunk, created with new MonitorChunk(list->length()); where list is vf->monitors() and it creates its _monitors member as a NEW_C_HEAP_ARRAY sized by number of monitors. ~MonitorChunk() calls FreeHeap(monitors()); so all the info for possibly multiple monitors is freed in one call as before. JavaThread had possibly a linked list of MonitorChunks, that is being removed (if we ever used that list, wouldn't that imply multiple deopts in the same JavaThread at the same time??) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18940#discussion_r1586275883 From dnsimon at openjdk.org Wed May 1 13:22:19 2024 From: dnsimon at openjdk.org (Doug Simon) Date: Wed, 1 May 2024 13:22:19 GMT Subject: RFR: 8331208: Memory stress test that checks OutOfMemoryError stack trace fails [v2] In-Reply-To: References: Message-ID: > This pull request mitigates failures in memory stress tests that check the stack trace of an `OutOfMemoryError` for certain expected entries. > > The stack trace of an OOME will [not be allocated once all preallocated OOMEs are used up](https://github.com/openjdk/jdk/blob/3d5eeac3a38ece4a23ea6da2dfe5939d64e81cea/src/hotspot/share/memory/universe.cpp#L722). If the only heap allocations performed in stressful conditions are those of the stress test, then the [4 preallocated OOMEs](https://github.com/openjdk/jdk/blob/f1d0e715b67e2ca47b525069d8153abbb33f75b9/src/hotspot/share/runtime/globals.hpp#L800) would be sufficient. However, it's possible for VM internal allocations to also occur during stressful conditions, especially in `-Xcomp` mode. For example, [CompileBroker::compile_method](https://github.com/openjdk/jdk/blob/3d5eeac3a38ece4a23ea6da2dfe5939d64e81cea/src/hotspot/share/compiler/compileBroker.cpp#L1399) will try to resolve the string constants in the constant pool of the method about to be compiled. This can fail as shown here: > > V [jvm.dll+0x62c23a] Exceptions::_throw+0x11a (exceptions.cpp:168) > V [jvm.dll+0x62d85b] Exceptions::_throw_oop+0xab (exceptions.cpp:140) > V [jvm.dll+0xbbce78] MemAllocator::Allocation::check_out_of_memory+0x208 (memAllocator.cpp:138) > V [jvm.dll+0xbbcac8] MemAllocator::allocate+0x158 (memAllocator.cpp:377) > V [jvm.dll+0x79bd05] InstanceKlass::allocate_instance+0x95 (instanceKlass.cpp:1509) > V [jvm.dll+0x7ddeed] java_lang_String::basic_create+0x9d (javaClasses.cpp:273) > V [jvm.dll+0x7e43c0] java_lang_String::create_from_unicode+0x60 (javaClasses.cpp:291) > V [jvm.dll+0xdb91a5] StringTable::do_intern+0xb5 (stringTable.cpp:379) > V [jvm.dll+0xdba9f2] StringTable::intern+0x1b2 (stringTable.cpp:368) > V [jvm.dll+0xdbaaa6] StringTable::intern+0x86 (stringTable.cpp:328) > V [jvm.dll+0x51c8b1] ConstantPool::string_at_impl+0x1d1 (constantPool.cpp:1251) > V [jvm.dll+0x51b95b] ConstantPool::resolve_string_constants_impl+0xeb (constantPool.cpp:800) > V [jvm.dll+0x4f2f8d] CompileBroker::compile_method+0x31d (compileBroker.cpp:1395) > V [jvm.dll+0x4f3474] CompileBroker::compile_method+0xc4 (compileBroker.cpp:1348) > > These internal allocations can occur before the allocations of the test and thus use up the pre-allocated OOMEs. As a result, the OOMEs triggered by the stress test may end up throwing the [default, shared OOME instance](https://github.com/openjdk/jdk/blob/3d5eeac3a38ece4a23ea6da2dfe5939d64e81cea/src/hotspot/... Doug Simon has updated the pull request incrementally with two additional commits since the last revision: - don't perform XX actions for OOME when in scope of an InternalOOMEMark - rename SandboxedOOMEMark to InternalOOMEMark ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18925/files - new: https://git.openjdk.org/jdk/pull/18925/files/137ab236..977bdc28 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18925&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18925&range=00-01 Stats: 28 lines in 8 files changed: 0 ins; 1 del; 27 mod Patch: https://git.openjdk.org/jdk/pull/18925.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18925/head:pull/18925 PR: https://git.openjdk.org/jdk/pull/18925 From coleenp at openjdk.org Wed May 1 13:26:53 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 1 May 2024 13:26:53 GMT Subject: RFR: 8314225: SIGSEGV in JavaThread::is_lock_owned [v3] In-Reply-To: References: Message-ID: On Wed, 1 May 2024 13:10:29 GMT, Kevin Walls wrote: >> src/hotspot/share/runtime/vframeArray.cpp line 55: >> >>> 53: MonitorChunk* chunk = _monitors; >>> 54: _monitors = nullptr; >>> 55: delete chunk; >> >> Is there just one monitor now on the vframeArrayElement? All of these functions imply there are more than one but if I'm reading this right only one gets deleted. > > Thanks Coleen - > > There is one MonitorChunk, created with new MonitorChunk(list->length()); where list is vf->monitors() > and it creates its _monitors member as a NEW_C_HEAP_ARRAY sized by number of monitors. > > ~MonitorChunk() calls FreeHeap(monitors()); so all the info for possibly multiple monitors is freed in one call as before. > > > JavaThread had possibly a linked list of MonitorChunks, that is being removed (if we ever used that list, wouldn't that imply multiple deopts in the same JavaThread at the same time??) Ok, so I didn't read it wrong. If there's only one monitor now on each vframeArrayElement, can you change the name to _monitor and this function should be free_monitor_chunk, singular? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18940#discussion_r1586291859 From kevinw at openjdk.org Wed May 1 13:47:55 2024 From: kevinw at openjdk.org (Kevin Walls) Date: Wed, 1 May 2024 13:47:55 GMT Subject: RFR: 8314225: SIGSEGV in JavaThread::is_lock_owned [v3] In-Reply-To: References: Message-ID: <83KpG4NT8ge0rryQMnc02lSPcWwoYegPxPSB7X8hhf8=.4750e433-a7cf-43f5-a950-f66023449fe5@github.com> On Wed, 1 May 2024 13:24:14 GMT, Coleen Phillimore wrote: >> Thanks Coleen - >> >> There is one MonitorChunk, created with new MonitorChunk(list->length()); where list is vf->monitors() >> and it creates its _monitors member as a NEW_C_HEAP_ARRAY sized by number of monitors. >> >> ~MonitorChunk() calls FreeHeap(monitors()); so all the info for possibly multiple monitors is freed in one call as before. >> >> >> JavaThread had possibly a linked list of MonitorChunks, that is being removed (if we ever used that list, wouldn't that imply multiple deopts in the same JavaThread at the same time??) > > Ok, so I didn't read it wrong. If there's only one monitor now on each vframeArrayElement, can you change the name to _monitor and this function should be free_monitor_chunk, singular? Oh, in a vframeArray there are multiple vframeArrayElement of course. So vframeArray::deallocate_monitor_chunks() has plural "chunks" in its name correctly, as it deals with all the elements. There's only one MonitorChunk* in each vframeArrayElement, and that's called "_monitors". ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18940#discussion_r1586315539 From coleenp at openjdk.org Wed May 1 14:31:51 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 1 May 2024 14:31:51 GMT Subject: RFR: 8314225: SIGSEGV in JavaThread::is_lock_owned [v3] In-Reply-To: <83KpG4NT8ge0rryQMnc02lSPcWwoYegPxPSB7X8hhf8=.4750e433-a7cf-43f5-a950-f66023449fe5@github.com> References: <83KpG4NT8ge0rryQMnc02lSPcWwoYegPxPSB7X8hhf8=.4750e433-a7cf-43f5-a950-f66023449fe5@github.com> Message-ID: On Wed, 1 May 2024 13:45:39 GMT, Kevin Walls wrote: > There's only one MonitorChunk* in each vframeArrayElement, and that's called "_monitors". If there's only one, should it not be called _monitor (singular?) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18940#discussion_r1586358350 From duke at openjdk.org Wed May 1 15:01:54 2024 From: duke at openjdk.org (Yuri Gaevsky) Date: Wed, 1 May 2024 15:01:54 GMT Subject: RFR: 8324124: RISC-V: implement _vectorizedMismatch intrinsic In-Reply-To: References: Message-ID: On Tue, 23 Apr 2024 09:48:29 GMT, Hamlin Li wrote: > Hi, Do you have plan to implement instrinsic `VectorCmpMasked`? It's part of `vectorizedMismatch` Hi @Hamlin-Li, I don't have such plan for the moment. Why do you think it should be a part of `_vectorizedMismatch` intrinsic? The similar [fix](https://github.com/openjdk/jdk/commit/b05c40ca3b5fd34cbbc7a9479b108a4ff2c099f1?diff=split&w=0) for X64 ([JDK-8266951](https://bugs.openjdk.org/browse/JDK-8266951)) looks like natural enhancement/followup for the original intrinsic functionality. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17750#issuecomment-2088591081 From iklam at openjdk.org Wed May 1 20:07:07 2024 From: iklam at openjdk.org (Ioi Lam) Date: Wed, 1 May 2024 20:07:07 GMT Subject: RFR: 8330532: Improve line-oriented text parsing in HotSpot [v4] In-Reply-To: References: Message-ID: > (This PR is an alternative to https://github.com/openjdk/jdk/pull/18669 with a better API for reading lines of text) > > HotSpot has a few cases where information is parsed from a file, or from a memory buffer, one line at a time. Example: > > - https://github.com/openjdk/jdk/blob/064628471b83616b4463baa78618d1b7a66d0c7c/src/hotspot/share/cds/classListParser.cpp#L169 > - https://github.com/openjdk/jdk/blob/064628471b83616b4463baa78618d1b7a66d0c7c/src/hotspot/share/compiler/compilerOracle.cpp#L1059-L1066 > > Common problems: > - They use a fixed buffer for reading a line, so long (but valid) lines will cause errors. > - There's ad-hoc code that deals with `FILE*` differently than from memory. > > This RFE implements a common utility, `inputStream`, for reading lines from different sources of input (see `FileInput` and `MemoryInput`). We fixed only `ClassListParser` and `CompilerOracle` in this RFE, but we can fix other readers in follow-up RFEs. > > The API allows other source of input to be implemented. For example, one could implement a `SocketInput` if there's a use case for it. > > In the future, `inputStream` can be extended (or encapsulated in a higher-level reader class) to read typed input tokens (for example, integers, strings, etc.) > > Credit: > The `inputStream` class and friends are contributed by @rose00 . See https://mail.openjdk.org/pipermail/hotspot-dev/2024-April/087077.html . > > John's original version is in the draft PR https://github.com/openjdk/jdk/pull/18773. In order to minimize the size of this PR, I have kept only the functionalities for reading a line and a time. Other features, such as pushing back contents into the `inputStream`, could be added in follow-up PRs. (These removed features can be found in the commit history of this PR). Ioi Lam has updated the pull request incrementally with three additional commits since the last revision: - BlockInputStream is used by gtest only, so moved it there - removed unused set_position(), etc - removed _must_free ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18833/files - new: https://git.openjdk.org/jdk/pull/18833/files/9c10ae56..bd7986e1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18833&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18833&range=02-03 Stats: 94 lines in 3 files changed: 19 ins; 65 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/18833.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18833/head:pull/18833 PR: https://git.openjdk.org/jdk/pull/18833 From iklam at openjdk.org Wed May 1 20:07:07 2024 From: iklam at openjdk.org (Ioi Lam) Date: Wed, 1 May 2024 20:07:07 GMT Subject: RFR: 8330532: Improve line-oriented text parsing in HotSpot [v3] In-Reply-To: <4__55RnizjcZwBGgP4QlfXXX6HBzn5jbRn_xrRPE4uM=.994bc41d-4bb3-4b63-b6dc-b533b598d0a6@github.com> References: <4__55RnizjcZwBGgP4QlfXXX6HBzn5jbRn_xrRPE4uM=.994bc41d-4bb3-4b63-b6dc-b533b598d0a6@github.com> Message-ID: On Tue, 23 Apr 2024 18:29:36 GMT, John R Rose wrote: >> src/hotspot/share/utilities/istream.hpp line 106: >> >>> 104: size_t _end; // offset to end of known current line (else content_end) >>> 105: size_t _next; // offset to known start of next line (else =end) >>> 106: void* _must_free; // unless null, a malloc pointer which we must free >> >> Reading this code, why do we set `_must_free` instead of simply having a method: >> >> ```c++ >> bool must_free() { >> return _buffer != &_small_buffer; >> } >> >> >> and just delete the `_must_free` field. > > Good question. There was a version of the code that accepted a user-supplied buffer, optionally. In that case, `_must_free` was set false (or to a user-requested value), since it was up to the user whether the user-supplied buffer should be freed. It could be a static buffer. If there is no longer such an option in the existing constructors, then this field can be GC-ed. I removed `_must_free` and added a new `has_c_heap_buffer()` method. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18833#discussion_r1586766416 From iklam at openjdk.org Wed May 1 20:30:56 2024 From: iklam at openjdk.org (Ioi Lam) Date: Wed, 1 May 2024 20:30:56 GMT Subject: RFR: 8330532: Improve line-oriented text parsing in HotSpot [v3] In-Reply-To: References: <4__55RnizjcZwBGgP4QlfXXX6HBzn5jbRn_xrRPE4uM=.994bc41d-4bb3-4b63-b6dc-b533b598d0a6@github.com> <2K-VA9DRH9DAgDL9HB__STvlnE0gSBRjPNU3NLOrZT0=.7ee74867-cf57-4c13-bd54-751425d2793a@github.com> Message-ID: On Wed, 24 Apr 2024 09:43:17 GMT, Johan Sj?len wrote: >> The `override` keyword is nice; thank you. >> >> I have already argued against the removal of `set_input`. And `set_input` needs `close`. >> >> I think `set_input` is not YAGNI but YIWNI = Yes I will need it. The reply that ?you can just wrap another i-stream around the new i-source? is fallacious because of the performance model of i-stream. > > Sorry, I'm still not on board with the `close` operation and I'm against `set_input` calling `close()` :-). Why is it necessary for the `inputStream` to require a file to be re-opened if the `inputStream` switches from one file to another? > > To be clear: OK, we want `set_input` because we don't want to allocate two small buffers, that's fine by me. @jdksjolen I have incorporated most of your suggested changes. I left this code in for now: void inputStream::set_input(inputStream::Input* input) { clear_buffer(); if (_input != nullptr && _input != input) { _input->close(); } ... I am also leaning towards removing the `close()` call. Otherwise it would be unsymmetrical - the `inputStream` doesn't open the `_input` automatically, but it will close it automatically for us. It seems better to leave both the `open` and `close` to the caller the `inputStream`. @rose00 what do you think? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18833#discussion_r1586789727 From iklam at openjdk.org Wed May 1 20:35:01 2024 From: iklam at openjdk.org (Ioi Lam) Date: Wed, 1 May 2024 20:35:01 GMT Subject: RFR: 8329728: Read long lines in ClassListParser [v5] In-Reply-To: References: Message-ID: On Wed, 10 Apr 2024 17:54:25 GMT, Ioi Lam wrote: >> Today the `ClassListParser` has a hard-coded limit of 4096 chars for each line in the CDS class list file. However, it's possible for a line to be much longer than than (64KB for the class name, plus extra information that can include path names, IDs, etc). >> >> I wrote a utility class `LineReader` that automatically allocates a buffer before calling `fgets()`. Hopefully this can be useful for other cases where we call `fgets()` with a fixed buffer size. >> >> Max line width is limited to 4M to simplify testing (and avoid running into corner cases when we approach INT_MAX). > > Ioi Lam has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - Merge branch 'master' into 8329728-read-arbitrary-long-lines-in-class-list-parser > - @dholmes-ora and @calvinccheung comments > - Check class name for valid UTF8 encoding > - @matias9927 and @calvinccheung comments - limit line to 4M. Added gtest cases. Test for class names > 64K > - 8329728: Read arbitrarily long lines in ClassListParser A more comprehensive solution is in [JDK-8330532](https://bugs.openjdk.org/browse/JDK-8330532). Closing this PR as a duplicate. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18669#issuecomment-2089086573 From iklam at openjdk.org Wed May 1 20:35:02 2024 From: iklam at openjdk.org (Ioi Lam) Date: Wed, 1 May 2024 20:35:02 GMT Subject: Withdrawn: 8329728: Read long lines in ClassListParser In-Reply-To: References: Message-ID: On Mon, 8 Apr 2024 04:51:31 GMT, Ioi Lam wrote: > Today the `ClassListParser` has a hard-coded limit of 4096 chars for each line in the CDS class list file. However, it's possible for a line to be much longer than than (64KB for the class name, plus extra information that can include path names, IDs, etc). > > I wrote a utility class `LineReader` that automatically allocates a buffer before calling `fgets()`. Hopefully this can be useful for other cases where we call `fgets()` with a fixed buffer size. > > Max line width is limited to 4M to simplify testing (and avoid running into corner cases when we approach INT_MAX). This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/18669 From dlong at openjdk.org Wed May 1 20:46:55 2024 From: dlong at openjdk.org (Dean Long) Date: Wed, 1 May 2024 20:46:55 GMT Subject: RFR: 8314225: SIGSEGV in JavaThread::is_lock_owned [v3] In-Reply-To: References: <83KpG4NT8ge0rryQMnc02lSPcWwoYegPxPSB7X8hhf8=.4750e433-a7cf-43f5-a950-f66023449fe5@github.com> Message-ID: On Wed, 1 May 2024 14:28:51 GMT, Coleen Phillimore wrote: >> Oh, in a vframeArray there are multiple vframeArrayElement of course. >> So vframeArray::deallocate_monitor_chunks() has plural "chunks" in its name correctly, as it deals with all the elements. >> >> There's only one MonitorChunk* in each vframeArrayElement, and that's called "_monitors". > >> There's only one MonitorChunk* in each vframeArrayElement, and that's called "_monitors". > > If there's only one, should it not be called _monitor (singular?) A chunk is an array of monitors. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18940#discussion_r1586804558 From dlong at openjdk.org Wed May 1 21:24:54 2024 From: dlong at openjdk.org (Dean Long) Date: Wed, 1 May 2024 21:24:54 GMT Subject: RFR: 8314225: SIGSEGV in JavaThread::is_lock_owned [v3] In-Reply-To: References: Message-ID: On Wed, 1 May 2024 11:04:21 GMT, Kevin Walls wrote: >> Removal of JavaThread's MonitorChunks member. This held lock information during deoptimization, but access to it is unnecessary for anything other than the deoptimization itself. >> >> Access to it in is_lock_owned() was racy, and caused rare crashes. > > Kevin Walls has updated the pull request incrementally with one additional commit since the last revision: > > Remove JavaThread's is_lock_owned src/hotspot/share/runtime/thread.cpp line 530: > 528: #endif // ASSERT > 529: > 530: bool Thread::is_lock_owned(address adr) const { Is there any reason not to move this to JavaThread now? Also, I don't think it needs to be virtual. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18940#discussion_r1586844763 From cjplummer at openjdk.org Wed May 1 21:27:01 2024 From: cjplummer at openjdk.org (Chris Plummer) Date: Wed, 1 May 2024 21:27:01 GMT Subject: RFR: 8328083: degrade virtual thread support for GetObjectMonitorUsage In-Reply-To: <-lAT5GzHVRrOUJhhMLfV5CkkPA3DHHDUZfdE7CBOcHg=.ecb91a2d-82c2-4e94-a1f6-f84d7a8c2a87@github.com> References: <-lAT5GzHVRrOUJhhMLfV5CkkPA3DHHDUZfdE7CBOcHg=.ecb91a2d-82c2-4e94-a1f6-f84d7a8c2a87@github.com> Message-ID: On Wed, 1 May 2024 10:20:52 GMT, Serguei Spitsyn wrote: > The fix is to degrade virtual threads support in the JVM TI `GetObjectMonitorUsage` function so that it is specified to only return an owner when the owner is a platform thread. Also, virtual threads are not listed in the both `waiters` and `notify_waiters` lists returned in the `jvmtiMonitorUsage` structure. Java 19 re-specified a number of JVMTI functions and events for virtual threads, we missed this one. > > The main motivation for degrading it now is that the object monitor implementation is being updated to allow virtual threads unmount while owning monitors. It would add overhead to record monitor usage when freezing/unmount, overhead that couldn't be tied to a JVMTI capability as the capability can be enabled at any time. > > `GetObjectMonitorUsage` was broken for 20+ years ([8247972](https://bugs.openjdk.org/browse/JDK-8247972)) without bug reports so it seems unlikely that the function is widely used. Degrading it to only return an owner when the owner is a platform thread has no compatibility impact for tooling that uses it in conjunction with `HotSpot` thread dumps or `ThreadMXBean`. > > One other point about `GetObjectMonitorUsage` is that it pre-dates j.u.concurrent in Java 5 so it can't be used to get a full picture of the lock usage in a program. > > The specs of the impacted `JDWP ObjectReference.MonitorInfo` command and the JDI `ObjectReference` `ownerThread()`, `waitingThreads()` and `entryCount()` methods are updated to match the JVM TI spec. > > Also, please, review the related CSR and Release Note: > - CSR: [8331422](https://bugs.openjdk.org/browse/JDK-8331422): degrade virtual thread support for GetObjectMonitorUsage > - RN: [8331465](https://bugs.openjdk.org/browse/JDK-8331465): Release Note: degrade virtual thread support for GetObjectMonitorUsage > > Testing: > - tested impacted and updated tests locally > - tested with mach5 tiers 1-6 I've only looked at specs and tests so far. Still need to review the JVMTI code changes. I looked at the CSR too, but thought it best to just comment on the spec changes here. src/hotspot/share/prims/jvmti.xml line 8259: > 8257: > 8258: > 8259: The number of times the owning platform thread has entered the monitor "the owning platform thread" doesn't really make sense if the monitor is owned by a virtual thread. You might want structure it more like the "owner" description above: The number of times the platform thread owning this monitor has has entered it, or 0 if owned by a virtual thread or not owned src/hotspot/share/prims/jvmti.xml line 8266: > 8264: > 8265: The number of platform threads waiting to own this monitor, > 8266: or 0 if the monitor is owned by a virtual thread or not owned Be consistent with above descriptions. They don't say "if the monitor is owned by". They say "if owned by". src/hotspot/share/prims/jvmti.xml line 8279: > 8277: > 8278: The number of platform threads waiting to be notified by this monitor, > 8279: or 0 if the monitor is owned by a virtual thread or not owned Same consistency issue as with `waiter_count` src/java.se/share/data/jdwp/jdwp.spec line 1620: > 1618: ) > 1619: (Reply > 1620: (threadObject owner "The platform thread owning this monitor, or nullptr " I don't think we should be introducing `nullptr` for just this one location. Please stick with `null` for now. src/java.se/share/data/jdwp/jdwp.spec line 1621: > 1619: (Reply > 1620: (threadObject owner "The platform thread owning this monitor, or nullptr " > 1621: "if owned` by a virtual thread or not owned.") You have a dangling back quote after "owned". This is showing up in the CSR too. src/java.se/share/data/jdwp/jdwp.spec line 1622: > 1620: (threadObject owner "The platform thread owning this monitor, or nullptr " > 1621: "if owned` by a virtual thread or not owned.") > 1622: (int entryCount "The number of times the owning platform thread has entered the monitor.") See the comment I left for the JVMTI spec. We should be more complete in the explanation here, explaining how it is 0 for virtual threads. src/jdk.jdi/share/classes/com/sun/jdi/ObjectReference.java line 348: > 346: /** > 347: * Returns a List containing a {@link ThreadReference} for > 348: * each platform thread currently waiting for this object's monitor. You need to add "platform" a little below in the `@return` section. src/jdk.jdi/share/classes/com/sun/jdi/ObjectReference.java line 369: > 367: > 368: /** > 369: * Returns an {@link ThreadReference} for the platform thread, if any, Pre-existing issue: It should be "a" not "an", but then in the `@return` section we are using "the", so maybe we should use similar wording here: `...the {@link ThreadReference} of the platform thread...` test/hotspot/jtreg/serviceability/jvmti/ObjectMonitorUsage/ObjectMonitorUsage.java line 257: > 255: // Correct the expected values for the virtual thread case. > 256: int expEnteringCount = isVirtual ? 0 : NUMBER_OF_ENTERING_THREADS; > 257: int expWaitingCount = isVirtual ? 0 : NUMBER_OF_WAITING_THREADS; There are comments below that still reference NUMBER_OF_ENTERING_THREADS and NUMBER_OF_WAITING_THREADS. test/hotspot/jtreg/vmTestbase/nsk/jdi/ObjectReference/waitingThreads/waitingthreads002.java line 167: > 165: try { > 166: List waitingThreads = objRef.waitingThreads(); > 167: if (waitingThreads.size() != expWaitingCount) { I don't see the need for the expWaitingCount bookkeeping. Can't we just verify that size() is zero if we are using virtual threads? I guess maybe the reason you took this approach is because you don't know if the threads are going to be virtual or not until you check them. There is a way to find out, but it's not that pretty either: static final boolean vthreadMode = "Virtual".equals(System.getProperty("test.thread.factory")); test/hotspot/jtreg/vmTestbase/nsk/jvmti/GetObjectMonitorUsage/objmonusage001.java line 65: > 63: } > 64: // Virtual threads are not supported by the GetObjectMonitorUsage. Correct > 65: // the expected values if the test is executed with MainWrapper=virtual. "MainWrapper" is not the proper terminology any more. It's "Test Thread Factory" (JTREG_TEST_THREAD_FACTORY=Virtual). test/hotspot/jtreg/vmTestbase/nsk/jvmti/GetObjectMonitorUsage/objmonusage001.java line 158: > 156: public void run() { > 157: // Virtual threads are not supported by the GetObjectMonitorUsage. Correct > 158: // the expected values if the test is executed with MainWrapper=virtual. "MainWrapper" again. test/hotspot/jtreg/vmTestbase/nsk/jvmti/GetObjectMonitorUsage/objmonusage004.java line 64: > 62: synchronized (lockCheck) { > 63: // Virtual threads are not supported by the GetObjectMonitorUsage. Correct > 64: // the expected values if the test is executed with MainWrapper=virtual. "MainWrappe" again. ------------- PR Review: https://git.openjdk.org/jdk/pull/19030#pullrequestreview-2034390826 PR Review Comment: https://git.openjdk.org/jdk/pull/19030#discussion_r1586784250 PR Review Comment: https://git.openjdk.org/jdk/pull/19030#discussion_r1586784280 PR Review Comment: https://git.openjdk.org/jdk/pull/19030#discussion_r1586792380 PR Review Comment: https://git.openjdk.org/jdk/pull/19030#discussion_r1586800777 PR Review Comment: https://git.openjdk.org/jdk/pull/19030#discussion_r1586802318 PR Review Comment: https://git.openjdk.org/jdk/pull/19030#discussion_r1586803324 PR Review Comment: https://git.openjdk.org/jdk/pull/19030#discussion_r1586806802 PR Review Comment: https://git.openjdk.org/jdk/pull/19030#discussion_r1586809854 PR Review Comment: https://git.openjdk.org/jdk/pull/19030#discussion_r1586833617 PR Review Comment: https://git.openjdk.org/jdk/pull/19030#discussion_r1586821719 PR Review Comment: https://git.openjdk.org/jdk/pull/19030#discussion_r1586824426 PR Review Comment: https://git.openjdk.org/jdk/pull/19030#discussion_r1586827714 PR Review Comment: https://git.openjdk.org/jdk/pull/19030#discussion_r1586829010 From kevinw at openjdk.org Wed May 1 21:42:53 2024 From: kevinw at openjdk.org (Kevin Walls) Date: Wed, 1 May 2024 21:42:53 GMT Subject: RFR: 8314225: SIGSEGV in JavaThread::is_lock_owned [v3] In-Reply-To: References: <4NzfdylxvqETF87l3E4O3XdBMInuP7_8S9mhS6tN0QA=.cc497605-246b-4ebc-9816-09b384683e0d@github.com> Message-ID: <4U-AP8zHxJrxwXYoTcxlpn5OvztYUW-ijTAd5TJ3I_4=.731aeb9c-115a-40c8-9298-577e0fada9ce@github.com> On Wed, 1 May 2024 10:00:13 GMT, David Holmes wrote: >> src/hotspot/share/runtime/vframeArray.cpp line 317: >> >>> 315: BasicObjectLock* src = _monitors->at(index); >>> 316: top->set_obj(src->obj()); >>> 317: assert(ObjectSynchronizer::current_thread_holds_lock(thread, Handle(thread, src->obj())), "should be held, before move_to"); >> >> Same comment as above, may need to check for null obj. > > Not sure how `obj` can be null in this code. ??? That is fetching from an index in the MonitorChunk* _monitors, so if we recognise null in element->fill_in() when populating MonitorChunk*, seems good to recognise it here in vframeArrayElement::unpack_on_stack()? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18940#discussion_r1586858264 From dholmes at openjdk.org Wed May 1 21:55:54 2024 From: dholmes at openjdk.org (David Holmes) Date: Wed, 1 May 2024 21:55:54 GMT Subject: RFR: 8314225: SIGSEGV in JavaThread::is_lock_owned [v3] In-Reply-To: References: Message-ID: On Wed, 1 May 2024 21:22:15 GMT, Dean Long wrote: >> Kevin Walls has updated the pull request incrementally with one additional commit since the last revision: >> >> Remove JavaThread's is_lock_owned > > src/hotspot/share/runtime/thread.cpp line 530: > >> 528: #endif // ASSERT >> 529: >> 530: bool Thread::is_lock_owned(address adr) const { > > Is there any reason not to move this to JavaThread now? Also, I don't think it needs to be virtual. Good point. Only JavaThread's can own ObjectMonitors. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18940#discussion_r1586866888 From dholmes at openjdk.org Wed May 1 22:12:51 2024 From: dholmes at openjdk.org (David Holmes) Date: Wed, 1 May 2024 22:12:51 GMT Subject: RFR: 8314225: SIGSEGV in JavaThread::is_lock_owned [v3] In-Reply-To: <4U-AP8zHxJrxwXYoTcxlpn5OvztYUW-ijTAd5TJ3I_4=.731aeb9c-115a-40c8-9298-577e0fada9ce@github.com> References: <4NzfdylxvqETF87l3E4O3XdBMInuP7_8S9mhS6tN0QA=.cc497605-246b-4ebc-9816-09b384683e0d@github.com> <4U-AP8zHxJrxwXYoTcxlpn5OvztYUW-ijTAd5TJ3I_4=.731aeb9c-115a-40c8-9298-577e0fada9ce@github.com> Message-ID: On Wed, 1 May 2024 21:40:00 GMT, Kevin Walls wrote: >> Not sure how `obj` can be null in this code. ??? > > That is fetching from an index in the MonitorChunk* _monitors, so if we recognise null in element->fill_in() when populating MonitorChunk*, seems good to recognise it here in vframeArrayElement::unpack_on_stack()? I can follow that logic but ... if it is null then what is this code actually doing? We have determined that the frame does contain locked monitors and so we are transferring them across. How can such a locked monitor have a null object? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18940#discussion_r1586877095 From dholmes at openjdk.org Wed May 1 21:59:52 2024 From: dholmes at openjdk.org (David Holmes) Date: Wed, 1 May 2024 21:59:52 GMT Subject: RFR: 8314225: SIGSEGV in JavaThread::is_lock_owned [v3] In-Reply-To: References: Message-ID: On Wed, 1 May 2024 11:04:21 GMT, Kevin Walls wrote: >> Removal of JavaThread's MonitorChunks member. This held lock information during deoptimization, but access to it is unnecessary for anything other than the deoptimization itself. >> >> Access to it in is_lock_owned() was racy, and caused rare crashes. > > Kevin Walls has updated the pull request incrementally with one additional commit since the last revision: > > Remove JavaThread's is_lock_owned Looking good! ------------- PR Review: https://git.openjdk.org/jdk/pull/18940#pullrequestreview-2034521303 From sspitsyn at openjdk.org Wed May 1 22:34:52 2024 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Wed, 1 May 2024 22:34:52 GMT Subject: RFR: 8328083: degrade virtual thread support for GetObjectMonitorUsage In-Reply-To: References: <-lAT5GzHVRrOUJhhMLfV5CkkPA3DHHDUZfdE7CBOcHg=.ecb91a2d-82c2-4e94-a1f6-f84d7a8c2a87@github.com> Message-ID: On Wed, 1 May 2024 20:21:25 GMT, Chris Plummer wrote: >> The fix is to degrade virtual threads support in the JVM TI `GetObjectMonitorUsage` function so that it is specified to only return an owner when the owner is a platform thread. Also, virtual threads are not listed in the both `waiters` and `notify_waiters` lists returned in the `jvmtiMonitorUsage` structure. Java 19 re-specified a number of JVMTI functions and events for virtual threads, we missed this one. >> >> The main motivation for degrading it now is that the object monitor implementation is being updated to allow virtual threads unmount while owning monitors. It would add overhead to record monitor usage when freezing/unmount, overhead that couldn't be tied to a JVMTI capability as the capability can be enabled at any time. >> >> `GetObjectMonitorUsage` was broken for 20+ years ([8247972](https://bugs.openjdk.org/browse/JDK-8247972)) without bug reports so it seems unlikely that the function is widely used. Degrading it to only return an owner when the owner is a platform thread has no compatibility impact for tooling that uses it in conjunction with `HotSpot` thread dumps or `ThreadMXBean`. >> >> One other point about `GetObjectMonitorUsage` is that it pre-dates j.u.concurrent in Java 5 so it can't be used to get a full picture of the lock usage in a program. >> >> The specs of the impacted `JDWP ObjectReference.MonitorInfo` command and the JDI `ObjectReference` `ownerThread()`, `waitingThreads()` and `entryCount()` methods are updated to match the JVM TI spec. >> >> Also, please, review the related CSR and Release Note: >> - CSR: [8331422](https://bugs.openjdk.org/browse/JDK-8331422): degrade virtual thread support for GetObjectMonitorUsage >> - RN: [8331465](https://bugs.openjdk.org/browse/JDK-8331465): Release Note: degrade virtual thread support for GetObjectMonitorUsage >> >> Testing: >> - tested impacted and updated tests locally >> - tested with mach5 tiers 1-6 > > src/hotspot/share/prims/jvmti.xml line 8259: > >> 8257: >> 8258: >> 8259: The number of times the owning platform thread has entered the monitor > > "the owning platform thread" doesn't really make sense if the monitor is owned by a virtual thread. You might want structure it more like the "owner" description above: > > > The number of times the platform thread owning this monitor has has entered it, > or 0 if owned by a virtual thread or not owned Good suggestion, thanks. It is more consistent this way. Updated. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19030#discussion_r1586890106 From kbarrett at openjdk.org Wed May 1 21:31:52 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Wed, 1 May 2024 21:31:52 GMT Subject: RFR: 8331352: error: template-id not allowed for constructor/destructor in C++20 In-Reply-To: <1ZltzGMdx6rjCX1VNnYGFbYCi6YfskRwK8p_Rn0Hnek=.97e6c9b7-fbdf-4008-b624-cc34cd1e4a4d@github.com> References: <1ZltzGMdx6rjCX1VNnYGFbYCi6YfskRwK8p_Rn0Hnek=.97e6c9b7-fbdf-4008-b624-cc34cd1e4a4d@github.com> Message-ID: <90atkYrYzeODGrz_lrWPBvouFE-3YknorvE85kgfB9s=.626478b5-0c06-42fa-bbb3-c6b20da84440@github.com> On Tue, 30 Apr 2024 09:14:05 GMT, Julian Waters wrote: > Seems weird that we're facing C++20 issues when HotSpot is only on C++14. This seems like it should be in the disabled warnings list of HotSpot for erroneous warnings that gcc is giving us, just my 2 cents I agree it's a bit weird that the forward -Wc++N-compat warnings are enabled by -Wall, but presumably the gcc maintainers have already had that discussion. So long as it's not causing significant problems (and in this case I think it's not), I don't think we should disable either the forward compat warnings or the specific warnings that fall under them. If we were going to disable -Wc++20-compat I'd be tempted to re-enable -Wtemplate-id-cdtor as a thing we don't want folks to do anyway. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19009#issuecomment-2089168289 From sspitsyn at openjdk.org Wed May 1 22:42:52 2024 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Wed, 1 May 2024 22:42:52 GMT Subject: RFR: 8328083: degrade virtual thread support for GetObjectMonitorUsage In-Reply-To: References: <-lAT5GzHVRrOUJhhMLfV5CkkPA3DHHDUZfdE7CBOcHg=.ecb91a2d-82c2-4e94-a1f6-f84d7a8c2a87@github.com> Message-ID: On Wed, 1 May 2024 20:21:28 GMT, Chris Plummer wrote: >> The fix is to degrade virtual threads support in the JVM TI `GetObjectMonitorUsage` function so that it is specified to only return an owner when the owner is a platform thread. Also, virtual threads are not listed in the both `waiters` and `notify_waiters` lists returned in the `jvmtiMonitorUsage` structure. Java 19 re-specified a number of JVMTI functions and events for virtual threads, we missed this one. >> >> The main motivation for degrading it now is that the object monitor implementation is being updated to allow virtual threads unmount while owning monitors. It would add overhead to record monitor usage when freezing/unmount, overhead that couldn't be tied to a JVMTI capability as the capability can be enabled at any time. >> >> `GetObjectMonitorUsage` was broken for 20+ years ([8247972](https://bugs.openjdk.org/browse/JDK-8247972)) without bug reports so it seems unlikely that the function is widely used. Degrading it to only return an owner when the owner is a platform thread has no compatibility impact for tooling that uses it in conjunction with `HotSpot` thread dumps or `ThreadMXBean`. >> >> One other point about `GetObjectMonitorUsage` is that it pre-dates j.u.concurrent in Java 5 so it can't be used to get a full picture of the lock usage in a program. >> >> The specs of the impacted `JDWP ObjectReference.MonitorInfo` command and the JDI `ObjectReference` `ownerThread()`, `waitingThreads()` and `entryCount()` methods are updated to match the JVM TI spec. >> >> Also, please, review the related CSR and Release Note: >> - CSR: [8331422](https://bugs.openjdk.org/browse/JDK-8331422): degrade virtual thread support for GetObjectMonitorUsage >> - RN: [8331465](https://bugs.openjdk.org/browse/JDK-8331465): Release Note: degrade virtual thread support for GetObjectMonitorUsage >> >> Testing: >> - tested impacted and updated tests locally >> - tested with mach5 tiers 1-6 > > src/hotspot/share/prims/jvmti.xml line 8266: > >> 8264: >> 8265: The number of platform threads waiting to own this monitor, >> 8266: or 0 if the monitor is owned by a virtual thread or not owned > > Be consistent with above descriptions. They don't say "if the monitor is owned by". They say "if owned by". Good suggestion, thanks. But it is more "incorrect". It should say "is waited by" instead of "is owned by": The number of platform threads waiting to own this monitor, or 0 if the monitor is waited by virtual threads only or not owned ``` Are you okay with this correction? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19030#discussion_r1586894427 From sspitsyn at openjdk.org Wed May 1 22:58:53 2024 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Wed, 1 May 2024 22:58:53 GMT Subject: RFR: 8328083: degrade virtual thread support for GetObjectMonitorUsage In-Reply-To: References: <-lAT5GzHVRrOUJhhMLfV5CkkPA3DHHDUZfdE7CBOcHg=.ecb91a2d-82c2-4e94-a1f6-f84d7a8c2a87@github.com> Message-ID: On Wed, 1 May 2024 20:30:54 GMT, Chris Plummer wrote: >> The fix is to degrade virtual threads support in the JVM TI `GetObjectMonitorUsage` function so that it is specified to only return an owner when the owner is a platform thread. Also, virtual threads are not listed in the both `waiters` and `notify_waiters` lists returned in the `jvmtiMonitorUsage` structure. Java 19 re-specified a number of JVMTI functions and events for virtual threads, we missed this one. >> >> The main motivation for degrading it now is that the object monitor implementation is being updated to allow virtual threads unmount while owning monitors. It would add overhead to record monitor usage when freezing/unmount, overhead that couldn't be tied to a JVMTI capability as the capability can be enabled at any time. >> >> `GetObjectMonitorUsage` was broken for 20+ years ([8247972](https://bugs.openjdk.org/browse/JDK-8247972)) without bug reports so it seems unlikely that the function is widely used. Degrading it to only return an owner when the owner is a platform thread has no compatibility impact for tooling that uses it in conjunction with `HotSpot` thread dumps or `ThreadMXBean`. >> >> One other point about `GetObjectMonitorUsage` is that it pre-dates j.u.concurrent in Java 5 so it can't be used to get a full picture of the lock usage in a program. >> >> The specs of the impacted `JDWP ObjectReference.MonitorInfo` command and the JDI `ObjectReference` `ownerThread()`, `waitingThreads()` and `entryCount()` methods are updated to match the JVM TI spec. >> >> Also, please, review the related CSR and Release Note: >> - CSR: [8331422](https://bugs.openjdk.org/browse/JDK-8331422): degrade virtual thread support for GetObjectMonitorUsage >> - RN: [8331465](https://bugs.openjdk.org/browse/JDK-8331465): Release Note: degrade virtual thread support for GetObjectMonitorUsage >> >> Testing: >> - tested impacted and updated tests locally >> - tested with mach5 tiers 1-6 > > src/hotspot/share/prims/jvmti.xml line 8279: > >> 8277: >> 8278: The number of platform threads waiting to be notified by this monitor, >> 8279: or 0 if the monitor is owned by a virtual thread or not owned > > Same consistency issue as with `waiter_count` Thanks. Let's align it with `waiter_count` solution. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19030#discussion_r1586902737 From sspitsyn at openjdk.org Wed May 1 23:02:52 2024 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Wed, 1 May 2024 23:02:52 GMT Subject: RFR: 8328083: degrade virtual thread support for GetObjectMonitorUsage In-Reply-To: References: <-lAT5GzHVRrOUJhhMLfV5CkkPA3DHHDUZfdE7CBOcHg=.ecb91a2d-82c2-4e94-a1f6-f84d7a8c2a87@github.com> Message-ID: On Wed, 1 May 2024 20:40:35 GMT, Chris Plummer wrote: >> The fix is to degrade virtual threads support in the JVM TI `GetObjectMonitorUsage` function so that it is specified to only return an owner when the owner is a platform thread. Also, virtual threads are not listed in the both `waiters` and `notify_waiters` lists returned in the `jvmtiMonitorUsage` structure. Java 19 re-specified a number of JVMTI functions and events for virtual threads, we missed this one. >> >> The main motivation for degrading it now is that the object monitor implementation is being updated to allow virtual threads unmount while owning monitors. It would add overhead to record monitor usage when freezing/unmount, overhead that couldn't be tied to a JVMTI capability as the capability can be enabled at any time. >> >> `GetObjectMonitorUsage` was broken for 20+ years ([8247972](https://bugs.openjdk.org/browse/JDK-8247972)) without bug reports so it seems unlikely that the function is widely used. Degrading it to only return an owner when the owner is a platform thread has no compatibility impact for tooling that uses it in conjunction with `HotSpot` thread dumps or `ThreadMXBean`. >> >> One other point about `GetObjectMonitorUsage` is that it pre-dates j.u.concurrent in Java 5 so it can't be used to get a full picture of the lock usage in a program. >> >> The specs of the impacted `JDWP ObjectReference.MonitorInfo` command and the JDI `ObjectReference` `ownerThread()`, `waitingThreads()` and `entryCount()` methods are updated to match the JVM TI spec. >> >> Also, please, review the related CSR and Release Note: >> - CSR: [8331422](https://bugs.openjdk.org/browse/JDK-8331422): degrade virtual thread support for GetObjectMonitorUsage >> - RN: [8331465](https://bugs.openjdk.org/browse/JDK-8331465): Release Note: degrade virtual thread support for GetObjectMonitorUsage >> >> Testing: >> - tested impacted and updated tests locally >> - tested with mach5 tiers 1-6 > > src/java.se/share/data/jdwp/jdwp.spec line 1620: > >> 1618: ) >> 1619: (Reply >> 1620: (threadObject owner "The platform thread owning this monitor, or nullptr " > > I don't think we should be introducing `nullptr` for just this one location. Please stick with `null` for now. Good catch, thanks. Updated. > src/java.se/share/data/jdwp/jdwp.spec line 1621: > >> 1619: (Reply >> 1620: (threadObject owner "The platform thread owning this monitor, or nullptr " >> 1621: "if owned` by a virtual thread or not owned.") > > You have a dangling back quote after "owned". This is showing up in the CSR too. Thanks. Fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19030#discussion_r1586905504 PR Review Comment: https://git.openjdk.org/jdk/pull/19030#discussion_r1586905989 From sspitsyn at openjdk.org Wed May 1 23:09:55 2024 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Wed, 1 May 2024 23:09:55 GMT Subject: RFR: 8328083: degrade virtual thread support for GetObjectMonitorUsage In-Reply-To: References: <-lAT5GzHVRrOUJhhMLfV5CkkPA3DHHDUZfdE7CBOcHg=.ecb91a2d-82c2-4e94-a1f6-f84d7a8c2a87@github.com> Message-ID: On Wed, 1 May 2024 21:03:31 GMT, Chris Plummer wrote: >> The fix is to degrade virtual threads support in the JVM TI `GetObjectMonitorUsage` function so that it is specified to only return an owner when the owner is a platform thread. Also, virtual threads are not listed in the both `waiters` and `notify_waiters` lists returned in the `jvmtiMonitorUsage` structure. Java 19 re-specified a number of JVMTI functions and events for virtual threads, we missed this one. >> >> The main motivation for degrading it now is that the object monitor implementation is being updated to allow virtual threads unmount while owning monitors. It would add overhead to record monitor usage when freezing/unmount, overhead that couldn't be tied to a JVMTI capability as the capability can be enabled at any time. >> >> `GetObjectMonitorUsage` was broken for 20+ years ([8247972](https://bugs.openjdk.org/browse/JDK-8247972)) without bug reports so it seems unlikely that the function is widely used. Degrading it to only return an owner when the owner is a platform thread has no compatibility impact for tooling that uses it in conjunction with `HotSpot` thread dumps or `ThreadMXBean`. >> >> One other point about `GetObjectMonitorUsage` is that it pre-dates j.u.concurrent in Java 5 so it can't be used to get a full picture of the lock usage in a program. >> >> The specs of the impacted `JDWP ObjectReference.MonitorInfo` command and the JDI `ObjectReference` `ownerThread()`, `waitingThreads()` and `entryCount()` methods are updated to match the JVM TI spec. >> >> Also, please, review the related CSR and Release Note: >> - CSR: [8331422](https://bugs.openjdk.org/browse/JDK-8331422): degrade virtual thread support for GetObjectMonitorUsage >> - RN: [8331465](https://bugs.openjdk.org/browse/JDK-8331465): Release Note: degrade virtual thread support for GetObjectMonitorUsage >> >> Testing: >> - tested impacted and updated tests locally >> - tested with mach5 tiers 1-6 > > test/hotspot/jtreg/vmTestbase/nsk/jvmti/GetObjectMonitorUsage/objmonusage001.java line 65: > >> 63: } >> 64: // Virtual threads are not supported by the GetObjectMonitorUsage. Correct >> 65: // the expected values if the test is executed with MainWrapper=virtual. > > "MainWrapper" is not the proper terminology any more. It's "Test Thread Factory" (JTREG_TEST_THREAD_FACTORY=Virtual). Good suggestion, thanks. Then I'd suggest this: // Virtual threads are not supported by the GetObjectMonitorUsage. // Correct the expected values if the test is executed with the // JTREG_TEST_THREAD_FACTORY=Virtual. > test/hotspot/jtreg/vmTestbase/nsk/jvmti/GetObjectMonitorUsage/objmonusage001.java line 158: > >> 156: public void run() { >> 157: // Virtual threads are not supported by the GetObjectMonitorUsage. Correct >> 158: // the expected values if the test is executed with MainWrapper=virtual. > > "MainWrapper" again. Thanks. Same as above. > test/hotspot/jtreg/vmTestbase/nsk/jvmti/GetObjectMonitorUsage/objmonusage004.java line 64: > >> 62: synchronized (lockCheck) { >> 63: // Virtual threads are not supported by the GetObjectMonitorUsage. Correct >> 64: // the expected values if the test is executed with MainWrapper=virtual. > > "MainWrappe" again. Thanks. Same as above. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19030#discussion_r1586908764 PR Review Comment: https://git.openjdk.org/jdk/pull/19030#discussion_r1586908965 PR Review Comment: https://git.openjdk.org/jdk/pull/19030#discussion_r1586909037 From cjplummer at openjdk.org Wed May 1 23:20:52 2024 From: cjplummer at openjdk.org (Chris Plummer) Date: Wed, 1 May 2024 23:20:52 GMT Subject: RFR: 8328083: degrade virtual thread support for GetObjectMonitorUsage In-Reply-To: References: <-lAT5GzHVRrOUJhhMLfV5CkkPA3DHHDUZfdE7CBOcHg=.ecb91a2d-82c2-4e94-a1f6-f84d7a8c2a87@github.com> Message-ID: <1zzFr4VCy2uAwXew1jEUuLVXpylbM06Vb7wqbhbzCPg=.efc7adf8-185f-4942-a40f-9a13953a2687@github.com> On Wed, 1 May 2024 22:40:02 GMT, Serguei Spitsyn wrote: >> src/hotspot/share/prims/jvmti.xml line 8266: >> >>> 8264: >>> 8265: The number of platform threads waiting to own this monitor, >>> 8266: or 0 if the monitor is owned by a virtual thread or not owned >> >> Be consistent with above descriptions. They don't say "if the monitor is owned by". They say "if owned by". > > Good suggestion, thanks. But it is more "incorrect". It should say "is waited by" instead of "is owned by": > > The number of platform threads waiting to own this monitor, or 0 > if is waited by virtual threads only or no threads are waiting > ``` > Are you okay with this correction? > Or maybe we should say: > > The number of platform threads waiting to own this monitor, or 0 > if virtual threads only are waiting or no threads are waiting Copy and paste issue on my part. I would use "if only virtual threads". >> test/hotspot/jtreg/vmTestbase/nsk/jvmti/GetObjectMonitorUsage/objmonusage001.java line 65: >> >>> 63: } >>> 64: // Virtual threads are not supported by the GetObjectMonitorUsage. Correct >>> 65: // the expected values if the test is executed with MainWrapper=virtual. >> >> "MainWrapper" is not the proper terminology any more. It's "Test Thread Factory" (JTREG_TEST_THREAD_FACTORY=Virtual). > > Good suggestion, thanks. Then I'd suggest this: > > // Virtual threads are not supported by the GetObjectMonitorUsage. > // Correct the expected values if the test is executed with the > // JTREG_TEST_THREAD_FACTORY=Virtual. You can drop "the" from "with the JTREG_TEST_THREAD_FACTORY=Virtual" ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19030#discussion_r1586913098 PR Review Comment: https://git.openjdk.org/jdk/pull/19030#discussion_r1586913936 From dlong at openjdk.org Thu May 2 00:12:51 2024 From: dlong at openjdk.org (Dean Long) Date: Thu, 2 May 2024 00:12:51 GMT Subject: RFR: 8314225: SIGSEGV in JavaThread::is_lock_owned [v3] In-Reply-To: References: <4NzfdylxvqETF87l3E4O3XdBMInuP7_8S9mhS6tN0QA=.cc497605-246b-4ebc-9816-09b384683e0d@github.com> <4U-AP8zHxJrxwXYoTcxlpn5OvztYUW-ijTAd5TJ3I_4=.731aeb9c-115a-40c8-9298-577e0fada9ce@github.com> Message-ID: On Wed, 1 May 2024 22:09:54 GMT, David Holmes wrote: >> That is fetching from an index in the MonitorChunk* _monitors, so if we recognise null in element->fill_in() when populating MonitorChunk*, seems good to recognise it here in vframeArrayElement::unpack_on_stack()? > > I can follow that logic but ... if it is null then what is this code actually doing? We have determined that the frame does contain locked monitors and so we are transferring them across. How can such a locked monitor have a null object? I assume it's only for the `fill_in` `realloc_failures` case. But you're right, it doesn't seem very useful. It's just going to look like an unlocked monitor slot in the interpreter frame. We could consider skipping these in `fill_in`, then they won't show up later in `unpack_on_stack`(). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18940#discussion_r1586938374 From sspitsyn at openjdk.org Thu May 2 00:53:58 2024 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Thu, 2 May 2024 00:53:58 GMT Subject: RFR: 8328083: degrade virtual thread support for GetObjectMonitorUsage In-Reply-To: <1zzFr4VCy2uAwXew1jEUuLVXpylbM06Vb7wqbhbzCPg=.efc7adf8-185f-4942-a40f-9a13953a2687@github.com> References: <-lAT5GzHVRrOUJhhMLfV5CkkPA3DHHDUZfdE7CBOcHg=.ecb91a2d-82c2-4e94-a1f6-f84d7a8c2a87@github.com> <1zzFr4VCy2uAwXew1jEUuLVXpylbM06Vb7wqbhbzCPg=.efc7adf8-185f-4942-a40f-9a13953a2687@github.com> Message-ID: On Wed, 1 May 2024 23:16:09 GMT, Chris Plummer wrote: >> Good suggestion, thanks. But it is more "incorrect". It should say "is waited by" instead of "is owned by": >> >> The number of platform threads waiting to own this monitor, or 0 >> if is waited by virtual threads only or no threads are waiting >> ``` >> Are you okay with this correction? >> Or maybe we should say: >> >> The number of platform threads waiting to own this monitor, or 0 >> if virtual threads only are waiting or no threads are waiting > > Copy and paste issue on my part. I would use "if only virtual threads". Okay, thanks. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19030#discussion_r1586955071 From dholmes at openjdk.org Thu May 2 01:42:02 2024 From: dholmes at openjdk.org (David Holmes) Date: Thu, 2 May 2024 01:42:02 GMT Subject: RFR: 8331208: Memory stress test that checks OutOfMemoryError stack trace fails [v2] In-Reply-To: References: Message-ID: On Wed, 1 May 2024 13:22:19 GMT, Doug Simon wrote: >> This pull request mitigates failures in memory stress tests that check the stack trace of an `OutOfMemoryError` for certain expected entries. >> >> The stack trace of an OOME will [not be allocated once all preallocated OOMEs are used up](https://github.com/openjdk/jdk/blob/3d5eeac3a38ece4a23ea6da2dfe5939d64e81cea/src/hotspot/share/memory/universe.cpp#L722). If the only heap allocations performed in stressful conditions are those of the stress test, then the [4 preallocated OOMEs](https://github.com/openjdk/jdk/blob/f1d0e715b67e2ca47b525069d8153abbb33f75b9/src/hotspot/share/runtime/globals.hpp#L800) would be sufficient. However, it's possible for VM internal allocations to also occur during stressful conditions, especially in `-Xcomp` mode. For example, [CompileBroker::compile_method](https://github.com/openjdk/jdk/blob/3d5eeac3a38ece4a23ea6da2dfe5939d64e81cea/src/hotspot/share/compiler/compileBroker.cpp#L1399) will try to resolve the string constants in the constant pool of the method about to be compiled. This can fail as shown here: >> >> V [jvm.dll+0x62c23a] Exceptions::_throw+0x11a (exceptions.cpp:168) >> V [jvm.dll+0x62d85b] Exceptions::_throw_oop+0xab (exceptions.cpp:140) >> V [jvm.dll+0xbbce78] MemAllocator::Allocation::check_out_of_memory+0x208 (memAllocator.cpp:138) >> V [jvm.dll+0xbbcac8] MemAllocator::allocate+0x158 (memAllocator.cpp:377) >> V [jvm.dll+0x79bd05] InstanceKlass::allocate_instance+0x95 (instanceKlass.cpp:1509) >> V [jvm.dll+0x7ddeed] java_lang_String::basic_create+0x9d (javaClasses.cpp:273) >> V [jvm.dll+0x7e43c0] java_lang_String::create_from_unicode+0x60 (javaClasses.cpp:291) >> V [jvm.dll+0xdb91a5] StringTable::do_intern+0xb5 (stringTable.cpp:379) >> V [jvm.dll+0xdba9f2] StringTable::intern+0x1b2 (stringTable.cpp:368) >> V [jvm.dll+0xdbaaa6] StringTable::intern+0x86 (stringTable.cpp:328) >> V [jvm.dll+0x51c8b1] ConstantPool::string_at_impl+0x1d1 (constantPool.cpp:1251) >> V [jvm.dll+0x51b95b] ConstantPool::resolve_string_constants_impl+0xeb (constantPool.cpp:800) >> V [jvm.dll+0x4f2f8d] CompileBroker::compile_method+0x31d (compileBroker.cpp:1395) >> V [jvm.dll+0x4f3474] CompileBroker::compile_method+0xc4 (compileBroker.cpp:1348) >> >> These internal allocations can occur before the allocations of the test and thus use up the pre-allocated OOMEs. As a result, the OOMEs triggered by the stress test may end up throwing the [default, shared OOME instance](https://github.com/openjdk/jdk/blob/3d5eeac3a38ec... > > Doug Simon has updated the pull request incrementally with two additional commits since the last revision: > > - don't perform XX actions for OOME when in scope of an InternalOOMEMark > - rename SandboxedOOMEMark to InternalOOMEMark src/hotspot/share/gc/shared/memAllocator.hpp line 131: > 129: > 130: public: > 131: InternalOOMEMark(JavaThread* thread) { Suggestion: add a comment: // Passing a null thread allows for a no-op implementation for contexts that will suppress // throwing of the OOME - see RetryableAllocationMark. I was wondering if we really need this. AFAICS it would be harmless to always pass in the current thread and set the thread's field because when we would have passed null then no exception would be thrown anyway. It seems the null thread is only used as a means for RAM to track whether activate was false. But I guess a no-op IOM achieves the same goal. src/hotspot/share/gc/shared/memAllocator.hpp line 151: > 149: } > 150: > 151: // Returns nullptr iff `activate` was false in the constructor. This comment is out of place - `activate` is in the RAM constructor src/hotspot/share/jvmci/jvmciRuntime.cpp line 107: > 105: RetryableAllocationMark(JavaThread* thread, bool activate) : _iom(activate ? thread : nullptr) {} > 106: ~RetryableAllocationMark() { > 107: JavaThread* THREAD = _iom.thread(); Please restore comment: `// For exception macros.` src/hotspot/share/jvmci/jvmciRuntime.cpp line 114: > 112: if (ex->is_a(vmClasses::OutOfMemoryError_klass())) { > 113: CLEAR_PENDING_EXCEPTION; > 114: } Just an observation but the original code will clear all exceptions except for an "async" exception, which these days is only the InternalError thrown by unsafe-access-errors. But the new code will only clear OOME thus allowing the (as expected) InternalError to remain, but also any other VirtualMachineErrors that may have arisen e.g. StackOverflowError. I actually think this is more correct, but it does seem a change in behaviour that we may need to be wary of. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18925#discussion_r1586968398 PR Review Comment: https://git.openjdk.org/jdk/pull/18925#discussion_r1586968924 PR Review Comment: https://git.openjdk.org/jdk/pull/18925#discussion_r1586969275 PR Review Comment: https://git.openjdk.org/jdk/pull/18925#discussion_r1586974448 From dholmes at openjdk.org Thu May 2 02:52:00 2024 From: dholmes at openjdk.org (David Holmes) Date: Thu, 2 May 2024 02:52:00 GMT Subject: RFR: 8328083: degrade virtual thread support for GetObjectMonitorUsage In-Reply-To: References: <-lAT5GzHVRrOUJhhMLfV5CkkPA3DHHDUZfdE7CBOcHg=.ecb91a2d-82c2-4e94-a1f6-f84d7a8c2a87@github.com> <1zzFr4VCy2uAwXew1jEUuLVXpylbM06Vb7wqbhbzCPg=.efc7adf8-185f-4942-a40f-9a13953a2687@github.com> Message-ID: On Thu, 2 May 2024 00:51:20 GMT, Serguei Spitsyn wrote: >> Copy and paste issue on my part. I would use "if only virtual threads". > > Okay, thanks. Second suggestion is better. "waited by" is not grammatically correct in this context. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19030#discussion_r1587002775 From dholmes at openjdk.org Thu May 2 02:52:01 2024 From: dholmes at openjdk.org (David Holmes) Date: Thu, 2 May 2024 02:52:01 GMT Subject: RFR: 8328083: degrade virtual thread support for GetObjectMonitorUsage In-Reply-To: <1zzFr4VCy2uAwXew1jEUuLVXpylbM06Vb7wqbhbzCPg=.efc7adf8-185f-4942-a40f-9a13953a2687@github.com> References: <-lAT5GzHVRrOUJhhMLfV5CkkPA3DHHDUZfdE7CBOcHg=.ecb91a2d-82c2-4e94-a1f6-f84d7a8c2a87@github.com> <1zzFr4VCy2uAwXew1jEUuLVXpylbM06Vb7wqbhbzCPg=.efc7adf8-185f-4942-a40f-9a13953a2687@github.com> Message-ID: On Wed, 1 May 2024 23:17:58 GMT, Chris Plummer wrote: >> Good suggestion, thanks. Then I'd suggest this: >> >> // Virtual threads are not supported by the GetObjectMonitorUsage. >> // Correct the expected values if the test is executed with the >> // JTREG_TEST_THREAD_FACTORY=Virtual. > > You can drop "the" from "with the JTREG_TEST_THREAD_FACTORY=Virtual" And drop "the" from "the GetObjectMonitorUsage". ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19030#discussion_r1587003633 From rehn at openjdk.org Thu May 2 06:33:00 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Thu, 2 May 2024 06:33:00 GMT Subject: Withdrawn: 8330161: RISC-V: Don't use C for Labels jumps In-Reply-To: References: Message-ID: On Fri, 12 Apr 2024 13:03:27 GMT, Robbin Ehn wrote: > Hi please consider! > > jal do not have C switch, we always use the full length instructions. > But jalr have, in case of an unbound Label which is to far for jal we can emit c_jalr. > When we bind the Label we can't patch the c_jalr. > > Sanity tested. This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/18761 From rehn at openjdk.org Thu May 2 06:33:00 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Thu, 2 May 2024 06:33:00 GMT Subject: Integrated: 8331360: RISCV: u32 _partial_subtype_ctr loaded/stored as 64 In-Reply-To: References: Message-ID: On Tue, 30 Apr 2024 06:54:56 GMT, Robbin Ehn wrote: > Hi, please consider. > > We should use incrementw() for these. > > Sanity tested, running t1. > > Thanks, Robbin This pull request has now been integrated. Changeset: 5ab8713b Author: Robbin Ehn URL: https://git.openjdk.org/jdk/commit/5ab8713b3fcdf8a1e9d44fc71190845f32449fce Stats: 9 lines in 2 files changed: 0 ins; 7 del; 2 mod 8331360: RISCV: u32 _partial_subtype_ctr loaded/stored as 64 Reviewed-by: fyang, mli, tonyp ------------- PR: https://git.openjdk.org/jdk/pull/19010 From sspitsyn at openjdk.org Thu May 2 06:37:54 2024 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Thu, 2 May 2024 06:37:54 GMT Subject: RFR: 8328083: degrade virtual thread support for GetObjectMonitorUsage In-Reply-To: References: <-lAT5GzHVRrOUJhhMLfV5CkkPA3DHHDUZfdE7CBOcHg=.ecb91a2d-82c2-4e94-a1f6-f84d7a8c2a87@github.com> <1zzFr4VCy2uAwXew1jEUuLVXpylbM06Vb7wqbhbzCPg=.efc7adf8-185f-4942-a40f-9a13953a2687@github.com> Message-ID: <3PBlt6Id-KHcyBKctS2jcAxbKFQuAQRlG1kIWHwnDWk=.956935c8-0a78-4778-8875-f378414aa94e@github.com> On Thu, 2 May 2024 02:47:19 GMT, David Holmes wrote: >> Okay, thanks. > > Second suggestion is better. "waited by" is not grammatically correct in this context. Thank you, David. So, the latest version is: The number of platform threads waiting to own this monitor, or 0 if only virtual threads are waiting or no threads are waiting ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19030#discussion_r1587131816 From sspitsyn at openjdk.org Thu May 2 06:43:53 2024 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Thu, 2 May 2024 06:43:53 GMT Subject: RFR: 8328083: degrade virtual thread support for GetObjectMonitorUsage In-Reply-To: References: <-lAT5GzHVRrOUJhhMLfV5CkkPA3DHHDUZfdE7CBOcHg=.ecb91a2d-82c2-4e94-a1f6-f84d7a8c2a87@github.com> <1zzFr4VCy2uAwXew1jEUuLVXpylbM06Vb7wqbhbzCPg=.efc7adf8-185f-4942-a40f-9a13953a2687@github.com> Message-ID: On Thu, 2 May 2024 02:49:35 GMT, David Holmes wrote: >> You can drop "the" from "with the JTREG_TEST_THREAD_FACTORY=Virtual" > > And drop "the" from "the GetObjectMonitorUsage". Thank you. Updated. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19030#discussion_r1587137513 From sspitsyn at openjdk.org Thu May 2 06:48:53 2024 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Thu, 2 May 2024 06:48:53 GMT Subject: RFR: 8328083: degrade virtual thread support for GetObjectMonitorUsage In-Reply-To: References: <-lAT5GzHVRrOUJhhMLfV5CkkPA3DHHDUZfdE7CBOcHg=.ecb91a2d-82c2-4e94-a1f6-f84d7a8c2a87@github.com> Message-ID: <2A25kL9oqh30aBRofiekO9CwmSwgEZ5LEcReUEfmxrQ=.eec2eaf8-dc9a-4a0d-bb42-d9f192f72fb2@github.com> On Wed, 1 May 2024 21:11:36 GMT, Chris Plummer wrote: >> The fix is to degrade virtual threads support in the JVM TI `GetObjectMonitorUsage` function so that it is specified to only return an owner when the owner is a platform thread. Also, virtual threads are not listed in the both `waiters` and `notify_waiters` lists returned in the `jvmtiMonitorUsage` structure. Java 19 re-specified a number of JVMTI functions and events for virtual threads, we missed this one. >> >> The main motivation for degrading it now is that the object monitor implementation is being updated to allow virtual threads unmount while owning monitors. It would add overhead to record monitor usage when freezing/unmount, overhead that couldn't be tied to a JVMTI capability as the capability can be enabled at any time. >> >> `GetObjectMonitorUsage` was broken for 20+ years ([8247972](https://bugs.openjdk.org/browse/JDK-8247972)) without bug reports so it seems unlikely that the function is widely used. Degrading it to only return an owner when the owner is a platform thread has no compatibility impact for tooling that uses it in conjunction with `HotSpot` thread dumps or `ThreadMXBean`. >> >> One other point about `GetObjectMonitorUsage` is that it pre-dates j.u.concurrent in Java 5 so it can't be used to get a full picture of the lock usage in a program. >> >> The specs of the impacted `JDWP ObjectReference.MonitorInfo` command and the JDI `ObjectReference` `ownerThread()`, `waitingThreads()` and `entryCount()` methods are updated to match the JVM TI spec. >> >> Also, please, review the related CSR and Release Note: >> - CSR: [8331422](https://bugs.openjdk.org/browse/JDK-8331422): degrade virtual thread support for GetObjectMonitorUsage >> - RN: [8331465](https://bugs.openjdk.org/browse/JDK-8331465): Release Note: degrade virtual thread support for GetObjectMonitorUsage >> >> Testing: >> - tested impacted and updated tests locally >> - tested with mach5 tiers 1-6 > > test/hotspot/jtreg/serviceability/jvmti/ObjectMonitorUsage/ObjectMonitorUsage.java line 257: > >> 255: // Correct the expected values for the virtual thread case. >> 256: int expEnteringCount = isVirtual ? 0 : NUMBER_OF_ENTERING_THREADS; >> 257: int expWaitingCount = isVirtual ? 0 : NUMBER_OF_WAITING_THREADS; > > There are comments below that still reference NUMBER_OF_ENTERING_THREADS and NUMBER_OF_WAITING_THREADS. Thank you for the comment. In fact, I don't know how to fix it. Replacing with `expEnteringCount/expWaitingCount` does not make sense to me. The comments are about the tested pattern, not about the real values. Please, let me know if you have any suggestion on fixing. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19030#discussion_r1587142283 From rehn at openjdk.org Thu May 2 06:50:57 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Thu, 2 May 2024 06:50:57 GMT Subject: RFR: 8326306: RISC-V: Re-structure MASM calls and jumps [v2] In-Reply-To: References: <1UZeWIQJIEYbPetxWPlhQffyAy4gWXvNiV79i4_3pMQ=.86fb3068-940b-49ea-a2ea-b84a865d4cca@github.com> <0gMQgeYKyAzms64-hBIrltqUSfetu3Kczwr7IwLmF18=.8f583ac0-afff-4f1b-985f-a688cd898ae3@github.com> <4iLVM5rBRUo43EgY72DPBxJJ3qaHC4Nx_aWBUW9pIM8=.1f7cdee2-15d8-4b0f-b4ac-082f23198d8e@github.com> Message-ID: On Wed, 1 May 2024 08:06:17 GMT, Robbin Ehn wrote: >> I am still thinking about the possibility of unifying `call` and `rt_call`. Having both of them could be confusing to me (and new comers I guess). What I was talking about in my previous comment is something like this add-on change: >> [addon.diff.txt](https://github.com/openjdk/jdk/files/15164874/addon.diff.txt) >> What do you think? > > Off today, I'll have a look tomorrow, thanks. Yes, we should use is_32bit_offset_from_codecache to use auipc as much as possible. As there is much churn in this patch and our testing takes so long I was trying to keep assembly same to avoid bisecting down issues. It's much better with just two calls, either rt call or VM leaf. I'll test your patch, thanks. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18942#discussion_r1587144634 From rehn at openjdk.org Thu May 2 07:09:53 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Thu, 2 May 2024 07:09:53 GMT Subject: RFR: 8326306: RISC-V: Re-structure MASM calls and jumps [v2] In-Reply-To: References: <1UZeWIQJIEYbPetxWPlhQffyAy4gWXvNiV79i4_3pMQ=.86fb3068-940b-49ea-a2ea-b84a865d4cca@github.com> <0gMQgeYKyAzms64-hBIrltqUSfetu3Kczwr7IwLmF18=.8f583ac0-afff-4f1b-985f-a688cd898ae3@github.com> <4iLVM5rBRUo43EgY72DPBxJJ3qaHC4Nx_aWBUW9pIM8=.1f7cdee2-15d8-4b0f-b4ac-082f23198d8e@github.com> Message-ID: On Thu, 2 May 2024 06:48:04 GMT, Robbin Ehn wrote: >> Off today, I'll have a look tomorrow, thanks. > > Yes, we should use is_32bit_offset_from_codecache to use auipc as much as possible. > As there is much churn in this patch and our testing takes so long I was trying to keep assembly same to avoid bisecting down issues. > > It's much better with just two calls, either rt call or VM leaf. > > I'll test your patch, thanks. We still need relocates rt_call, not sure why you removed it. It seem like we need two version of rt_call one with address and one with Address. Then it seem like we could remove far_call as the rt_call would do the right thing. I like your idea, and we should do that, but it seems like it's not trivial just to add to this patch. Is there a reason we need to include such in changes in this PR? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18942#discussion_r1587163354 From azafari at openjdk.org Thu May 2 07:23:08 2024 From: azafari at openjdk.org (Afshin Zafari) Date: Thu, 2 May 2024 07:23:08 GMT Subject: Integrated: 8330076: NMT: add/make a mandatory MEMFLAGS argument to family of os::reserve/commit/uncommit memory API In-Reply-To: <5GDKVVPITIzIcyfm-0tKOFzFIEPBgzOe-or1eX_POns=.a5205641-139b-4749-afcc-57ddbc85e6be@github.com> References: <5GDKVVPITIzIcyfm-0tKOFzFIEPBgzOe-or1eX_POns=.a5205641-139b-4749-afcc-57ddbc85e6be@github.com> Message-ID: <0Lvalxj55jlSR1a0qw2-X0Xp_j-RIx0x3KZzOYhmlr0=.7457d51d-8eda-4b4e-b6d6-9c9a860612c0@github.com> On Thu, 11 Apr 2024 15:54:38 GMT, Afshin Zafari wrote: > `MEMFLAGS flag` is used to hold/show the type of the memory regions in NMT. Each call of NMT API requires a search through the list of memory regions. > The Hotspot code reserves/commits/uncommits memory regions and later calls explicitly NMT API with a specific memory type (e.g., `mtGC`, `mtJavaHeap`) for that region. Therefore, there are two search in the list of regions per reserve/commit/uncommit operations, one for the operation and another for setting the type of the region. > When the memory type is passed in during reserve/commit/uncommit operations, NMT can use it and avoid the extra search for setting the memory type. > > Tests: tiers1-5 passed on linux-x64, macosx-aarch64 and windows-x64 for debug and non-debug builds. This pull request has now been integrated. Changeset: 4036d7d8 Author: Afshin Zafari URL: https://git.openjdk.org/jdk/commit/4036d7d8246da0550adf8543848606c777da20a1 Stats: 449 lines in 62 files changed: 29 ins; 51 del; 369 mod 8330076: NMT: add/make a mandatory MEMFLAGS argument to family of os::reserve/commit/uncommit memory API Reviewed-by: stefank, jsjolen, stuefe ------------- PR: https://git.openjdk.org/jdk/pull/18745 From azafari at openjdk.org Thu May 2 07:23:07 2024 From: azafari at openjdk.org (Afshin Zafari) Date: Thu, 2 May 2024 07:23:07 GMT Subject: RFR: 8330076: NMT: add/make a mandatory MEMFLAGS argument to family of os::reserve/commit/uncommit memory API [v13] In-Reply-To: References: <5GDKVVPITIzIcyfm-0tKOFzFIEPBgzOe-or1eX_POns=.a5205641-139b-4749-afcc-57ddbc85e6be@github.com> Message-ID: On Wed, 1 May 2024 08:30:34 GMT, David Holmes wrote: >> This is a big change, but the pattern of the changes is quite easy to follow. >> >> I do have a couple of queries below. >> >> Thanks > >> @dholmes-ora, I am not sure if you got all your comments addressed. Would you please, have a look at here? Thanks. > > My comments were addressed - thanks - but I will leave it to the experts in this area to grant the approvals. I did spot one change in testHelpers.cpp that had not actually been made yet. Thank you @dholmes-ora for your review with a sharp eye that found that missed change. Thank you @tstuefe, @stefank, @shipilev and @jdksjolen for your comments and reviews. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18745#issuecomment-2089780895 From sspitsyn at openjdk.org Thu May 2 07:23:55 2024 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Thu, 2 May 2024 07:23:55 GMT Subject: RFR: 8328083: degrade virtual thread support for GetObjectMonitorUsage In-Reply-To: References: <-lAT5GzHVRrOUJhhMLfV5CkkPA3DHHDUZfdE7CBOcHg=.ecb91a2d-82c2-4e94-a1f6-f84d7a8c2a87@github.com> Message-ID: On Wed, 1 May 2024 21:01:16 GMT, Chris Plummer wrote: >> The fix is to degrade virtual threads support in the JVM TI `GetObjectMonitorUsage` function so that it is specified to only return an owner when the owner is a platform thread. Also, virtual threads are not listed in the both `waiters` and `notify_waiters` lists returned in the `jvmtiMonitorUsage` structure. Java 19 re-specified a number of JVMTI functions and events for virtual threads, we missed this one. >> >> The main motivation for degrading it now is that the object monitor implementation is being updated to allow virtual threads unmount while owning monitors. It would add overhead to record monitor usage when freezing/unmount, overhead that couldn't be tied to a JVMTI capability as the capability can be enabled at any time. >> >> `GetObjectMonitorUsage` was broken for 20+ years ([8247972](https://bugs.openjdk.org/browse/JDK-8247972)) without bug reports so it seems unlikely that the function is widely used. Degrading it to only return an owner when the owner is a platform thread has no compatibility impact for tooling that uses it in conjunction with `HotSpot` thread dumps or `ThreadMXBean`. >> >> One other point about `GetObjectMonitorUsage` is that it pre-dates j.u.concurrent in Java 5 so it can't be used to get a full picture of the lock usage in a program. >> >> The specs of the impacted `JDWP ObjectReference.MonitorInfo` command and the JDI `ObjectReference` `ownerThread()`, `waitingThreads()` and `entryCount()` methods are updated to match the JVM TI spec. >> >> Also, please, review the related CSR and Release Note: >> - CSR: [8331422](https://bugs.openjdk.org/browse/JDK-8331422): degrade virtual thread support for GetObjectMonitorUsage >> - RN: [8331465](https://bugs.openjdk.org/browse/JDK-8331465): Release Note: degrade virtual thread support for GetObjectMonitorUsage >> >> Testing: >> - tested impacted and updated tests locally >> - tested with mach5 tiers 1-6 > > test/hotspot/jtreg/vmTestbase/nsk/jdi/ObjectReference/waitingThreads/waitingthreads002.java line 167: > >> 165: try { >> 166: List waitingThreads = objRef.waitingThreads(); >> 167: if (waitingThreads.size() != expWaitingCount) { > > I don't see the need for the expWaitingCount bookkeeping. Can't we just verify that size() is zero if we are using virtual threads? I guess maybe the reason you took this approach is because you don't know if the threads are going to be virtual or not until you check them. There is a way to find out, but it's not that pretty either: > > static final boolean vthreadMode = "Virtual".equals(System.getProperty("test.thread.factory")); Thank you for the suggestion. Updated with it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19030#discussion_r1587177852 From sspitsyn at openjdk.org Thu May 2 07:33:09 2024 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Thu, 2 May 2024 07:33:09 GMT Subject: RFR: 8328083: degrade virtual thread support for GetObjectMonitorUsage [v2] In-Reply-To: <-lAT5GzHVRrOUJhhMLfV5CkkPA3DHHDUZfdE7CBOcHg=.ecb91a2d-82c2-4e94-a1f6-f84d7a8c2a87@github.com> References: <-lAT5GzHVRrOUJhhMLfV5CkkPA3DHHDUZfdE7CBOcHg=.ecb91a2d-82c2-4e94-a1f6-f84d7a8c2a87@github.com> Message-ID: > The fix is to degrade virtual threads support in the JVM TI `GetObjectMonitorUsage` function so that it is specified to only return an owner when the owner is a platform thread. Also, virtual threads are not listed in the both `waiters` and `notify_waiters` lists returned in the `jvmtiMonitorUsage` structure. Java 19 re-specified a number of JVMTI functions and events for virtual threads, we missed this one. > > The main motivation for degrading it now is that the object monitor implementation is being updated to allow virtual threads unmount while owning monitors. It would add overhead to record monitor usage when freezing/unmount, overhead that couldn't be tied to a JVMTI capability as the capability can be enabled at any time. > > `GetObjectMonitorUsage` was broken for 20+ years ([8247972](https://bugs.openjdk.org/browse/JDK-8247972)) without bug reports so it seems unlikely that the function is widely used. Degrading it to only return an owner when the owner is a platform thread has no compatibility impact for tooling that uses it in conjunction with `HotSpot` thread dumps or `ThreadMXBean`. > > One other point about `GetObjectMonitorUsage` is that it pre-dates j.u.concurrent in Java 5 so it can't be used to get a full picture of the lock usage in a program. > > The specs of the impacted `JDWP ObjectReference.MonitorInfo` command and the JDI `ObjectReference` `ownerThread()`, `waitingThreads()` and `entryCount()` methods are updated to match the JVM TI spec. > > Also, please, review the related CSR and Release Note: > - CSR: [8331422](https://bugs.openjdk.org/browse/JDK-8331422): degrade virtual thread support for GetObjectMonitorUsage > - RN: [8331465](https://bugs.openjdk.org/browse/JDK-8331465): Release Note: degrade virtual thread support for GetObjectMonitorUsage > > Testing: > - tested impacted and updated tests locally > - tested with mach5 tiers 1-6 Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: review: Corrections in: 1) JVMTI/JDWP spec; 2) test vthread checks; 3) test comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19030/files - new: https://git.openjdk.org/jdk/pull/19030/files/9a3b8192..7465f064 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19030&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19030&range=00-01 Stats: 23 lines in 6 files changed: 6 ins; 2 del; 15 mod Patch: https://git.openjdk.org/jdk/pull/19030.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19030/head:pull/19030 PR: https://git.openjdk.org/jdk/pull/19030 From stuefe at openjdk.org Thu May 2 07:43:08 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 2 May 2024 07:43:08 GMT Subject: RFR: 8330076: NMT: add/make a mandatory MEMFLAGS argument to family of os::reserve/commit/uncommit memory API [v13] In-Reply-To: References: <5GDKVVPITIzIcyfm-0tKOFzFIEPBgzOe-or1eX_POns=.a5205641-139b-4749-afcc-57ddbc85e6be@github.com> Message-ID: <2ZUb0sqaS1WXReRk0o8Ko2nywN0bYc3xgv2cPJzjWD0=.63938279-f252-427d-a944-1551ebda2acf@github.com> On Wed, 1 May 2024 08:30:34 GMT, David Holmes wrote: >> This is a big change, but the pattern of the changes is quite easy to follow. >> >> I do have a couple of queries below. >> >> Thanks > >> @dholmes-ora, I am not sure if you got all your comments addressed. Would you please, have a look at here? Thanks. > > My comments were addressed - thanks - but I will leave it to the experts in this area to grant the approvals. I did spot one change in testHelpers.cpp that had not actually been made yet. Thanks @dholmes-ora for the eagle eye. Background: I prefer mtTest to mtWhatever in gtests, since it is what I expect to see in NMT (storage for test data = mtTest, "live" metaspace memory = mtMetaspate). And god, is the GH interface terrible when patches get large. Its almost impossiible to keep an overview. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18745#issuecomment-2089813793 From duke at openjdk.org Thu May 2 07:46:57 2024 From: duke at openjdk.org (Inigo Mediavilla Saiz) Date: Thu, 2 May 2024 07:46:57 GMT Subject: RFR: 8329088: Stack chunk thawing races with concurrent GC stack iteration [v2] In-Reply-To: References: <-GWGR7FPUMDBs4qvaDfnDR6Jfq9QKsQxNorjln_n-Ns=.16653cde-4cc5-4978-a385-41ebcc8e49c2@github.com> Message-ID: On Mon, 22 Apr 2024 09:39:44 GMT, Erik ?sterlund wrote: >>> Unlike thawing, the freeze operation does not race with the GC by design. >>> >> Is this with the changes in the allocation code in this patch or even before those there was no race? > > Thanks for the detailed review @pchilano! I made the suggested updates. Hello ?, I wonder @fisk if you were able to reproduce the situation where a GC thread sees inconsistent values for `sp` and `argsize` and whether it would make sense to write a test forcing that situation to happen to prove that this PR fixes the issue and to avoid regressions ? I'm not super experienced with the hotspot code, but I'd be happy to try to write a test for that if you think that it makes sense. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18643#issuecomment-2089820311 From rehn at openjdk.org Thu May 2 08:14:16 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Thu, 2 May 2024 08:14:16 GMT Subject: Integrated: 8331399: RISC-V: Don't us mv instead of la In-Reply-To: References: Message-ID: On Tue, 30 Apr 2024 09:27:09 GMT, Robbin Ehn wrote: > Hi please consider, > > It makes no sense to use mv instead of la. > It doesn't follow the standard mnemonics and it confusing when people use mv when they really mean la. > > la will do the reloc with movptr in this case, so the code is the same. > > Testing t1. > > Thanks, Robbin This pull request has now been integrated. Changeset: dd906ffd Author: Robbin Ehn URL: https://git.openjdk.org/jdk/commit/dd906ffdcb7d965cd4798cb7eebd9c1b71b3c136 Stats: 9 lines in 2 files changed: 0 ins; 7 del; 2 mod 8331399: RISC-V: Don't us mv instead of la Reviewed-by: fyang, mli, tonyp ------------- PR: https://git.openjdk.org/jdk/pull/19014 From rehn at openjdk.org Thu May 2 08:17:10 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Thu, 2 May 2024 08:17:10 GMT Subject: RFR: 8326306: RISC-V: Re-structure MASM calls and jumps [v3] In-Reply-To: References: Message-ID: > Hi, please consider. > > We have code that directly use the asm for call/jumps instead masm. > Our masm have a bit odd naming, and we don't use 'proper' pseudoinstructions/mnemonics. > Suggested by [riscv-asm-manual](https://github.com/riscv-non-isa/riscv-asm-manual/tree/master) > > j offset jal x0, offset Jump > jal offset jal x1, offset Jump and link > jr rs jalr x0, rs, 0 Jump register > jalr rs jalr x1, rs, 0 Jump and link register > ret jalr x0, x1, 0 Return from subroutine > call offset auipc x1, offset[31:12]; jalr x1, x1, offset[11:0] Call far-away subroutine > tail offset auipc x6, offset[31:12]; jalr x0, x6, offset[11:0] Tail call far-away subroutine > > But these can only be implemented like this if you have small enough application. > The fallback of these is to use GOT (your C compiler should place a copy of GOT every 2G so it's always reachable). > We don't have GOT, instead we materialize, so there is still differences between these and ours. > > This patch: > - Tries to follow these suggested mappings as good we can. > - Make sure all jumps/calls go through MASM. (so we get control and can easily change for sites using a certain calling convention) > - To avoid confusion between MASM public/private methods and ASM methods and the mnemonics there are some renaming. > E.g. the mnemonics jal means call offset, as we can't use that so there is no 'jal'. > - I enabled c.j, but right now we never generate it. > - As always the macro does no good and are legacy from when code base did not use templates. (also the x-macros screws up my IDE (vim+rtags)) > > I started down this path due to I have followup patch on top of this which removes trampoline in favor for load-n-jump. > (WIP: https://github.com/robehn/jdk/compare/jal-fixes...robehn:jdk:load-n-link?expand=1) > While looking into our calls it was a bit confusing, this helps. > > Done a couple of t1-3 slightly different version of this patch, and as part of the followup, no issues found. (VF2, qemu, LP4) > Re-running tests, had some last minute changes. > > Thanks, Robbin Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: - Merge branch 'master' into jal-fixes - Corrected method name - Missed a ws - JALR ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18942/files - new: https://git.openjdk.org/jdk/pull/18942/files/31361202..e9bd4d6b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18942&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18942&range=01-02 Stats: 19203 lines in 1520 files changed: 6345 ins; 8726 del; 4132 mod Patch: https://git.openjdk.org/jdk/pull/18942.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18942/head:pull/18942 PR: https://git.openjdk.org/jdk/pull/18942 From eosterlund at openjdk.org Thu May 2 08:34:00 2024 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Thu, 2 May 2024 08:34:00 GMT Subject: RFR: 8329088: Stack chunk thawing races with concurrent GC stack iteration [v2] In-Reply-To: References: <-GWGR7FPUMDBs4qvaDfnDR6Jfq9QKsQxNorjln_n-Ns=.16653cde-4cc5-4978-a385-41ebcc8e49c2@github.com> Message-ID: On Mon, 22 Apr 2024 09:39:44 GMT, Erik ?sterlund wrote: >>> Unlike thawing, the freeze operation does not race with the GC by design. >>> >> Is this with the changes in the allocation code in this patch or even before those there was no race? > > Thanks for the detailed review @pchilano! I made the suggested updates. > Hello ?, I wonder @fisk if you were able to reproduce the situation where a GC thread sees inconsistent values for `sp` and `argsize` and whether it would make sense to write a test forcing that situation to happen to prove that this PR fixes the issue and to avoid regressions ? I'm not super experienced with the hotspot code, but I'd be happy to try to write a test for that if you think that it makes sense. Writing a test for this would be rather tricky, as you have to put the GC in a rather particular state, to have it actually race with thawing. If you want to give it a shot, then that would be nice, but it won't be easy I think. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18643#issuecomment-2089897313 From eosterlund at openjdk.org Thu May 2 08:34:00 2024 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Thu, 2 May 2024 08:34:00 GMT Subject: Integrated: 8329088: Stack chunk thawing races with concurrent GC stack iteration In-Reply-To: References: Message-ID: On Fri, 5 Apr 2024 05:54:11 GMT, Erik ?sterlund wrote: > When we thaw the last frame from a stack chunk, we non-atomically set the stack pointer (sp), and set its argsize to 0. Unfortunately, GC threads may iterate over the frames of the stack chunk concurrently. When initializing their stack frame iterator, they read the sp and argsize racingly. Since there is no synchronization between the threads, we may observe inconsistent pairs of sp and argsize, for example the updated sp with a stale argsize, or the updated argsize with a stale sp. > > At the core of the problem, the stack chunks define sp and argsize. The argsize is used to calculate where the bottom of the stack chunk is, which is required to determine if it is empty or not. This patch proposes to switch things around and store the bottom directly in the chunk, instead of argsize. Instead, argsize is calculated from the bottom. By changing the relationship of which property is stored and which property is calculated, we can simplify this code quite a bit. > > In the new model, is_empty() is true iff sp and bottom are exactly the same. Bottom is only set during freezing, never during thawing. The bottom is initialized whenever the bottom frame is frozen, and left untouched during thawing. Unlike thawing, the freeze operation does not race with the GC by design. Hence we have moved one of the racy mutations to the operation that doesn't race with the GC. The GC is now only exposed to changing sp(). It doesn't matter if it observes the old or new sp(), now that we have removed the only source if inconsistency describing said frame (racing argsize). > > Testing: tier1-5, manual testing of test/jdk/jdk/internal/vm/Continuation This pull request has now been integrated. Changeset: 8bcd2e61 Author: Erik ?sterlund URL: https://git.openjdk.org/jdk/commit/8bcd2e61aec51f7c5b09ae162f8cca85a8bbf105 Stats: 111 lines in 12 files changed: 34 ins; 26 del; 51 mod 8329088: Stack chunk thawing races with concurrent GC stack iteration Reviewed-by: stefank, pchilanomate ------------- PR: https://git.openjdk.org/jdk/pull/18643 From dnsimon at openjdk.org Thu May 2 08:44:57 2024 From: dnsimon at openjdk.org (Doug Simon) Date: Thu, 2 May 2024 08:44:57 GMT Subject: RFR: 8331208: Memory stress test that checks OutOfMemoryError stack trace fails [v2] In-Reply-To: References: Message-ID: On Thu, 2 May 2024 01:24:23 GMT, David Holmes wrote: >> Doug Simon has updated the pull request incrementally with two additional commits since the last revision: >> >> - don't perform XX actions for OOME when in scope of an InternalOOMEMark >> - rename SandboxedOOMEMark to InternalOOMEMark > > src/hotspot/share/gc/shared/memAllocator.hpp line 131: > >> 129: >> 130: public: >> 131: InternalOOMEMark(JavaThread* thread) { > > Suggestion: add a comment: > > // Passing a null thread allows for a no-op implementation for contexts that will suppress > // throwing of the OOME - see RetryableAllocationMark. > > I was wondering if we really need this. AFAICS it would be harmless to always pass in the current thread and set the thread's field because when we would have passed null then no exception would be thrown anyway. It seems the null thread is only used as a means for RAM to track whether activate was false. But I guess a no-op IOM achieves the same goal. Throwing of the OOME is never suppressed by InternalOOMEMark. It only changes how the OOME is initialized. When RetryableAllocationMark passes `thread == null`, it wants the normal OOME initialization to be done and JVMTI events to be fired. In the context of https://bugs.openjdk.org/browse/JDK-8331429, I propose to leave this PR as is. That issue will remove `activate` altogether (cc @mur47x111 ). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18925#discussion_r1587274415 From stefank at openjdk.org Thu May 2 08:50:56 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Thu, 2 May 2024 08:50:56 GMT Subject: RFR: 8331208: Memory stress test that checks OutOfMemoryError stack trace fails [v2] In-Reply-To: References: Message-ID: On Wed, 1 May 2024 13:22:19 GMT, Doug Simon wrote: >> This pull request mitigates failures in memory stress tests that check the stack trace of an `OutOfMemoryError` for certain expected entries. >> >> The stack trace of an OOME will [not be allocated once all preallocated OOMEs are used up](https://github.com/openjdk/jdk/blob/3d5eeac3a38ece4a23ea6da2dfe5939d64e81cea/src/hotspot/share/memory/universe.cpp#L722). If the only heap allocations performed in stressful conditions are those of the stress test, then the [4 preallocated OOMEs](https://github.com/openjdk/jdk/blob/f1d0e715b67e2ca47b525069d8153abbb33f75b9/src/hotspot/share/runtime/globals.hpp#L800) would be sufficient. However, it's possible for VM internal allocations to also occur during stressful conditions, especially in `-Xcomp` mode. For example, [CompileBroker::compile_method](https://github.com/openjdk/jdk/blob/3d5eeac3a38ece4a23ea6da2dfe5939d64e81cea/src/hotspot/share/compiler/compileBroker.cpp#L1399) will try to resolve the string constants in the constant pool of the method about to be compiled. This can fail as shown here: >> >> V [jvm.dll+0x62c23a] Exceptions::_throw+0x11a (exceptions.cpp:168) >> V [jvm.dll+0x62d85b] Exceptions::_throw_oop+0xab (exceptions.cpp:140) >> V [jvm.dll+0xbbce78] MemAllocator::Allocation::check_out_of_memory+0x208 (memAllocator.cpp:138) >> V [jvm.dll+0xbbcac8] MemAllocator::allocate+0x158 (memAllocator.cpp:377) >> V [jvm.dll+0x79bd05] InstanceKlass::allocate_instance+0x95 (instanceKlass.cpp:1509) >> V [jvm.dll+0x7ddeed] java_lang_String::basic_create+0x9d (javaClasses.cpp:273) >> V [jvm.dll+0x7e43c0] java_lang_String::create_from_unicode+0x60 (javaClasses.cpp:291) >> V [jvm.dll+0xdb91a5] StringTable::do_intern+0xb5 (stringTable.cpp:379) >> V [jvm.dll+0xdba9f2] StringTable::intern+0x1b2 (stringTable.cpp:368) >> V [jvm.dll+0xdbaaa6] StringTable::intern+0x86 (stringTable.cpp:328) >> V [jvm.dll+0x51c8b1] ConstantPool::string_at_impl+0x1d1 (constantPool.cpp:1251) >> V [jvm.dll+0x51b95b] ConstantPool::resolve_string_constants_impl+0xeb (constantPool.cpp:800) >> V [jvm.dll+0x4f2f8d] CompileBroker::compile_method+0x31d (compileBroker.cpp:1395) >> V [jvm.dll+0x4f3474] CompileBroker::compile_method+0xc4 (compileBroker.cpp:1348) >> >> These internal allocations can occur before the allocations of the test and thus use up the pre-allocated OOMEs. As a result, the OOMEs triggered by the stress test may end up throwing the [default, shared OOME instance](https://github.com/openjdk/jdk/blob/3d5eeac3a38ec... > > Doug Simon has updated the pull request incrementally with two additional commits since the last revision: > > - don't perform XX actions for OOME when in scope of an InternalOOMEMark > - rename SandboxedOOMEMark to InternalOOMEMark I took a look at this because it touches GC code, and therefore have a few nits / style requests related to that. However, don't consider this a full review since I'm not familiar with the part of the code / issues this PR intends to solve. src/hotspot/share/gc/shared/memAllocator.cpp line 140: > 138: THROW_OOP_(exception, true); > 139: } else { > 140: THROW_OOP_(Universe::out_of_memory_error_java_heap(/* omit_backtrace*/ true), true); Given that the only explicitly passed in value for `omit_backtrace` is `true` I think it would be nicer to create a separate function for this case, instead of having a comment always explaining what true stands for. Maybe `out_of_memory_error_java_heap_omit_backtrace()`? src/hotspot/share/gc/shared/memAllocator.hpp line 131: > 129: > 130: public: > 131: InternalOOMEMark(JavaThread* thread) { Suggestion: explicit InternalOOMEMark(JavaThread* thread) { src/hotspot/share/memory/universe.cpp line 658: > 656: oome = gen_out_of_memory_error(oome); > 657: } > 658: return oome; It could be nice to get rid of the double negation here: Suggestion: oop Universe::out_of_memory_error_java_heap(bool omit_backtrace) { oop oome = out_of_memory_errors()->obj_at(_oom_java_heap); if (omit_backtrace) { return oome; } return gen_out_of_memory_error(oome); src/hotspot/share/memory/universe.hpp line 274: > 272: // may or may not have a backtrace. If error has a backtrace then the stack trace is already > 273: // filled in. > 274: static oop out_of_memory_error_java_heap(bool omit_backtrace=false); Suggestion: static oop out_of_memory_error_java_heap(bool omit_backtrace = false); src/hotspot/share/oops/klass.cpp line 881: > 879: THROW_OOP(Universe::out_of_memory_error_array_size()); > 880: } else { > 881: THROW_OOP(Universe::out_of_memory_error_java_heap(/* omit_backtrace*/ true)); Suggestion: THROW_OOP(Universe::out_of_memory_error_java_heap(/* omit_backtrace */ true)); src/hotspot/share/runtime/javaThread.hpp line 57: > 55: class JNIHandleBlock; > 56: class JVMCIRuntime; > 57: class InternalOOMEMark; It would be nice to get this sorted as the other forward declarations. src/hotspot/share/runtime/javaThread.hpp line 718: > 716: bool in_internal_oome_mark() const { return _in_internal_oome_mark; } > 717: void set_in_internal_oome_mark(bool b) { _in_internal_oome_mark = b; } > 718: Should all these be prefixed with `is` like: bool is_in_VTMS_transition() const { return _is_in_VTMS_transition; } bool is_in_tmp_VTMS_transition() const { return _is_in_tmp_VTMS_transition; } bool is_in_any_VTMS_transition() const { return _is_in_VTMS_transition || _is_in_tmp_VTMS_transition; } ------------- PR Review: https://git.openjdk.org/jdk/pull/18925#pullrequestreview-2035108670 PR Review Comment: https://git.openjdk.org/jdk/pull/18925#discussion_r1587257272 PR Review Comment: https://git.openjdk.org/jdk/pull/18925#discussion_r1587241226 PR Review Comment: https://git.openjdk.org/jdk/pull/18925#discussion_r1587252177 PR Review Comment: https://git.openjdk.org/jdk/pull/18925#discussion_r1587252545 PR Review Comment: https://git.openjdk.org/jdk/pull/18925#discussion_r1587252963 PR Review Comment: https://git.openjdk.org/jdk/pull/18925#discussion_r1587248746 PR Review Comment: https://git.openjdk.org/jdk/pull/18925#discussion_r1587268629 From dnsimon at openjdk.org Thu May 2 08:50:57 2024 From: dnsimon at openjdk.org (Doug Simon) Date: Thu, 2 May 2024 08:50:57 GMT Subject: RFR: 8331208: Memory stress test that checks OutOfMemoryError stack trace fails [v2] In-Reply-To: References: Message-ID: On Thu, 2 May 2024 01:38:47 GMT, David Holmes wrote: >> Doug Simon has updated the pull request incrementally with two additional commits since the last revision: >> >> - don't perform XX actions for OOME when in scope of an InternalOOMEMark >> - rename SandboxedOOMEMark to InternalOOMEMark > > src/hotspot/share/jvmci/jvmciRuntime.cpp line 114: > >> 112: if (ex->is_a(vmClasses::OutOfMemoryError_klass())) { >> 113: CLEAR_PENDING_EXCEPTION; >> 114: } > > Just an observation but the original code will clear all exceptions except for an "async" exception, which these days is only the InternalError thrown by unsafe-access-errors. But the new code will only clear OOME thus allowing the (as expected) InternalError to remain, but also any other VirtualMachineErrors that may have arisen e.g. StackOverflowError. I actually think this is more correct, but it does seem a change in behaviour that we may need to be wary of. In the context of Graal, it doesn't really make much of a difference as the Graal stub that calls this runtime routine will clear all exceptions anyway. But yes, I think limiting the clearing here to OOME is better. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18925#discussion_r1587281572 From kevinw at openjdk.org Thu May 2 08:58:20 2024 From: kevinw at openjdk.org (Kevin Walls) Date: Thu, 2 May 2024 08:58:20 GMT Subject: RFR: 8314225: SIGSEGV in JavaThread::is_lock_owned [v4] In-Reply-To: References: Message-ID: > Removal of JavaThread's MonitorChunks member. This held lock information during deoptimization, but access to it is unnecessary for anything other than the deoptimization itself. > > Access to it in is_lock_owned() was racy, and caused rare crashes. Kevin Walls has updated the pull request incrementally with one additional commit since the last revision: Move is_lock_owned from Thread to JavaThread ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18940/files - new: https://git.openjdk.org/jdk/pull/18940/files/ce92b92b..3e9dd511 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18940&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18940&range=02-03 Stats: 18 lines in 5 files changed: 9 ins; 8 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/18940.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18940/head:pull/18940 PR: https://git.openjdk.org/jdk/pull/18940 From rehn at openjdk.org Thu May 2 09:00:57 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Thu, 2 May 2024 09:00:57 GMT Subject: RFR: 8326306: RISC-V: Re-structure MASM calls and jumps [v3] In-Reply-To: References: Message-ID: On Thu, 2 May 2024 08:17:10 GMT, Robbin Ehn wrote: >> Hi, please consider. >> >> We have code that directly use the asm for call/jumps instead masm. >> Our masm have a bit odd naming, and we don't use 'proper' pseudoinstructions/mnemonics. >> Suggested by [riscv-asm-manual](https://github.com/riscv-non-isa/riscv-asm-manual/tree/master) >> >> j offset jal x0, offset Jump >> jal offset jal x1, offset Jump and link >> jr rs jalr x0, rs, 0 Jump register >> jalr rs jalr x1, rs, 0 Jump and link register >> ret jalr x0, x1, 0 Return from subroutine >> call offset auipc x1, offset[31:12]; jalr x1, x1, offset[11:0] Call far-away subroutine >> tail offset auipc x6, offset[31:12]; jalr x0, x6, offset[11:0] Tail call far-away subroutine >> >> But these can only be implemented like this if you have small enough application. >> The fallback of these is to use GOT (your C compiler should place a copy of GOT every 2G so it's always reachable). >> We don't have GOT, instead we materialize, so there is still differences between these and ours. >> >> This patch: >> - Tries to follow these suggested mappings as good we can. >> - Make sure all jumps/calls go through MASM. (so we get control and can easily change for sites using a certain calling convention) >> - To avoid confusion between MASM public/private methods and ASM methods and the mnemonics there are some renaming. >> E.g. the mnemonics jal means call offset, as we can't use that so there is no 'jal'. >> - I enabled c.j, but right now we never generate it. >> - As always the macro does no good and are legacy from when code base did not use templates. (also the x-macros screws up my IDE (vim+rtags)) >> >> I started down this path due to I have followup patch on top of this which removes trampoline in favor for load-n-jump. >> (WIP: https://github.com/robehn/jdk/compare/jal-fixes...robehn:jdk:load-n-link?expand=1) >> While looking into our calls it was a bit confusing, this helps. >> >> Done a couple of t1-3 slightly different version of this patch, and as part of the followup, no issues found. (VF2, qemu, LP4) >> Re-running tests, had some last minute changes. >> >> Thanks, Robbin > > Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - Merge branch 'master' into jal-fixes > - Corrected method name > - Missed a ws > - JALR My merge with master was hit by: https://bugs.openjdk.org/browse/JDK-8331546 Re-merge when it's fixed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18942#issuecomment-2089942697 From kevinw at openjdk.org Thu May 2 09:01:54 2024 From: kevinw at openjdk.org (Kevin Walls) Date: Thu, 2 May 2024 09:01:54 GMT Subject: RFR: 8314225: SIGSEGV in JavaThread::is_lock_owned [v3] In-Reply-To: References: Message-ID: On Wed, 1 May 2024 21:53:29 GMT, David Holmes wrote: >> src/hotspot/share/runtime/thread.cpp line 530: >> >>> 528: #endif // ASSERT >>> 529: >>> 530: bool Thread::is_lock_owned(address adr) const { >> >> Is there any reason not to move this to JavaThread now? Also, I don't think it needs to be virtual. > > Good point. Only JavaThread's can own ObjectMonitors. OK yes - can move that to JavaThread, with just adding one cast in synchronizer.cpp, where ObjectSynchronizer::FastHashCode(Thread*, oop) uses is_lock_owned. (ObjectSynchronizer::FastHashCode may be a candidate for taking JavaThread instead, maybe chasing down the users of that is a separate task. 8-) ) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18940#discussion_r1587298837 From dnsimon at openjdk.org Thu May 2 09:02:54 2024 From: dnsimon at openjdk.org (Doug Simon) Date: Thu, 2 May 2024 09:02:54 GMT Subject: RFR: 8331208: Memory stress test that checks OutOfMemoryError stack trace fails [v2] In-Reply-To: References: Message-ID: On Thu, 2 May 2024 08:37:33 GMT, Stefan Karlsson wrote: >> Doug Simon has updated the pull request incrementally with two additional commits since the last revision: >> >> - don't perform XX actions for OOME when in scope of an InternalOOMEMark >> - rename SandboxedOOMEMark to InternalOOMEMark > > src/hotspot/share/runtime/javaThread.hpp line 718: > >> 716: bool in_internal_oome_mark() const { return _in_internal_oome_mark; } >> 717: void set_in_internal_oome_mark(bool b) { _in_internal_oome_mark = b; } >> 718: > > Should all these be prefixed with `is` like: > > bool is_in_VTMS_transition() const { return _is_in_VTMS_transition; } > bool is_in_tmp_VTMS_transition() const { return _is_in_tmp_VTMS_transition; } > bool is_in_any_VTMS_transition() const { return _is_in_VTMS_transition || _is_in_tmp_VTMS_transition; } Ok. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18925#discussion_r1587300112 From dnsimon at openjdk.org Thu May 2 09:08:34 2024 From: dnsimon at openjdk.org (Doug Simon) Date: Thu, 2 May 2024 09:08:34 GMT Subject: RFR: 8331208: Memory stress test that checks OutOfMemoryError stack trace fails [v3] In-Reply-To: References: Message-ID: > This pull request mitigates failures in memory stress tests that check the stack trace of an `OutOfMemoryError` for certain expected entries. > > The stack trace of an OOME will [not be allocated once all preallocated OOMEs are used up](https://github.com/openjdk/jdk/blob/3d5eeac3a38ece4a23ea6da2dfe5939d64e81cea/src/hotspot/share/memory/universe.cpp#L722). If the only heap allocations performed in stressful conditions are those of the stress test, then the [4 preallocated OOMEs](https://github.com/openjdk/jdk/blob/f1d0e715b67e2ca47b525069d8153abbb33f75b9/src/hotspot/share/runtime/globals.hpp#L800) would be sufficient. However, it's possible for VM internal allocations to also occur during stressful conditions, especially in `-Xcomp` mode. For example, [CompileBroker::compile_method](https://github.com/openjdk/jdk/blob/3d5eeac3a38ece4a23ea6da2dfe5939d64e81cea/src/hotspot/share/compiler/compileBroker.cpp#L1399) will try to resolve the string constants in the constant pool of the method about to be compiled. This can fail as shown here: > > V [jvm.dll+0x62c23a] Exceptions::_throw+0x11a (exceptions.cpp:168) > V [jvm.dll+0x62d85b] Exceptions::_throw_oop+0xab (exceptions.cpp:140) > V [jvm.dll+0xbbce78] MemAllocator::Allocation::check_out_of_memory+0x208 (memAllocator.cpp:138) > V [jvm.dll+0xbbcac8] MemAllocator::allocate+0x158 (memAllocator.cpp:377) > V [jvm.dll+0x79bd05] InstanceKlass::allocate_instance+0x95 (instanceKlass.cpp:1509) > V [jvm.dll+0x7ddeed] java_lang_String::basic_create+0x9d (javaClasses.cpp:273) > V [jvm.dll+0x7e43c0] java_lang_String::create_from_unicode+0x60 (javaClasses.cpp:291) > V [jvm.dll+0xdb91a5] StringTable::do_intern+0xb5 (stringTable.cpp:379) > V [jvm.dll+0xdba9f2] StringTable::intern+0x1b2 (stringTable.cpp:368) > V [jvm.dll+0xdbaaa6] StringTable::intern+0x86 (stringTable.cpp:328) > V [jvm.dll+0x51c8b1] ConstantPool::string_at_impl+0x1d1 (constantPool.cpp:1251) > V [jvm.dll+0x51b95b] ConstantPool::resolve_string_constants_impl+0xeb (constantPool.cpp:800) > V [jvm.dll+0x4f2f8d] CompileBroker::compile_method+0x31d (compileBroker.cpp:1395) > V [jvm.dll+0x4f3474] CompileBroker::compile_method+0xc4 (compileBroker.cpp:1348) > > These internal allocations can occur before the allocations of the test and thus use up the pre-allocated OOMEs. As a result, the OOMEs triggered by the stress test may end up throwing the [default, shared OOME instance](https://github.com/openjdk/jdk/blob/3d5eeac3a38ece4a23ea6da2dfe5939d64e81cea/src/hotspot/... Doug Simon has updated the pull request incrementally with one additional commit since the last revision: addressed review comments and suggestions ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18925/files - new: https://git.openjdk.org/jdk/pull/18925/files/977bdc28..545714a5 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18925&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18925&range=01-02 Stats: 27 lines in 8 files changed: 3 ins; 2 del; 22 mod Patch: https://git.openjdk.org/jdk/pull/18925.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18925/head:pull/18925 PR: https://git.openjdk.org/jdk/pull/18925 From jkern at openjdk.org Thu May 2 09:59:01 2024 From: jkern at openjdk.org (Joachim Kern) Date: Thu, 2 May 2024 09:59:01 GMT Subject: RFR: 8330539: Use #include instead of -Dalloca'(size)'=__builtin_alloca'(size)' for AIX Message-ID: We need to find a better way to handle alloca on AIX. See the discussion in the PR for https://bugs.openjdk.org/browse/JDK-8329257, e.g. https://github.com/openjdk/jdk/pull/18536#discussion_r1568650313 in which three alternatives are suggested. Quoting: Let me summarize the choices we have and ask for your vote. Magnus dislikes the -Dalloca'(size)'=__builtin_alloca'(size)' in flags-cflags.m4 I introduced to get rid of #if defined(_AIX) #include #endif in globalDefinitions_gcc.hpp. We have four possible solutions 1. Reintroduce #if defined(_AIX) #include #endif in globalDefinitions_gcc.hpp. 2. Unconditionally introduce only #include in globalDefinitions_gcc.hpp. This should work for all platforms using this header including the unofficial Windows/gcc Port, although only AIX needs it. 3. Add #if defined(_AIX) #include #endif to the sources using alloca(). These are /hotspot/share/runtime/os.cpp /hotspot/share/runtime/javaThread.cpp /hotspot/share/utilities/vmError.cpp Here we need the AIX condition, because otherwise the classic Windows Build (NTAMD64) fails. 4. Replace -Dalloca'(size)'=__builtin_alloca'(size)' in flags-cflags.m4 by -U__STRICT_ANSI__ at the same place. Explanation can also found in https://github.com/openjdk/jdk/pull/18536#discussion_r1583360569 and following. I will implement the solution with the most likes and having no dislike. ------------- Commit messages: - JDK-8330539 Changes: https://git.openjdk.org/jdk/pull/19053/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19053&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8330539 Stats: 2 lines in 2 files changed: 1 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/19053.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19053/head:pull/19053 PR: https://git.openjdk.org/jdk/pull/19053 From sspitsyn at openjdk.org Thu May 2 10:13:00 2024 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Thu, 2 May 2024 10:13:00 GMT Subject: RFR: 8330146: assert(!_thread->is_in_any_VTMS_transition()) failed Message-ID: Any event posting code except CFLH, ClassPrepare and ClassLoad events has a conditional return in case if the event is posted during a VTMS transition. The CFLH, ClassPrepare and ClassLoad event posting code has just an assert instead. The ClassPrepare and ClassLoad events also have a conditional return in a case of temporary VTMS transition. This update is to align the CFLH, ClassPrepare and ClassLoad events with all other events in this area. Testing: - TBD: submit mach5 tiers 1-6 ------------- Commit messages: - 8330146: assert(!_thread->is_in_any_VTMS_transition()) failed Changes: https://git.openjdk.org/jdk/pull/19054/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19054&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8330146 Stats: 9 lines in 1 file changed: 2 ins; 2 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/19054.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19054/head:pull/19054 PR: https://git.openjdk.org/jdk/pull/19054 From jsjolen at openjdk.org Thu May 2 10:32:57 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Thu, 2 May 2024 10:32:57 GMT Subject: RFR: 8330532: Improve line-oriented text parsing in HotSpot [v4] In-Reply-To: References: Message-ID: On Wed, 1 May 2024 20:07:07 GMT, Ioi Lam wrote: >> (This PR is an alternative to https://github.com/openjdk/jdk/pull/18669 with a better API for reading lines of text) >> >> HotSpot has a few cases where information is parsed from a file, or from a memory buffer, one line at a time. Example: >> >> - https://github.com/openjdk/jdk/blob/064628471b83616b4463baa78618d1b7a66d0c7c/src/hotspot/share/cds/classListParser.cpp#L169 >> - https://github.com/openjdk/jdk/blob/064628471b83616b4463baa78618d1b7a66d0c7c/src/hotspot/share/compiler/compilerOracle.cpp#L1059-L1066 >> >> Common problems: >> - They use a fixed buffer for reading a line, so long (but valid) lines will cause errors. >> - There's ad-hoc code that deals with `FILE*` differently than from memory. >> >> This RFE implements a common utility, `inputStream`, for reading lines from different sources of input (see `FileInput` and `MemoryInput`). We fixed only `ClassListParser` and `CompilerOracle` in this RFE, but we can fix other readers in follow-up RFEs. >> >> The API allows other source of input to be implemented. For example, one could implement a `SocketInput` if there's a use case for it. >> >> In the future, `inputStream` can be extended (or encapsulated in a higher-level reader class) to read typed input tokens (for example, integers, strings, etc.) >> >> Credit: >> The `inputStream` class and friends are contributed by @rose00 . See https://mail.openjdk.org/pipermail/hotspot-dev/2024-April/087077.html . >> >> John's original version is in the draft PR https://github.com/openjdk/jdk/pull/18773. In order to minimize the size of this PR, I have kept only the functionalities for reading a line and a time. Other features, such as pushing back contents into the `inputStream`, could be added in follow-up PRs. (These removed features can be found in the commit history of this PR). > > Ioi Lam has updated the pull request incrementally with three additional commits since the last revision: > > - BlockInputStream is used by gtest only, so moved it there > - removed unused set_position(), etc > - removed _must_free Hi Ioi, thanks for looking at my changes. I've got some more changes :-). Mainly: Shouldn't the `_small_buffer` be assigned to the `_buffer` pointer by default? This simplifies the code a bit. src/hotspot/share/utilities/istream.cpp line 119: > 117: assert(!definitely_done(), ""); // caller responsibility > 118: while (need_to_read()) { > 119: size_t fill_offset, fill_length; Nit: It's fine to move these decls out of the loop. src/hotspot/share/utilities/istream.cpp line 173: > 171: assert(_buffer_size > 0, ""); > 172: // and continue with at least a little buffer > 173: } Get rid of this branch, small buffer now default. src/hotspot/share/utilities/istream.cpp line 273: > 271: COV(EXB_S); > 272: new_buf = &_small_buffer[0]; > 273: new_length = sizeof(_small_buffer); Remove this branch, cannot ever happen as default is the small buffer. src/hotspot/share/utilities/istream.cpp line 378: > 376: return old_mode; > 377: } > 378: #endif //ASSERT I believe that we have support for `gcov` (grepping for it shows results for a `GCOV_ENABLED` flag), in which case that's what we should use for code coverage instrumentation. So, this should probably be deleted (along with the rest of it). It would be best if we got some docs for how to use gcov also, however. src/hotspot/share/utilities/istream.hpp line 111: > 109: > 110: bool has_c_heap_buffer() { > 111: return _buffer != nullptr && _buffer != &_small_buffer[0]; No need to check for nullptr. src/hotspot/share/utilities/istream.hpp line 196: > 194: bool fill_buffer(); > 195: > 196: // Find some room in the buffer so we call read on it. "so we **can** call read on it" probably src/hotspot/share/utilities/istream.hpp line 236: > 234: _end(0), > 235: _next(0), > 236: _line_count(0) {} Explicitly initialize the `_small_buffer` (`0` it out?). Set `_buffer` and `_buffer_size` to the small buffer by default. src/hotspot/share/utilities/istream.hpp line 247: > 245: virtual ~inputStream() { > 246: if (has_c_heap_buffer()) free_c_heap_buffer(); > 247: if (_input != nullptr) set_input(nullptr); In case we remove close: remove this call. ------------- PR Review: https://git.openjdk.org/jdk/pull/18833#pullrequestreview-2035313060 PR Review Comment: https://git.openjdk.org/jdk/pull/18833#discussion_r1587380892 PR Review Comment: https://git.openjdk.org/jdk/pull/18833#discussion_r1587387927 PR Review Comment: https://git.openjdk.org/jdk/pull/18833#discussion_r1587390089 PR Review Comment: https://git.openjdk.org/jdk/pull/18833#discussion_r1587367132 PR Review Comment: https://git.openjdk.org/jdk/pull/18833#discussion_r1587395999 PR Review Comment: https://git.openjdk.org/jdk/pull/18833#discussion_r1587379361 PR Review Comment: https://git.openjdk.org/jdk/pull/18833#discussion_r1587385232 PR Review Comment: https://git.openjdk.org/jdk/pull/18833#discussion_r1587397065 From aph at redhat.com Thu May 2 10:52:29 2024 From: aph at redhat.com (Andrew Haley) Date: Thu, 2 May 2024 11:52:29 +0100 Subject: Aarch64: CPU_Model support for Neoverse N1/N2/V1/V2 In-Reply-To: <5b995833-1966-4e13-8ab2-b896456dbefc.jinguojie.jgj@alibaba-inc.com> References: <7ed32450-b9cc-491e-933e-fedb93c6bcf5.jinguojie.jgj@alibaba-inc.com> <5b995833-1966-4e13-8ab2-b896456dbefc.jinguojie.jgj@alibaba-inc.com> Message-ID: <0902f468-586f-474f-915f-bd23f88e294a@redhat.com> Created: https://bugs.openjdk.org/browse/JDK-8331556 -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From ayang at openjdk.org Thu May 2 10:54:03 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Thu, 2 May 2024 10:54:03 GMT Subject: RFR: 8331557: Serial: Refactor SerialHeap::do_collection Message-ID: It's probably easier to read the new code directly. The two classes in `serialVMOperations` serve as entrance points to invoke young/full GCs. Some previously hidden decisions are made more obvious, e.g. if a young-gc fails (or will probablly fail), fallback to full-gc. Additionally, `StatRecord` is removed, because this kind of info-aggregation should be done outsite VM (by third-party tool). Test: tier1-6 ------------- Commit messages: - s1-do-collect Changes: https://git.openjdk.org/jdk/pull/19056/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19056&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8331557 Stats: 555 lines in 15 files changed: 123 ins; 347 del; 85 mod Patch: https://git.openjdk.org/jdk/pull/19056.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19056/head:pull/19056 PR: https://git.openjdk.org/jdk/pull/19056 From stefank at openjdk.org Thu May 2 11:46:54 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Thu, 2 May 2024 11:46:54 GMT Subject: RFR: 8331208: Memory stress test that checks OutOfMemoryError stack trace fails [v3] In-Reply-To: References: Message-ID: <9Lq0XFLxSGEUUBekFbaZMQRDlA2FRbc_DQbjfxGFc80=.fd98a495-1ac7-4115-870c-86cfad4add69@github.com> On Thu, 2 May 2024 09:08:34 GMT, Doug Simon wrote: >> This pull request mitigates failures in memory stress tests that check the stack trace of an `OutOfMemoryError` for certain expected entries. >> >> The stack trace of an OOME will [not be allocated once all preallocated OOMEs are used up](https://github.com/openjdk/jdk/blob/3d5eeac3a38ece4a23ea6da2dfe5939d64e81cea/src/hotspot/share/memory/universe.cpp#L722). If the only heap allocations performed in stressful conditions are those of the stress test, then the [4 preallocated OOMEs](https://github.com/openjdk/jdk/blob/f1d0e715b67e2ca47b525069d8153abbb33f75b9/src/hotspot/share/runtime/globals.hpp#L800) would be sufficient. However, it's possible for VM internal allocations to also occur during stressful conditions, especially in `-Xcomp` mode. For example, [CompileBroker::compile_method](https://github.com/openjdk/jdk/blob/3d5eeac3a38ece4a23ea6da2dfe5939d64e81cea/src/hotspot/share/compiler/compileBroker.cpp#L1399) will try to resolve the string constants in the constant pool of the method about to be compiled. This can fail as shown here: >> >> V [jvm.dll+0x62c23a] Exceptions::_throw+0x11a (exceptions.cpp:168) >> V [jvm.dll+0x62d85b] Exceptions::_throw_oop+0xab (exceptions.cpp:140) >> V [jvm.dll+0xbbce78] MemAllocator::Allocation::check_out_of_memory+0x208 (memAllocator.cpp:138) >> V [jvm.dll+0xbbcac8] MemAllocator::allocate+0x158 (memAllocator.cpp:377) >> V [jvm.dll+0x79bd05] InstanceKlass::allocate_instance+0x95 (instanceKlass.cpp:1509) >> V [jvm.dll+0x7ddeed] java_lang_String::basic_create+0x9d (javaClasses.cpp:273) >> V [jvm.dll+0x7e43c0] java_lang_String::create_from_unicode+0x60 (javaClasses.cpp:291) >> V [jvm.dll+0xdb91a5] StringTable::do_intern+0xb5 (stringTable.cpp:379) >> V [jvm.dll+0xdba9f2] StringTable::intern+0x1b2 (stringTable.cpp:368) >> V [jvm.dll+0xdbaaa6] StringTable::intern+0x86 (stringTable.cpp:328) >> V [jvm.dll+0x51c8b1] ConstantPool::string_at_impl+0x1d1 (constantPool.cpp:1251) >> V [jvm.dll+0x51b95b] ConstantPool::resolve_string_constants_impl+0xeb (constantPool.cpp:800) >> V [jvm.dll+0x4f2f8d] CompileBroker::compile_method+0x31d (compileBroker.cpp:1395) >> V [jvm.dll+0x4f3474] CompileBroker::compile_method+0xc4 (compileBroker.cpp:1348) >> >> These internal allocations can occur before the allocations of the test and thus use up the pre-allocated OOMEs. As a result, the OOMEs triggered by the stress test may end up throwing the [default, shared OOME instance](https://github.com/openjdk/jdk/blob/3d5eeac3a38ec... > > Doug Simon has updated the pull request incrementally with one additional commit since the last revision: > > addressed review comments and suggestions > [addressed review comments and suggestions](https://github.com/openjdk/jdk/pull/18925/commits/545714a519ebd2f12173231b991875880e019965) Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/18925#issuecomment-2090297732 From jsjolen at openjdk.org Thu May 2 12:30:54 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Thu, 2 May 2024 12:30:54 GMT Subject: RFR: 8330532: Improve line-oriented text parsing in HotSpot [v4] In-Reply-To: References: Message-ID: On Thu, 2 May 2024 09:54:32 GMT, Johan Sj?len wrote: >> Ioi Lam has updated the pull request incrementally with three additional commits since the last revision: >> >> - BlockInputStream is used by gtest only, so moved it there >> - removed unused set_position(), etc >> - removed _must_free > > src/hotspot/share/utilities/istream.cpp line 378: > >> 376: return old_mode; >> 377: } >> 378: #endif //ASSERT > > I believe that we have support for `gcov` (grepping for it shows results for a `GCOV_ENABLED` flag), in which case that's what we should use for code coverage instrumentation. So, this should probably be deleted (along with the rest of it). > > It would be best if we got some docs for how to use gcov also, however. Key is to provide configure with `--enable-native-coverage`. For `jib` we have `linux-x64-cov`, see https://github.com/openjdk/jdk/blob/master/make/conf/jib-profiles.js ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18833#discussion_r1587552079 From rehn at openjdk.org Thu May 2 12:45:12 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Thu, 2 May 2024 12:45:12 GMT Subject: RFR: 8326306: RISC-V: Re-structure MASM calls and jumps [v4] In-Reply-To: References: Message-ID: > Hi, please consider. > > We have code that directly use the asm for call/jumps instead masm. > Our masm have a bit odd naming, and we don't use 'proper' pseudoinstructions/mnemonics. > Suggested by [riscv-asm-manual](https://github.com/riscv-non-isa/riscv-asm-manual/tree/master) > > j offset jal x0, offset Jump > jal offset jal x1, offset Jump and link > jr rs jalr x0, rs, 0 Jump register > jalr rs jalr x1, rs, 0 Jump and link register > ret jalr x0, x1, 0 Return from subroutine > call offset auipc x1, offset[31:12]; jalr x1, x1, offset[11:0] Call far-away subroutine > tail offset auipc x6, offset[31:12]; jalr x0, x6, offset[11:0] Tail call far-away subroutine > > But these can only be implemented like this if you have small enough application. > The fallback of these is to use GOT (your C compiler should place a copy of GOT every 2G so it's always reachable). > We don't have GOT, instead we materialize, so there is still differences between these and ours. > > This patch: > - Tries to follow these suggested mappings as good we can. > - Make sure all jumps/calls go through MASM. (so we get control and can easily change for sites using a certain calling convention) > - To avoid confusion between MASM public/private methods and ASM methods and the mnemonics there are some renaming. > E.g. the mnemonics jal means call offset, as we can't use that so there is no 'jal'. > - I enabled c.j, but right now we never generate it. > - As always the macro does no good and are legacy from when code base did not use templates. (also the x-macros screws up my IDE (vim+rtags)) > > I started down this path due to I have followup patch on top of this which removes trampoline in favor for load-n-jump. > (WIP: https://github.com/robehn/jdk/compare/jal-fixes...robehn:jdk:load-n-link?expand=1) > While looking into our calls it was a bit confusing, this helps. > > Done a couple of t1-3 slightly different version of this patch, and as part of the followup, no issues found. (VF2, qemu, LP4) > Re-running tests, had some last minute changes. > > Thanks, Robbin Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: - Merge branch 'master' into jal-fixes - Merge branch 'master' into jal-fixes - Corrected method name - Missed a ws - JALR ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18942/files - new: https://git.openjdk.org/jdk/pull/18942/files/e9bd4d6b..cb5ec446 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18942&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18942&range=02-03 Stats: 1275 lines in 90 files changed: 245 ins; 257 del; 773 mod Patch: https://git.openjdk.org/jdk/pull/18942.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18942/head:pull/18942 PR: https://git.openjdk.org/jdk/pull/18942 From jsjolen at openjdk.org Thu May 2 13:00:10 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Thu, 2 May 2024 13:00:10 GMT Subject: RFR: 8331193: Return references when possible in GrowableArray [v5] In-Reply-To: References: Message-ID: > Hi, > > This PR introduces the possibility of using references more often when using GrowableArray, where as previously this was only possible when using the `at()` method. This lets us avoid copying and redundant method calls and makes the API more streamlined. After the patch, we can use `at_grow` just like `at` works. The same goes for `top`, `first`, and `last`. > > > Some example code: > ```c++ > // Before this patch this worked: > GrowableArray arr(8,8,-1); // Pre-fill with 8 -1s > int& x = arr.at(7); > if (x == -1) { > x = 2; > } > assert(arr.at(7) == 2, "this holds"); > // but this was forbidden > int& x = arr.at_grow(9, -1); // Compilation error! at_grow returns E, not E& > // so we had to do > int x = arr.at_grow(9, -1); > if (x == -1) { > arr.at_put(9, 2); > } > > > Thanks. Johan Sj?len has updated the pull request incrementally with two additional commits since the last revision: - Also re-add const to adr_at - Undo change ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18975/files - new: https://git.openjdk.org/jdk/pull/18975/files/8d9607ad..3dc21ec4 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18975&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18975&range=03-04 Stats: 6 lines in 1 file changed: 0 ins; 5 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/18975.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18975/head:pull/18975 PR: https://git.openjdk.org/jdk/pull/18975 From jsjolen at openjdk.org Thu May 2 13:00:10 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Thu, 2 May 2024 13:00:10 GMT Subject: RFR: 8331193: Return references when possible in GrowableArray [v3] In-Reply-To: References: Message-ID: On Tue, 30 Apr 2024 09:00:18 GMT, Johan Sj?len wrote: >> src/hotspot/share/utilities/growableArray.hpp line 153: >> >>> 151: E* adr_at(int i) const { >>> 152: assert(0 <= i && i < _len, "illegal index %d for length %d", i, _len); >>> 153: return &_data[i]; >> >> (GitHub won't let me put comment on the `adr_at` signature.) >> >> I think there should similarly be const and non-const adr_at, returning pointer to const and non-const respectively. > > Done! Alright, I'm reverting this change. The issue is that C1 and C2 aren't very `const`-correct, and fixing this would make the size of the PR blow up. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18975#discussion_r1587591026 From jwaters at openjdk.org Thu May 2 13:38:53 2024 From: jwaters at openjdk.org (Julian Waters) Date: Thu, 2 May 2024 13:38:53 GMT Subject: RFR: 8330539: Use #include instead of -Dalloca'(size)'=__builtin_alloca'(size)' for AIX In-Reply-To: References: Message-ID: <63kULaCivYNoomWQiTEK0n_p4p2K1-Me3b-R9ofBhHM=.34129f5f-3cae-47d7-a709-876b8a0ecdc3@github.com> On Thu, 2 May 2024 09:54:14 GMT, Joachim Kern wrote: > We need to find a better way to handle alloca on AIX. > > See the discussion in the PR for https://bugs.openjdk.org/browse/JDK-8329257, e.g. https://github.com/openjdk/jdk/pull/18536#discussion_r1568650313 in which three alternatives are suggested. Quoting: > > Let me summarize the choices we have and ask for your vote. > Magnus dislikes the -Dalloca'(size)'=__builtin_alloca'(size)' in flags-cflags.m4 I introduced to get rid of > > #if defined(_AIX) > #include > #endif > > in globalDefinitions_gcc.hpp. > > We have four possible solutions > > 1. Reintroduce > > #if defined(_AIX) > #include > #endif > > in globalDefinitions_gcc.hpp. > > 2. Unconditionally introduce only #include in globalDefinitions_gcc.hpp. This should work for all platforms using this header including the unofficial Windows/gcc Port, although only AIX needs it. > > 3. Add > > #if defined(_AIX) > #include > #endif > > to the sources using alloca(). These are > /hotspot/share/runtime/os.cpp > /hotspot/share/runtime/javaThread.cpp > /hotspot/share/utilities/vmError.cpp > Here we need the AIX condition, because otherwise the classic Windows Build (NTAMD64) fails. > > 4. Replace -Dalloca'(size)'=__builtin_alloca'(size)' in flags-cflags.m4 by -U__STRICT_ANSI__ at the same place. Explanation can also found in https://github.com/openjdk/jdk/pull/18536#discussion_r1583360569 and following. > > I will implement the solution with the most likes and having no dislike. I'd put alloca.h down below the Standard Headers, next to stuff like pthread.h and dlfcn.h, and have the AIX ifdef check so it's clearer that only AIX needs it, but otherwise no objections Compilation Failures are all unrelated to the actual change itself ------------- Marked as reviewed by jwaters (Committer). PR Review: https://git.openjdk.org/jdk/pull/19053#pullrequestreview-2035766825 PR Comment: https://git.openjdk.org/jdk/pull/19053#issuecomment-2090522085 From mdoerr at openjdk.org Thu May 2 13:48:54 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Thu, 2 May 2024 13:48:54 GMT Subject: RFR: 8330539: Use #include instead of -Dalloca'(size)'=__builtin_alloca'(size)' for AIX In-Reply-To: References: Message-ID: On Thu, 2 May 2024 09:54:14 GMT, Joachim Kern wrote: > We need to find a better way to handle alloca on AIX. > > See the discussion in the PR for https://bugs.openjdk.org/browse/JDK-8329257, e.g. https://github.com/openjdk/jdk/pull/18536#discussion_r1568650313 in which three alternatives are suggested. Quoting: > > Let me summarize the choices we have and ask for your vote. > Magnus dislikes the -Dalloca'(size)'=__builtin_alloca'(size)' in flags-cflags.m4 I introduced to get rid of > > #if defined(_AIX) > #include > #endif > > in globalDefinitions_gcc.hpp. > > We have four possible solutions > > 1. Reintroduce > > #if defined(_AIX) > #include > #endif > > in globalDefinitions_gcc.hpp. > > 2. Unconditionally introduce only #include in globalDefinitions_gcc.hpp. This should work for all platforms using this header including the unofficial Windows/gcc Port, although only AIX needs it. > > 3. Add > > #if defined(_AIX) > #include > #endif > > to the sources using alloca(). These are > /hotspot/share/runtime/os.cpp > /hotspot/share/runtime/javaThread.cpp > /hotspot/share/utilities/vmError.cpp > Here we need the AIX condition, because otherwise the classic Windows Build (NTAMD64) fails. > > 4. Replace -Dalloca'(size)'=__builtin_alloca'(size)' in flags-cflags.m4 by -U__STRICT_ANSI__ at the same place. Explanation can also found in https://github.com/openjdk/jdk/pull/18536#discussion_r1583360569 and following. > > I will implement the solution with the most likes and having no dislike. LGTM. This is my preferred solution. GHA failures are there because this PR is based on the code before https://github.com/openjdk/jdk/commit/286cbf831c2eb76e31bd69b4a93cd5ae9a821493 . ------------- Marked as reviewed by mdoerr (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19053#pullrequestreview-2035796660 From jsjolen at openjdk.org Thu May 2 14:03:02 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Thu, 2 May 2024 14:03:02 GMT Subject: RFR: 8312132: Add tracking of multiple address spaces in NMT [v46] In-Reply-To: References: <2UMNj1LkcFJOj5bIOi8wJuscaXrGIHzPvlVTIpI-bw4=.38340e91-571b-4cff-8ffa-e32d602395a8@github.com> Message-ID: On Tue, 30 Apr 2024 06:18:53 GMT, Thomas Stuefe wrote: >> I'm fine with `typedef`:ing `size_t`, but I'd like a naming suggestion from you if that's alright. Naming isn't my strong suit and I'd prefer only doing the rename once :). > > If the type is defined within VMATree scope, it can be anything short and succinct, e.g. > `VMATree::position_t`. We typically do not use the `_t` suffix in HotSpot for types. I changed it to `position`, though. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18289#discussion_r1587689356 From jsjolen at openjdk.org Thu May 2 14:15:02 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Thu, 2 May 2024 14:15:02 GMT Subject: RFR: 8312132: Add tracking of multiple address spaces in NMT [v54] In-Reply-To: References: Message-ID: On Mon, 29 Apr 2024 16:34:05 GMT, Gerard Ziemski wrote: >> Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: >> >> assert device != nullptr in MemoryFileTracker::instance > > src/hotspot/share/nmt/nmtNativeCallStackStorage.hpp line 67: > >> 65: // 4096 buckets ensures that probability of collision is 50% at approximately 64 >> 66: // different call stacks. >> 67: static const constexpr int nr_buckets = 4096; > > Shouldn't that be a prime number optimally, ex. 4099? (ideally Marsenne prime, but there is one at 127 then next one is 8191) Good point, let's go for 4099. I clarified the comment a bit also. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18289#discussion_r1587706249 From jsjolen at openjdk.org Thu May 2 14:15:03 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Thu, 2 May 2024 14:15:03 GMT Subject: RFR: 8312132: Add tracking of multiple address spaces in NMT [v54] In-Reply-To: References: <0z8n2nEWkKvUSaSN_UwDFykvB5xENEVGDfr0p4_SKw8=.5731dc50-c0cf-48e3-9f74-28db684a4ebf@github.com> <6zJC0o26l14DRheFDsjmnUnuytxe-aEEz8mOFrCTk1o=.3ab3f0ef-631d-4fc6-9eee-d7d210f11ea7@github.com> Message-ID: On Tue, 30 Apr 2024 17:32:46 GMT, Thomas Stuefe wrote: >> If the class is general enough, then perhaps it should be moved into `shared/utilities` so others can use it as well? A candidate for a follow up later? > > Yea, at least long term. For me its fine in another RFE too, or if we see a second use for this class. (One possibility I mentioned to StefanK recently was using this class to track zgc memory pages, which is currently done with linked lists). > > Up to you, @jdksjolen We're leaving it as `VMATree`, it's a private name for us NMT developers and we've all seen it and talked about it at this point. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18289#discussion_r1587709727 From jsjolen at openjdk.org Thu May 2 14:19:04 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Thu, 2 May 2024 14:19:04 GMT Subject: RFR: 8312132: Add tracking of multiple address spaces in NMT [v54] In-Reply-To: References: <0z8n2nEWkKvUSaSN_UwDFykvB5xENEVGDfr0p4_SKw8=.5731dc50-c0cf-48e3-9f74-28db684a4ebf@github.com> <6zJC0o26l14DRheFDsjmnUnuytxe-aEEz8mOFrCTk1o=.3ab3f0ef-631d-4fc6-9eee-d7d210f11ea7@github.com> Message-ID: On Thu, 2 May 2024 14:11:58 GMT, Johan Sj?len wrote: >> Yea, at least long term. For me its fine in another RFE too, or if we see a second use for this class. (One possibility I mentioned to StefanK recently was using this class to track zgc memory pages, which is currently done with linked lists). >> >> Up to you, @jdksjolen > > We're leaving it as `VMATree`, it's a private name for us NMT developers and we've all seen it and talked about it at this point. And to clarify: If this gets moved to utilities, then of course the name can be changed. It'd require a lot of other stuff to change also, probably. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18289#discussion_r1587716410 From jsjolen at openjdk.org Thu May 2 14:19:04 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Thu, 2 May 2024 14:19:04 GMT Subject: RFR: 8312132: Add tracking of multiple address spaces in NMT [v54] In-Reply-To: References: Message-ID: <2VPttjo9RlGT1E3Lt5J482AH7fM7csbHMQUhUbVZ_LE=.aa68263d-a896-4327-bb99-56fb12deb834@github.com> On Mon, 29 Apr 2024 16:13:53 GMT, Gerard Ziemski wrote: >> Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: >> >> assert device != nullptr in MemoryFileTracker::instance > > src/hotspot/share/nmt/nmtNativeCallStackStorage.hpp line 52: > >> 50: }; >> 51: NativeCallStack* put(const NativeCallStack& value) { >> 52: int bucket = value.calculate_hash() % nr_buckets; > > `calculate_hash()` is: > > > for (int i = 0; i < NMT_TrackingStackDepth; i++) { > hash += (uintptr_t)_stack[i]; > } > > > Wouldn't XOR serve us better here than plain "+" ? Yes, probably. This would be a separate RFE, however. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18289#discussion_r1587716835 From jsjolen at openjdk.org Thu May 2 14:21:58 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Thu, 2 May 2024 14:21:58 GMT Subject: RFR: 8312132: Add tracking of multiple address spaces in NMT [v49] In-Reply-To: <7dLzx1ziOv1Qo2vfr8hAh9JRxas2TBvp6Zjvw206KRA=.59863075-5473-4de3-8d08-6f89817e4f8c@github.com> References: <1cKD_eCdTb8AmNQwA9T4GFK0xu_CjJeABePgatn8xSY=.ec58f99d-bcd6-4e92-87a4-d1e49d33f4af@github.com> <7dLzx1ziOv1Qo2vfr8hAh9JRxas2TBvp6Zjvw206KRA=.59863075-5473-4de3-8d08-6f89817e4f8c@github.com> Message-ID: On Sat, 27 Apr 2024 13:13:53 GMT, Gerard Ziemski wrote: >> That's a discussion that should take place in its own PR. > > I see 16 instances of the same patter: > > > assert_post_init(); > if (!enabled()) return > > > `in memTracker.hpp` so it's local and isolated to `MemTracker` class. I didn't think this would be controversial/big deal to warrant its own PR? Maybe not, but this PR is already very large :). Changing this can be done in a separate PR and is not directly related to these changes, so it should be done in its separate PR. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18289#discussion_r1587722672 From mdoerr at openjdk.org Thu May 2 14:30:03 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Thu, 2 May 2024 14:30:03 GMT Subject: RFR: 8329257: AIX: Switch HOTSPOT_TOOLCHAIN_TYPE from xlc to gcc [v3] In-Reply-To: <1EgO9Z2UdqtUYN2oNClYl_evpBDw1asCxQRWPk0w_6E=.db209d00-845d-44bc-9ca1-e5c533087638@github.com> References: <-XeYeJ0OEmauTYsEoSXxzRmQXSKMOLw87GSpqDnEmug=.5cb7e71f-fea6-4a84-8260-5f515d3d3810@github.com> <18WjPZeDIWkxGIB0BJgyDg5VipCtY4EOlWmIGPWZGCw=.b50cf4a9-61a4-421e-97eb-3dbac94c14f9@github.com> <_xcaF7UUDHA11loD89Dz871vAQgRqMzCdPkahFDfKv8=.a2c6dcbe-5942-4fb7-9d8b-4239ea048e56@github.com> <76P7uKTuqo7IKYr5yBP4Vx1SS0AcEXC_6vDAU6LfIzo=.d939556f-6fab-4009-820b-821376bfdb7c@github.com> <6aR5nvKhz28A1CkxtaAD9CwTjILBjwZrrRwP3988oEc=.72203104-2ae5-40ff-bd87-168b684446e6@ github.com> <1EgO9Z2UdqtUYN2oNClYl_evpBDw1asCxQRWPk0w_6E=.db209d00-845d-44bc-9ca1-e5c533087638@github.com> Message-ID: On Tue, 30 Apr 2024 16:36:52 GMT, Kim Barrett wrote: >> I will do after labor day and create a PR with this suggested solution in your JDK-8330539. > > I think I still prefer just unconditionally including in globalDefinitions_gcc.hpp. For gcc/clang we are using `-std=c++14` + `-D_GNU_SOURCE` instead of `-std=gnu++14`. I forget exactly why. I don't really want > to be messing with `__STRICT_ANSI__`. https://github.com/openjdk/jdk/pull/19053 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18536#discussion_r1587736142 From alanb at openjdk.org Thu May 2 14:31:55 2024 From: alanb at openjdk.org (Alan Bateman) Date: Thu, 2 May 2024 14:31:55 GMT Subject: RFR: 8328083: degrade virtual thread support for GetObjectMonitorUsage [v2] In-Reply-To: References: <-lAT5GzHVRrOUJhhMLfV5CkkPA3DHHDUZfdE7CBOcHg=.ecb91a2d-82c2-4e94-a1f6-f84d7a8c2a87@github.com> Message-ID: On Thu, 2 May 2024 07:33:09 GMT, Serguei Spitsyn wrote: >> The fix is to degrade virtual threads support in the JVM TI `GetObjectMonitorUsage` function so that it is specified to only return an owner when the owner is a platform thread. Also, virtual threads are not listed in the both `waiters` and `notify_waiters` lists returned in the `jvmtiMonitorUsage` structure. Java 19 re-specified a number of JVMTI functions and events for virtual threads, we missed this one. >> >> The main motivation for degrading it now is that the object monitor implementation is being updated to allow virtual threads unmount while owning monitors. It would add overhead to record monitor usage when freezing/unmount, overhead that couldn't be tied to a JVMTI capability as the capability can be enabled at any time. >> >> `GetObjectMonitorUsage` was broken for 20+ years ([8247972](https://bugs.openjdk.org/browse/JDK-8247972)) without bug reports so it seems unlikely that the function is widely used. Degrading it to only return an owner when the owner is a platform thread has no compatibility impact for tooling that uses it in conjunction with `HotSpot` thread dumps or `ThreadMXBean`. >> >> One other point about `GetObjectMonitorUsage` is that it pre-dates j.u.concurrent in Java 5 so it can't be used to get a full picture of the lock usage in a program. >> >> The specs of the impacted `JDWP ObjectReference.MonitorInfo` command and the JDI `ObjectReference` `ownerThread()`, `waitingThreads()` and `entryCount()` methods are updated to match the JVM TI spec. >> >> Also, please, review the related CSR and Release Note: >> - CSR: [8331422](https://bugs.openjdk.org/browse/JDK-8331422): degrade virtual thread support for GetObjectMonitorUsage >> - RN: [8331465](https://bugs.openjdk.org/browse/JDK-8331465): Release Note: degrade virtual thread support for GetObjectMonitorUsage >> >> Testing: >> - tested impacted and updated tests locally >> - tested with mach5 tiers 1-6 > > Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: > > review: Corrections in: 1) JVMTI/JDWP spec; 2) test vthread checks; 3) test comments The update to the API specs looks okay. I think it was just a cut & paste error that the wrong text was copied into the description of the waiter_count and notify_waiter_count fields. I assume you'll update the CSR so it has the updated text. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19030#issuecomment-2090636854 From duke at openjdk.org Thu May 2 14:33:09 2024 From: duke at openjdk.org (Lei Zaakjyu) Date: Thu, 2 May 2024 14:33:09 GMT Subject: RFR: 8330694: Rename 'HeapRegion' to 'G1HeapRegion' [v5] In-Reply-To: <3IdWn9VGEERd8v9RcH2E_LzjVo0L8nMfi5jGWmhgVuM=.6b5b3be4-bfbd-4376-9580-48d78d75665c@github.com> References: <3IdWn9VGEERd8v9RcH2E_LzjVo0L8nMfi5jGWmhgVuM=.6b5b3be4-bfbd-4376-9580-48d78d75665c@github.com> Message-ID: > follow up 8267941 Lei Zaakjyu has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits: - Merge branch 'master' into JDK-8330694 - fix indentation - also tidy up - tidy up - rename ------------- Changes: https://git.openjdk.org/jdk/pull/18871/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=18871&range=04 Stats: 995 lines in 124 files changed: 0 ins; 4 del; 991 mod Patch: https://git.openjdk.org/jdk/pull/18871.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18871/head:pull/18871 PR: https://git.openjdk.org/jdk/pull/18871 From stuefe at openjdk.org Thu May 2 14:36:00 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 2 May 2024 14:36:00 GMT Subject: RFR: 8312132: Add tracking of multiple address spaces in NMT [v54] In-Reply-To: References: <0z8n2nEWkKvUSaSN_UwDFykvB5xENEVGDfr0p4_SKw8=.5731dc50-c0cf-48e3-9f74-28db684a4ebf@github.com> <6zJC0o26l14DRheFDsjmnUnuytxe-aEEz8mOFrCTk1o=.3ab3f0ef-631d-4fc6-9eee-d7d210f11ea7@github.com> Message-ID: On Thu, 2 May 2024 14:16:03 GMT, Johan Sj?len wrote: >> We're leaving it as `VMATree`, it's a private name for us NMT developers and we've all seen it and talked about it at this point. > > And to clarify: If this gets moved to utilities, then of course the name can be changed. It'd require a lot of other stuff to change also, probably. Can live with that. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18289#discussion_r1587744751 From duke at openjdk.org Thu May 2 14:41:11 2024 From: duke at openjdk.org (Lei Zaakjyu) Date: Thu, 2 May 2024 14:41:11 GMT Subject: RFR: 8330694: Rename 'HeapRegion' to 'G1HeapRegion' [v6] In-Reply-To: <3IdWn9VGEERd8v9RcH2E_LzjVo0L8nMfi5jGWmhgVuM=.6b5b3be4-bfbd-4376-9580-48d78d75665c@github.com> References: <3IdWn9VGEERd8v9RcH2E_LzjVo0L8nMfi5jGWmhgVuM=.6b5b3be4-bfbd-4376-9580-48d78d75665c@github.com> Message-ID: <0EiTktH1vdO-lzORVl-eklrn69deCe3IRr8fUHUYW1s=.02a0bd31-47a7-45ea-9fd4-63d4a7bee286@github.com> > follow up 8267941 Lei Zaakjyu has updated the pull request incrementally with one additional commit since the last revision: review ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18871/files - new: https://git.openjdk.org/jdk/pull/18871/files/f4e066c4..92d0df4d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18871&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18871&range=04-05 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/18871.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18871/head:pull/18871 PR: https://git.openjdk.org/jdk/pull/18871 From jsjolen at openjdk.org Thu May 2 14:43:37 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Thu, 2 May 2024 14:43:37 GMT Subject: RFR: 8312132: Add tracking of multiple address spaces in NMT [v56] In-Reply-To: References: Message-ID: > Hi, > > This PR introduces a new abstraction to NMT, named `MemoryFileTracker`. Today, NMT does not track any memory outside of the virtual memory address space. This means that if you allocated memory in something such as a memory-backed file and use `mmap` to map into that memory, then you'll have trouble reporting this to NMT. This is the situation that ZGC is in, and that is what this patch attempts to fix. > > ## `MemoryFileTracker` > > The `MemoryFileTracker` adds the ability of adding new virtual memory address spaces to NMT and committing memory to these, the basic API is: > > ```c++ > static MemoryFile* make_device(const char* descriptive_name); > static void free_device(MemoryFile* device); > > static void allocate_memory(MemoryFile* device, size_t offset, size_t size, > MEMFLAGS flag, const NativeCallStack& stack); > static void free_memory(MemoryFile* device, size_t offset, size_t size); > > > It is easiest to see how this is used by looking at what ZGC's `ZNMT` class does: > > ```c++ > void ZNMT::reserve(zaddress_unsafe start, size_t size) { > MemTracker::record_virtual_memory_reserve((address)start, size, CALLER_PC, mtJavaHeap); > } > void ZNMT::commit(zoffset offset, size_t size) { > MemTracker::allocate_memory_in(ZNMT::_device, static_cast(offset), size, mtJavaHeap, CALLER_PC); > } > void ZNMT::uncommit(zoffset offset, size_t size) { > MemTracker::free_memory_in(ZNMT::_device, (size_t)offset, size); > } > > void ZNMT::map(zaddress_unsafe addr, size_t size, zoffset offset) { > // NMT doesn't track mappings at the moment. > } > void ZNMT::unmap(zaddress_unsafe addr, size_t size) { > // NMT doesn't track mappings at the moment. > } > > > As you can see, any mapping between reserved regions and device-allocated memory is not recorded in NMT. This means that in detailed mode you only get reserved regions printed for the reserved memory, the device-allocated memory is reported separately. When performing summary reporting any memory allocated via these devices is added to the corresponding `MEMFLAGS` as `committed` memory. > > This patch is also acting as a base on which we deploy multiple new backend ideas to NMT. These ideas are: > > 1. Implement VMA tracking using a balanced binary tree approach. Today's `VirtualMemoryTracker`'s usage of linked lists is slow and brittle, we'd like to move away from it. Our Treap-based approach in this patch gives a performance boost such that we see 25x better performance in a benchmark. The idea and draft of this... Johan Sj?len has updated the pull request incrementally with two additional commits since the last revision: - typedef size_t into position - Use prime number for number of buckets ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18289/files - new: https://git.openjdk.org/jdk/pull/18289/files/dc987443..dba53897 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18289&range=55 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18289&range=54-55 Stats: 16 lines in 3 files changed: 4 ins; 0 del; 12 mod Patch: https://git.openjdk.org/jdk/pull/18289.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18289/head:pull/18289 PR: https://git.openjdk.org/jdk/pull/18289 From stuefe at openjdk.org Thu May 2 14:43:37 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 2 May 2024 14:43:37 GMT Subject: RFR: 8312132: Add tracking of multiple address spaces in NMT [v46] In-Reply-To: References: Message-ID: <8Pkr2lOm0YS7yPAEZooSGXR1WhOwyDkv2ej0qxCOKp4=.513c6399-f24e-4145-bcc9-e19eb0243949@github.com> On Thu, 25 Apr 2024 10:27:13 GMT, Johan Sj?len wrote: >> src/hotspot/share/nmt/vmatree.hpp line 135: >> >>> 133: SummaryDiff register_mapping(size_t A, size_t B, StateType state, Metadata& metadata); >>> 134: >>> 135: SummaryDiff reserve_mapping(size_t from, size_t sz, Metadata& metadata) { >> >> If we use `reserve_mapping` for `uncommit_memory`, we need to set a `StackIndex` and a `MEMFLAGS` to pass as a `Metadata`. If we use `mtNone` for example, all the uncommitted amount would be accounted for `mtNone`. >> Would you please provide a `uncommit_mapping(address, size)` to handle these issues properly? > > Let's wait with this until we actually port over the `VirtualMemoryTracker` to use `VMATree`. I think we should rethink recording specific stacks for uncommitted memory. I don't believe anyone cares who reserves uncommitted memory; or who uncommits memory. And this only leads to splintering the tree, if we uncommit from different callsites. We should consider keeping stacks for committed memory only, and use some noop stack placeholder for uncommitted mmeory. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18289#discussion_r1587752880 From shade at openjdk.org Thu May 2 14:47:18 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 2 May 2024 14:47:18 GMT Subject: RFR: 8331573: Rename CollectedHeap::is_gc_active to be explicitly about STW GCs Message-ID: `CollectedHeap::is_gc_active()` is confusing, since its name implies _any_ GC phase is running, while it actually only covers the STW GCs. It would be good to rename it for clarity. The freed-up name, `is_gc_active` could then be repurposed to track any (concurrent or STW) GC phase running. That would be useful to resolve [JDK-8331572](https://bugs.openjdk.org/browse/JDK-8331572). Doing this rename separately guarantees we have caught and renamed all current uses. Additional testing: - [ ] Linux AArch64 server fastdebug, `all` ------------- Commit messages: - Fix Changes: https://git.openjdk.org/jdk/pull/19064/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19064&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8331573 Stats: 64 lines in 27 files changed: 0 ins; 2 del; 62 mod Patch: https://git.openjdk.org/jdk/pull/19064.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19064/head:pull/19064 PR: https://git.openjdk.org/jdk/pull/19064 From mli at openjdk.org Thu May 2 15:00:03 2024 From: mli at openjdk.org (Hamlin Li) Date: Thu, 2 May 2024 15:00:03 GMT Subject: RFR: 8320995: RISC-V: C2 PopCountVI Message-ID: Hi, Can you help to review this patch? Both auto-vect and vector api depends on this intrinsic. Thanks! ## Performance Not performance test was done, as this depends on vcpop.v instruction in zvbb extension and the code seqeunce is rather simple than non-intrinsic version. ------------- Commit messages: - move vcpop_v - fix Zvbb flag and misc - remove unexpected local files - fix issue for mask version - Merge branch 'master' into popcount-v - merge master; fix issues - minor fix - merge master - PopCountV: Initial commit Changes: https://git.openjdk.org/jdk/pull/19065/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19065&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8320995 Stats: 92 lines in 11 files changed: 80 ins; 0 del; 12 mod Patch: https://git.openjdk.org/jdk/pull/19065.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19065/head:pull/19065 PR: https://git.openjdk.org/jdk/pull/19065 From kbarrett at openjdk.org Thu May 2 15:11:53 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Thu, 2 May 2024 15:11:53 GMT Subject: RFR: 8330539: Use #include instead of -Dalloca'(size)'=__builtin_alloca'(size)' for AIX In-Reply-To: References: Message-ID: On Thu, 2 May 2024 09:54:14 GMT, Joachim Kern wrote: > We need to find a better way to handle alloca on AIX. > > See the discussion in the PR for https://bugs.openjdk.org/browse/JDK-8329257, e.g. https://github.com/openjdk/jdk/pull/18536#discussion_r1568650313 in which three alternatives are suggested. Quoting: > > Let me summarize the choices we have and ask for your vote. > Magnus dislikes the -Dalloca'(size)'=__builtin_alloca'(size)' in flags-cflags.m4 I introduced to get rid of > > #if defined(_AIX) > #include > #endif > > in globalDefinitions_gcc.hpp. > > We have four possible solutions > > 1. Reintroduce > > #if defined(_AIX) > #include > #endif > > in globalDefinitions_gcc.hpp. > > 2. Unconditionally introduce only #include in globalDefinitions_gcc.hpp. This should work for all platforms using this header including the unofficial Windows/gcc Port, although only AIX needs it. > > 3. Add > > #if defined(_AIX) > #include > #endif > > to the sources using alloca(). These are > /hotspot/share/runtime/os.cpp > /hotspot/share/runtime/javaThread.cpp > /hotspot/share/utilities/vmError.cpp > Here we need the AIX condition, because otherwise the classic Windows Build (NTAMD64) fails. > > 4. Replace -Dalloca'(size)'=__builtin_alloca'(size)' in flags-cflags.m4 by -U__STRICT_ANSI__ at the same place. Explanation can also found in https://github.com/openjdk/jdk/pull/18536#discussion_r1583360569 and following. > > I will implement the solution with the most likes and having no dislike. Looks good. ------------- Marked as reviewed by kbarrett (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19053#pullrequestreview-2036034561 From stefank at openjdk.org Thu May 2 17:08:53 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Thu, 2 May 2024 17:08:53 GMT Subject: RFR: 8331573: Rename CollectedHeap::is_gc_active to be explicitly about STW GCs In-Reply-To: References: Message-ID: On Thu, 2 May 2024 14:40:35 GMT, Aleksey Shipilev wrote: > `CollectedHeap::is_gc_active()` is confusing, since its name implies _any_ GC phase is running, while it actually only covers the STW GCs. It would be good to rename it for clarity. The freed-up name, `is_gc_active` could then be repurposed to track any (concurrent or STW) GC phase running. That would be useful to resolve [JDK-8331572](https://bugs.openjdk.org/browse/JDK-8331572). > > Doing this rename separately guarantees we have caught and renamed all current uses. > > Additional testing: > - [ ] Linux AArch64 server fastdebug, `all` Seems like a good change. src/hotspot/share/gc/parallel/psParallelCompact.cpp line 1270: > 1268: > 1269: ParallelScavengeHeap* heap = ParallelScavengeHeap::heap(); > 1270: assert(!heap->is_stw_gc_active(), "not reentrant"); While reading this I see that all these "not reentrant" asserts seems redundant given that we already do these checks in `IsSTWGCActiveMark`. Brownies points if you get rid of them. ;) src/hotspot/share/gc/parallel/psParallelCompact.cpp line 1493: > 1491: PCAddThreadRootsMarkingTaskClosure(uint worker_id) : _worker_id(worker_id) { } > 1492: void do_thread(Thread* thread) { > 1493: assert(ParallelScavengeHeap::heap()->is_stw_gc_active(), "called outside gc"); Should this be updated to "called outside gc pause" as you did in `G1CollectedHeap::pin_object`? The same comment goes for the other occurrences below. ------------- Changes requested by stefank (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19064#pullrequestreview-2036315901 PR Review Comment: https://git.openjdk.org/jdk/pull/19064#discussion_r1587988542 PR Review Comment: https://git.openjdk.org/jdk/pull/19064#discussion_r1587974562 From stefank at openjdk.org Thu May 2 17:14:55 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Thu, 2 May 2024 17:14:55 GMT Subject: RFR: 8331573: Rename CollectedHeap::is_gc_active to be explicitly about STW GCs In-Reply-To: References: Message-ID: <168Z8R7UX0uYymkZ_NCW17QbrwLu_82rg0j4nJ895_E=.5e1c4624-770f-4e94-9588-797982c2cca9@github.com> On Thu, 2 May 2024 14:40:35 GMT, Aleksey Shipilev wrote: > `CollectedHeap::is_gc_active()` is confusing, since its name implies _any_ GC phase is running, while it actually only covers the STW GCs. It would be good to rename it for clarity. The freed-up name, `is_gc_active` could then be repurposed to track any (concurrent or STW) GC phase running. That would be useful to resolve [JDK-8331572](https://bugs.openjdk.org/browse/JDK-8331572). > > Doing this rename separately guarantees we have caught and renamed all current uses. > > Additional testing: > - [ ] Linux AArch64 server fastdebug, `all` Marked as reviewed by stefank (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/19064#pullrequestreview-2036349526 From shade at openjdk.org Thu May 2 17:14:57 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 2 May 2024 17:14:57 GMT Subject: RFR: 8331573: Rename CollectedHeap::is_gc_active to be explicitly about STW GCs In-Reply-To: References: Message-ID: On Thu, 2 May 2024 16:56:11 GMT, Stefan Karlsson wrote: >> `CollectedHeap::is_gc_active()` is confusing, since its name implies _any_ GC phase is running, while it actually only covers the STW GCs. It would be good to rename it for clarity. The freed-up name, `is_gc_active` could then be repurposed to track any (concurrent or STW) GC phase running. That would be useful to resolve [JDK-8331572](https://bugs.openjdk.org/browse/JDK-8331572). >> >> Doing this rename separately guarantees we have caught and renamed all current uses. >> >> Additional testing: >> - [ ] Linux AArch64 server fastdebug, `all` > > src/hotspot/share/gc/parallel/psParallelCompact.cpp line 1493: > >> 1491: PCAddThreadRootsMarkingTaskClosure(uint worker_id) : _worker_id(worker_id) { } >> 1492: void do_thread(Thread* thread) { >> 1493: assert(ParallelScavengeHeap::heap()->is_stw_gc_active(), "called outside gc"); > > Should this be updated to "called outside gc pause" as you did in `G1CollectedHeap::pin_object`? The same comment goes for the other occurrences below. I deliberately stopped myself from doing this for Parallel GC code, where every GC is STW GC :) I can change to "GC pause" if you want. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19064#discussion_r1587999105 From stefank at openjdk.org Thu May 2 17:29:53 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Thu, 2 May 2024 17:29:53 GMT Subject: RFR: 8331573: Rename CollectedHeap::is_gc_active to be explicitly about STW GCs In-Reply-To: References: Message-ID: <8rTp30vldMrfGYMh6uP-tirE9bjNGTBePOSztx95MD4=.8f9cdead-0301-42c4-acad-a2bfc26b4702@github.com> On Thu, 2 May 2024 17:23:21 GMT, Aleksey Shipilev wrote: >> src/hotspot/share/gc/parallel/psParallelCompact.cpp line 1270: >> >>> 1268: >>> 1269: ParallelScavengeHeap* heap = ParallelScavengeHeap::heap(); >>> 1270: assert(!heap->is_stw_gc_active(), "not reentrant"); >> >> While reading this I see that all these "not reentrant" asserts seems redundant given that we already do these checks in `IsSTWGCActiveMark`. Brownies points if you get rid of them. ;) > > Ah, hm. Indeed! Separate PR? There is some light cleanup in G1 that can be associated with it. This PR would keep with just a mechanical rename. Sounds like a good idea. >> src/hotspot/share/gc/parallel/psParallelCompact.cpp line 1493: >> >>> 1491: PCAddThreadRootsMarkingTaskClosure(uint worker_id) : _worker_id(worker_id) { } >>> 1492: void do_thread(Thread* thread) { >>> 1493: assert(ParallelScavengeHeap::heap()->is_stw_gc_active(), "called outside gc"); >> >> Should this be updated to "called outside gc pause" as you did in `G1CollectedHeap::pin_object`? The same comment goes for the other occurrences below. > > I deliberately stopped myself from doing this for Parallel GC code, where every GC is STW GC :) I can change to "GC pause" if you want. Ah, I see. I wouldn't mind if it were changed to include "pause", but I'm also OK with you leaving it as is. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19064#discussion_r1588019866 PR Review Comment: https://git.openjdk.org/jdk/pull/19064#discussion_r1588018382 From shade at openjdk.org Thu May 2 17:29:52 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 2 May 2024 17:29:52 GMT Subject: RFR: 8331573: Rename CollectedHeap::is_gc_active to be explicitly about STW GCs In-Reply-To: References: Message-ID: On Thu, 2 May 2024 17:04:44 GMT, Stefan Karlsson wrote: >> `CollectedHeap::is_gc_active()` is confusing, since its name implies _any_ GC phase is running, while it actually only covers the STW GCs. It would be good to rename it for clarity. The freed-up name, `is_gc_active` could then be repurposed to track any (concurrent or STW) GC phase running. That would be useful to resolve [JDK-8331572](https://bugs.openjdk.org/browse/JDK-8331572). >> >> Doing this rename separately guarantees we have caught and renamed all current uses. >> >> Additional testing: >> - [ ] Linux AArch64 server fastdebug, `all` > > src/hotspot/share/gc/parallel/psParallelCompact.cpp line 1270: > >> 1268: >> 1269: ParallelScavengeHeap* heap = ParallelScavengeHeap::heap(); >> 1270: assert(!heap->is_stw_gc_active(), "not reentrant"); > > While reading this I see that all these "not reentrant" asserts seems redundant given that we already do these checks in `IsSTWGCActiveMark`. Brownies points if you get rid of them. ;) Ah, hm. Indeed! Separate PR? There is some light cleanup in G1 that can be associated with it. This PR would keep with just a mechanical rename. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19064#discussion_r1588015870 From zgu at openjdk.org Thu May 2 17:34:52 2024 From: zgu at openjdk.org (Zhengyu Gu) Date: Thu, 2 May 2024 17:34:52 GMT Subject: RFR: 8331573: Rename CollectedHeap::is_gc_active to be explicitly about STW GCs In-Reply-To: References: Message-ID: On Thu, 2 May 2024 14:40:35 GMT, Aleksey Shipilev wrote: > `CollectedHeap::is_gc_active()` is confusing, since its name implies _any_ GC phase is running, while it actually only covers the STW GCs. It would be good to rename it for clarity. The freed-up name, `is_gc_active` could then be repurposed to track any (concurrent or STW) GC phase running. That would be useful to resolve [JDK-8331572](https://bugs.openjdk.org/browse/JDK-8331572). > > Doing this rename separately guarantees we have caught and renamed all current uses. > > Additional testing: > - [ ] Linux AArch64 server fastdebug, `all` LGTM ------------- Marked as reviewed by zgu (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19064#pullrequestreview-2036411170 From sspitsyn at openjdk.org Thu May 2 17:47:54 2024 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Thu, 2 May 2024 17:47:54 GMT Subject: RFR: 8328083: degrade virtual thread support for GetObjectMonitorUsage [v2] In-Reply-To: References: <-lAT5GzHVRrOUJhhMLfV5CkkPA3DHHDUZfdE7CBOcHg=.ecb91a2d-82c2-4e94-a1f6-f84d7a8c2a87@github.com> Message-ID: <-vAMg5Qz5ofCPtk7lYfp1fsmg98Rk_MpdiHHHK_rN5g=.2b836e19-d999-48bb-a778-1a0efeb17079@github.com> On Thu, 2 May 2024 07:33:09 GMT, Serguei Spitsyn wrote: >> The fix is to degrade virtual threads support in the JVM TI `GetObjectMonitorUsage` function so that it is specified to only return an owner when the owner is a platform thread. Also, virtual threads are not listed in the both `waiters` and `notify_waiters` lists returned in the `jvmtiMonitorUsage` structure. Java 19 re-specified a number of JVMTI functions and events for virtual threads, we missed this one. >> >> The main motivation for degrading it now is that the object monitor implementation is being updated to allow virtual threads unmount while owning monitors. It would add overhead to record monitor usage when freezing/unmount, overhead that couldn't be tied to a JVMTI capability as the capability can be enabled at any time. >> >> `GetObjectMonitorUsage` was broken for 20+ years ([8247972](https://bugs.openjdk.org/browse/JDK-8247972)) without bug reports so it seems unlikely that the function is widely used. Degrading it to only return an owner when the owner is a platform thread has no compatibility impact for tooling that uses it in conjunction with `HotSpot` thread dumps or `ThreadMXBean`. >> >> One other point about `GetObjectMonitorUsage` is that it pre-dates j.u.concurrent in Java 5 so it can't be used to get a full picture of the lock usage in a program. >> >> The specs of the impacted `JDWP ObjectReference.MonitorInfo` command and the JDI `ObjectReference` `ownerThread()`, `waitingThreads()` and `entryCount()` methods are updated to match the JVM TI spec. >> >> Also, please, review the related CSR and Release Note: >> - CSR: [8331422](https://bugs.openjdk.org/browse/JDK-8331422): degrade virtual thread support for GetObjectMonitorUsage >> - RN: [8331465](https://bugs.openjdk.org/browse/JDK-8331465): Release Note: degrade virtual thread support for GetObjectMonitorUsage >> >> Testing: >> - tested impacted and updated tests locally >> - tested with mach5 tiers 1-6 > > Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: > > review: Corrections in: 1) JVMTI/JDWP spec; 2) test vthread checks; 3) test comments Thank you, Alan. I've updated the CSR with the recent diffs. Also, added a statement about separate deprecation plans. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19030#issuecomment-2091156003 From never at openjdk.org Thu May 2 18:49:55 2024 From: never at openjdk.org (Tom Rodriguez) Date: Thu, 2 May 2024 18:49:55 GMT Subject: RFR: 8331208: Memory stress test that checks OutOfMemoryError stack trace fails [v3] In-Reply-To: References: Message-ID: On Thu, 2 May 2024 09:08:34 GMT, Doug Simon wrote: >> This pull request mitigates failures in memory stress tests that check the stack trace of an `OutOfMemoryError` for certain expected entries. >> >> The stack trace of an OOME will [not be allocated once all preallocated OOMEs are used up](https://github.com/openjdk/jdk/blob/3d5eeac3a38ece4a23ea6da2dfe5939d64e81cea/src/hotspot/share/memory/universe.cpp#L722). If the only heap allocations performed in stressful conditions are those of the stress test, then the [4 preallocated OOMEs](https://github.com/openjdk/jdk/blob/f1d0e715b67e2ca47b525069d8153abbb33f75b9/src/hotspot/share/runtime/globals.hpp#L800) would be sufficient. However, it's possible for VM internal allocations to also occur during stressful conditions, especially in `-Xcomp` mode. For example, [CompileBroker::compile_method](https://github.com/openjdk/jdk/blob/3d5eeac3a38ece4a23ea6da2dfe5939d64e81cea/src/hotspot/share/compiler/compileBroker.cpp#L1399) will try to resolve the string constants in the constant pool of the method about to be compiled. This can fail as shown here: >> >> V [jvm.dll+0x62c23a] Exceptions::_throw+0x11a (exceptions.cpp:168) >> V [jvm.dll+0x62d85b] Exceptions::_throw_oop+0xab (exceptions.cpp:140) >> V [jvm.dll+0xbbce78] MemAllocator::Allocation::check_out_of_memory+0x208 (memAllocator.cpp:138) >> V [jvm.dll+0xbbcac8] MemAllocator::allocate+0x158 (memAllocator.cpp:377) >> V [jvm.dll+0x79bd05] InstanceKlass::allocate_instance+0x95 (instanceKlass.cpp:1509) >> V [jvm.dll+0x7ddeed] java_lang_String::basic_create+0x9d (javaClasses.cpp:273) >> V [jvm.dll+0x7e43c0] java_lang_String::create_from_unicode+0x60 (javaClasses.cpp:291) >> V [jvm.dll+0xdb91a5] StringTable::do_intern+0xb5 (stringTable.cpp:379) >> V [jvm.dll+0xdba9f2] StringTable::intern+0x1b2 (stringTable.cpp:368) >> V [jvm.dll+0xdbaaa6] StringTable::intern+0x86 (stringTable.cpp:328) >> V [jvm.dll+0x51c8b1] ConstantPool::string_at_impl+0x1d1 (constantPool.cpp:1251) >> V [jvm.dll+0x51b95b] ConstantPool::resolve_string_constants_impl+0xeb (constantPool.cpp:800) >> V [jvm.dll+0x4f2f8d] CompileBroker::compile_method+0x31d (compileBroker.cpp:1395) >> V [jvm.dll+0x4f3474] CompileBroker::compile_method+0xc4 (compileBroker.cpp:1348) >> >> These internal allocations can occur before the allocations of the test and thus use up the pre-allocated OOMEs. As a result, the OOMEs triggered by the stress test may end up throwing the [default, shared OOME instance](https://github.com/openjdk/jdk/blob/3d5eeac3a38ec... > > Doug Simon has updated the pull request incrementally with one additional commit since the last revision: > > addressed review comments and suggestions The new version looks great to me. ------------- Marked as reviewed by never (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18925#pullrequestreview-2036608061 From kevinw at openjdk.org Thu May 2 19:16:18 2024 From: kevinw at openjdk.org (Kevin Walls) Date: Thu, 2 May 2024 19:16:18 GMT Subject: RFR: 8314225: SIGSEGV in JavaThread::is_lock_owned [v5] In-Reply-To: References: Message-ID: > Removal of JavaThread's MonitorChunks member. This held lock information during deoptimization, but access to it is unnecessary for anything other than the deoptimization itself. > > Access to it in is_lock_owned() was racy, and caused rare crashes. Kevin Walls has updated the pull request incrementally with one additional commit since the last revision: Missing include ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18940/files - new: https://git.openjdk.org/jdk/pull/18940/files/3e9dd511..562859fb Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18940&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18940&range=03-04 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/18940.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18940/head:pull/18940 PR: https://git.openjdk.org/jdk/pull/18940 From kevinw at openjdk.org Thu May 2 19:40:18 2024 From: kevinw at openjdk.org (Kevin Walls) Date: Thu, 2 May 2024 19:40:18 GMT Subject: RFR: 8314225: SIGSEGV in JavaThread::is_lock_owned [v6] In-Reply-To: References: Message-ID: > Removal of JavaThread's MonitorChunks member. This held lock information during deoptimization, but access to it is unnecessary for anything other than the deoptimization itself. > > Access to it in is_lock_owned() was racy, and caused rare crashes. Kevin Walls has updated the pull request incrementally with one additional commit since the last revision: monitor->owner() == nullptr handling in fill_in ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18940/files - new: https://git.openjdk.org/jdk/pull/18940/files/562859fb..54086ccd Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18940&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18940&range=04-05 Stats: 7 lines in 1 file changed: 0 ins; 3 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/18940.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18940/head:pull/18940 PR: https://git.openjdk.org/jdk/pull/18940 From kevinw at openjdk.org Thu May 2 19:40:18 2024 From: kevinw at openjdk.org (Kevin Walls) Date: Thu, 2 May 2024 19:40:18 GMT Subject: RFR: 8314225: SIGSEGV in JavaThread::is_lock_owned [v6] In-Reply-To: References: <4NzfdylxvqETF87l3E4O3XdBMInuP7_8S9mhS6tN0QA=.cc497605-246b-4ebc-9816-09b384683e0d@github.com> <4U-AP8zHxJrxwXYoTcxlpn5OvztYUW-ijTAd5TJ3I_4=.731aeb9c-115a-40c8-9298-577e0fada9ce@github.com> Message-ID: <2YLBGUEZCYrtkqsT4jCdmLrNzGhdsN8U6YOY_hqkbo0=.597115db-ac29-4fa3-84c3-b713b5652c85@github.com> On Thu, 2 May 2024 00:10:32 GMT, Dean Long wrote: >> I can follow that logic but ... if it is null then what is this code actually doing? We have determined that the frame does contain locked monitors and so we are transferring them across. How can such a locked monitor have a null object? > > I assume it's only for the `fill_in` `realloc_failures` case. But you're right, it doesn't seem very useful. It's just going to look like an unlocked monitor slot in the interpreter frame. We could consider skipping these in `fill_in`, then they won't show up later in `unpack_on_stack`(). fill_in() has previously OK with seeing monitor->owner() == nullptr so it's already setting dest->set_obj(null) under some conditions. I see we can handle the null separately and simplify the asserts there. vframeArrayElement::unpack_on_stack() Still might retrieve a null, so the asserts there keep the guard against doing the owner check -- I'm not sure if they won't show up there - it loops over the number of elements in the MonitorChunk* so it should see them all? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18940#discussion_r1588258872 From ihse at openjdk.org Thu May 2 21:21:57 2024 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Thu, 2 May 2024 21:21:57 GMT Subject: RFR: 8330539: Use #include instead of -Dalloca'(size)'=__builtin_alloca'(size)' for AIX In-Reply-To: References: Message-ID: On Thu, 2 May 2024 09:54:14 GMT, Joachim Kern wrote: > We need to find a better way to handle alloca on AIX. > > See the discussion in the PR for https://bugs.openjdk.org/browse/JDK-8329257, e.g. https://github.com/openjdk/jdk/pull/18536#discussion_r1568650313 in which three alternatives are suggested. Quoting: > > Let me summarize the choices we have and ask for your vote. > Magnus dislikes the -Dalloca'(size)'=__builtin_alloca'(size)' in flags-cflags.m4 I introduced to get rid of > > #if defined(_AIX) > #include > #endif > > in globalDefinitions_gcc.hpp. > > We have four possible solutions > > 1. Reintroduce > > #if defined(_AIX) > #include > #endif > > in globalDefinitions_gcc.hpp. > > 2. Unconditionally introduce only #include in globalDefinitions_gcc.hpp. This should work for all platforms using this header including the unofficial Windows/gcc Port, although only AIX needs it. > > 3. Add > > #if defined(_AIX) > #include > #endif > > to the sources using alloca(). These are > /hotspot/share/runtime/os.cpp > /hotspot/share/runtime/javaThread.cpp > /hotspot/share/utilities/vmError.cpp > Here we need the AIX condition, because otherwise the classic Windows Build (NTAMD64) fails. > > 4. Replace -Dalloca'(size)'=__builtin_alloca'(size)' in flags-cflags.m4 by -U__STRICT_ANSI__ at the same place. Explanation can also found in https://github.com/openjdk/jdk/pull/18536#discussion_r1583360569 and following. > > I will implement the solution with the most likes and having no dislike. Marked as reviewed by ihse (Reviewer). Looks good to me too. ------------- PR Review: https://git.openjdk.org/jdk/pull/19053#pullrequestreview-2036941518 PR Comment: https://git.openjdk.org/jdk/pull/19053#issuecomment-2091668231 From cjplummer at openjdk.org Thu May 2 21:35:54 2024 From: cjplummer at openjdk.org (Chris Plummer) Date: Thu, 2 May 2024 21:35:54 GMT Subject: RFR: 8330694: Rename 'HeapRegion' to 'G1HeapRegion' [v6] In-Reply-To: References: <3IdWn9VGEERd8v9RcH2E_LzjVo0L8nMfi5jGWmhgVuM=.6b5b3be4-bfbd-4376-9580-48d78d75665c@github.com> Message-ID: <7mbaVxLkGeEnczjYFmrAtu3vtwdZmZxgH06ZxAradkY=.e8ff6e41-4fea-4517-86d3-89f432b37ccd@github.com> On Sat, 20 Apr 2024 03:04:23 GMT, Chris Plummer wrote: >> Lei Zaakjyu has updated the pull request incrementally with one additional commit since the last revision: >> >> review > > src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/utilities/PointerLocation.java line 131: > >> 129: } >> 130: >> 131: public G1HeapRegion getHeapRegion() { > > Do we want to rename to getG1HeapRegion? It seems you agreed with this suggestion but the change was never made. > test/hotspot/jtreg/serviceability/sa/TestG1HeapRegion.java line 62: > >> 60: agent.attach(Integer.parseInt(pid)); >> 61: G1CollectedHeap heap = (G1CollectedHeap)VM.getVM().getUniverse().heap(); >> 62: G1HeapRegion hr = heap.hrm().heapRegionIterator().next(); > > "g1HeapRegionIterator"? And here also it seems you agreed with this suggestion but the change was never made. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18871#discussion_r1588433200 PR Review Comment: https://git.openjdk.org/jdk/pull/18871#discussion_r1588434351 From cjplummer at openjdk.org Thu May 2 21:38:53 2024 From: cjplummer at openjdk.org (Chris Plummer) Date: Thu, 2 May 2024 21:38:53 GMT Subject: RFR: 8328083: degrade virtual thread support for GetObjectMonitorUsage [v2] In-Reply-To: <2A25kL9oqh30aBRofiekO9CwmSwgEZ5LEcReUEfmxrQ=.eec2eaf8-dc9a-4a0d-bb42-d9f192f72fb2@github.com> References: <-lAT5GzHVRrOUJhhMLfV5CkkPA3DHHDUZfdE7CBOcHg=.ecb91a2d-82c2-4e94-a1f6-f84d7a8c2a87@github.com> <2A25kL9oqh30aBRofiekO9CwmSwgEZ5LEcReUEfmxrQ=.eec2eaf8-dc9a-4a0d-bb42-d9f192f72fb2@github.com> Message-ID: On Thu, 2 May 2024 06:45:39 GMT, Serguei Spitsyn wrote: >> test/hotspot/jtreg/serviceability/jvmti/ObjectMonitorUsage/ObjectMonitorUsage.java line 257: >> >>> 255: // Correct the expected values for the virtual thread case. >>> 256: int expEnteringCount = isVirtual ? 0 : NUMBER_OF_ENTERING_THREADS; >>> 257: int expWaitingCount = isVirtual ? 0 : NUMBER_OF_WAITING_THREADS; >> >> There are comments below that still reference NUMBER_OF_ENTERING_THREADS and NUMBER_OF_WAITING_THREADS. > > Thank you for the comment. In fact, I don't know how to fix it. > Replacing of NUMBER_OF_ENTERING_THREADS/NUMBER_OF_WAITING_THREADS in comments with `expEnteringCount/expWaitingCount` does not make sense to me. The comments are about the tested pattern, not about the real values. Please, let me know if you have any suggestion on fixing. expEnteringCount/expWaitingCount contain the tested patterns. I don't see why they can't just replace NUMBER_OF_ENTERING_THREADS/NUMBER_OF_WAITING_THREADS in the comments also. In fact it is confusing if you don't because code right below the comments references expEnteringCount/expWaitingCount, not NUMBER_OF_ENTERING_THREADS/NUMBER_OF_WAITING_THREADS. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19030#discussion_r1588440431 From cjplummer at openjdk.org Thu May 2 21:52:53 2024 From: cjplummer at openjdk.org (Chris Plummer) Date: Thu, 2 May 2024 21:52:53 GMT Subject: RFR: 8328083: degrade virtual thread support for GetObjectMonitorUsage [v2] In-Reply-To: References: <-lAT5GzHVRrOUJhhMLfV5CkkPA3DHHDUZfdE7CBOcHg=.ecb91a2d-82c2-4e94-a1f6-f84d7a8c2a87@github.com> Message-ID: On Thu, 2 May 2024 07:33:09 GMT, Serguei Spitsyn wrote: >> The fix is to degrade virtual threads support in the JVM TI `GetObjectMonitorUsage` function so that it is specified to only return an owner when the owner is a platform thread. Also, virtual threads are not listed in the both `waiters` and `notify_waiters` lists returned in the `jvmtiMonitorUsage` structure. Java 19 re-specified a number of JVMTI functions and events for virtual threads, we missed this one. >> >> The main motivation for degrading it now is that the object monitor implementation is being updated to allow virtual threads unmount while owning monitors. It would add overhead to record monitor usage when freezing/unmount, overhead that couldn't be tied to a JVMTI capability as the capability can be enabled at any time. >> >> `GetObjectMonitorUsage` was broken for 20+ years ([8247972](https://bugs.openjdk.org/browse/JDK-8247972)) without bug reports so it seems unlikely that the function is widely used. Degrading it to only return an owner when the owner is a platform thread has no compatibility impact for tooling that uses it in conjunction with `HotSpot` thread dumps or `ThreadMXBean`. >> >> One other point about `GetObjectMonitorUsage` is that it pre-dates j.u.concurrent in Java 5 so it can't be used to get a full picture of the lock usage in a program. >> >> The specs of the impacted `JDWP ObjectReference.MonitorInfo` command and the JDI `ObjectReference` `ownerThread()`, `waitingThreads()` and `entryCount()` methods are updated to match the JVM TI spec. >> >> Also, please, review the related CSR and Release Note: >> - CSR: [8331422](https://bugs.openjdk.org/browse/JDK-8331422): degrade virtual thread support for GetObjectMonitorUsage >> - RN: [8331465](https://bugs.openjdk.org/browse/JDK-8331465): Release Note: degrade virtual thread support for GetObjectMonitorUsage >> >> Testing: >> - tested impacted and updated tests locally >> - tested with mach5 tiers 1-6 > > Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: > > review: Corrections in: 1) JVMTI/JDWP spec; 2) test vthread checks; 3) test comments src/hotspot/share/prims/jvmti.xml line 8280: > 8278: > 8279: The number of platform threads waiting to own this monitor, or 0 > 8280: if only virtual threads are waiting or no threads are waiting This is now exactly the same as `waiter_count` above. I don't think this is what you intended. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19030#discussion_r1588453160 From cjplummer at openjdk.org Thu May 2 21:52:54 2024 From: cjplummer at openjdk.org (Chris Plummer) Date: Thu, 2 May 2024 21:52:54 GMT Subject: RFR: 8328083: degrade virtual thread support for GetObjectMonitorUsage [v2] In-Reply-To: References: <-lAT5GzHVRrOUJhhMLfV5CkkPA3DHHDUZfdE7CBOcHg=.ecb91a2d-82c2-4e94-a1f6-f84d7a8c2a87@github.com> Message-ID: On Wed, 1 May 2024 20:42:16 GMT, Chris Plummer wrote: >> Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: >> >> review: Corrections in: 1) JVMTI/JDWP spec; 2) test vthread checks; 3) test comments > > src/java.se/share/data/jdwp/jdwp.spec line 1622: > >> 1620: (threadObject owner "The platform thread owning this monitor, or nullptr " >> 1621: "if owned` by a virtual thread or not owned.") >> 1622: (int entryCount "The number of times the owning platform thread has entered the monitor.") > > See the comment I left for the JVMTI spec. We should be more complete in the explanation here, explaining how it is 0 for virtual threads. I don't think this has been resolved. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19030#discussion_r1588459462 From cjplummer at openjdk.org Thu May 2 21:52:55 2024 From: cjplummer at openjdk.org (Chris Plummer) Date: Thu, 2 May 2024 21:52:55 GMT Subject: RFR: 8328083: degrade virtual thread support for GetObjectMonitorUsage [v2] In-Reply-To: References: <-lAT5GzHVRrOUJhhMLfV5CkkPA3DHHDUZfdE7CBOcHg=.ecb91a2d-82c2-4e94-a1f6-f84d7a8c2a87@github.com> <2A25kL9oqh30aBRofiekO9CwmSwgEZ5LEcReUEfmxrQ=.eec2eaf8-dc9a-4a0d-bb42-d9f192f72fb2@github.com> Message-ID: <2lhm2l4CzUnyStTj215njaZg9EcMwwKWxMxtdZTXD8I=.ba8b1275-f16c-4af4-80e5-81ace9b40aa2@github.com> On Thu, 2 May 2024 21:36:27 GMT, Chris Plummer wrote: >> Thank you for the comment. In fact, I don't know how to fix it. >> Replacing of NUMBER_OF_ENTERING_THREADS/NUMBER_OF_WAITING_THREADS in comments with `expEnteringCount/expWaitingCount` does not make sense to me. The comments are about the tested pattern, not about the real values. Please, let me know if you have any suggestion on fixing. > > expEnteringCount/expWaitingCount contain the tested patterns. I don't see why they can't just replace NUMBER_OF_ENTERING_THREADS/NUMBER_OF_WAITING_THREADS in the comments also. In fact it is confusing if you don't because code right below the comments references expEnteringCount/expWaitingCount, not NUMBER_OF_ENTERING_THREADS/NUMBER_OF_WAITING_THREADS. ...and there are also comments above with this issue. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19030#discussion_r1588456637 From cjplummer at openjdk.org Thu May 2 22:42:52 2024 From: cjplummer at openjdk.org (Chris Plummer) Date: Thu, 2 May 2024 22:42:52 GMT Subject: RFR: 8330146: assert(!_thread->is_in_any_VTMS_transition()) failed In-Reply-To: References: Message-ID: On Thu, 2 May 2024 10:07:35 GMT, Serguei Spitsyn wrote: > Any event posting code except CFLH, ClassPrepare and ClassLoad events has a conditional return in case if the event is posted during a VTMS transition. The CFLH, ClassPrepare and ClassLoad event posting code has just an assert instead. The ClassPrepare and ClassLoad events also have a conditional return in a case of temporary VTMS transition. > This update is to align the CFLH, ClassPrepare and ClassLoad events with all other events in this area. > > Testing: > - TBD: submit mach5 tiers 1-6 I looked at other places where the following is already in place: `return; // no events should be posted if thread is in any VTMS transition` I can understand the rationale for not sending events in those cases (like breakpoint, singlestep, and methodentry). However, loss ClassPrepare and ClassLoad events seems a bit more significant for profilers that might be trying to accurately track all class loading. It seems maybe we should instead be trying to avoid these events by preloading the classes as was suggested as an option in the CR. ------------- PR Review: https://git.openjdk.org/jdk/pull/19054#pullrequestreview-2037080382 From cjplummer at openjdk.org Thu May 2 22:46:52 2024 From: cjplummer at openjdk.org (Chris Plummer) Date: Thu, 2 May 2024 22:46:52 GMT Subject: RFR: 8330852: All callers of JvmtiEnvBase::get_threadOop_and_JavaThread should pass current thread explicitly [v3] In-Reply-To: References: Message-ID: <7DToDczTkXlyv-tvpHtlVJA33LDsAfV_2t7uzQ5SNSI=.b68b2474-eb3e-4e89-a84f-42403104585a@github.com> On Tue, 30 Apr 2024 23:48:02 GMT, Alex Menkov wrote: >> Some cleanup related to JvmtiEnvBase::get_threadOop_and_JavaThread method >> >> Testing: tier1-6 > > Alex Menkov has updated the pull request incrementally with one additional commit since the last revision: > > renamed current_thread tp current Given the number of `current` renames (which distracts from the core change in this PR), and given that not all occurrences were renamed (only those that were touched), I think it would be best to leave the rename for a separate PR. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18986#issuecomment-2091868298 From fyang at openjdk.org Thu May 2 23:19:51 2024 From: fyang at openjdk.org (Fei Yang) Date: Thu, 2 May 2024 23:19:51 GMT Subject: RFR: 8326306: RISC-V: Re-structure MASM calls and jumps [v2] In-Reply-To: References: <1UZeWIQJIEYbPetxWPlhQffyAy4gWXvNiV79i4_3pMQ=.86fb3068-940b-49ea-a2ea-b84a865d4cca@github.com> <0gMQgeYKyAzms64-hBIrltqUSfetu3Kczwr7IwLmF18=.8f583ac0-afff-4f1b-985f-a688cd898ae3@github.com> <4iLVM5rBRUo43EgY72DPBxJJ3qaHC4Nx_aWBUW9pIM8=.1f7cdee2-15d8-4b0f-b4ac-082f23198d8e@github.com> Message-ID: On Thu, 2 May 2024 07:06:51 GMT, Robbin Ehn wrote: > We still need relocates rt_call, not sure why you removed it. I removed the relocate because I am thinking it should be absolute calls in the else block of `MacroAssembler::rt_call` [1], so no reloc required, as you mentioned in your previous comments. [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp#L5027 > It seem like we need two version of rt_call one with address and one with Address. Then it seem like we could remove far_call as the rt_call would do the right thing. > > I like your idea, and we should do that, but it seems like it's not trivial just to add to this patch. Is there a reason we need to include such in changes in this PR? On holiday this week, we can discuss further next week :-) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18942#discussion_r1588519824 From amenkov at openjdk.org Thu May 2 23:58:52 2024 From: amenkov at openjdk.org (Alex Menkov) Date: Thu, 2 May 2024 23:58:52 GMT Subject: RFR: 8330852: All callers of JvmtiEnvBase::get_threadOop_and_JavaThread should pass current thread explicitly [v3] In-Reply-To: References: Message-ID: <5jmKQ3eqaYVEG5niaFpjNT4F4wKyjufHXTv_AbWTp2U=.7df58d35-552a-4677-ac59-8902ef4ad42c@github.com> On Tue, 30 Apr 2024 23:48:02 GMT, Alex Menkov wrote: >> Some cleanup related to JvmtiEnvBase::get_threadOop_and_JavaThread method >> >> Testing: tier1-6 > > Alex Menkov has updated the pull request incrementally with one additional commit since the last revision: > > renamed current_thread tp current ok, I'll revert last update and rename current_thread to current only in few places where new variable is introduced ------------- PR Comment: https://git.openjdk.org/jdk/pull/18986#issuecomment-2091927214 From jwaters at openjdk.org Fri May 3 00:40:52 2024 From: jwaters at openjdk.org (Julian Waters) Date: Fri, 3 May 2024 00:40:52 GMT Subject: RFR: 8330539: Use #include instead of -Dalloca'(size)'=__builtin_alloca'(size)' for AIX In-Reply-To: References: Message-ID: On Thu, 2 May 2024 09:54:14 GMT, Joachim Kern wrote: > We need to find a better way to handle alloca on AIX. > > See the discussion in the PR for https://bugs.openjdk.org/browse/JDK-8329257, e.g. https://github.com/openjdk/jdk/pull/18536#discussion_r1568650313 in which three alternatives are suggested. Quoting: > > Let me summarize the choices we have and ask for your vote. > Magnus dislikes the -Dalloca'(size)'=__builtin_alloca'(size)' in flags-cflags.m4 I introduced to get rid of > > #if defined(_AIX) > #include > #endif > > in globalDefinitions_gcc.hpp. > > We have four possible solutions > > 1. Reintroduce > > #if defined(_AIX) > #include > #endif > > in globalDefinitions_gcc.hpp. > > 2. Unconditionally introduce only #include in globalDefinitions_gcc.hpp. This should work for all platforms using this header including the unofficial Windows/gcc Port, although only AIX needs it. > > 3. Add > > #if defined(_AIX) > #include > #endif > > to the sources using alloca(). These are > /hotspot/share/runtime/os.cpp > /hotspot/share/runtime/javaThread.cpp > /hotspot/share/utilities/vmError.cpp > Here we need the AIX condition, because otherwise the classic Windows Build (NTAMD64) fails. > > 4. Replace -Dalloca'(size)'=__builtin_alloca'(size)' in flags-cflags.m4 by -U__STRICT_ANSI__ at the same place. Explanation can also found in https://github.com/openjdk/jdk/pull/18536#discussion_r1583360569 and following. > > I will implement the solution with the most likes and having no dislike. I stand corrected, it seems that MinGW distributions don't have alloca.h in their headers. No worries, the Windows/gcc Port will handle this downstream ------------- PR Comment: https://git.openjdk.org/jdk/pull/19053#issuecomment-2091958780 From dlong at openjdk.org Fri May 3 01:52:56 2024 From: dlong at openjdk.org (Dean Long) Date: Fri, 3 May 2024 01:52:56 GMT Subject: RFR: 8314225: SIGSEGV in JavaThread::is_lock_owned [v6] In-Reply-To: References: Message-ID: On Thu, 2 May 2024 19:40:18 GMT, Kevin Walls wrote: >> Removal of JavaThread's MonitorChunks member. This held lock information during deoptimization, but access to it is unnecessary for anything other than the deoptimization itself. >> >> Access to it in is_lock_owned() was racy, and caused rare crashes. > > Kevin Walls has updated the pull request incrementally with one additional commit since the last revision: > > monitor->owner() == nullptr handling in fill_in src/hotspot/share/runtime/javaThread.hpp line 676: > 674: > 675: // Fast-locking support (not for LM_LIGHTWEIGHT) > 676: bool is_lock_owned(address adr) const; Suggestion: // Stack-locking support (not for LM_LIGHTWEIGHT) bool is_lock_owned(address stack_adr) const; ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18940#discussion_r1588598705 From amenkov at openjdk.org Fri May 3 01:54:24 2024 From: amenkov at openjdk.org (Alex Menkov) Date: Fri, 3 May 2024 01:54:24 GMT Subject: RFR: 8330852: All callers of JvmtiEnvBase::get_threadOop_and_JavaThread should pass current thread explicitly [v4] In-Reply-To: References: Message-ID: > Some cleanup related to JvmtiEnvBase::get_threadOop_and_JavaThread method > > Testing: tier1-6 Alex Menkov has updated the pull request incrementally with three additional commits since the last revision: - update - Revert "renamed current_thread to current" This reverts commit d5d614bcf0861466acd695296e974d2253f84c9f. - Revert "renamed current_thread tp current" This reverts commit 4602632221044aa754a1bc8d11e7a3e9a0092590. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18986/files - new: https://git.openjdk.org/jdk/pull/18986/files/46026322..9be24a4a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18986&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18986&range=02-03 Stats: 122 lines in 2 files changed: 0 ins; 0 del; 122 mod Patch: https://git.openjdk.org/jdk/pull/18986.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18986/head:pull/18986 PR: https://git.openjdk.org/jdk/pull/18986 From dlong at openjdk.org Fri May 3 02:04:59 2024 From: dlong at openjdk.org (Dean Long) Date: Fri, 3 May 2024 02:04:59 GMT Subject: RFR: 8314225: SIGSEGV in JavaThread::is_lock_owned [v6] In-Reply-To: References: Message-ID: On Thu, 2 May 2024 19:40:18 GMT, Kevin Walls wrote: >> Removal of JavaThread's MonitorChunks member. This held lock information during deoptimization, but access to it is unnecessary for anything other than the deoptimization itself. >> >> Access to it in is_lock_owned() was racy, and caused rare crashes. > > Kevin Walls has updated the pull request incrementally with one additional commit since the last revision: > > monitor->owner() == nullptr handling in fill_in src/hotspot/share/runtime/synchronizer.cpp line 1060: > 1058: // the ObjectMonitor. > 1059: } else if (LockingMode == LM_LEGACY && mark.has_locker() > 1060: && JavaThread::cast(current)->is_lock_owned((address)mark.locker())) { This looks risky. How about guarding it with a check for current->is_Java_thread()? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18940#discussion_r1588602588 From dlong at openjdk.org Fri May 3 02:22:56 2024 From: dlong at openjdk.org (Dean Long) Date: Fri, 3 May 2024 02:22:56 GMT Subject: RFR: 8314225: SIGSEGV in JavaThread::is_lock_owned [v6] In-Reply-To: References: Message-ID: On Thu, 2 May 2024 19:40:18 GMT, Kevin Walls wrote: >> Removal of JavaThread's MonitorChunks member. This held lock information during deoptimization, but access to it is unnecessary for anything other than the deoptimization itself. >> >> Access to it in is_lock_owned() was racy, and caused rare crashes. > > Kevin Walls has updated the pull request incrementally with one additional commit since the last revision: > > monitor->owner() == nullptr handling in fill_in src/hotspot/share/runtime/vframeArray.cpp line 97: > 95: dest->set_obj(nullptr); > 96: } else { > 97: assert(!monitor->owner()->is_unlocked(), "object must be null or locked"); Suggestion: assert(monitor->owner() != nullptr, "monitor owner must not be null"); assert(!monitor->owner()->is_unlocked(), "monitor must be locked"); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18940#discussion_r1588608773 From baikaishiuc at gmail.com Fri May 3 02:32:10 2024 From: baikaishiuc at gmail.com (zhengxianwei) Date: Fri, 3 May 2024 10:32:10 +0800 Subject: Where does the openjdk JVM interpreter execute the bytecode instanceof operation Message-ID: Hello everyone, I'm a simulator developer working on ARM <-> X86 simulator development. I've run into an issue , the crash occurred while running an x86 version of dbeaver on an ARM processor. ( https://github.com/dbeaver/dbeaver). As a Java novice, I tried debugging with jdb and initially found that the instanceof operation returns different results on x86 and ARM architectures. So, I wanted to examine the specific execution process of instanceof using the -Xint mode. However, I could only find the instanceof execution during the JIT process ( https://github.com/openjdk/jdk/blob/6bef0474c8b8773d0d20c0f25c36a2ce9cdbd7e8/src/hotspot/cpu/x86/templateTable_x86.cpp#L4243 ). I couldn't locate where instanceof is executed in the interpreter. I tried searching all files within the project containing instanceof and adding print statements, but to no avail. Perhaps I made a mistake somewhere . I thought the most likely place where the interpreter executes instanceof would be: https://github.com/openjdk/jdk/blob/6bef0474c8b8773d0d20c0f25c36a2ce9cdbd7e8/src/hotspot/share/interpreter/zero/bytecodeInterpreter.cpp#L2079 . However, it seems that it didn't execute there either. So, my question is, where does the openjdk JVM interpreter execute the bytecode instanceof operation? -------------- next part -------------- An HTML attachment was scrubbed... URL: From dlong at openjdk.org Fri May 3 02:45:53 2024 From: dlong at openjdk.org (Dean Long) Date: Fri, 3 May 2024 02:45:53 GMT Subject: RFR: 8314225: SIGSEGV in JavaThread::is_lock_owned [v6] In-Reply-To: References: Message-ID: On Thu, 2 May 2024 19:40:18 GMT, Kevin Walls wrote: >> Removal of JavaThread's MonitorChunks member. This held lock information during deoptimization, but access to it is unnecessary for anything other than the deoptimization itself. >> >> Access to it in is_lock_owned() was racy, and caused rare crashes. > > Kevin Walls has updated the pull request incrementally with one additional commit since the last revision: > > monitor->owner() == nullptr handling in fill_in src/hotspot/share/runtime/vframeArray.cpp line 94: > 92: assert(!monitor->owner_is_scalar_replaced() || realloc_failures, "object should be reallocated already"); > 93: BasicObjectLock* dest = _monitors->at(index); > 94: if (monitor->owner_is_scalar_replaced() || monitor->owner() == nullptr) { The only way to get a null owner is if owner_is_scalar_replaced() is true: https://github.com/openjdk/jdk/blob/6bef0474c8b8773d0d20c0f25c36a2ce9cdbd7e8/src/hotspot/share/runtime/stackValue.hpp#L52 and to get this far with it still null means `realloc_failures` is true. We could avoid a null check later in unpack_on_stack if we skip adding to _monitors in this case. So maybe use a GrowableArray inside MonitorChunk and add elements using append(). Suggestion: if (monitor->owner_is_scalar_replaced()) { src/hotspot/share/runtime/vframeArray.cpp line 316: > 314: BasicObjectLock* src = _monitors->at(index); > 315: top->set_obj(src->obj()); > 316: assert(src->obj() == nullptr || ObjectSynchronizer::current_thread_holds_lock(thread, Handle(thread, src->obj())), No need for null check if we don't add null owners to _monitors. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18940#discussion_r1588615873 PR Review Comment: https://git.openjdk.org/jdk/pull/18940#discussion_r1588616143 From baikaishiuc at gmail.com Fri May 3 02:56:23 2024 From: baikaishiuc at gmail.com (zhengxianwei) Date: Fri, 3 May 2024 10:56:23 +0800 Subject: Where does the openjdk JVM interpreter execute the bytecode instanceof operation In-Reply-To: References: Message-ID: Sorry, the format of the previous email was incorrect. I've rearranged it. Hello everyone, I'm a simulator developer working on ARM <-> X86 simulator development. I've run into an issue , the crash occurred while running an x86 version of dbeaver on an ARM processor. (https://github.com/dbeaver/dbeaver). As a Java novice, I tried debugging with jdb and initially found that the instanceof operation returns different results on x86 and ARM architectures. So, I wanted to examine the specific execution process of instanceof using the -Xint mode. However, I could only find the instanceof execution during the JIT process ( https://github.com/openjdk/jdk/blob/6bef0474c8b8773d0d20c0f25c36a2ce9cdbd7e8/src/hotspot/cpu/x86/templateTable_x86.cpp#L4243 ). I couldn't locate where instanceof is executed in the interpreter. I tried searching all files within the project containing instanceof and adding print statements, but to no avail. Perhaps I made a mistake somewhere . I thought the most likely place where the interpreter executes instanceof would be: https://github.com/openjdk/jdk/blob/6bef0474c8b8773d0d20c0f25c36a2ce9cdbd7e8/src/hotspot/share/interpreter/zero/bytecodeInterpreter.cpp#L2079 . However, it seems that it didn't execute there either. So, my question is, where does the openjdk JVM interpreter execute the bytecode instanceof operation? On Fri, May 3, 2024 at 10:32?AM zhengxianwei wrote: > Hello everyone, I'm a simulator developer working on ARM <-> X86 > simulator development. I've run into an issue , the crash occurred while > running an x86 version of dbeaver on an ARM processor. ( > https://github.com/dbeaver/dbeaver). As a Java novice, I tried debugging > with jdb and initially found that the instanceof operation returns > different results on x86 and ARM architectures. So, I wanted to examine > the specific execution process of instanceof using the -Xint mode. However, > I could only find the instanceof execution during the JIT process ( > https://github.com/openjdk/jdk/blob/6bef0474c8b8773d0d20c0f25c36a2ce9cdbd7e8/src/hotspot/cpu/x86/templateTable_x86.cpp#L4243 > ). I couldn't locate where instanceof is executed in the interpreter. I > tried searching all files within the project containing instanceof and > adding print statements, but to no avail. Perhaps I made a mistake > somewhere. I thought the most likely place where the interpreter executes > instanceof would be: > https://github.com/openjdk/jdk/blob/6bef0474c8b8773d0d20c0f25c36a2ce9cdbd7e8/src/hotspot/share/interpreter/zero/bytecodeInterpreter.cpp#L2079 > . However, it seems that it didn't execute there either. So, my question > is, where does the openjdk JVM interpreter execute the bytecode > instanceof operation? > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From tanksherman27 at gmail.com Fri May 3 03:02:30 2024 From: tanksherman27 at gmail.com (Julian Waters) Date: Fri, 3 May 2024 11:02:30 +0800 Subject: Where does the openjdk JVM interpreter execute the bytecode instanceof operation Message-ID: Hi Xian Wei, No, you are right! The code in templateTable_x86.cpp that you linked to in your post is not part of the Just in Time Compilers, it is part of the x86 Interpreter! The Java HotSpot VM actually has 2 different Interpreters, the primary Interpreter is written in large chunks of assembly specific to each platform, which is then processed by the HotSpot macro assemblers. The bytecodeInterpreter.cpp file you linked to is part of the second and less often used Interpreter, which is why modifying the bytecodeInterpreter.cpp instanceof implementation did nothing in your case (The Interpreter used actually depends on the platform, and the secondary Interpreter is not used on ARM or x86). The details on the macro assemblers unfortunately elude me since I am not a HotSpot expert (Although I hope to be one day), but to understand how instanceof works on x86 and ARM, you need to understand both x86 and ARM assembly. The Interpreter's instanceof opcode is implemented on x86 in https://github.com/openjdk/jdk/blob/6bef0474c8b8773d0d20c0f25c36a2ce9cdbd7e8/src/hotspot/cpu/x86/templateTable_x86.cpp#L4243 and on ARM, it is implemented in https://github.com/openjdk/jdk/blob/6bef0474c8b8773d0d20c0f25c36a2ce9cdbd7e8/src/hotspot/cpu/arm/templateTable_arm.cpp#L4182 Happy to help! best regards, Julian -------------- next part -------------- An HTML attachment was scrubbed... URL: From tanksherman27 at gmail.com Fri May 3 03:32:24 2024 From: tanksherman27 at gmail.com (Julian Waters) Date: Fri, 3 May 2024 11:32:24 +0800 Subject: Where does the openjdk JVM interpreter execute the bytecode instanceof operation In-Reply-To: References: Message-ID: Unfortunately that's where my knowledge ends. The Template Table as it's called is only run once at the start of the program, to insert the assembly into executable memory, I'm not sure how to properly add a log in the middle of the assembly inside the Interpreter. Maybe David or Thomas could help with this? Sorry I couldn't help you further best regards, Julian On Fri, May 3, 2024 at 11:21?AM zhengxianwei wrote: > Thank you for your response, which helped clarify some basic knowledge > about the JVM interpreter. > > Okay, let's set aside bytecodeInterpreter.cpp for now. > > However, I still have some confusion about the interpreter you mentioned > in templateTable_x86.cpp. I conducted a test where I added a print > statement in TemplateTable::instanceof(): > > ``` > void TemplateTable::instanceof() { > print_info(os)("... %s:%d", __FILE__, __LINE__); > ... > } > ``` > > When I used the default dbeaver.ini file, the added log was present, > indicating that TemplateTable::instanceof was indeed executed. > > But when I modified dbeaver.ini to include an option -Xint (which I > understand to mean running the program entirely using interpretation), the > print disappeared. This suggests that after adding -Xint, > TemplateTable::instanceof was not executed. So, I'm wondering if there are > any other interpretations for instanceof after adding -Xint. > > On Fri, May 3, 2024 at 11:03?AM Julian Waters > wrote: > >> Hi Xian Wei, >> >> No, you are right! The code in templateTable_x86.cpp that you linked to >> in your post is not part of the Just in Time Compilers, it is part of the >> x86 Interpreter! The Java HotSpot VM actually has 2 different Interpreters, >> the primary Interpreter is written in large chunks of assembly specific to >> each platform, which is then processed by the HotSpot macro assemblers. The >> bytecodeInterpreter.cpp file you linked to is part of the second and less >> often used Interpreter, which is why modifying the bytecodeInterpreter.cpp >> instanceof implementation did nothing in your case (The Interpreter used >> actually depends on the platform, and the secondary Interpreter is not used >> on ARM or x86). The details on the macro assemblers unfortunately elude me >> since I am not a HotSpot expert (Although I hope to be one day), but to >> understand how instanceof works on x86 and ARM, you need to understand both >> x86 and ARM assembly. The Interpreter's instanceof opcode is implemented on >> x86 in >> https://github.com/openjdk/jdk/blob/6bef0474c8b8773d0d20c0f25c36a2ce9cdbd7e8/src/hotspot/cpu/x86/templateTable_x86.cpp#L4243 >> and on ARM, it is implemented in >> https://github.com/openjdk/jdk/blob/6bef0474c8b8773d0d20c0f25c36a2ce9cdbd7e8/src/hotspot/cpu/arm/templateTable_arm.cpp#L4182 >> >> Happy to help! >> >> best regards, >> Julian >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From aboldtch at openjdk.org Fri May 3 05:44:04 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Fri, 3 May 2024 05:44:04 GMT Subject: RFR: 8326957: Implement JEP 474: ZGC: Generational Mode by Default [v4] In-Reply-To: References: Message-ID: > This is the implementation task for `JEP 474: ZGC: Generational Mode by Default`. See the JEP for details. [JDK-8326667](https://bugs.openjdk.org/browse/JDK-8326667) Axel Boldt-Christmas has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains eight commits: - Merge tag 'jdk-23+21' into JDK-8326957 Added tag jdk-23+21 for changeset e833bfc8 - Merge tag 'jdk-23+19' into JDK-8326957 Added tag jdk-23+19 for changeset 706b421c - Remove extra space - Use consistent terminology - Merge tag 'jdk-23+17' into JDK-8326957 Added tag jdk-23+17 for changeset 8efd7aa6 - Merge tag 'jdk-23+16' into JDK-8326957 Added tag jdk-23+16 for changeset d580bcf9 - Update VMDeprecatedOptions.java test - 8326957: Implementation of Deprecate Non-Generational ZGC ------------- Changes: https://git.openjdk.org/jdk/pull/18393/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=18393&range=03 Stats: 107 lines in 7 files changed: 105 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/18393.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18393/head:pull/18393 PR: https://git.openjdk.org/jdk/pull/18393 From duke at openjdk.org Fri May 3 06:24:54 2024 From: duke at openjdk.org (Lei Zaakjyu) Date: Fri, 3 May 2024 06:24:54 GMT Subject: RFR: 8330694: Rename 'HeapRegion' to 'G1HeapRegion' [v6] In-Reply-To: <7mbaVxLkGeEnczjYFmrAtu3vtwdZmZxgH06ZxAradkY=.e8ff6e41-4fea-4517-86d3-89f432b37ccd@github.com> References: <3IdWn9VGEERd8v9RcH2E_LzjVo0L8nMfi5jGWmhgVuM=.6b5b3be4-bfbd-4376-9580-48d78d75665c@github.com> <7mbaVxLkGeEnczjYFmrAtu3vtwdZmZxgH06ZxAradkY=.e8ff6e41-4fea-4517-86d3-89f432b37ccd@github.com> Message-ID: <7xbz4cpe2Je-WAWeZ17YSu05m93hGEUarw4YwvNqF50=.11095cda-d9ae-439e-8884-0f36ef7ef58d@github.com> On Thu, 2 May 2024 21:32:50 GMT, Chris Plummer wrote: >> test/hotspot/jtreg/serviceability/sa/TestG1HeapRegion.java line 62: >> >>> 60: agent.attach(Integer.parseInt(pid)); >>> 61: G1CollectedHeap heap = (G1CollectedHeap)VM.getVM().getUniverse().heap(); >>> 62: G1HeapRegion hr = heap.hrm().heapRegionIterator().next(); >> >> "g1HeapRegionIterator"? > > And here also it seems you agreed with this suggestion but the change was never made. I think we can make these changes in later PRs in order to avoid making this one even larger. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18871#discussion_r1588784617 From alanb at openjdk.org Fri May 3 06:42:55 2024 From: alanb at openjdk.org (Alan Bateman) Date: Fri, 3 May 2024 06:42:55 GMT Subject: RFR: 8328083: degrade virtual thread support for GetObjectMonitorUsage [v2] In-Reply-To: References: <-lAT5GzHVRrOUJhhMLfV5CkkPA3DHHDUZfdE7CBOcHg=.ecb91a2d-82c2-4e94-a1f6-f84d7a8c2a87@github.com> Message-ID: On Thu, 2 May 2024 21:44:43 GMT, Chris Plummer wrote: >> Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: >> >> review: Corrections in: 1) JVMTI/JDWP spec; 2) test vthread checks; 3) test comments > > src/hotspot/share/prims/jvmti.xml line 8280: > >> 8278: >> 8279: The number of platform threads waiting to own this monitor, or 0 >> 8280: if only virtual threads are waiting or no threads are waiting > > This is now exactly the same as `waiter_count` above. I don't think this is what you intended. Indeed, looks like the description for waiter_count has been pasted in here in error. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19030#discussion_r1588796716 From tschatzl at openjdk.org Fri May 3 06:48:52 2024 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Fri, 3 May 2024 06:48:52 GMT Subject: RFR: 8330694: Rename 'HeapRegion' to 'G1HeapRegion' [v6] In-Reply-To: <7xbz4cpe2Je-WAWeZ17YSu05m93hGEUarw4YwvNqF50=.11095cda-d9ae-439e-8884-0f36ef7ef58d@github.com> References: <3IdWn9VGEERd8v9RcH2E_LzjVo0L8nMfi5jGWmhgVuM=.6b5b3be4-bfbd-4376-9580-48d78d75665c@github.com> <7mbaVxLkGeEnczjYFmrAtu3vtwdZmZxgH06ZxAradkY=.e8ff6e41-4fea-4517-86d3-89f432b37ccd@github.com> <7xbz4cpe2Je-WAWeZ17YSu05m93hGEUarw4YwvNqF50=.11095cda-d9ae-439e-8884-0f36ef7ef58d@github.com> Message-ID: <7cYZUtNt8Xl8sjIWBNpspMhLR-XUaimVawGI2f5j57g=.8cdbc54e-4f85-4526-bba8-35432e82a1ce@github.com> On Fri, 3 May 2024 06:21:54 GMT, Lei Zaakjyu wrote: >> And here also it seems you agreed with this suggestion but the change was never made. > > I think we can make these changes in later PRs in order to avoid making this one even larger. As mentioned earlier, I also think the SA changes should be done here in this CR since they are fairly minor and related to single methods, so not as encapsulated as changes to the remaining class names (for which I already created an RFR https://bugs.openjdk.org/browse/JDK-8331385). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18871#discussion_r1588800658 From alanb at openjdk.org Fri May 3 06:51:51 2024 From: alanb at openjdk.org (Alan Bateman) Date: Fri, 3 May 2024 06:51:51 GMT Subject: RFR: 8330146: assert(!_thread->is_in_any_VTMS_transition()) failed In-Reply-To: References: Message-ID: On Thu, 2 May 2024 22:40:02 GMT, Chris Plummer wrote: > It seems maybe we should instead be trying to avoid these events by preloading the classes as was suggested as an option in the CR. I don't think preloading PinnedThreadPrinter will solve it completely. First usage could potentially load a lot of other classes, e.g. the current implementation uses streams and several other APIs. Going forward, this debugging option should be removed. It's already removed in the loom repo. It has been the source of several issues. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19054#issuecomment-2092406784 From tschatzl at openjdk.org Fri May 3 06:52:51 2024 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Fri, 3 May 2024 06:52:51 GMT Subject: RFR: 8331573: Rename CollectedHeap::is_gc_active to be explicitly about STW GCs In-Reply-To: References: Message-ID: On Thu, 2 May 2024 14:40:35 GMT, Aleksey Shipilev wrote: > `CollectedHeap::is_gc_active()` is confusing, since its name implies _any_ GC phase is running, while it actually only covers the STW GCs. It would be good to rename it for clarity. The freed-up name, `is_gc_active` could then be repurposed to track any (concurrent or STW) GC phase running. That would be useful to resolve [JDK-8331572](https://bugs.openjdk.org/browse/JDK-8331572). > > Doing this rename separately guarantees we have caught and renamed all current uses. > > Additional testing: > - [ ] Linux AArch64 server fastdebug, `all` Marked as reviewed by tschatzl (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/19064#pullrequestreview-2037494618 From thomas.schatzl at oracle.com Fri May 3 07:12:08 2024 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Fri, 3 May 2024 09:12:08 +0200 Subject: RFR: 8319548: Unexpected internal name for Filler array klass causes error in VisualVM In-Reply-To: References: Message-ID: On 30.04.24 03:38, jjscl8888 wrote: > Thank you for your clarification. if the instance in question had no > traffic but you observed a sudden increase in the old generation size > at 2:35 in the graph, and subsequent garbage collections (GCs) did not > reduce the size of the old generation back to its original value Collectors are fairly reluctant to give back memory to the OS. For G1 in particular, there are the options `MinHeapFreeRatio` and `MaxHeapFreeRatio` which to some degree steer commit and uncommit. * `MinHeapFreeRatio` is "The minimum percentage of heap free after GC to avoid expansion", i.e. minimum amount of memory should be kept free. Default is 40%, i.e. expands if less than that amount of memory is free. * `MaxHeapFreeRatio` is "The maximum percentage of heap free after GC to avoid shrinking", i.e. maximum amount of memory that should be kept free. Default is 70%; i.e. only shrinks the heap if more than 70% of memory is free. Not sure the latter condition is met here to shrink, and without logs (`-Xlog:gc+ergo+heap=debug`) this is just a guess. Also, this kind of heap resizing (including shrinking) only occurs in the Remark pause. So to decrease the heap more aggressively, it might work to decrease `MaxHeapFreeRatio` (and probably `MinHeapFreeRatio` because for such large heaps the default values are maybe not optimal). Hth, Thomas From mbaesken at openjdk.org Fri May 3 07:38:00 2024 From: mbaesken at openjdk.org (Matthias Baesken) Date: Fri, 3 May 2024 07:38:00 GMT Subject: RFR: 8331428: ubsan: JVM flag checking complains about MaxTenuringThresholdConstraintFunc, InitialTenuringThresholdConstraintFunc and AllocatePrefetchStepSizeConstraintFunc Message-ID: Seems MaxTenuringThresholdConstraintFunc, InitialTenuringThresholdConstraintFunc and AllocatePrefetchStepSizeConstraintFunc check uint values (see gc_globals.hpp). However those functions have uintx in the check functions. This causes Ubsan to complain : /jdk/src/hotspot/share/runtime/flags/jvmFlagAccess.cpp:176:12: runtime error: call to function MaxTenuringThresholdConstraintFunc(unsigned long, bool) through pointer to incorrect function type 'JVMFlag::Error (*)(unsigned int, bool)' jvmFlagConstraintsGC.cpp:188: note: MaxTenuringThresholdConstraintFunc(unsigned long, bool) defined here #0 0x10541cfbe in FlagAccessImpl_uint::typed_check_constraint(void*, unsigned int, bool) const jvmFlagAccess.cpp:176 #1 0x1054253d7 in JVMFlagLimit::check_all_constraints(JVMFlagConstraintPhase) jvmFlagLimit.cpp:179 #2 0x105f20b98 in Threads::create_vm(JavaVMInitArgs*, bool*) threads.cpp:471 #3 0x10538c3fb in JNI_CreateJavaVM_inner(JavaVM_**, void**, void*) jni.cpp:3581 #4 0x10342e71c in JavaMain java.c:491 #5 0x103435248 in ThreadJavaMain java_md_macosx.m:720 #6 0x7fff204338fb in _pthread_start+0xdf (libsystem_pthread.dylib:x86_64+0x68fb) #7 0x7fff2042f442 in thread_start+0xe (libsystem_pthread.dylib:x86_64+0x2442) /jdk/src/hotspot/share/runtime/flags/jvmFlagAccess.cpp:176:12: runtime error: call to function InitialTenuringThresholdConstraintFunc(unsigned long, bool) through pointer to incorrect function type 'JVMFlag::Error (*)(unsigned int, bool)' jvmFlagConstraintsGC.cpp:177: note: InitialTenuringThresholdConstraintFunc(unsigned long, bool) defined here #0 0x117b1cfbe in FlagAccessImpl_uint::typed_check_constraint(void*, unsigned int, bool) const jvmFlagAccess.cpp:176 #1 0x117b253d7 in JVMFlagLimit::check_all_constraints(JVMFlagConstraintPhase) jvmFlagLimit.cpp:179 #2 0x118620b98 in Threads::create_vm(JavaVMInitArgs*, bool*) threads.cpp:471 #3 0x117a8c3fb in JNI_CreateJavaVM_inner(JavaVM_**, void**, void*) jni.cpp:3581 #4 0x10077e71c in JavaMain java.c:491 #5 0x100785248 in ThreadJavaMain java_md_macosx.m:720 #6 0x7fff204338fb in _pthread_start+0xdf (libsystem_pthread.dylib:x86_64+0x68fb) #7 0x7fff2042f442 in thread_start+0xe (libsystem_pthread.dylib:x86_64+0x2442) and /jdk/src/hotspot/share/runtime/flags/jvmFlagAccess.cpp:157:12: runtime error: call to function AllocatePrefetchStepSizeConstraintFunc(long, bool) through pointer to incorrect function type 'JVMFlag::Error (*)(int, bool)' jvmFlagConstraintsCompiler.cpp:70: note: AllocatePrefetchStepSizeConstraintFunc(long, bool) defined here #0 0x10239bcee in FlagAccessImpl_int::typed_check_constraint(void*, int, bool) const jvmFlagAccess.cpp:157 #1 0x1023a53d7 in JVMFlagLimit::check_all_constraints(JVMFlagConstraintPhase) jvmFlagLimit.cpp:179 #2 0x102ee640b in universe_init() universe.cpp:875 #3 0x10213ee27 in init_globals() init.cpp:128 #4 0x102ea0d69 in Threads::create_vm(JavaVMInitArgs*, bool*) threads.cpp:553 #5 0x10230c3fb in JNI_CreateJavaVM_inner(JavaVM_**, void**, void*) jni.cpp:3581 #6 0x10041271c in JavaMain java.c:491 #7 0x100419248 in ThreadJavaMain java_md_macosx.m:720 #8 0x7fff204338fb in _pthread_start+0xdf (libsystem_pthread.dylib:x86_64+0x68fb) #9 0x7fff2042f442 in thread_start+0xe (libsystem_pthread.dylib:x86_64+0x2442) ------------- Commit messages: - JDK-8331428 Changes: https://git.openjdk.org/jdk/pull/19074/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19074&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8331428 Stats: 11 lines in 4 files changed: 0 ins; 0 del; 11 mod Patch: https://git.openjdk.org/jdk/pull/19074.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19074/head:pull/19074 PR: https://git.openjdk.org/jdk/pull/19074 From stefank at openjdk.org Fri May 3 07:49:52 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Fri, 3 May 2024 07:49:52 GMT Subject: RFR: 8331428: ubsan: JVM flag checking complains about MaxTenuringThresholdConstraintFunc, InitialTenuringThresholdConstraintFunc and AllocatePrefetchStepSizeConstraintFunc In-Reply-To: References: Message-ID: On Fri, 3 May 2024 07:32:35 GMT, Matthias Baesken wrote: > Seems MaxTenuringThresholdConstraintFunc, InitialTenuringThresholdConstraintFunc and AllocatePrefetchStepSizeConstraintFunc check uint values (see gc_globals.hpp). However those functions have uintx in the check functions. > This causes Ubsan to complain : > > /jdk/src/hotspot/share/runtime/flags/jvmFlagAccess.cpp:176:12: runtime error: call to function MaxTenuringThresholdConstraintFunc(unsigned long, bool) through pointer to incorrect function type 'JVMFlag::Error (*)(unsigned int, bool)' > jvmFlagConstraintsGC.cpp:188: note: MaxTenuringThresholdConstraintFunc(unsigned long, bool) defined here > #0 0x10541cfbe in FlagAccessImpl_uint::typed_check_constraint(void*, unsigned int, bool) const jvmFlagAccess.cpp:176 > #1 0x1054253d7 in JVMFlagLimit::check_all_constraints(JVMFlagConstraintPhase) jvmFlagLimit.cpp:179 > #2 0x105f20b98 in Threads::create_vm(JavaVMInitArgs*, bool*) threads.cpp:471 > #3 0x10538c3fb in JNI_CreateJavaVM_inner(JavaVM_**, void**, void*) jni.cpp:3581 > #4 0x10342e71c in JavaMain java.c:491 > #5 0x103435248 in ThreadJavaMain java_md_macosx.m:720 > #6 0x7fff204338fb in _pthread_start+0xdf (libsystem_pthread.dylib:x86_64+0x68fb) > #7 0x7fff2042f442 in thread_start+0xe (libsystem_pthread.dylib:x86_64+0x2442) > > /jdk/src/hotspot/share/runtime/flags/jvmFlagAccess.cpp:176:12: runtime error: call to function InitialTenuringThresholdConstraintFunc(unsigned long, bool) through pointer to incorrect function type 'JVMFlag::Error (*)(unsigned int, bool)' > jvmFlagConstraintsGC.cpp:177: note: InitialTenuringThresholdConstraintFunc(unsigned long, bool) defined here > #0 0x117b1cfbe in FlagAccessImpl_uint::typed_check_constraint(void*, unsigned int, bool) const jvmFlagAccess.cpp:176 > #1 0x117b253d7 in JVMFlagLimit::check_all_constraints(JVMFlagConstraintPhase) jvmFlagLimit.cpp:179 > #2 0x118620b98 in Threads::create_vm(JavaVMInitArgs*, bool*) threads.cpp:471 > #3 0x117a8c3fb in JNI_CreateJavaVM_inner(JavaVM_**, void**, void*) jni.cpp:3581 > #4 0x10077e71c in JavaMain java.c:491 > #5 0x100785248 in ThreadJavaMain java_md_macosx.m:720 > #6 0x7fff204338fb in _pthread_start+0xdf (libsystem_pthread.dylib:x86_64+0x68fb) > #7 0x7fff2042f442 in thread_start+0xe (libsystem_pthread.dylib:x86_64+0x2442) > > and > > /jdk/src/hotspot/share/runtime/flags/jvmFlagAccess.cpp:157:12: runtime error: call to function AllocatePrefetchStepSizeConstraintFunc(long, bool) through pointer to incorrect function type 'JVMFlag::Error (*)(int, bool)' > jvmFlagConstrain... Marked as reviewed by stefank (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/19074#pullrequestreview-2037589801 From aboldtch at openjdk.org Fri May 3 07:59:51 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Fri, 3 May 2024 07:59:51 GMT Subject: RFR: 8331428: ubsan: JVM flag checking complains about MaxTenuringThresholdConstraintFunc, InitialTenuringThresholdConstraintFunc and AllocatePrefetchStepSizeConstraintFunc In-Reply-To: References: Message-ID: On Fri, 3 May 2024 07:32:35 GMT, Matthias Baesken wrote: > Seems MaxTenuringThresholdConstraintFunc, InitialTenuringThresholdConstraintFunc and AllocatePrefetchStepSizeConstraintFunc check uint values (see gc_globals.hpp). However those functions have uintx in the check functions. > This causes Ubsan to complain : > > /jdk/src/hotspot/share/runtime/flags/jvmFlagAccess.cpp:176:12: runtime error: call to function MaxTenuringThresholdConstraintFunc(unsigned long, bool) through pointer to incorrect function type 'JVMFlag::Error (*)(unsigned int, bool)' > jvmFlagConstraintsGC.cpp:188: note: MaxTenuringThresholdConstraintFunc(unsigned long, bool) defined here > #0 0x10541cfbe in FlagAccessImpl_uint::typed_check_constraint(void*, unsigned int, bool) const jvmFlagAccess.cpp:176 > #1 0x1054253d7 in JVMFlagLimit::check_all_constraints(JVMFlagConstraintPhase) jvmFlagLimit.cpp:179 > #2 0x105f20b98 in Threads::create_vm(JavaVMInitArgs*, bool*) threads.cpp:471 > #3 0x10538c3fb in JNI_CreateJavaVM_inner(JavaVM_**, void**, void*) jni.cpp:3581 > #4 0x10342e71c in JavaMain java.c:491 > #5 0x103435248 in ThreadJavaMain java_md_macosx.m:720 > #6 0x7fff204338fb in _pthread_start+0xdf (libsystem_pthread.dylib:x86_64+0x68fb) > #7 0x7fff2042f442 in thread_start+0xe (libsystem_pthread.dylib:x86_64+0x2442) > > /jdk/src/hotspot/share/runtime/flags/jvmFlagAccess.cpp:176:12: runtime error: call to function InitialTenuringThresholdConstraintFunc(unsigned long, bool) through pointer to incorrect function type 'JVMFlag::Error (*)(unsigned int, bool)' > jvmFlagConstraintsGC.cpp:177: note: InitialTenuringThresholdConstraintFunc(unsigned long, bool) defined here > #0 0x117b1cfbe in FlagAccessImpl_uint::typed_check_constraint(void*, unsigned int, bool) const jvmFlagAccess.cpp:176 > #1 0x117b253d7 in JVMFlagLimit::check_all_constraints(JVMFlagConstraintPhase) jvmFlagLimit.cpp:179 > #2 0x118620b98 in Threads::create_vm(JavaVMInitArgs*, bool*) threads.cpp:471 > #3 0x117a8c3fb in JNI_CreateJavaVM_inner(JavaVM_**, void**, void*) jni.cpp:3581 > #4 0x10077e71c in JavaMain java.c:491 > #5 0x100785248 in ThreadJavaMain java_md_macosx.m:720 > #6 0x7fff204338fb in _pthread_start+0xdf (libsystem_pthread.dylib:x86_64+0x68fb) > #7 0x7fff2042f442 in thread_start+0xe (libsystem_pthread.dylib:x86_64+0x2442) > > and > > /jdk/src/hotspot/share/runtime/flags/jvmFlagAccess.cpp:157:12: runtime error: call to function AllocatePrefetchStepSizeConstraintFunc(long, bool) through pointer to incorrect function type 'JVMFlag::Error (*)(int, bool)' > jvmFlagConstrain... Marked as reviewed by aboldtch (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/19074#pullrequestreview-2037605908 From john.r.rose at oracle.com Fri May 3 08:16:38 2024 From: john.r.rose at oracle.com (John Rose) Date: Fri, 03 May 2024 01:16:38 -0700 Subject: RFR: 8330532: Improve line-oriented text parsing in HotSpot [v3] In-Reply-To: References: <4__55RnizjcZwBGgP4QlfXXX6HBzn5jbRn_xrRPE4uM=.994bc41d-4bb3-4b63-b6dc-b533b598d0a6@github.com> <2K-VA9DRH9DAgDL9HB__STvlnE0gSBRjPNU3NLOrZT0=.7ee74867-cf57-4c13-bd54-751425d2793a@github.com> Message-ID: <29531CC3-8FF9-428D-A981-0608DBF2E52F@oracle.com> On 1 May 2024, at 13:30, Ioi Lam wrote: > I am also leaning towards removing the `close()` call. Otherwise it would be unsymmetrical - the `inputStream` doesn't open the `_input` automatically, but it will close it automatically for us. > > It seems better to leave both the `open` and `close` to the caller the `inputStream`. > > @rose00 what do you think? You guys are right; the caller should deal with any ?close?. And that removes the need for a virtual close method, I think? In which case, yes, ditch close() completely. The more important API point is set_input. Glad that?s staying. From gli at openjdk.org Fri May 3 08:20:53 2024 From: gli at openjdk.org (Guoxiong Li) Date: Fri, 3 May 2024 08:20:53 GMT Subject: RFR: 8331573: Rename CollectedHeap::is_gc_active to be explicitly about STW GCs In-Reply-To: References: Message-ID: On Thu, 2 May 2024 14:40:35 GMT, Aleksey Shipilev wrote: > `CollectedHeap::is_gc_active()` is confusing, since its name implies _any_ GC phase is running, while it actually only covers the STW GCs. It would be good to rename it for clarity. The freed-up name, `is_gc_active` could then be repurposed to track any (concurrent or STW) GC phase running. That would be useful to resolve [JDK-8331572](https://bugs.openjdk.org/browse/JDK-8331572). > > Doing this rename separately guarantees we have caught and renamed all current uses. > > Additional testing: > - [ ] Linux AArch64 server fastdebug, `all` Looks good. ------------- Marked as reviewed by gli (Committer). PR Review: https://git.openjdk.org/jdk/pull/19064#pullrequestreview-2037637061 From tschatzl at openjdk.org Fri May 3 08:26:00 2024 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Fri, 3 May 2024 08:26:00 GMT Subject: RFR: 8331428: ubsan: JVM flag checking complains about MaxTenuringThresholdConstraintFunc, InitialTenuringThresholdConstraintFunc and AllocatePrefetchStepSizeConstraintFunc In-Reply-To: References: Message-ID: On Fri, 3 May 2024 07:32:35 GMT, Matthias Baesken wrote: > Seems MaxTenuringThresholdConstraintFunc, InitialTenuringThresholdConstraintFunc and AllocatePrefetchStepSizeConstraintFunc check uint values (see gc_globals.hpp). However those functions have uintx in the check functions. > This causes Ubsan to complain : > > /jdk/src/hotspot/share/runtime/flags/jvmFlagAccess.cpp:176:12: runtime error: call to function MaxTenuringThresholdConstraintFunc(unsigned long, bool) through pointer to incorrect function type 'JVMFlag::Error (*)(unsigned int, bool)' > jvmFlagConstraintsGC.cpp:188: note: MaxTenuringThresholdConstraintFunc(unsigned long, bool) defined here > #0 0x10541cfbe in FlagAccessImpl_uint::typed_check_constraint(void*, unsigned int, bool) const jvmFlagAccess.cpp:176 > #1 0x1054253d7 in JVMFlagLimit::check_all_constraints(JVMFlagConstraintPhase) jvmFlagLimit.cpp:179 > #2 0x105f20b98 in Threads::create_vm(JavaVMInitArgs*, bool*) threads.cpp:471 > #3 0x10538c3fb in JNI_CreateJavaVM_inner(JavaVM_**, void**, void*) jni.cpp:3581 > #4 0x10342e71c in JavaMain java.c:491 > #5 0x103435248 in ThreadJavaMain java_md_macosx.m:720 > #6 0x7fff204338fb in _pthread_start+0xdf (libsystem_pthread.dylib:x86_64+0x68fb) > #7 0x7fff2042f442 in thread_start+0xe (libsystem_pthread.dylib:x86_64+0x2442) > > /jdk/src/hotspot/share/runtime/flags/jvmFlagAccess.cpp:176:12: runtime error: call to function InitialTenuringThresholdConstraintFunc(unsigned long, bool) through pointer to incorrect function type 'JVMFlag::Error (*)(unsigned int, bool)' > jvmFlagConstraintsGC.cpp:177: note: InitialTenuringThresholdConstraintFunc(unsigned long, bool) defined here > #0 0x117b1cfbe in FlagAccessImpl_uint::typed_check_constraint(void*, unsigned int, bool) const jvmFlagAccess.cpp:176 > #1 0x117b253d7 in JVMFlagLimit::check_all_constraints(JVMFlagConstraintPhase) jvmFlagLimit.cpp:179 > #2 0x118620b98 in Threads::create_vm(JavaVMInitArgs*, bool*) threads.cpp:471 > #3 0x117a8c3fb in JNI_CreateJavaVM_inner(JavaVM_**, void**, void*) jni.cpp:3581 > #4 0x10077e71c in JavaMain java.c:491 > #5 0x100785248 in ThreadJavaMain java_md_macosx.m:720 > #6 0x7fff204338fb in _pthread_start+0xdf (libsystem_pthread.dylib:x86_64+0x68fb) > #7 0x7fff2042f442 in thread_start+0xe (libsystem_pthread.dylib:x86_64+0x2442) > > and > > /jdk/src/hotspot/share/runtime/flags/jvmFlagAccess.cpp:157:12: runtime error: call to function AllocatePrefetchStepSizeConstraintFunc(long, bool) through pointer to incorrect function type 'JVMFlag::Error (*)(int, bool)' > jvmFlagConstrain... Marked as reviewed by tschatzl (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/19074#pullrequestreview-2037644438 From jsjolen at openjdk.org Fri May 3 08:28:06 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Fri, 3 May 2024 08:28:06 GMT Subject: RFR: 8312132: Add tracking of multiple address spaces in NMT [v46] In-Reply-To: <8Pkr2lOm0YS7yPAEZooSGXR1WhOwyDkv2ej0qxCOKp4=.513c6399-f24e-4145-bcc9-e19eb0243949@github.com> References: <8Pkr2lOm0YS7yPAEZooSGXR1WhOwyDkv2ej0qxCOKp4=.513c6399-f24e-4145-bcc9-e19eb0243949@github.com> Message-ID: On Thu, 2 May 2024 14:38:43 GMT, Thomas Stuefe wrote: >> Let's wait with this until we actually port over the `VirtualMemoryTracker` to use `VMATree`. > > I think we should rethink recording specific stacks for uncommitted memory. I don't believe anyone cares who reserves uncommitted memory; or who uncommits memory. And this only leads to splintering the tree, if we uncommit from different callsites. We should consider keeping stacks for committed memory only, and use some noop stack placeholder for uncommitted mmeory. The issue, as I see it, is that we think of committing memory as a "layering" on top of reserving memory, and when that commit goes away the underlying layer of reserved memory is exposed again. In our VMATree, we don't store that underlying reservation anymore. So what to do? If we add callstack and MEMFLAGS for uncommitting memory then that's an easy solution. The best would be to keep VMT's semantics here. We can do that, if the metadata stored is doubled in size per node and we recognise this pattern. Still, I'll re-iterate: This is a problem for tomorrow, when we do port VMT. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18289#discussion_r1588899271 From dnsimon at openjdk.org Fri May 3 08:28:56 2024 From: dnsimon at openjdk.org (Doug Simon) Date: Fri, 3 May 2024 08:28:56 GMT Subject: RFR: 8331208: Memory stress test that checks OutOfMemoryError stack trace fails In-Reply-To: References: Message-ID: On Wed, 1 May 2024 09:04:11 GMT, David Holmes wrote: >> This pull request mitigates failures in memory stress tests that check the stack trace of an `OutOfMemoryError` for certain expected entries. >> >> The stack trace of an OOME will [not be allocated once all preallocated OOMEs are used up](https://github.com/openjdk/jdk/blob/3d5eeac3a38ece4a23ea6da2dfe5939d64e81cea/src/hotspot/share/memory/universe.cpp#L722). If the only heap allocations performed in stressful conditions are those of the stress test, then the [4 preallocated OOMEs](https://github.com/openjdk/jdk/blob/f1d0e715b67e2ca47b525069d8153abbb33f75b9/src/hotspot/share/runtime/globals.hpp#L800) would be sufficient. However, it's possible for VM internal allocations to also occur during stressful conditions, especially in `-Xcomp` mode. For example, [CompileBroker::compile_method](https://github.com/openjdk/jdk/blob/3d5eeac3a38ece4a23ea6da2dfe5939d64e81cea/src/hotspot/share/compiler/compileBroker.cpp#L1399) will try to resolve the string constants in the constant pool of the method about to be compiled. This can fail as shown here: >> >> V [jvm.dll+0x62c23a] Exceptions::_throw+0x11a (exceptions.cpp:168) >> V [jvm.dll+0x62d85b] Exceptions::_throw_oop+0xab (exceptions.cpp:140) >> V [jvm.dll+0xbbce78] MemAllocator::Allocation::check_out_of_memory+0x208 (memAllocator.cpp:138) >> V [jvm.dll+0xbbcac8] MemAllocator::allocate+0x158 (memAllocator.cpp:377) >> V [jvm.dll+0x79bd05] InstanceKlass::allocate_instance+0x95 (instanceKlass.cpp:1509) >> V [jvm.dll+0x7ddeed] java_lang_String::basic_create+0x9d (javaClasses.cpp:273) >> V [jvm.dll+0x7e43c0] java_lang_String::create_from_unicode+0x60 (javaClasses.cpp:291) >> V [jvm.dll+0xdb91a5] StringTable::do_intern+0xb5 (stringTable.cpp:379) >> V [jvm.dll+0xdba9f2] StringTable::intern+0x1b2 (stringTable.cpp:368) >> V [jvm.dll+0xdbaaa6] StringTable::intern+0x86 (stringTable.cpp:328) >> V [jvm.dll+0x51c8b1] ConstantPool::string_at_impl+0x1d1 (constantPool.cpp:1251) >> V [jvm.dll+0x51b95b] ConstantPool::resolve_string_constants_impl+0xeb (constantPool.cpp:800) >> V [jvm.dll+0x4f2f8d] CompileBroker::compile_method+0x31d (compileBroker.cpp:1395) >> V [jvm.dll+0x4f3474] CompileBroker::compile_method+0xc4 (compileBroker.cpp:1348) >> >> These internal allocations can occur before the allocations of the test and thus use up the pre-allocated OOMEs. As a result, the OOMEs triggered by the stress test may end up throwing the [default, shared OOME instance](https://github.com/openjdk/jdk/blob/3d5eeac3a38ec... > > I don't think "sandbox" fits in this context: >> Sandboxing is a security practice in which you use an isolated environment, or a ?sandbox,? for testing. Within the sandbox you run code, analyze the code in a safe, isolated environment without affecting the application, system or platform. Any remaining concerns @dholmes-ora ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/18925#issuecomment-2092545887 From jkern at openjdk.org Fri May 3 08:34:57 2024 From: jkern at openjdk.org (Joachim Kern) Date: Fri, 3 May 2024 08:34:57 GMT Subject: Integrated: 8330539: Use #include instead of -Dalloca'(size)'=__builtin_alloca'(size)' for AIX In-Reply-To: References: Message-ID: On Thu, 2 May 2024 09:54:14 GMT, Joachim Kern wrote: > We need to find a better way to handle alloca on AIX. > > See the discussion in the PR for https://bugs.openjdk.org/browse/JDK-8329257, e.g. https://github.com/openjdk/jdk/pull/18536#discussion_r1568650313 in which three alternatives are suggested. Quoting: > > Let me summarize the choices we have and ask for your vote. > Magnus dislikes the -Dalloca'(size)'=__builtin_alloca'(size)' in flags-cflags.m4 I introduced to get rid of > > #if defined(_AIX) > #include > #endif > > in globalDefinitions_gcc.hpp. > > We have four possible solutions > > 1. Reintroduce > > #if defined(_AIX) > #include > #endif > > in globalDefinitions_gcc.hpp. > > 2. Unconditionally introduce only #include in globalDefinitions_gcc.hpp. This should work for all platforms using this header including the unofficial Windows/gcc Port, although only AIX needs it. > > 3. Add > > #if defined(_AIX) > #include > #endif > > to the sources using alloca(). These are > /hotspot/share/runtime/os.cpp > /hotspot/share/runtime/javaThread.cpp > /hotspot/share/utilities/vmError.cpp > Here we need the AIX condition, because otherwise the classic Windows Build (NTAMD64) fails. > > 4. Replace -Dalloca'(size)'=__builtin_alloca'(size)' in flags-cflags.m4 by -U__STRICT_ANSI__ at the same place. Explanation can also found in https://github.com/openjdk/jdk/pull/18536#discussion_r1583360569 and following. > > I will implement the solution with the most likes and having no dislike. This pull request has now been integrated. Changeset: a10845b5 Author: Joachim Kern Committer: Martin Doerr URL: https://git.openjdk.org/jdk/commit/a10845b553fc6fe7e06a0f37ce73fe5f704dc7c4 Stats: 2 lines in 2 files changed: 1 ins; 0 del; 1 mod 8330539: Use #include instead of -Dalloca'(size)'=__builtin_alloca'(size)' for AIX Reviewed-by: jwaters, mdoerr, kbarrett, ihse ------------- PR: https://git.openjdk.org/jdk/pull/19053 From sspitsyn at openjdk.org Fri May 3 09:02:54 2024 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Fri, 3 May 2024 09:02:54 GMT Subject: RFR: 8330146: assert(!_thread->is_in_any_VTMS_transition()) failed In-Reply-To: References: Message-ID: <4W4NxkWeMMZng4oTDBWpJp3glMe-fy0onT-G-KRi-Q0=.28368d7b-0219-41bd-a37b-69d5f8880808@github.com> On Fri, 3 May 2024 06:49:09 GMT, Alan Bateman wrote: > It seems maybe we should instead be trying to avoid these events by preloading the classes as was suggested as an option in the CR. The problem is that all such cases are unknown, so it is just not realistic. As Alan commented, this exact debugging option will be removed soon. The other cases are extremely rare because they were not identified with the assert in place (in fact, these assert existed for a couple of releases for exactly this purpose). However, such classes can cause real problems (deadlocks and other issues as we see) if they are not ignored by `CFLH`, `ClassPrepare` and `ClassLoad` posting code. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19054#issuecomment-2092596008 From jsjolen at openjdk.org Fri May 3 10:13:22 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Fri, 3 May 2024 10:13:22 GMT Subject: RFR: 8312132: Add tracking of multiple address spaces in NMT [v57] In-Reply-To: References: Message-ID: > Hi, > > This PR introduces a new abstraction to NMT, named `MemoryFileTracker`. Today, NMT does not track any memory outside of the virtual memory address space. This means that if you allocated memory in something such as a memory-backed file and use `mmap` to map into that memory, then you'll have trouble reporting this to NMT. This is the situation that ZGC is in, and that is what this patch attempts to fix. > > ## `MemoryFileTracker` > > The `MemoryFileTracker` adds the ability of adding new virtual memory address spaces to NMT and committing memory to these, the basic API is: > > ```c++ > static MemoryFile* make_device(const char* descriptive_name); > static void free_device(MemoryFile* device); > > static void allocate_memory(MemoryFile* device, size_t offset, size_t size, > MEMFLAGS flag, const NativeCallStack& stack); > static void free_memory(MemoryFile* device, size_t offset, size_t size); > > > It is easiest to see how this is used by looking at what ZGC's `ZNMT` class does: > > ```c++ > void ZNMT::reserve(zaddress_unsafe start, size_t size) { > MemTracker::record_virtual_memory_reserve((address)start, size, CALLER_PC, mtJavaHeap); > } > void ZNMT::commit(zoffset offset, size_t size) { > MemTracker::allocate_memory_in(ZNMT::_device, static_cast(offset), size, mtJavaHeap, CALLER_PC); > } > void ZNMT::uncommit(zoffset offset, size_t size) { > MemTracker::free_memory_in(ZNMT::_device, (size_t)offset, size); > } > > void ZNMT::map(zaddress_unsafe addr, size_t size, zoffset offset) { > // NMT doesn't track mappings at the moment. > } > void ZNMT::unmap(zaddress_unsafe addr, size_t size) { > // NMT doesn't track mappings at the moment. > } > > > As you can see, any mapping between reserved regions and device-allocated memory is not recorded in NMT. This means that in detailed mode you only get reserved regions printed for the reserved memory, the device-allocated memory is reported separately. When performing summary reporting any memory allocated via these devices is added to the corresponding `MEMFLAGS` as `committed` memory. > > This patch is also acting as a base on which we deploy multiple new backend ideas to NMT. These ideas are: > > 1. Implement VMA tracking using a balanced binary tree approach. Today's `VirtualMemoryTracker`'s usage of linked lists is slow and brittle, we'd like to move away from it. Our Treap-based approach in this patch gives a performance boost such that we see 25x better performance in a benchmark. The idea and draft of this... Johan Sj?len has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 119 commits: - Merge remote-tracking branch 'openjdk/master' into nmt-physical-device - Seed from os::random - verify_self - typedef size_t into position - Use prime number for number of buckets - Constify Metadata - Another check - assert device != nullptr in MemoryFileTracker::instance - Explicitly handle 0-sized mappings as no-ops - Missing return in upsert causes duplicate keys - ... and 109 more: https://git.openjdk.org/jdk/compare/a10845b5...45bcdaba ------------- Changes: https://git.openjdk.org/jdk/pull/18289/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=18289&range=56 Stats: 1825 lines in 21 files changed: 1717 ins; 85 del; 23 mod Patch: https://git.openjdk.org/jdk/pull/18289.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18289/head:pull/18289 PR: https://git.openjdk.org/jdk/pull/18289 From azafari at openjdk.org Fri May 3 10:14:04 2024 From: azafari at openjdk.org (Afshin Zafari) Date: Fri, 3 May 2024 10:14:04 GMT Subject: RFR: 8331540: [BACKOUT] NMT: add/make a mandatory MEMFLAGS argument to family of os::reserve/commit/uncommit memory API Message-ID: reverted the changes. ------------- Commit messages: - 8330076: [BACKOUT] NMT: add/make a mandatory MEMFLAGS argument to family of os::reserve/commit/uncommit memory API Changes: https://git.openjdk.org/jdk/pull/19080/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19080&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8331540 Stats: 449 lines in 62 files changed: 51 ins; 29 del; 369 mod Patch: https://git.openjdk.org/jdk/pull/19080.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19080/head:pull/19080 PR: https://git.openjdk.org/jdk/pull/19080 From jwilhelm at openjdk.org Fri May 3 10:19:56 2024 From: jwilhelm at openjdk.org (Jesper Wilhelmsson) Date: Fri, 3 May 2024 10:19:56 GMT Subject: RFR: 8331540: [BACKOUT] NMT: add/make a mandatory MEMFLAGS argument to family of os::reserve/commit/uncommit memory API In-Reply-To: References: Message-ID: On Fri, 3 May 2024 10:09:06 GMT, Afshin Zafari wrote: > reverted the changes. Looks good. ------------- Marked as reviewed by jwilhelm (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19080#pullrequestreview-2037833419 From azafari at openjdk.org Fri May 3 10:19:56 2024 From: azafari at openjdk.org (Afshin Zafari) Date: Fri, 3 May 2024 10:19:56 GMT Subject: RFR: 8331540: [BACKOUT] NMT: add/make a mandatory MEMFLAGS argument to family of os::reserve/commit/uncommit memory API In-Reply-To: References: Message-ID: On Fri, 3 May 2024 10:14:59 GMT, Jesper Wilhelmsson wrote: >> reverted the changes. > > Looks good. Thanks @JesperIRL. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19080#issuecomment-2092712777 From azafari at openjdk.org Fri May 3 10:19:56 2024 From: azafari at openjdk.org (Afshin Zafari) Date: Fri, 3 May 2024 10:19:56 GMT Subject: Integrated: 8331540: [BACKOUT] NMT: add/make a mandatory MEMFLAGS argument to family of os::reserve/commit/uncommit memory API In-Reply-To: References: Message-ID: On Fri, 3 May 2024 10:09:06 GMT, Afshin Zafari wrote: > reverted the changes. This pull request has now been integrated. Changeset: f665e07a Author: Afshin Zafari URL: https://git.openjdk.org/jdk/commit/f665e07ab223bdabb6cf3f653f799913d874bc55 Stats: 449 lines in 62 files changed: 51 ins; 29 del; 369 mod 8331540: [BACKOUT] NMT: add/make a mandatory MEMFLAGS argument to family of os::reserve/commit/uncommit memory API Reviewed-by: jwilhelm ------------- PR: https://git.openjdk.org/jdk/pull/19080 From sspitsyn at openjdk.org Fri May 3 10:33:53 2024 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Fri, 3 May 2024 10:33:53 GMT Subject: RFR: 8328083: degrade virtual thread support for GetObjectMonitorUsage [v2] In-Reply-To: References: <-lAT5GzHVRrOUJhhMLfV5CkkPA3DHHDUZfdE7CBOcHg=.ecb91a2d-82c2-4e94-a1f6-f84d7a8c2a87@github.com> Message-ID: On Fri, 3 May 2024 06:40:07 GMT, Alan Bateman wrote: >> src/hotspot/share/prims/jvmti.xml line 8280: >> >>> 8278: >>> 8279: The number of platform threads waiting to own this monitor, or 0 >>> 8280: if only virtual threads are waiting or no threads are waiting >> >> This is now exactly the same as `waiter_count` above. I don't think this is what you intended. > > Indeed, looks like the description for waiter_count has been pasted in here in error. Thank you. Fixed as below: diff --git a/src/hotspot/share/prims/jvmti.xml b/src/hotspot/share/prims/jvmti.xml index d382a02178e..3bcf15466d7 100644 --- a/src/hotspot/share/prims/jvmti.xml +++ b/src/hotspot/share/prims/jvmti.xml @@ -8277,7 +8277,8 @@ class C2 extends C1 implements I2 { The number of platform threads waiting to own this monitor, or 0 - if only virtual threads are waiting or no threads are waiting + if only virtual threads are waiting to be notified or no threads are waiting + to be notified ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19030#discussion_r1589031880 From sspitsyn at openjdk.org Fri May 3 10:53:04 2024 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Fri, 3 May 2024 10:53:04 GMT Subject: RFR: 8328083: degrade virtual thread support for GetObjectMonitorUsage [v3] In-Reply-To: <-lAT5GzHVRrOUJhhMLfV5CkkPA3DHHDUZfdE7CBOcHg=.ecb91a2d-82c2-4e94-a1f6-f84d7a8c2a87@github.com> References: <-lAT5GzHVRrOUJhhMLfV5CkkPA3DHHDUZfdE7CBOcHg=.ecb91a2d-82c2-4e94-a1f6-f84d7a8c2a87@github.com> Message-ID: <2ou2F-YXLZM2QCxGp86nhs-GBwDG4hfHvwrKMRxal84=.0c26f8c5-99fb-4b84-b19f-490e68c8c4fa@github.com> > The fix is to degrade virtual threads support in the JVM TI `GetObjectMonitorUsage` function so that it is specified to only return an owner when the owner is a platform thread. Also, virtual threads are not listed in the both `waiters` and `notify_waiters` lists returned in the `jvmtiMonitorUsage` structure. Java 19 re-specified a number of JVMTI functions and events for virtual threads, we missed this one. > > The main motivation for degrading it now is that the object monitor implementation is being updated to allow virtual threads unmount while owning monitors. It would add overhead to record monitor usage when freezing/unmount, overhead that couldn't be tied to a JVMTI capability as the capability can be enabled at any time. > > `GetObjectMonitorUsage` was broken for 20+ years ([8247972](https://bugs.openjdk.org/browse/JDK-8247972)) without bug reports so it seems unlikely that the function is widely used. Degrading it to only return an owner when the owner is a platform thread has no compatibility impact for tooling that uses it in conjunction with `HotSpot` thread dumps or `ThreadMXBean`. > > One other point about `GetObjectMonitorUsage` is that it pre-dates j.u.concurrent in Java 5 so it can't be used to get a full picture of the lock usage in a program. > > The specs of the impacted `JDWP ObjectReference.MonitorInfo` command and the JDI `ObjectReference` `ownerThread()`, `waitingThreads()` and `entryCount()` methods are updated to match the JVM TI spec. > > Also, please, review the related CSR and Release Note: > - CSR: [8331422](https://bugs.openjdk.org/browse/JDK-8331422): degrade virtual thread support for GetObjectMonitorUsage > - RN: [8331465](https://bugs.openjdk.org/browse/JDK-8331465): Release Note: degrade virtual thread support for GetObjectMonitorUsage > > Testing: > - tested impacted and updated tests locally > - tested with mach5 tiers 1-6 Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: review: tweaks in JVMTI and JDWP changes ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19030/files - new: https://git.openjdk.org/jdk/pull/19030/files/7465f064..e7c2d652 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19030&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19030&range=01-02 Stats: 3 lines in 2 files changed: 1 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/19030.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19030/head:pull/19030 PR: https://git.openjdk.org/jdk/pull/19030 From sspitsyn at openjdk.org Fri May 3 10:53:04 2024 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Fri, 3 May 2024 10:53:04 GMT Subject: RFR: 8328083: degrade virtual thread support for GetObjectMonitorUsage [v3] In-Reply-To: <2lhm2l4CzUnyStTj215njaZg9EcMwwKWxMxtdZTXD8I=.ba8b1275-f16c-4af4-80e5-81ace9b40aa2@github.com> References: <-lAT5GzHVRrOUJhhMLfV5CkkPA3DHHDUZfdE7CBOcHg=.ecb91a2d-82c2-4e94-a1f6-f84d7a8c2a87@github.com> <2A25kL9oqh30aBRofiekO9CwmSwgEZ5LEcReUEfmxrQ=.eec2eaf8-dc9a-4a0d-bb42-d9f192f72fb2@github.com> <2lhm2l4CzUnyStTj215njaZg9EcMwwKWxMxtdZTXD8I=.ba8b1275-f16c-4af4-80e5-81ace9b40aa2@github.com> Message-ID: On Thu, 2 May 2024 21:47:50 GMT, Chris Plummer wrote: >> expEnteringCount/expWaitingCount contain the tested patterns. I don't see why they can't just replace NUMBER_OF_ENTERING_THREADS/NUMBER_OF_WAITING_THREADS in the comments also. In fact it is confusing if you don't because code right below the comments references expEnteringCount/expWaitingCount, not NUMBER_OF_ENTERING_THREADS/NUMBER_OF_WAITING_THREADS. > > ...and there are also comments above with this issue. > expEnteringCount/expWaitingCount contain the tested patterns. I kind of disagree. Please, take look at the loop below: for (int i = 0; i < NUMBER_OF_WAITING_THREADS; i++) { expEnteringCount = isVirtual ? 0 : NUMBER_OF_ENTERING_THREADS + i + 1; expWaitingCount = isVirtual ? 0 : NUMBER_OF_WAITING_THREADS - i - 1; lockCheck.notify(); // notify waiting threads one by one // now the notified WaitingTask has to be blocked on the lockCheck re-enter // entry count: 1 // count of threads waiting to enter: NUMBER_OF_ENTERING_THREADS // count of threads waiting to re-enter: i + 1 // count of threads waiting to be notified: NUMBER_OF_WAITING_THREADS - i - 1 check(lockCheck, expOwnerThread(), expEntryCount(), expEnteringCount, expWaitingCount); } The comment fixed as you suggest does not look useful anymore as the tested pattern is lost: // entry count: expOwnerThread() // count of threads waiting to enter: expEnteringCount // count of threads waiting to re-enter: expEntryCount() // count of threads waiting to be notified: expWaitingCount check(lockCheck, expOwnerThread(), expEntryCount(), expEnteringCount, expWaitingCount); } I understand your concern but your suggestion is not that good. We could remove these comments but the tested pattern will be thrown away with the comments. Would it help if we add clarifications that the comments are correct for platform threads only? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19030#discussion_r1589041287 From kevinw at openjdk.org Fri May 3 11:25:08 2024 From: kevinw at openjdk.org (Kevin Walls) Date: Fri, 3 May 2024 11:25:08 GMT Subject: RFR: 8314225: SIGSEGV in JavaThread::is_lock_owned [v7] In-Reply-To: References: Message-ID: > Removal of JavaThread's MonitorChunks member. This held lock information during deoptimization, but access to it is unnecessary for anything other than the deoptimization itself. > > Access to it in is_lock_owned() was racy, and caused rare crashes. Kevin Walls has updated the pull request incrementally with one additional commit since the last revision: JavaThread comment update and synchronizer check before cast ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18940/files - new: https://git.openjdk.org/jdk/pull/18940/files/54086ccd..2989ad4c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18940&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18940&range=05-06 Stats: 2 lines in 2 files changed: 1 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/18940.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18940/head:pull/18940 PR: https://git.openjdk.org/jdk/pull/18940 From jsjolen at openjdk.org Fri May 3 11:55:14 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Fri, 3 May 2024 11:55:14 GMT Subject: RFR: 8312132: Add tracking of multiple address spaces in NMT [v58] In-Reply-To: References: Message-ID: > Hi, > > This PR introduces a new abstraction to NMT, named `MemoryFileTracker`. Today, NMT does not track any memory outside of the virtual memory address space. This means that if you allocated memory in something such as a memory-backed file and use `mmap` to map into that memory, then you'll have trouble reporting this to NMT. This is the situation that ZGC is in, and that is what this patch attempts to fix. > > ## `MemoryFileTracker` > > The `MemoryFileTracker` adds the ability of adding new virtual memory address spaces to NMT and committing memory to these, the basic API is: > > ```c++ > static MemoryFile* make_device(const char* descriptive_name); > static void free_device(MemoryFile* device); > > static void allocate_memory(MemoryFile* device, size_t offset, size_t size, > MEMFLAGS flag, const NativeCallStack& stack); > static void free_memory(MemoryFile* device, size_t offset, size_t size); > > > It is easiest to see how this is used by looking at what ZGC's `ZNMT` class does: > > ```c++ > void ZNMT::reserve(zaddress_unsafe start, size_t size) { > MemTracker::record_virtual_memory_reserve((address)start, size, CALLER_PC, mtJavaHeap); > } > void ZNMT::commit(zoffset offset, size_t size) { > MemTracker::allocate_memory_in(ZNMT::_device, static_cast(offset), size, mtJavaHeap, CALLER_PC); > } > void ZNMT::uncommit(zoffset offset, size_t size) { > MemTracker::free_memory_in(ZNMT::_device, (size_t)offset, size); > } > > void ZNMT::map(zaddress_unsafe addr, size_t size, zoffset offset) { > // NMT doesn't track mappings at the moment. > } > void ZNMT::unmap(zaddress_unsafe addr, size_t size) { > // NMT doesn't track mappings at the moment. > } > > > As you can see, any mapping between reserved regions and device-allocated memory is not recorded in NMT. This means that in detailed mode you only get reserved regions printed for the reserved memory, the device-allocated memory is reported separately. When performing summary reporting any memory allocated via these devices is added to the corresponding `MEMFLAGS` as `committed` memory. > > This patch is also acting as a base on which we deploy multiple new backend ideas to NMT. These ideas are: > > 1. Implement VMA tracking using a balanced binary tree approach. Today's `VirtualMemoryTracker`'s usage of linked lists is slow and brittle, we'd like to move away from it. Our Treap-based approach in this patch gives a performance boost such that we see 25x better performance in a benchmark. The idea and draft of this... Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: Must be static ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18289/files - new: https://git.openjdk.org/jdk/pull/18289/files/45bcdaba..32263c94 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18289&range=57 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18289&range=56-57 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/18289.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18289/head:pull/18289 PR: https://git.openjdk.org/jdk/pull/18289 From gli at openjdk.org Fri May 3 12:04:14 2024 From: gli at openjdk.org (Guoxiong Li) Date: Fri, 3 May 2024 12:04:14 GMT Subject: RFR: 8331608: Consolidate EncodeGCModeConcurrentFrameClosure and TransformStackChunkClosure Message-ID: Hi all, After [JDK-8296875](https://bugs.openjdk.org/browse/JDK-8296875), the classes `EncodeGCModeConcurrentFrameClosure` and `TransformStackChunkClosure` almost have the same code. This patch consolidates them into one. The tests `make test-hotspot_loom` and `make test-hotspot_gc` passed locally (linux & x64). Thanks for taking the time to review. Best Regards, -- Guoxiong ------------- Commit messages: - JDK-8331608 Changes: https://git.openjdk.org/jdk/pull/19084/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19084&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8331608 Stats: 27 lines in 1 file changed: 1 ins; 21 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/19084.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19084/head:pull/19084 PR: https://git.openjdk.org/jdk/pull/19084 From jsjolen at openjdk.org Fri May 3 12:08:21 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Fri, 3 May 2024 12:08:21 GMT Subject: RFR: 8312132: Add tracking of multiple address spaces in NMT [v59] In-Reply-To: References: Message-ID: > Hi, > > This PR introduces a new abstraction to NMT, named `MemoryFileTracker`. Today, NMT does not track any memory outside of the virtual memory address space. This means that if you allocated memory in something such as a memory-backed file and use `mmap` to map into that memory, then you'll have trouble reporting this to NMT. This is the situation that ZGC is in, and that is what this patch attempts to fix. > > ## `MemoryFileTracker` > > The `MemoryFileTracker` adds the ability of adding new virtual memory address spaces to NMT and committing memory to these, the basic API is: > > ```c++ > static MemoryFile* make_device(const char* descriptive_name); > static void free_device(MemoryFile* device); > > static void allocate_memory(MemoryFile* device, size_t offset, size_t size, > MEMFLAGS flag, const NativeCallStack& stack); > static void free_memory(MemoryFile* device, size_t offset, size_t size); > > > It is easiest to see how this is used by looking at what ZGC's `ZNMT` class does: > > ```c++ > void ZNMT::reserve(zaddress_unsafe start, size_t size) { > MemTracker::record_virtual_memory_reserve((address)start, size, CALLER_PC, mtJavaHeap); > } > void ZNMT::commit(zoffset offset, size_t size) { > MemTracker::allocate_memory_in(ZNMT::_device, static_cast(offset), size, mtJavaHeap, CALLER_PC); > } > void ZNMT::uncommit(zoffset offset, size_t size) { > MemTracker::free_memory_in(ZNMT::_device, (size_t)offset, size); > } > > void ZNMT::map(zaddress_unsafe addr, size_t size, zoffset offset) { > // NMT doesn't track mappings at the moment. > } > void ZNMT::unmap(zaddress_unsafe addr, size_t size) { > // NMT doesn't track mappings at the moment. > } > > > As you can see, any mapping between reserved regions and device-allocated memory is not recorded in NMT. This means that in detailed mode you only get reserved regions printed for the reserved memory, the device-allocated memory is reported separately. When performing summary reporting any memory allocated via these devices is added to the corresponding `MEMFLAGS` as `committed` memory. > > This patch is also acting as a base on which we deploy multiple new backend ideas to NMT. These ideas are: > > 1. Implement VMA tracking using a balanced binary tree approach. Today's `VirtualMemoryTracker`'s usage of linked lists is slow and brittle, we'd like to move away from it. Our Treap-based approach in this patch gives a performance boost such that we see 25x better performance in a benchmark. The idea and draft of this... Johan Sj?len has updated the pull request incrementally with two additional commits since the last revision: - Fix what we put into the GA - Move out ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18289/files - new: https://git.openjdk.org/jdk/pull/18289/files/32263c94..c70203db Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18289&range=58 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18289&range=57-58 Stats: 93 lines in 2 files changed: 30 ins; 62 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/18289.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18289/head:pull/18289 PR: https://git.openjdk.org/jdk/pull/18289 From stefank at openjdk.org Fri May 3 12:23:52 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Fri, 3 May 2024 12:23:52 GMT Subject: RFR: 8331608: Consolidate EncodeGCModeConcurrentFrameClosure and TransformStackChunkClosure In-Reply-To: References: Message-ID: On Fri, 3 May 2024 11:58:36 GMT, Guoxiong Li wrote: > Hi all, > > After [JDK-8296875](https://bugs.openjdk.org/browse/JDK-8296875), the classes `EncodeGCModeConcurrentFrameClosure` and `TransformStackChunkClosure` almost have the same code. This patch consolidates them into one. > > The tests `make test-hotspot_loom` and `make test-hotspot_gc` passed locally (linux & x64). Thanks for taking the time to review. > > Best Regards, > -- Guoxiong There doesn't seem to be a need to place the `derived_cl` closures in the callers. Could you keep it within `do_frame` as we used to do in `TransformStackChunkClosure`? ------------- Changes requested by stefank (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19084#pullrequestreview-2038024207 From gli at openjdk.org Fri May 3 12:28:51 2024 From: gli at openjdk.org (Guoxiong Li) Date: Fri, 3 May 2024 12:28:51 GMT Subject: RFR: 8331608: Consolidate EncodeGCModeConcurrentFrameClosure and TransformStackChunkClosure In-Reply-To: References: Message-ID: On Fri, 3 May 2024 12:21:09 GMT, Stefan Karlsson wrote: > There doesn't seem to be a need to place the `derived_cl` closures in the callers. Could you keep it within `do_frame` as we used to do in `TransformStackChunkClosure`? If moving it into `do_frame`, everytime the `do_frame` is invoked, we need to construct a `RelativizeClosure` object. Will it cost more running time? ------------- PR Comment: https://git.openjdk.org/jdk/pull/19084#issuecomment-2092916999 From stefank at openjdk.org Fri May 3 12:46:51 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Fri, 3 May 2024 12:46:51 GMT Subject: RFR: 8331608: Consolidate EncodeGCModeConcurrentFrameClosure and TransformStackChunkClosure In-Reply-To: References: Message-ID: <1QTQUUCrRthzWNXH0myS9SZFYafaSNpGOy3VuVgWfFk=.7c7e5385-cc38-4f1c-ac19-f8cd9bd9c26b@github.com> On Fri, 3 May 2024 12:26:33 GMT, Guoxiong Li wrote: > > There doesn't seem to be a need to place the `derived_cl` closures in the callers. Could you keep it within `do_frame` as we used to do in `TransformStackChunkClosure`? > > If moving it into `do_frame`, everytime the `do_frame` is invoked, we need to construct a `RelativizeClosure` object. Will it cost more running time? I don't think you need to worry about that. It could have been worth thinking about if there was a large state that needed to be set up for every frame, but that's not the case here. The compiler will likely handle this well. If you are still not convinced I think you could disassemble the code with and without the suggestion and compare the two. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19084#issuecomment-2092944650 From ayang at openjdk.org Fri May 3 12:49:03 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Fri, 3 May 2024 12:49:03 GMT Subject: RFR: 8331557: Serial: Refactor SerialHeap::do_collection [v2] In-Reply-To: References: Message-ID: <2E8psdsbHlnXaWjLMnhAHsoywFxY-jWEhHqAU4699_8=.83ba590a-2357-4924-a74a-e972b70b60da@github.com> > It's probably easier to read the new code directly. The two classes in `serialVMOperations` serve as entrance points to invoke young/full GCs. Some previously hidden decisions are made more obvious, e.g. if a young-gc fails (or will probablly fail), fallback to full-gc. > > Additionally, `StatRecord` is removed, because this kind of info-aggregation should be done outsite VM (by third-party tool). > > Test: tier1-6 Albert Mingkun Yang has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains one commit: s1-do-collect ------------- Changes: https://git.openjdk.org/jdk/pull/19056/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19056&range=01 Stats: 558 lines in 15 files changed: 126 ins; 348 del; 84 mod Patch: https://git.openjdk.org/jdk/pull/19056.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19056/head:pull/19056 PR: https://git.openjdk.org/jdk/pull/19056 From tschatzl at openjdk.org Fri May 3 12:52:59 2024 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Fri, 3 May 2024 12:52:59 GMT Subject: RFR: 8319548: Unexpected internal name for Filler array klass causes error in VisualVM In-Reply-To: References: Message-ID: On Tue, 19 Dec 2023 10:08:14 GMT, Thomas Schatzl wrote: > Hi all, > > please review this change that changes the filler array class name (again) after user feedback. > > In particular, the previous name `Ljdk/internal/vm/FillerArray;` confuses some tools (https://github.com/oracle/visualvm/issues/523). I.e. it's not an array, but still variable sized. > This change adds the `[` array bracket, and renames the element name to not have `Array` inside to not try to pretend that the element is some other kind of array. > > Testing: tier1-6 > > Thanks, > Thomas (because the bot does not seem to forward the answer from the mailing list within a few hours; fwiw, it has been pure luck that I stumbled across that question within github): On 30.04.24 03:38, jjscl8888 wrote: > Thank you for your clarification. if the instance in question had no > traffic but you observed a sudden increase in the old generation size > at 2:35 in the graph, and subsequent garbage collections (GCs) did not > reduce the size of the old generation back to its original value Collectors are fairly reluctant to give back memory to the OS. For G1 in particular, there are the options `MinHeapFreeRatio` and `MaxHeapFreeRatio` which to some degree steer commit and uncommit. * `MinHeapFreeRatio` is "The minimum percentage of heap free after GC to avoid expansion", i.e. minimum amount of memory should be kept free. Default is 40%, i.e. expands if less than that amount of memory is free. * `MaxHeapFreeRatio` is "The maximum percentage of heap free after GC to avoid shrinking", i.e. maximum amount of memory that should be kept free. Default is 70%; i.e. only shrinks the heap if more than 70% of memory is free. Not sure the latter condition is met here to shrink, and without logs (`-Xlog:gc+ergo+heap=debug`) this is just a guess. Also, this kind of heap resizing (including shrinking) only occurs in the Remark pause. So to decrease the heap more aggressively, it might work to decrease `MaxHeapFreeRatio` (and probably `MinHeapFreeRatio` too because for such large heaps the default values are maybe not optimal). Hth, Thomas ------------- PR Comment: https://git.openjdk.org/jdk/pull/17155#issuecomment-2092954588 From mbaesken at openjdk.org Fri May 3 13:04:56 2024 From: mbaesken at openjdk.org (Matthias Baesken) Date: Fri, 3 May 2024 13:04:56 GMT Subject: RFR: 8331428: ubsan: JVM flag checking complains about MaxTenuringThresholdConstraintFunc, InitialTenuringThresholdConstraintFunc and AllocatePrefetchStepSizeConstraintFunc In-Reply-To: References: Message-ID: On Fri, 3 May 2024 07:32:35 GMT, Matthias Baesken wrote: > Seems MaxTenuringThresholdConstraintFunc, InitialTenuringThresholdConstraintFunc and AllocatePrefetchStepSizeConstraintFunc check uint values (see gc_globals.hpp). However those functions have uintx in the check functions. > This causes Ubsan to complain : > > /jdk/src/hotspot/share/runtime/flags/jvmFlagAccess.cpp:176:12: runtime error: call to function MaxTenuringThresholdConstraintFunc(unsigned long, bool) through pointer to incorrect function type 'JVMFlag::Error (*)(unsigned int, bool)' > jvmFlagConstraintsGC.cpp:188: note: MaxTenuringThresholdConstraintFunc(unsigned long, bool) defined here > #0 0x10541cfbe in FlagAccessImpl_uint::typed_check_constraint(void*, unsigned int, bool) const jvmFlagAccess.cpp:176 > #1 0x1054253d7 in JVMFlagLimit::check_all_constraints(JVMFlagConstraintPhase) jvmFlagLimit.cpp:179 > #2 0x105f20b98 in Threads::create_vm(JavaVMInitArgs*, bool*) threads.cpp:471 > #3 0x10538c3fb in JNI_CreateJavaVM_inner(JavaVM_**, void**, void*) jni.cpp:3581 > #4 0x10342e71c in JavaMain java.c:491 > #5 0x103435248 in ThreadJavaMain java_md_macosx.m:720 > #6 0x7fff204338fb in _pthread_start+0xdf (libsystem_pthread.dylib:x86_64+0x68fb) > #7 0x7fff2042f442 in thread_start+0xe (libsystem_pthread.dylib:x86_64+0x2442) > > /jdk/src/hotspot/share/runtime/flags/jvmFlagAccess.cpp:176:12: runtime error: call to function InitialTenuringThresholdConstraintFunc(unsigned long, bool) through pointer to incorrect function type 'JVMFlag::Error (*)(unsigned int, bool)' > jvmFlagConstraintsGC.cpp:177: note: InitialTenuringThresholdConstraintFunc(unsigned long, bool) defined here > #0 0x117b1cfbe in FlagAccessImpl_uint::typed_check_constraint(void*, unsigned int, bool) const jvmFlagAccess.cpp:176 > #1 0x117b253d7 in JVMFlagLimit::check_all_constraints(JVMFlagConstraintPhase) jvmFlagLimit.cpp:179 > #2 0x118620b98 in Threads::create_vm(JavaVMInitArgs*, bool*) threads.cpp:471 > #3 0x117a8c3fb in JNI_CreateJavaVM_inner(JavaVM_**, void**, void*) jni.cpp:3581 > #4 0x10077e71c in JavaMain java.c:491 > #5 0x100785248 in ThreadJavaMain java_md_macosx.m:720 > #6 0x7fff204338fb in _pthread_start+0xdf (libsystem_pthread.dylib:x86_64+0x68fb) > #7 0x7fff2042f442 in thread_start+0xe (libsystem_pthread.dylib:x86_64+0x2442) > > and > > /jdk/src/hotspot/share/runtime/flags/jvmFlagAccess.cpp:157:12: runtime error: call to function AllocatePrefetchStepSizeConstraintFunc(long, bool) through pointer to incorrect function type 'JVMFlag::Error (*)(int, bool)' > jvmFlagConstrain... Thanks for the reviews ! ------------- PR Comment: https://git.openjdk.org/jdk/pull/19074#issuecomment-2092973093 From mbaesken at openjdk.org Fri May 3 13:04:57 2024 From: mbaesken at openjdk.org (Matthias Baesken) Date: Fri, 3 May 2024 13:04:57 GMT Subject: Integrated: 8331428: ubsan: JVM flag checking complains about MaxTenuringThresholdConstraintFunc, InitialTenuringThresholdConstraintFunc and AllocatePrefetchStepSizeConstraintFunc In-Reply-To: References: Message-ID: On Fri, 3 May 2024 07:32:35 GMT, Matthias Baesken wrote: > Seems MaxTenuringThresholdConstraintFunc, InitialTenuringThresholdConstraintFunc and AllocatePrefetchStepSizeConstraintFunc check uint values (see gc_globals.hpp). However those functions have uintx in the check functions. > This causes Ubsan to complain : > > /jdk/src/hotspot/share/runtime/flags/jvmFlagAccess.cpp:176:12: runtime error: call to function MaxTenuringThresholdConstraintFunc(unsigned long, bool) through pointer to incorrect function type 'JVMFlag::Error (*)(unsigned int, bool)' > jvmFlagConstraintsGC.cpp:188: note: MaxTenuringThresholdConstraintFunc(unsigned long, bool) defined here > #0 0x10541cfbe in FlagAccessImpl_uint::typed_check_constraint(void*, unsigned int, bool) const jvmFlagAccess.cpp:176 > #1 0x1054253d7 in JVMFlagLimit::check_all_constraints(JVMFlagConstraintPhase) jvmFlagLimit.cpp:179 > #2 0x105f20b98 in Threads::create_vm(JavaVMInitArgs*, bool*) threads.cpp:471 > #3 0x10538c3fb in JNI_CreateJavaVM_inner(JavaVM_**, void**, void*) jni.cpp:3581 > #4 0x10342e71c in JavaMain java.c:491 > #5 0x103435248 in ThreadJavaMain java_md_macosx.m:720 > #6 0x7fff204338fb in _pthread_start+0xdf (libsystem_pthread.dylib:x86_64+0x68fb) > #7 0x7fff2042f442 in thread_start+0xe (libsystem_pthread.dylib:x86_64+0x2442) > > /jdk/src/hotspot/share/runtime/flags/jvmFlagAccess.cpp:176:12: runtime error: call to function InitialTenuringThresholdConstraintFunc(unsigned long, bool) through pointer to incorrect function type 'JVMFlag::Error (*)(unsigned int, bool)' > jvmFlagConstraintsGC.cpp:177: note: InitialTenuringThresholdConstraintFunc(unsigned long, bool) defined here > #0 0x117b1cfbe in FlagAccessImpl_uint::typed_check_constraint(void*, unsigned int, bool) const jvmFlagAccess.cpp:176 > #1 0x117b253d7 in JVMFlagLimit::check_all_constraints(JVMFlagConstraintPhase) jvmFlagLimit.cpp:179 > #2 0x118620b98 in Threads::create_vm(JavaVMInitArgs*, bool*) threads.cpp:471 > #3 0x117a8c3fb in JNI_CreateJavaVM_inner(JavaVM_**, void**, void*) jni.cpp:3581 > #4 0x10077e71c in JavaMain java.c:491 > #5 0x100785248 in ThreadJavaMain java_md_macosx.m:720 > #6 0x7fff204338fb in _pthread_start+0xdf (libsystem_pthread.dylib:x86_64+0x68fb) > #7 0x7fff2042f442 in thread_start+0xe (libsystem_pthread.dylib:x86_64+0x2442) > > and > > /jdk/src/hotspot/share/runtime/flags/jvmFlagAccess.cpp:157:12: runtime error: call to function AllocatePrefetchStepSizeConstraintFunc(long, bool) through pointer to incorrect function type 'JVMFlag::Error (*)(int, bool)' > jvmFlagConstrain... This pull request has now been integrated. Changeset: 9697bc38 Author: Matthias Baesken URL: https://git.openjdk.org/jdk/commit/9697bc38586059d9bb020d3ca44a1c6cd7de315c Stats: 11 lines in 4 files changed: 0 ins; 0 del; 11 mod 8331428: ubsan: JVM flag checking complains about MaxTenuringThresholdConstraintFunc, InitialTenuringThresholdConstraintFunc and AllocatePrefetchStepSizeConstraintFunc Reviewed-by: stefank, aboldtch, tschatzl ------------- PR: https://git.openjdk.org/jdk/pull/19074 From stuefe at openjdk.org Fri May 3 13:19:01 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Fri, 3 May 2024 13:19:01 GMT Subject: RFR: 8312132: Add tracking of multiple address spaces in NMT [v54] In-Reply-To: References: <8tX-E6rhvM3r0MhHkAmoCaxzyUrQ6ohmV8UDYMdokms=.77f5daee-f371-4ab8-ad98-337cb4fb4111@github.com> Message-ID: <4Hch9nUc84xbWcyTMJapMgBQUC1-37IfPSx6wVv8Cv8=.24030e51-f8b7-4beb-a40d-4f18719edd4a@github.com> On Tue, 30 Apr 2024 17:44:59 GMT, Thomas Stuefe wrote: > > Okay, so this is irrelevant of how the treap is implemented? I guess it'd check for: > > > > - Non-degeneration of the depth of the tree (approximately log n) > > - Uniqueness of keys > > - Anything else?? > didnt read carefully enough. Obviosly, check thst keys are monotonously raising. No need to check for uniqueness. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18289#issuecomment-2093000195 From gli at openjdk.org Fri May 3 13:23:03 2024 From: gli at openjdk.org (Guoxiong Li) Date: Fri, 3 May 2024 13:23:03 GMT Subject: RFR: 8331608: Consolidate EncodeGCModeConcurrentFrameClosure and TransformStackChunkClosure [v2] In-Reply-To: References: Message-ID: > Hi all, > > After [JDK-8296875](https://bugs.openjdk.org/browse/JDK-8296875), the classes `EncodeGCModeConcurrentFrameClosure` and `TransformStackChunkClosure` almost have the same code. This patch consolidates them into one. > > The tests `make test-hotspot_loom` and `make test-hotspot_gc` passed locally (linux & x64). Thanks for taking the time to review. > > Best Regards, > -- Guoxiong Guoxiong Li has updated the pull request incrementally with one additional commit since the last revision: Move RelativizeClosure into do_frame ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19084/files - new: https://git.openjdk.org/jdk/pull/19084/files/88f250fb..43268d76 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19084&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19084&range=00-01 Stats: 5 lines in 1 file changed: 1 ins; 2 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/19084.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19084/head:pull/19084 PR: https://git.openjdk.org/jdk/pull/19084 From gli at openjdk.org Fri May 3 13:23:04 2024 From: gli at openjdk.org (Guoxiong Li) Date: Fri, 3 May 2024 13:23:04 GMT Subject: RFR: 8331608: Consolidate EncodeGCModeConcurrentFrameClosure and TransformStackChunkClosure [v2] In-Reply-To: <1QTQUUCrRthzWNXH0myS9SZFYafaSNpGOy3VuVgWfFk=.7c7e5385-cc38-4f1c-ac19-f8cd9bd9c26b@github.com> References: <1QTQUUCrRthzWNXH0myS9SZFYafaSNpGOy3VuVgWfFk=.7c7e5385-cc38-4f1c-ac19-f8cd9bd9c26b@github.com> Message-ID: On Fri, 3 May 2024 12:44:28 GMT, Stefan Karlsson wrote: > I don't think you need to worry about that. It could have been worth thinking about if there was a large state that needed to be set up for every frame, but that's not the case here. The compiler will likely handle this well. If you are still not convinced I think you could disassemble the code with and without the suggestion and compare the two. OK. I updated the code just now. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19084#issuecomment-2093007519 From baikaishiuc at gmail.com Fri May 3 13:38:21 2024 From: baikaishiuc at gmail.com (zhengxianwei) Date: Fri, 3 May 2024 21:38:21 +0800 Subject: How can I correctly invoke external methods in the JVM interpreter? Message-ID: I have added the -Xint configuration to the JVM, and I would like to invoke external functions when executing bytecode instanceof. I have added the following method ``` JRT_ENTRY(void, InterpreterRuntime::dump_instanceof0(JavaThread* current)) //log_info(os)("invoke instanceof %s:%d", __FILE__, __LINE__); JRT_END ``` Then modify templateTable_x86.cpp ``` void TemplateTable::instanceof() { transition(atos, itos); Label done, is_null, ok_is_subtype, quicked, resolved; call_VM(noreg, CAST_FROM_FN_PTR(address, InterpreterRuntime::dump_instanceof0)); __ testptr(rax, rax); __ jcc(Assembler::zero, is_null); ``` But compiling OpenJDK will result in errors ``` * For target support_images_jmods__create_java.desktop.jmod_exec: Error occurred during initialization of VM java.lang.InternalError: platform encoding not initialized at jdk.internal.util.SystemProps$Raw.platformProperties(java.base/Native Method) at jdk.internal.util.SystemProps$Raw.(java.base/SystemProps.java:263) at jdk.internal.util.SystemProps.initProperties(java.base/SystemProps.java:67) at java.lang.System.initPhase1(java.base/System.java:2167) ``` Could you please tell me what's incorrect about the method I implemented above ? -------------- next part -------------- An HTML attachment was scrubbed... URL: From gli at openjdk.org Fri May 3 13:46:05 2024 From: gli at openjdk.org (Guoxiong Li) Date: Fri, 3 May 2024 13:46:05 GMT Subject: RFR: 8331608: Consolidate EncodeGCModeConcurrentFrameClosure and TransformStackChunkClosure [v3] In-Reply-To: References: Message-ID: > Hi all, > > After [JDK-8296875](https://bugs.openjdk.org/browse/JDK-8296875), the classes `EncodeGCModeConcurrentFrameClosure` and `TransformStackChunkClosure` almost have the same code. This patch consolidates them into one. > > The tests `make test-hotspot_loom` and `make test-hotspot_gc` passed locally (linux & x64). Thanks for taking the time to review. > > Best Regards, > -- Guoxiong Guoxiong Li has updated the pull request incrementally with one additional commit since the last revision: Remove parameter ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19084/files - new: https://git.openjdk.org/jdk/pull/19084/files/43268d76..b689c9e1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19084&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19084&range=01-02 Stats: 5 lines in 1 file changed: 0 ins; 2 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/19084.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19084/head:pull/19084 PR: https://git.openjdk.org/jdk/pull/19084 From mdoerr at openjdk.org Fri May 3 14:06:16 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Fri, 3 May 2024 14:06:16 GMT Subject: RFR: 8331626: unsafe.cpp:162:38: runtime error in index_oop_from_field_offset_long - applying non-zero offset 4563897424 to null pointer Message-ID: <73h3Knwa33PoG1bq1S38-dEIKnB0lKfCKe3NwbIvNcU=.b9ea8021-5261-48eb-b0ec-cfae975477e9@github.com> `index_oop_from_field_offset_long` is sometimes used to access an absolute address by using `p == nullptr`. Unfortunately, `nullptr + byte_offset` implies undefined behavior and should better get fixed. UBSan complains about it (see JBS issue). A possible solution is to replace pointer arithmetic by integer arithmetic. We can use unsigned because `assert_field_offset_sane` checks that `byte_offset >= 0`. ------------- Commit messages: - 8331626: unsafe.cpp:162:38: runtime error in index_oop_from_field_offset_long - applying non-zero offset 4563897424 to null pointer Changes: https://git.openjdk.org/jdk/pull/19087/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19087&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8331626 Stats: 7 lines in 1 file changed: 0 ins; 4 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/19087.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19087/head:pull/19087 PR: https://git.openjdk.org/jdk/pull/19087 From mbaesken at openjdk.org Fri May 3 14:15:51 2024 From: mbaesken at openjdk.org (Matthias Baesken) Date: Fri, 3 May 2024 14:15:51 GMT Subject: RFR: 8331626: unsafe.cpp:162:38: runtime error in index_oop_from_field_offset_long - applying non-zero offset 4563897424 to null pointer In-Reply-To: <73h3Knwa33PoG1bq1S38-dEIKnB0lKfCKe3NwbIvNcU=.b9ea8021-5261-48eb-b0ec-cfae975477e9@github.com> References: <73h3Knwa33PoG1bq1S38-dEIKnB0lKfCKe3NwbIvNcU=.b9ea8021-5261-48eb-b0ec-cfae975477e9@github.com> Message-ID: On Fri, 3 May 2024 14:01:34 GMT, Martin Doerr wrote: > `index_oop_from_field_offset_long` is sometimes used to access an absolute address by using `p == nullptr`. Unfortunately, `nullptr + byte_offset` implies undefined behavior and should better get fixed. UBSan complains about it (see JBS issue). > A possible solution is to replace pointer arithmetic by integer arithmetic. We can use unsigned because `assert_field_offset_sane` checks that `byte_offset >= 0`. Marked as reviewed by mbaesken (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/19087#pullrequestreview-2038247954 From mli at openjdk.org Fri May 3 14:27:03 2024 From: mli at openjdk.org (Hamlin Li) Date: Fri, 3 May 2024 14:27:03 GMT Subject: RFR: 8320995: RISC-V: C2 PopCountVI [v2] In-Reply-To: References: Message-ID: <7eoTRo9miet61MlKRv6MRwFY3HCjyG4RiW6RGGJ4sAM=.982fad36-b2d9-4103-8a02-eca041a40e7d@github.com> > Hi, > Can you help to review this patch? > Both auto-vect and vector api depends on this intrinsic. > Thanks! > > ## Performance > Not performance test was done, as this depends on vcpop.v instruction in zvbb extension and the code seqeunce is rather simple than non-intrinsic version. Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: fix minor flag ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19065/files - new: https://git.openjdk.org/jdk/pull/19065/files/87797f3f..74692e23 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19065&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19065&range=00-01 Stats: 2 lines in 1 file changed: 0 ins; 2 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/19065.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19065/head:pull/19065 PR: https://git.openjdk.org/jdk/pull/19065 From duke at openjdk.org Fri May 3 15:46:55 2024 From: duke at openjdk.org (Yuri Gaevsky) Date: Fri, 3 May 2024 15:46:55 GMT Subject: RFR: 8324124: RISC-V: implement _vectorizedMismatch intrinsic In-Reply-To: References: Message-ID: <3YIbGyddjWq_ouYSh2NOwTFDphoKW6SrpCqBoUTiBrE=.17390e6e-48a9-450e-934b-12ddf641c261@github.com> On Tue, 20 Feb 2024 11:11:33 GMT, Fei Yang wrote: >> Hello All, >> >> Please review these changes to enable the __vectorizedMismatch_ intrinsic on RISC-V platform with RVV instructions supported. >> >> Thank you, >> -Yuri Gaevsky >> >> **Correctness checks:** >> hotspot/jtreg/compiler/{intrinsic/c1/c2}/ under QEMU-8.1 with RVV v1.0.0 and -XX:TieredStopAtLevel=1/2/3/4. > > src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 4256: > >> 4254: >> 4255: bind(VEC_LOOP); >> 4256: vsetvli(t0, cnt, Assembler::e8, Assembler::m8); > > I see `e8` element size is always used here for all cases. Maybe we could make use of some larger element size (according to `log2_array_indxscale` input) to improve the code? Especiall, the part for handling `idx`. Hi @RealFYang: do you expect any activity from my side here? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17750#discussion_r1589379310 From sgehwolf at openjdk.org Fri May 3 15:57:53 2024 From: sgehwolf at openjdk.org (Severin Gehwolf) Date: Fri, 3 May 2024 15:57:53 GMT Subject: RFR: 8261242: [Linux] OSContainer::is_containerized() returns true when run outside a container [v2] In-Reply-To: References: <8MpoLKDw6usz92EBH9R1XWfnX0E7NU5fd2dv8tob2ho=.455c310f-cadb-484d-a40f-6fd7e2c0811c@github.com> Message-ID: On Tue, 16 Apr 2024 18:10:08 GMT, Thomas Stuefe wrote: >> src/hotspot/os/linux/cgroupSubsystem_linux.cpp line 351: >> >>> 349: // >>> 350: // We collect the read only mount option in the cgroup infos so as to have that >>> 351: // info ready when determining is_containerized(). >> >> Here, and in other places: a comment indicating the line format we scan would be appreciated, possibly with argument numbers. Saves the casual code reader from looking into proc man page. Even just pasting the example line for proc manpage would be fine (https://man7.org/linux/man-pages/man5/proc.5.html) (but with order adapted to your scanf call, they count major:minor as one) > > Trying to parse the `%s%*[^-]-` > > So, %s parses the mount options, until we encounter whitespace. Then %*[^-]- parses everything that is not a dash, until we encounter the dash? Then we eat the dash? This is to skip the optionals? Correct. Note that `%s %*[^-]` doesn't work for files without optionals. Since `%*[^-]` requires a non-empty match and the optionals are, well, optional :-) I've added more verbose comments to clarify this. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18201#discussion_r1589390841 From sgehwolf at openjdk.org Fri May 3 15:57:57 2024 From: sgehwolf at openjdk.org (Severin Gehwolf) Date: Fri, 3 May 2024 15:57:57 GMT Subject: RFR: 8261242: [Linux] OSContainer::is_containerized() returns true when run outside a container [v2] In-Reply-To: <8MpoLKDw6usz92EBH9R1XWfnX0E7NU5fd2dv8tob2ho=.455c310f-cadb-484d-a40f-6fd7e2c0811c@github.com> References: <8MpoLKDw6usz92EBH9R1XWfnX0E7NU5fd2dv8tob2ho=.455c310f-cadb-484d-a40f-6fd7e2c0811c@github.com> Message-ID: <51Pz76bzLcZkgBLkoQeslRRTqztF2mIfSsvAZjo38uY=.7c1b4958-56b2-49c8-9311-74d83dfc355f@github.com> On Tue, 16 Apr 2024 18:16:33 GMT, Thomas Stuefe wrote: >> Severin Gehwolf has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains ten additional commits since the last revision: >> >> - Merge branch 'master' into jdk-8261242-is-containerized-fix >> - jcheck fixes >> - Fix tests >> - Implement Metrics.isContainerized() >> - Some clean-up >> - Drop cgroups testing on plain Linux >> - Implement fall-back logic for non-ro controller mounts >> - Make find_ro static and local to compilation unit >> - 8261242: [Linux] OSContainer::is_containerized() returns true > > src/hotspot/os/linux/osContainer_linux.cpp line 78: > >> 76: const char *reason; >> 77: bool any_mem_cpu_limit_present = false; >> 78: bool ctrl_ro = cgroup_subsystem->is_containerized(); > > nit: naming? what does ctrl mean in this case? Maybe use "cgroup_is_containerized"? `ctrl` was short for `controller`. I've changed the naming. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18201#discussion_r1589391426 From sgehwolf at openjdk.org Fri May 3 16:00:54 2024 From: sgehwolf at openjdk.org (Severin Gehwolf) Date: Fri, 3 May 2024 16:00:54 GMT Subject: RFR: 8261242: [Linux] OSContainer::is_containerized() returns true when run outside a container [v2] In-Reply-To: <8MpoLKDw6usz92EBH9R1XWfnX0E7NU5fd2dv8tob2ho=.455c310f-cadb-484d-a40f-6fd7e2c0811c@github.com> References: <8MpoLKDw6usz92EBH9R1XWfnX0E7NU5fd2dv8tob2ho=.455c310f-cadb-484d-a40f-6fd7e2c0811c@github.com> Message-ID: On Tue, 16 Apr 2024 18:21:29 GMT, Thomas Stuefe wrote: > Why return here? Because it's not useful to see containerized settings (other than the cg version in use) after this patch. The JVM won't use them (uses the physical settings instead). Why would you want to show the settings? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18201#discussion_r1589396352 From sgehwolf at openjdk.org Fri May 3 16:05:30 2024 From: sgehwolf at openjdk.org (Severin Gehwolf) Date: Fri, 3 May 2024 16:05:30 GMT Subject: RFR: 8261242: [Linux] OSContainer::is_containerized() returns true when run outside a container [v3] In-Reply-To: References: Message-ID: > Please review this enhancement to the container detection code which allows it to figure out whether the JVM is actually running inside a container (`podman`, `docker`, `crio`), or with some other means that enforces memory/cpu limits by means of the cgroup filesystem. If neither of those conditions hold, the JVM runs in not containerized mode, addressing the issue described in the JBS tracker. For example, on my Linux system `is_containerized() == false" is being indicated with the following trace log line: > > > [0.001s][debug][os,container] OSContainer::init: is_containerized() = false because no cpu or memory limit is present > > > This state is being exposed by the Java `Metrics` API class using the new (still JDK internal) `isContainerized()` method. Example: > > > java -XshowSettings:system --version > Operating System Metrics: > Provider: cgroupv1 > System not containerized. > openjdk 23-internal 2024-09-17 > OpenJDK Runtime Environment (fastdebug build 23-internal-adhoc.sgehwolf.jdk-jdk) > OpenJDK 64-Bit Server VM (fastdebug build 23-internal-adhoc.sgehwolf.jdk-jdk, mixed mode, sharing) > > > The basic property this is being built on is the observation that the cgroup controllers typically get mounted read only into containers. Note that the current container tests assert that `OSContainer::is_containerized() == true` in various tests. Therefore, using the heuristic of "is any memory or cpu limit present" isn't sufficient. I had considered that in an earlier iteration, but many container tests failed. > > Overall, I think, with this patch we improve the current situation of claiming a containerized system being present when it's actually just a regular Linux system. > > Testing: > > - [x] GHA (risc-v failure seems infra related) > - [x] Container tests on Linux x86_64 of cgroups v1 and cgroups v2 (including gtests) > - [x] Some manual testing using cri-o > > Thoughts? Severin Gehwolf has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 12 additional commits since the last revision: - Add doc for mountinfo scanning. - Unify naming of variables - Merge branch 'master' into jdk-8261242-is-containerized-fix - Merge branch 'master' into jdk-8261242-is-containerized-fix - jcheck fixes - Fix tests - Implement Metrics.isContainerized() - Some clean-up - Drop cgroups testing on plain Linux - Implement fall-back logic for non-ro controller mounts - ... and 2 more: https://git.openjdk.org/jdk/compare/06fa7bd3...434430ca ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18201/files - new: https://git.openjdk.org/jdk/pull/18201/files/0df26ebd..434430ca Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18201&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18201&range=01-02 Stats: 82529 lines in 2377 files changed: 37138 ins; 34932 del; 10459 mod Patch: https://git.openjdk.org/jdk/pull/18201.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18201/head:pull/18201 PR: https://git.openjdk.org/jdk/pull/18201 From sgehwolf at openjdk.org Fri May 3 16:05:30 2024 From: sgehwolf at openjdk.org (Severin Gehwolf) Date: Fri, 3 May 2024 16:05:30 GMT Subject: RFR: 8261242: [Linux] OSContainer::is_containerized() returns true when run outside a container In-Reply-To: References: Message-ID: On Mon, 22 Apr 2024 13:56:23 GMT, Jan Kratochvil wrote: > Anyway in this patch one could unify naming across variables/parameters, the same value is called `_is_ro`, `is_read_only`, `ro_opt`, `read_only`, `ro`. I've tried to unify the naming a bit. Thanks for the review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/18201#issuecomment-2093300919 From sgehwolf at openjdk.org Fri May 3 16:12:53 2024 From: sgehwolf at openjdk.org (Severin Gehwolf) Date: Fri, 3 May 2024 16:12:53 GMT Subject: RFR: 8261242: [Linux] OSContainer::is_containerized() returns true when run outside a container [v2] In-Reply-To: References: <8MpoLKDw6usz92EBH9R1XWfnX0E7NU5fd2dv8tob2ho=.455c310f-cadb-484d-a40f-6fd7e2c0811c@github.com> Message-ID: On Fri, 3 May 2024 15:58:11 GMT, Severin Gehwolf wrote: >> src/java.base/share/classes/sun/launcher/LauncherHelper.java line 375: >> >>> 373: if (!c.isContainerized()) { >>> 374: ostream.println(INDENT + "System not containerized."); >>> 375: return; >> >> Why return here? Would this not cut the output short in the non-containerized case? >> >> And if this not intended, the not-containerized-`-XshowSettings:system` test below should test and catch this (e.g. scan for CPU set) > >> Why return here? > > Because it's not useful to see containerized settings (other than the cg version in use) after this patch. The JVM won't use them (uses the physical settings instead). Why would you want to show the settings? To clarify. `showSettings:system` output on a host system: Operating System Metrics: Provider: cgroupv1 System not containerized. openjdk 23-internal 2024-09-17 OpenJDK Runtime Environment (fastdebug build 23-internal-adhoc.sgehwolf.jdk-jdk) OpenJDK 64-Bit Server VM (fastdebug build 23-internal-adhoc.sgehwolf.jdk-jdk, mixed mode, sharing) ... and in a container (with memory limit 500m): Operating System Metrics: Provider: cgroupv1 Effective CPU Count: 12 CPU Period: 100000us CPU Quota: -1 CPU Shares: -1 List of Processors, 12 total: 0 1 2 3 4 5 6 7 8 9 10 11 List of Effective Processors, 12 total: 0 1 2 3 4 5 6 7 8 9 10 11 List of Memory Nodes, 1 total: 0 List of Available Memory Nodes, 1 total: 0 Memory Limit: 500.00M Memory Soft Limit: Unlimited Memory & Swap Limit: 500.00M Maximum Processes Limit: 2048 openjdk 23-internal 2024-09-17 OpenJDK Runtime Environment (fastdebug build 23-internal-adhoc.sgehwolf.jdk-jdk) OpenJDK 64-Bit Server VM (fastdebug build 23-internal-adhoc.sgehwolf.jdk-jdk, mixed mode, sharing) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18201#discussion_r1589407238 From never at openjdk.org Fri May 3 17:27:55 2024 From: never at openjdk.org (Tom Rodriguez) Date: Fri, 3 May 2024 17:27:55 GMT Subject: RFR: 8326957: Implement JEP 474: ZGC: Generational Mode by Default [v4] In-Reply-To: References: Message-ID: On Fri, 3 May 2024 05:44:04 GMT, Axel Boldt-Christmas wrote: >> This is the implementation task for `JEP 474: ZGC: Generational Mode by Default`. See the JEP for details. [JDK-8326667](https://bugs.openjdk.org/browse/JDK-8326667) > > Axel Boldt-Christmas has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains eight commits: > > - Merge tag 'jdk-23+21' into JDK-8326957 > > Added tag jdk-23+21 for changeset e833bfc8 > - Merge tag 'jdk-23+19' into JDK-8326957 > > Added tag jdk-23+19 for changeset 706b421c > - Remove extra space > - Use consistent terminology > - Merge tag 'jdk-23+17' into JDK-8326957 > > Added tag jdk-23+17 for changeset 8efd7aa6 > - Merge tag 'jdk-23+16' into JDK-8326957 > > Added tag jdk-23+16 for changeset d580bcf9 > - Update VMDeprecatedOptions.java test > - 8326957: Implementation of Deprecate Non-Generational ZGC Graal still doesn't support generational ZGC though I'm actively working on it. I'm hoping to have it in before rampdown but at least until that time JVMCI needs to default to non-generational ZGC. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18393#issuecomment-2093461126 From cjplummer at openjdk.org Fri May 3 18:33:53 2024 From: cjplummer at openjdk.org (Chris Plummer) Date: Fri, 3 May 2024 18:33:53 GMT Subject: RFR: 8330146: assert(!_thread->is_in_any_VTMS_transition()) failed In-Reply-To: References: Message-ID: <_OZZbnPJvcnrwvlHdh9i-_kEBHdK3QEo_qGfG_nE3XE=.536f045e-65e6-4a37-a22a-080476bf9a21@github.com> On Thu, 2 May 2024 10:07:35 GMT, Serguei Spitsyn wrote: > Any event posting code except CFLH, ClassPrepare and ClassLoad events has a conditional return in case if the event is posted during a VTMS transition. The CFLH, ClassPrepare and ClassLoad event posting code has just an assert instead. The ClassPrepare and ClassLoad events also have a conditional return in a case of temporary VTMS transition. > This update is to align the CFLH, ClassPrepare and ClassLoad events with all other events in this area. > > Testing: > - TBD: submit mach5 tiers 1-6 What "debugging option" are you referring to. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19054#issuecomment-2093550055 From alanb at openjdk.org Fri May 3 18:39:51 2024 From: alanb at openjdk.org (Alan Bateman) Date: Fri, 3 May 2024 18:39:51 GMT Subject: RFR: 8330146: assert(!_thread->is_in_any_VTMS_transition()) failed In-Reply-To: <_OZZbnPJvcnrwvlHdh9i-_kEBHdK3QEo_qGfG_nE3XE=.536f045e-65e6-4a37-a22a-080476bf9a21@github.com> References: <_OZZbnPJvcnrwvlHdh9i-_kEBHdK3QEo_qGfG_nE3XE=.536f045e-65e6-4a37-a22a-080476bf9a21@github.com> Message-ID: On Fri, 3 May 2024 18:31:21 GMT, Chris Plummer wrote: > What "debugging option" are you referring to. `-Djdk.tracePinnedThreads=full`. When this system property is set then it means the onPinned callback is running the printing code. This is happen in a transition when running with JVMTI enabled. It dates from early development in the loom repo and was a mistake to bring it into the main line. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19054#issuecomment-2093556941 From cjplummer at openjdk.org Fri May 3 18:58:51 2024 From: cjplummer at openjdk.org (Chris Plummer) Date: Fri, 3 May 2024 18:58:51 GMT Subject: RFR: 8330146: assert(!_thread->is_in_any_VTMS_transition()) failed In-Reply-To: References: Message-ID: On Thu, 2 May 2024 10:07:35 GMT, Serguei Spitsyn wrote: > Any event posting code except CFLH, ClassPrepare and ClassLoad events has a conditional return in case if the event is posted during a VTMS transition. The CFLH, ClassPrepare and ClassLoad event posting code has just an assert instead. The ClassPrepare and ClassLoad events also have a conditional return in a case of temporary VTMS transition. > This update is to align the CFLH, ClassPrepare and ClassLoad events with all other events in this area. > > Testing: > - TBD: submit mach5 tiers 1-6 Marked as reviewed by cjplummer (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/19054#pullrequestreview-2038829389 From luhenry at openjdk.org Fri May 3 18:59:52 2024 From: luhenry at openjdk.org (Ludovic Henry) Date: Fri, 3 May 2024 18:59:52 GMT Subject: RFR: 8320995: RISC-V: C2 PopCountVI [v2] In-Reply-To: <7eoTRo9miet61MlKRv6MRwFY3HCjyG4RiW6RGGJ4sAM=.982fad36-b2d9-4103-8a02-eca041a40e7d@github.com> References: <7eoTRo9miet61MlKRv6MRwFY3HCjyG4RiW6RGGJ4sAM=.982fad36-b2d9-4103-8a02-eca041a40e7d@github.com> Message-ID: On Fri, 3 May 2024 14:27:03 GMT, Hamlin Li wrote: >> Hi, >> Can you help to review this patch? >> Both auto-vect and vector api depends on this intrinsic. >> Thanks! >> >> ## Performance >> Not performance test was done, as this depends on vcpop.v instruction in zvbb extension and the code seqeunce is rather simple than non-intrinsic version. > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > fix minor flag src/hotspot/cpu/riscv/globals_riscv.hpp line 118: > 116: product(bool, UseZihintpause, false, EXPERIMENTAL, \ > 117: "Use Zihintpause instructions") \ > 118: product(bool, UseZvbb, false, "Use Zvbb instructions") \ Shouldn't this be marked `EXPERIMENTAL` as we have no hardware to test it on? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19065#discussion_r1589624536 From luhenry at openjdk.org Fri May 3 19:01:55 2024 From: luhenry at openjdk.org (Ludovic Henry) Date: Fri, 3 May 2024 19:01:55 GMT Subject: RFR: 8324124: RISC-V: implement _vectorizedMismatch intrinsic In-Reply-To: References: Message-ID: On Wed, 1 May 2024 14:58:58 GMT, Yuri Gaevsky wrote: >> Hi, Do you have plan to implement instrinsic `VectorCmpMasked`? It's part of `vectorizedMismatch` > >> Hi, Do you have plan to implement instrinsic `VectorCmpMasked`? It's part of `vectorizedMismatch` > > Hi @Hamlin-Li, > > I don't have such plan for the moment. Why do you think it should be a part of `_vectorizedMismatch` intrinsic? The similar [fix](https://github.com/openjdk/jdk/commit/b05c40ca3b5fd34cbbc7a9479b108a4ff2c099f1?diff=split&w=0) for X64 ([JDK-8266951](https://bugs.openjdk.org/browse/JDK-8266951)) looks like natural enhancement/followup for the original intrinsic functionality. @ygaevsky the `VectorCmpMasked` is to support partial inlining for small arrays: https://github.com/openjdk/jdk/blob/b33096f887108c3d7e1f4e62689c2b10401234fa/src/hotspot/share/opto/library_call.cpp#L6372-L6411 It very much complements this intrinsic and allows it to focus on larger arrays. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17750#issuecomment-2093598840 From sspitsyn at openjdk.org Fri May 3 20:19:52 2024 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Fri, 3 May 2024 20:19:52 GMT Subject: RFR: 8330146: assert(!_thread->is_in_any_VTMS_transition()) failed In-Reply-To: References: Message-ID: On Thu, 2 May 2024 10:07:35 GMT, Serguei Spitsyn wrote: > Any event posting code except CFLH, ClassPrepare and ClassLoad events has a conditional return in case if the event is posted during a VTMS transition. The CFLH, ClassPrepare and ClassLoad event posting code has just an assert instead. The ClassPrepare and ClassLoad events also have a conditional return in a case of temporary VTMS transition. > This update is to align the CFLH, ClassPrepare and ClassLoad events with all other events in this area. > > Testing: > - TBD: submit mach5 tiers 1-6 Thank you for review, Chris! ------------- PR Comment: https://git.openjdk.org/jdk/pull/19054#issuecomment-2093703364 From kevinw at openjdk.org Fri May 3 21:43:09 2024 From: kevinw at openjdk.org (Kevin Walls) Date: Fri, 3 May 2024 21:43:09 GMT Subject: RFR: 8314225: SIGSEGV in JavaThread::is_lock_owned [v8] In-Reply-To: References: Message-ID: > Removal of JavaThread's MonitorChunks member. This held lock information during deoptimization, but access to it is unnecessary for anything other than the deoptimization itself. > > Access to it in is_lock_owned() was racy, and caused rare crashes. Kevin Walls has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 12 additional commits since the last revision: - Merge remote-tracking branch 'upstream/master' into 8314225_is_lock_owned_no_monitor_chunks_check - fill_in assert update - JavaThread comment update and synchronizer check before cast - monitor->owner() == nullptr handling in fill_in - Missing include - Move is_lock_owned from Thread to JavaThread - Remove JavaThread's is_lock_owned - Feedback from Dean - Merge remote-tracking branch 'upstream/master' into 8314225_is_lock_owned_no_monitor_chunks_check - Add asserts around move_to calls - ... and 2 more: https://git.openjdk.org/jdk/compare/d8a4ee4e...b5380800 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18940/files - new: https://git.openjdk.org/jdk/pull/18940/files/2989ad4c..b5380800 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18940&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18940&range=06-07 Stats: 6408 lines in 269 files changed: 3176 ins; 1347 del; 1885 mod Patch: https://git.openjdk.org/jdk/pull/18940.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18940/head:pull/18940 PR: https://git.openjdk.org/jdk/pull/18940 From sgibbons at openjdk.org Fri May 3 23:22:31 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Fri, 3 May 2024 23:22:31 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v18] In-Reply-To: References: Message-ID: > Re-write the IndexOf code without the use of the pcmpestri instruction, only using AVX2 instructions. This change accelerates String.IndexOf on average 1.3x for AVX2. The benchmark numbers: > > > Benchmark Score Latest > StringIndexOf.advancedWithMediumSub 343.573 317.934 0.925375393x > StringIndexOf.advancedWithShortSub1 1039.081 1053.96 1.014319384x > StringIndexOf.advancedWithShortSub2 55.828 110.541 1.980027943x > StringIndexOf.constantPattern 9.361 11.906 1.271872663x > StringIndexOf.searchCharLongSuccess 4.216 4.218 1.000474383x > StringIndexOf.searchCharMediumSuccess 3.133 3.216 1.02649218x > StringIndexOf.searchCharShortSuccess 3.76 3.761 1.000265957x > StringIndexOf.success 9.186 9.713 1.057369911x > StringIndexOf.successBig 14.341 46.343 3.231504079x > StringIndexOfChar.latin1_AVX2_String 6220.918 12154.52 1.953814533x > StringIndexOfChar.latin1_AVX2_char 5503.556 5540.044 1.006629895x > StringIndexOfChar.latin1_SSE4_String 6978.854 6818.689 0.977049957x > StringIndexOfChar.latin1_SSE4_char 5657.499 5474.624 0.967675646x > StringIndexOfChar.latin1_Short_String 7132.541 6863.359 0.962260014x > StringIndexOfChar.latin1_Short_char 16013.389 16162.437 1.009307711x > StringIndexOfChar.latin1_mixed_String 7386.123 14771.622 1.999915517x > StringIndexOfChar.latin1_mixed_char 9901.671 9782.245 0.987938803 Scott Gibbons has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 50 commits: - Merge remote-tracking branch 'origin/master' into indexof - Move arrays_equals back to c2_MacroAssembler - Merge branch 'openjdk:master' into indexof - Remove infinite loop (used for debugging) - Merge branch 'openjdk:master' into indexof - Cleaned up, ready for review - Pre-cleanup code - Add JMH. Add 16-byte compares to arrays_equals - Better method for mask creation - Merge branch 'openjdk:master' into indexof - ... and 40 more: https://git.openjdk.org/jdk/compare/b20fa7b4...f52d281d ------------- Changes: https://git.openjdk.org/jdk/pull/16753/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=16753&range=17 Stats: 4345 lines in 17 files changed: 4183 ins; 26 del; 136 mod Patch: https://git.openjdk.org/jdk/pull/16753.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16753/head:pull/16753 PR: https://git.openjdk.org/jdk/pull/16753 From crschnick at xpipe.io Sat May 4 00:49:35 2024 From: crschnick at xpipe.io (Christopher Schnick) Date: Sat, 4 May 2024 02:49:35 +0200 Subject: External _JAVA_OPTIONS environment variable sourcing for self-contained applications Message-ID: Hello there, I wasn't entirely sure whether this is the correct mailing list for this, but it was the best match for me skimming through all the available mailing lists. Feel free to point me to a better suited one if I'm wrong here. We develop and distribute Java desktop applications to users by creating standalone application images with jpackage. Everything is working fine, however there was a recent issue where some users couldn't get the application to work correctly. After some investigation, it turned out that the affected users had set the environment variable _JAVA_OPTIONS with a few JVM arguments, particularly Xmx parameters that were way too low for our application. I was quite surprised that these apply to self contained jpackage applications, because for me this is not in the spirit of an isolated and self contained application. I was even more surprised that it overwrote existing arguments as we had our own values for Xmx set in the application image, but these were ignored in favour of _JAVA_OPTIONS. And I'm under the impression that this behavior cannot be disabled. (Please correct me if I'm wrong) While I see that there is definitely some use case for having this option available to allow users to customize their environment uniformly, I would say that this causes usually more harm than good in this case. The cases of unintentional interference are probably much higher than intentional configuration, which requires specific application knowledge to work in the first place. If someone has set up a few Java 8 application on their system via normal jars and has configured a few options for them, I don't want them to apply to my application image that runs on Java 21. As the developer, I also don't want the user even having to bother with thinking about this possibility. I also don't even know if the application starts up if the variable contains unrecognized options.? Overall I'm not advocating here to fully remove this behavior, but at least thinking about giving application developers some option to disable external JVM argument sourcing for jlink/jpackage. I hope that this proposal can be considered. Best Christopher Schnick From gli at openjdk.org Sat May 4 03:14:06 2024 From: gli at openjdk.org (Guoxiong Li) Date: Sat, 4 May 2024 03:14:06 GMT Subject: RFR: 8331557: Serial: Refactor SerialHeap::do_collection [v2] In-Reply-To: <2E8psdsbHlnXaWjLMnhAHsoywFxY-jWEhHqAU4699_8=.83ba590a-2357-4924-a74a-e972b70b60da@github.com> References: <2E8psdsbHlnXaWjLMnhAHsoywFxY-jWEhHqAU4699_8=.83ba590a-2357-4924-a74a-e972b70b60da@github.com> Message-ID: On Fri, 3 May 2024 12:49:03 GMT, Albert Mingkun Yang wrote: >> It's probably easier to read the new code directly. The two classes in `serialVMOperations` serve as entrance points to invoke young/full GCs. Some previously hidden decisions are made more obvious, e.g. if a young-gc fails (or will probablly fail), fallback to full-gc. >> >> Additionally, `StatRecord` is removed, because this kind of info-aggregation should be done outsite VM (by third-party tool). >> >> Test: tier1-6 > > Albert Mingkun Yang has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains one commit: > > s1-do-collect Nice refactor. src/hotspot/share/gc/serial/serialHeap.cpp line 442: > 440: } > 441: > 442: bool SerialHeap::do_young_gc(DefNewGeneration* young_gen, bool clear_soft_refs) { The parameter `DefNewGeneration* young_gen` is not necessary. We can use the field `SerialHeap::_young_gen` directly. src/hotspot/share/gc/serial/serialHeap.cpp line 461: > 459: if (should_verify && VerifyBeforeGC) { > 460: prepare_for_verify(); > 461: Universe::verify("Before GC"); May the prefix of the verification log be better to specify the minor or full GC? Such as `Before Minor GC` here. src/hotspot/share/gc/serial/serialHeap.cpp line 463: > 461: Universe::verify("Before GC"); > 462: } > 463: gc_prologue(false); The parameter `full` of the method `SerialHeap::gc_prologue` doesn't been used. Seems a leftover of [JDK-8323993](https://bugs.openjdk.org/browse/JDK-8323993). src/hotspot/share/gc/serial/serialHeap.cpp line 468: > 466: gen->stat_record()->accumulated_time.stop(); > 467: > 468: update_gc_stats(gen, full); The method `update_gc_stats` is only used by young-gen to sample the promoted size. It is good to rename and simplify the related code. I filed https://bugs.openjdk.org/browse/JDK-8331684 to follow up. src/hotspot/share/gc/serial/serialHeap.cpp line 660: > 658: } > 659: do_full_collection_no_gc_locker(clear_soft_refs); > 660: } Please note the difference between the previous `SerialHeap::do_collection` and `SerialHeap::collect_at_safepoint_no_gc_locker` here. The previous `SerialHeap::do_collection` may invoke full GC according to the method `SerialHeap::should_do_full_collection` even the young GC succeeded. But `SerialHeap::collect_at_safepoint_no_gc_locker` only invokes full GC when the young GC failed (because of failed promotion). Such change makes the `SerialHeap::should_do_full_collection` has no user. If the behaviour of the `SerialHeap::collect_at_safepoint_no_gc_locker` is your intention, I think it is good to remove `SerialHeap::should_do_full_collection`. ------------- Changes requested by gli (Committer). PR Review: https://git.openjdk.org/jdk/pull/19056#pullrequestreview-2039185857 PR Review Comment: https://git.openjdk.org/jdk/pull/19056#discussion_r1589844934 PR Review Comment: https://git.openjdk.org/jdk/pull/19056#discussion_r1589860506 PR Review Comment: https://git.openjdk.org/jdk/pull/19056#discussion_r1589859847 PR Review Comment: https://git.openjdk.org/jdk/pull/19056#discussion_r1589857649 PR Review Comment: https://git.openjdk.org/jdk/pull/19056#discussion_r1589863014 From duke at openjdk.org Sat May 4 03:32:19 2024 From: duke at openjdk.org (Lei Zaakjyu) Date: Sat, 4 May 2024 03:32:19 GMT Subject: RFR: 8330694: Rename 'HeapRegion' to 'G1HeapRegion' [v7] In-Reply-To: <3IdWn9VGEERd8v9RcH2E_LzjVo0L8nMfi5jGWmhgVuM=.6b5b3be4-bfbd-4376-9580-48d78d75665c@github.com> References: <3IdWn9VGEERd8v9RcH2E_LzjVo0L8nMfi5jGWmhgVuM=.6b5b3be4-bfbd-4376-9580-48d78d75665c@github.com> Message-ID: <-j4lM2hWQmtYu9jJ9WQ8l_dhoU4newkCfxq9-66_Gx4=.e6633ac1-2ab1-4d45-b051-ef589e1412bd@github.com> > follow up 8267941 Lei Zaakjyu has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains eight commits: - rename - Merge branch 'master' of https://git.openjdk.org/jdk into JDK-8330694 - review - Merge branch 'master' into JDK-8330694 - fix indentation - also tidy up - tidy up - rename ------------- Changes: https://git.openjdk.org/jdk/pull/18871/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=18871&range=06 Stats: 999 lines in 124 files changed: 1 ins; 4 del; 994 mod Patch: https://git.openjdk.org/jdk/pull/18871.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18871/head:pull/18871 PR: https://git.openjdk.org/jdk/pull/18871 From duke at openjdk.org Sat May 4 03:42:17 2024 From: duke at openjdk.org (Lei Zaakjyu) Date: Sat, 4 May 2024 03:42:17 GMT Subject: RFR: 8330694: Rename 'HeapRegion' to 'G1HeapRegion' [v8] In-Reply-To: <3IdWn9VGEERd8v9RcH2E_LzjVo0L8nMfi5jGWmhgVuM=.6b5b3be4-bfbd-4376-9580-48d78d75665c@github.com> References: <3IdWn9VGEERd8v9RcH2E_LzjVo0L8nMfi5jGWmhgVuM=.6b5b3be4-bfbd-4376-9580-48d78d75665c@github.com> Message-ID: > follow up 8267941 Lei Zaakjyu has updated the pull request incrementally with one additional commit since the last revision: review ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18871/files - new: https://git.openjdk.org/jdk/pull/18871/files/4d175e27..80eeb443 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18871&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18871&range=06-07 Stats: 15 lines in 5 files changed: 0 ins; 0 del; 15 mod Patch: https://git.openjdk.org/jdk/pull/18871.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18871/head:pull/18871 PR: https://git.openjdk.org/jdk/pull/18871 From duke at openjdk.org Sat May 4 03:45:31 2024 From: duke at openjdk.org (Lei Zaakjyu) Date: Sat, 4 May 2024 03:45:31 GMT Subject: RFR: 8330694: Rename 'HeapRegion' to 'G1HeapRegion' [v9] In-Reply-To: <3IdWn9VGEERd8v9RcH2E_LzjVo0L8nMfi5jGWmhgVuM=.6b5b3be4-bfbd-4376-9580-48d78d75665c@github.com> References: <3IdWn9VGEERd8v9RcH2E_LzjVo0L8nMfi5jGWmhgVuM=.6b5b3be4-bfbd-4376-9580-48d78d75665c@github.com> Message-ID: <7Aud9EX-Q09Bx3MmZjM182gBp9sDmbvIt7rSmtBa1FM=.cc43a81c-7431-484d-9eae-295da93c9a52@github.com> > follow up 8267941 Lei Zaakjyu has updated the pull request incrementally with one additional commit since the last revision: review ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18871/files - new: https://git.openjdk.org/jdk/pull/18871/files/80eeb443..b007eb01 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18871&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18871&range=07-08 Stats: 2 lines in 2 files changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/18871.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18871/head:pull/18871 PR: https://git.openjdk.org/jdk/pull/18871 From aph-open at littlepinkcloud.com Sat May 4 10:09:33 2024 From: aph-open at littlepinkcloud.com (Andrew Haley) Date: Sat, 4 May 2024 11:09:33 +0100 Subject: How can I correctly invoke external methods in the JVM interpreter? In-Reply-To: References: Message-ID: <50c00ef3-5163-4652-9957-4fec2192bb0f@littlepinkcloud.com> On 5/3/24 14:38, zhengxianwei wrote: > Could you please tell me what's incorrect about the method I implemented above ? Think about register contents. Calls from the interpreter to native code use the C calling convention, which preserves the contents of some registers but clobbers others. You need to know that calling convention, and ensure that anything in use gets saved when you call native code. Also, you must step through the generated code in a debugger. Then you would see what had gone wrong. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From sspitsyn at openjdk.org Sat May 4 10:21:52 2024 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Sat, 4 May 2024 10:21:52 GMT Subject: RFR: 8330852: All callers of JvmtiEnvBase::get_threadOop_and_JavaThread should pass current thread explicitly [v4] In-Reply-To: References: Message-ID: On Fri, 3 May 2024 01:54:24 GMT, Alex Menkov wrote: >> Some cleanup related to JvmtiEnvBase::get_threadOop_and_JavaThread method >> >> Testing: tier1-6 > > Alex Menkov has updated the pull request incrementally with three additional commits since the last revision: > > - update > - Revert "renamed current_thread to current" > > This reverts commit d5d614bcf0861466acd695296e974d2253f84c9f. > - Revert "renamed current_thread tp current" > > This reverts commit 4602632221044aa754a1bc8d11e7a3e9a0092590. Looks good. ------------- Marked as reviewed by sspitsyn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18986#pullrequestreview-2039319268 From Alan.Bateman at oracle.com Sat May 4 10:41:50 2024 From: Alan.Bateman at oracle.com (Alan Bateman) Date: Sat, 4 May 2024 11:41:50 +0100 Subject: External _JAVA_OPTIONS environment variable sourcing for self-contained applications In-Reply-To: References: Message-ID: On 04/05/2024 01:49, Christopher Schnick wrote: > Hello there, > > I wasn't entirely sure whether this is the correct mailing list for > this, but it was the best match for me skimming through all the > available mailing lists. Feel free to point me to a better suited one > if I'm wrong here. > > We develop and distribute Java desktop applications to users by > creating standalone application images with jpackage. Everything is > working fine, however there was a recent issue where some users > couldn't get the application to work correctly. After some > investigation, it turned out that the affected users had set the > environment variable _JAVA_OPTIONS with a few JVM arguments, > particularly Xmx parameters that were way too low for our application. > I was quite surprised that these apply to self contained jpackage > applications, because for me this is not in the spirit of an isolated > and self contained application. I was even more surprised that it > overwrote existing arguments as we had our own values for Xmx set in > the application image, but these were ignored in favour of > _JAVA_OPTIONS. And I'm under the impression that this behavior cannot > be disabled. (Please correct me if I'm wrong) > > While I see that there is definitely some use case for having this > option available to allow users to customize their environment > uniformly, I would say that this causes usually more harm than good in > this case. The cases of unintentional interference are probably much > higher than intentional configuration, which requires specific > application knowledge to work in the first place. _JAVA_OPTIONS is a legacy environment variable from early JDK releases. Yes, it does append (rather than prepend) so it overrides options. It's not a documented env variable but it it has existed for a long time and people seem to have found it. So great care would be required before changing anything. The supported/documented environment variables are JAVA_TOOL_OPTIONS and JDK_JAVA_OPTIONS. They prepend so they don't override the options specified on the command line. There are important use-cases. JAVA_TOOL_OPTIONS is documented in the JVMTI spec as a way to insert options to start tool agents (-agentlib or -agentpath). JDK_JAVA_OPTIONS is JDK-specific and has its own section in the java man page. It supports both java launcher options and VM options.? It was an important piece to help the migration from JDK 8 to newer releases for deployments that try to use the same command lines across a broad range of JDK releases. My guess from reading your mail is that the desktop application doesn't have a console so the "Picked up _JAVA_OPTIONS=" message that is printed is not seen. In server applications the message will typically end up at the top of the application's log file so you can see what has been picked up. -Alan From tanksherman27 at gmail.com Sat May 4 13:46:01 2024 From: tanksherman27 at gmail.com (Julian Waters) Date: Sat, 4 May 2024 21:46:01 +0800 Subject: Where does the openjdk JVM interpreter execute the bytecode instanceof operation In-Reply-To: References: Message-ID: Glad to help! Paging for David and Thomas again, who'll probably be able to help you more than I can best regards, Julian On Fri, May 3, 2024 at 3:54?PM zhengxianwei wrote: > I carefully analyzed it and found that what you said is actually correct. > > I didn't understand it correctly initially. > > Thanks again for your explanation > > On Fri, May 3, 2024 at 11:03?AM Julian Waters > wrote: > >> Hi Xian Wei, >> >> No, you are right! The code in templateTable_x86.cpp that you linked to >> in your post is not part of the Just in Time Compilers, it is part of the >> x86 Interpreter! The Java HotSpot VM actually has 2 different Interpreters, >> the primary Interpreter is written in large chunks of assembly specific to >> each platform, which is then processed by the HotSpot macro assemblers. The >> bytecodeInterpreter.cpp file you linked to is part of the second and less >> often used Interpreter, which is why modifying the bytecodeInterpreter.cpp >> instanceof implementation did nothing in your case (The Interpreter used >> actually depends on the platform, and the secondary Interpreter is not used >> on ARM or x86). The details on the macro assemblers unfortunately elude me >> since I am not a HotSpot expert (Although I hope to be one day), but to >> understand how instanceof works on x86 and ARM, you need to understand both >> x86 and ARM assembly. The Interpreter's instanceof opcode is implemented on >> x86 in >> https://github.com/openjdk/jdk/blob/6bef0474c8b8773d0d20c0f25c36a2ce9cdbd7e8/src/hotspot/cpu/x86/templateTable_x86.cpp#L4243 >> and on ARM, it is implemented in >> https://github.com/openjdk/jdk/blob/6bef0474c8b8773d0d20c0f25c36a2ce9cdbd7e8/src/hotspot/cpu/arm/templateTable_arm.cpp#L4182 >> >> Happy to help! >> >> best regards, >> Julian >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From crschnick at xpipe.io Sat May 4 17:22:12 2024 From: crschnick at xpipe.io (Christopher Schnick) Date: Sat, 4 May 2024 19:22:12 +0200 Subject: External _JAVA_OPTIONS environment variable sourcing for self-contained applications In-Reply-To: References: Message-ID: Hey Alan, in our specific case we are distributing desktop applications to end users where we take great care to make them user friendly and easy to use. In most instances, the users probably don't even know that our application runs on Java. A big part in accomplishing this was the introduction of jlink/jpackage. We can just say: Download this application, you don't need anything else. If they ask about installing Java, we can just say it does not matter what kind of Java you have installed and configured on your system because our application is self contained. But now it seems like we have to augment this statement by saying that it doesn't matter what you have installed unless you have set _JAVA_OPTIONS or JAVA_TOOL_OPTIONS, then it might be possible that the application will not behave as expected or does not start up at all without any error message. And we can't do anything about it, you have to ask your IT admin. As a practical use case, if an office deploys a few legacy Java 8 applications on their systems that are configured via any environment variables to behave uniformly, there is a good chance of an option being included in there that is not supported by Java 21. If this is the case, then ALL modern self contained graphical Java applications don't even start up and don't show an error message. (I just tried, I remember that jpackage showed a generic error window some time ago but it does not do that for me anymore). If the option is at least recognized but completely wrong for the requirements of the application, it will quickly error out which isn't much better. This is horrible behavior in this use case. Not every Java application is scalable or configurable one, there are cases in which any user configuration does not make sense and is most likely unintentional this way. Any possible configuration can be exposed by the application itself. All I'm asking for is to consider giving developers an option in jlink that is explicitly opt-in to disable this environment variable sourcing. That way the original behavior would not be changed unless explicitly requested by the developer. Best Christopher Schnick On 04/05/2024 12:41, Alan Bateman wrote: > On 04/05/2024 01:49, Christopher Schnick wrote: >> Hello there, >> >> I wasn't entirely sure whether this is the correct mailing list for >> this, but it was the best match for me skimming through all the >> available mailing lists. Feel free to point me to a better suited one >> if I'm wrong here. >> >> We develop and distribute Java desktop applications to users by >> creating standalone application images with jpackage. Everything is >> working fine, however there was a recent issue where some users >> couldn't get the application to work correctly. After some >> investigation, it turned out that the affected users had set the >> environment variable _JAVA_OPTIONS with a few JVM arguments, >> particularly Xmx parameters that were way too low for our >> application. I was quite surprised that these apply to self contained >> jpackage applications, because for me this is not in the spirit of an >> isolated and self contained application. I was even more surprised >> that it overwrote existing arguments as we had our own values for Xmx >> set in the application image, but these were ignored in favour of >> _JAVA_OPTIONS. And I'm under the impression that this behavior cannot >> be disabled. (Please correct me if I'm wrong) >> >> While I see that there is definitely some use case for having this >> option available to allow users to customize their environment >> uniformly, I would say that this causes usually more harm than good >> in this case. The cases of unintentional interference are probably >> much higher than intentional configuration, which requires specific >> application knowledge to work in the first place. > > _JAVA_OPTIONS is a legacy environment variable from early JDK > releases. Yes, it does append (rather than prepend) so it overrides > options. It's not a documented env variable but it it has existed for > a long time and people seem to have found it. So great care would be > required before changing anything. > > The supported/documented environment variables are JAVA_TOOL_OPTIONS > and JDK_JAVA_OPTIONS. They prepend so they don't override the options > specified on the command line. There are important use-cases. > JAVA_TOOL_OPTIONS is documented in the JVMTI spec as a way to insert > options to start tool agents (-agentlib or -agentpath). > JDK_JAVA_OPTIONS is JDK-specific and has its own section in the java > man page. It supports both java launcher options and VM options.? It > was an important piece to help the migration from JDK 8 to newer > releases for deployments that try to use the same command lines across > a broad range of JDK releases. > > My guess from reading your mail is that the desktop application > doesn't have a console so the "Picked up _JAVA_OPTIONS=" > message that is printed is not seen. In server applications the > message will typically end up at the top of the application's log file > so you can see what has been picked up. > > -Alan > > From sgibbons at openjdk.org Sat May 4 19:35:21 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Sat, 4 May 2024 19:35:21 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v19] In-Reply-To: References: Message-ID: > Re-write the IndexOf code without the use of the pcmpestri instruction, only using AVX2 instructions. This change accelerates String.IndexOf on average 1.3x for AVX2. The benchmark numbers: > > > Benchmark Score Latest > StringIndexOf.advancedWithMediumSub 343.573 317.934 0.925375393x > StringIndexOf.advancedWithShortSub1 1039.081 1053.96 1.014319384x > StringIndexOf.advancedWithShortSub2 55.828 110.541 1.980027943x > StringIndexOf.constantPattern 9.361 11.906 1.271872663x > StringIndexOf.searchCharLongSuccess 4.216 4.218 1.000474383x > StringIndexOf.searchCharMediumSuccess 3.133 3.216 1.02649218x > StringIndexOf.searchCharShortSuccess 3.76 3.761 1.000265957x > StringIndexOf.success 9.186 9.713 1.057369911x > StringIndexOf.successBig 14.341 46.343 3.231504079x > StringIndexOfChar.latin1_AVX2_String 6220.918 12154.52 1.953814533x > StringIndexOfChar.latin1_AVX2_char 5503.556 5540.044 1.006629895x > StringIndexOfChar.latin1_SSE4_String 6978.854 6818.689 0.977049957x > StringIndexOfChar.latin1_SSE4_char 5657.499 5474.624 0.967675646x > StringIndexOfChar.latin1_Short_String 7132.541 6863.359 0.962260014x > StringIndexOfChar.latin1_Short_char 16013.389 16162.437 1.009307711x > StringIndexOfChar.latin1_mixed_String 7386.123 14771.622 1.999915517x > StringIndexOfChar.latin1_mixed_char 9901.671 9782.245 0.987938803 Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: Rearrange; add lambdas for clarity ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16753/files - new: https://git.openjdk.org/jdk/pull/16753/files/f52d281d..fb4da92a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16753&range=18 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16753&range=17-18 Stats: 2561 lines in 1 file changed: 804 ins; 954 del; 803 mod Patch: https://git.openjdk.org/jdk/pull/16753.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16753/head:pull/16753 PR: https://git.openjdk.org/jdk/pull/16753 From tanksherman27 at gmail.com Sun May 5 07:46:45 2024 From: tanksherman27 at gmail.com (Julian Waters) Date: Sun, 5 May 2024 15:46:45 +0800 Subject: Where does the openjdk JVM interpreter execute the bytecode instanceof operation In-Reply-To: References: Message-ID: By the way, when you reply to someone, you should also cc to hotspot-dev at openjdk.org, for your message to show up on the mailing lists. That way, more people will see it and your chances of them helping you increase best regards, Julian On Fri, May 3, 2024 at 3:54?PM zhengxianwei wrote: > I carefully analyzed it and found that what you said is actually correct. > > I didn't understand it correctly initially. > > Thanks again for your explanation > > On Fri, May 3, 2024 at 11:03?AM Julian Waters > wrote: > >> Hi Xian Wei, >> >> No, you are right! The code in templateTable_x86.cpp that you linked to >> in your post is not part of the Just in Time Compilers, it is part of the >> x86 Interpreter! The Java HotSpot VM actually has 2 different Interpreters, >> the primary Interpreter is written in large chunks of assembly specific to >> each platform, which is then processed by the HotSpot macro assemblers. The >> bytecodeInterpreter.cpp file you linked to is part of the second and less >> often used Interpreter, which is why modifying the bytecodeInterpreter.cpp >> instanceof implementation did nothing in your case (The Interpreter used >> actually depends on the platform, and the secondary Interpreter is not used >> on ARM or x86). The details on the macro assemblers unfortunately elude me >> since I am not a HotSpot expert (Although I hope to be one day), but to >> understand how instanceof works on x86 and ARM, you need to understand both >> x86 and ARM assembly. The Interpreter's instanceof opcode is implemented on >> x86 in >> https://github.com/openjdk/jdk/blob/6bef0474c8b8773d0d20c0f25c36a2ce9cdbd7e8/src/hotspot/cpu/x86/templateTable_x86.cpp#L4243 >> and on ARM, it is implemented in >> https://github.com/openjdk/jdk/blob/6bef0474c8b8773d0d20c0f25c36a2ce9cdbd7e8/src/hotspot/cpu/arm/templateTable_arm.cpp#L4182 >> >> Happy to help! >> >> best regards, >> Julian >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From baikaishiuc at gmail.com Sun May 5 08:13:12 2024 From: baikaishiuc at gmail.com (zhengxianwei) Date: Sun, 5 May 2024 16:13:12 +0800 Subject: Where does the openjdk JVM interpreter execute the bytecode instanceof operation In-Reply-To: References: Message-ID: Thank you. This is my first time using the mailing list, and I wasn't aware of this issue. I'll make sure to cc o hotspot-dev at openjdk.org now. :-) On Sun, May 5, 2024 at 3:47?PM Julian Waters wrote: > By the way, when you reply to someone, you should also cc to > hotspot-dev at openjdk.org, for your message to show up on the mailing > lists. That way, more people will see it and your chances of them helping > you increase > > best regards, > Julian > > On Fri, May 3, 2024 at 3:54?PM zhengxianwei wrote: > >> I carefully analyzed it and found that what you said is actually correct. >> >> I didn't understand it correctly initially. >> >> Thanks again for your explanation >> >> On Fri, May 3, 2024 at 11:03?AM Julian Waters >> wrote: >> >>> Hi Xian Wei, >>> >>> No, you are right! The code in templateTable_x86.cpp that you linked to >>> in your post is not part of the Just in Time Compilers, it is part of the >>> x86 Interpreter! The Java HotSpot VM actually has 2 different Interpreters, >>> the primary Interpreter is written in large chunks of assembly specific to >>> each platform, which is then processed by the HotSpot macro assemblers. The >>> bytecodeInterpreter.cpp file you linked to is part of the second and less >>> often used Interpreter, which is why modifying the bytecodeInterpreter.cpp >>> instanceof implementation did nothing in your case (The Interpreter used >>> actually depends on the platform, and the secondary Interpreter is not used >>> on ARM or x86). The details on the macro assemblers unfortunately elude me >>> since I am not a HotSpot expert (Although I hope to be one day), but to >>> understand how instanceof works on x86 and ARM, you need to understand both >>> x86 and ARM assembly. The Interpreter's instanceof opcode is implemented on >>> x86 in >>> https://github.com/openjdk/jdk/blob/6bef0474c8b8773d0d20c0f25c36a2ce9cdbd7e8/src/hotspot/cpu/x86/templateTable_x86.cpp#L4243 >>> and on ARM, it is implemented in >>> https://github.com/openjdk/jdk/blob/6bef0474c8b8773d0d20c0f25c36a2ce9cdbd7e8/src/hotspot/cpu/arm/templateTable_arm.cpp#L4182 >>> >>> Happy to help! >>> >>> best regards, >>> Julian >>> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From baikaishiuc at gmail.com Sun May 5 08:14:11 2024 From: baikaishiuc at gmail.com (zhengxianwei) Date: Sun, 5 May 2024 16:14:11 +0800 Subject: Fwd: How can I correctly invoke external methods in the JVM interpreter? In-Reply-To: References: <50c00ef3-5163-4652-9957-4fec2192bb0f@littlepinkcloud.com> Message-ID: ---------- Forwarded message --------- From: zhengxianwei Date: Sun, May 5, 2024 at 9:34?AM Subject: Re: How can I correctly invoke external methods in the JVM interpreter? To: Andrew Haley Thank you for your reply. Yesterday, I analyzed the code and found that when CALL_VM is called, it needs to save and restore the register group, while I assumed that CALL_VM itself included the operation to restore the context. Now it's working fine > Calls from the interpreter to > native code use the C calling convention This is something I really need > Also, you must step through the generated code in a debugger Yes, you're right. I need to track the generated assembly code. A few days ago, when I was analyzing the issue, I found debugging the JVM with both JDB and GDB to be cumbersome, so I was reluctant to spend time debugging the code. Now it seems I need to re-understand the entire process On Sat, May 4, 2024 at 6:09?PM Andrew Haley wrote: > On 5/3/24 14:38, zhengxianwei wrote: > > Could you please tell me what's incorrect about the method I implemented > above ? > > Think about register contents. Calls from the interpreter to > native code use the C calling convention, which preserves the > contents of some registers but clobbers others. You need to know > that calling convention, and ensure that anything in use gets > saved when you call native code. > > Also, you must step through the generated code in a debugger. > Then you would see what had gone wrong. > > -- > Andrew Haley (he/him) > Java Platform Lead Engineer > Red Hat UK Ltd. > https://keybase.io/andrewhaley > EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From aph-open at littlepinkcloud.com Sun May 5 08:25:33 2024 From: aph-open at littlepinkcloud.com (Andrew Haley) Date: Sun, 5 May 2024 09:25:33 +0100 Subject: External _JAVA_OPTIONS environment variable sourcing for self-contained applications In-Reply-To: References: Message-ID: <00448fac-9299-429e-906a-9c17942c0437@littlepinkcloud.com> On 5/4/24 18:22, Christopher Schnick wrote: > All I'm asking for is to consider giving > developers an option in jlink that is explicitly opt-in to disable this > environment variable sourcing. That way the original behavior would not > be changed unless explicitly requested by the developer. If I had to solve the problem quickly (as in, less than a year or so) I'd consider creating a custom launcher. I think that would reset any environment variables then start Java. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From jsjolen at openjdk.org Sun May 5 09:08:57 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Sun, 5 May 2024 09:08:57 GMT Subject: RFR: 8312132: Add tracking of multiple address spaces in NMT [v54] In-Reply-To: References: <8tX-E6rhvM3r0MhHkAmoCaxzyUrQ6ohmV8UDYMdokms=.77f5daee-f371-4ab8-ad98-337cb4fb4111@github.com> Message-ID: On Tue, 30 Apr 2024 17:44:59 GMT, Thomas Stuefe wrote: >If I am not mistaken, it also seems more expensive? A remove node needs two splits and a merge, both seem to be dependent on tree depth. Removing the node via find-and-rotate-til-its-a-leaf only needs one tree traversal (first find the node, then rotate down until its a leaf). It is more expensive AFAICS. Remove with split/merge needs three passes down the tree, remove with rotation needs a pass down the tree and then a pass up the tree. So 3*log(n) vs 2*log(n) ;-). ------------- PR Comment: https://git.openjdk.org/jdk/pull/18289#issuecomment-2094697902 From jsjolen at openjdk.org Sun May 5 09:30:58 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Sun, 5 May 2024 09:30:58 GMT Subject: RFR: 8312132: Add tracking of multiple address spaces in NMT [v5] In-Reply-To: References: <-XAziSwGMo20pUAnbdRW1JUk_0ZB-80RVfAHr0iuewE=.bff8f2f7-01e2-46eb-bd4b-1b16fccc6aa1@github.com> <3al4DjsRcIX_qJZNbTGqBDIAOj4bU5l8xpYPHQE8cNM=.7cc0bdfe-c9c8-46ce-ad42-397c61b5a603@github.com> <7u3imUh6-qb_wLdyZ4mn5SfnEOkxyFEQ20O0fb6WJj0=.3179edcb-0340-4d50-a674-c18128cc2e2f@github.com> <3z6o8urlRN3qEViyH6CMdXYByP0LR8mMKBYVe9_xKGI=.db9bb8d2-d4c1-4b23-9667-c0a9b7d7b94f@github.com> Message-ID: On Tue, 30 Apr 2024 11:29:37 GMT, Thomas Stuefe wrote: >>> Should the indexes not be stable across resize? >> >> **No.** The hash is determined as: `int place_to_put_element = hash_of(the_thing) % size_of_array;` >> >> The `size_of_array` will change, so when probing for/inserting the same NCS after a resize a new index may be used. Meaning, we will have duplicate entries. If we're OK with this, then that's fine. It means that equality checking will require dereferencing the index and doing the full NCS comparison. >> >> ```c++ >> GA ht(2); // Size 2 >> int oldidx = hash(4) % ht.size(); // oldidx == 0 >> ht.put(oldidx, 4); >> // Out of room, resize >> ht.grow(4); >> // Now imagine you insert oldidx into some treap node's metadata >> // Now we're adding the same int, 4, again but get a different index >> int newidx = hash(4) % ht.size(); // newidx == 2 >> // Now what? > > Ah, I get the confusion. This is not what I meant. > > What I mean was: > > At the moment you malloc space for NativeCallStack, then keep NativeCallStack* in the hash map. NativeCallStack* now uniquely identifies your stack. > > What I meant is to place NativeCallStack in a growable array. Now, you have a 32-bit or even a 16-bit index into that array. That index uniquely identifies the stack. You keep that index the hashmap. The hashmap does not change. Hashmap storage has nothing to do with that array. This is not the bucket array. > > Basically, you replace the malloc for the NativeCallStack with a placement-new in a new growable array. The rest stays the same. > > But now, you have a 32-bit or even 16-bit index, and that is smaller than a native pointer, which makes it possible to encode the stack information in a tree node much more succinctively. This makes it possible to encode the whole tree node metainfo very comfortably in a single 64-bit value. You can even get both in- and out-state of the VMATree into a single 64-bit value like this: > > bits 0-7 MEMFLAGS in > bits 8-16 State in > bits 16-31 callstack index in > > bits 32-39 MEMFLAGS out > bits 40-47 State out > bits 48-63 callstack index out Oh right of course, just store the NCS separately to the closed-addressing hashtable. I'm going for a 32-bit value just because that's the quickest. We can do a further compression round in a future PR. If we really wanted to, we could also store the `Link`s in a GA and thus reduce their pointer sizes to 32-bits also. Still, future PR, IMHO. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18289#discussion_r1590261943 From crschnick at xpipe.io Sun May 5 22:25:04 2024 From: crschnick at xpipe.io (Christopher Schnick) Date: Mon, 6 May 2024 00:25:04 +0200 Subject: External _JAVA_OPTIONS environment variable sourcing for self-contained applications In-Reply-To: <00448fac-9299-429e-906a-9c17942c0437@littlepinkcloud.com> References: <00448fac-9299-429e-906a-9c17942c0437@littlepinkcloud.com> Message-ID: <779da9f1-c969-4d41-9f15-71d315122fb6@xpipe.io> Since there already is the jpackage tool to create an application launcher, such an option could also be added to jpackage if it better fits there. Alternatively, a JVM argument to disable external environment variable argument sourcing would also work. On 05/05/2024 10:25, Andrew Haley wrote: > On 5/4/24 18:22, Christopher Schnick wrote: >> All I'm asking for is to consider giving >> developers an option in jlink that is explicitly opt-in to disable this >> environment variable sourcing. That way the original behavior would not >> be changed unless explicitly requested by the developer. > > If I had to solve the problem quickly (as in, less than a year or > so) I'd consider creating a custom launcher. I think that would reset > any environment variables then start Java. > From duke at openjdk.org Mon May 6 01:42:06 2024 From: duke at openjdk.org (Jin Guojie) Date: Mon, 6 May 2024 01:42:06 GMT Subject: RFR: 8331558: AArch64: optimize integer remainder Message-ID: 8331558: AArch64: optimize integer remainder On some Arm processors, a separate multiply/subtract is actually faster than the combined instruction. 8331556: AArch64: CPU_Model support for Neoverse N1/N2/V1/V2 Add full platform coverage for Neoverse variants in vm_version.?pp The following test has passed, which shows definite performance improvement. make test TEST="micro:java.lang.IntegerDivMod" make test TEST="micro:java.lang.LongDivMod" * IntegerDivMod.testDivideRemainderUnsigned baseline(ns/ops) 2223 with this pacth(ns/ops) 1885 improvement(%) 17.93% * IntegerDivMod.testRemainderUnsigned baseline(ns/ops) 2225 with this pacth(ns/ops) 1885 improvement(%) 18.03% * LongDivMod.testDivideRemainderUnsigned baseline(ns/ops) 2231 with this pacth(ns/ops) 1894 improvement(%) 17.79% * LongDivMod.testRemainderUnsigned baseline(ns/ops) 2232 with this pacth(ns/ops) 1891 improvement(%) 18.03% ------------- Commit messages: - Update vm_version_aarch64.hpp - 8331558: AArch64: optimize integer remainder - 8331556: AArch64: CPU_Model support for Neoverse N1/N2/V1/V2 Changes: https://git.openjdk.org/jdk/pull/19093/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19093&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8331558 Stats: 66 lines in 4 files changed: 51 ins; 9 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/19093.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19093/head:pull/19093 PR: https://git.openjdk.org/jdk/pull/19093 From duke at openjdk.org Mon May 6 03:33:30 2024 From: duke at openjdk.org (Liming Liu) Date: Mon, 6 May 2024 03:33:30 GMT Subject: RFR: 8324781: runtime/Thread/TestAlwaysPreTouchStacks.java failed with Expected a higher ratio between stack committed and reserved [v9] In-Reply-To: References: Message-ID: > The testcase failed on Oracle CI since JDK-8315923. The root cause is that Oracle CI runs Linux-5.4.17-UEK where the value of MADV_POPULATE_WRITE (23) is used as MADV_DONTEXEC which is not supported by upstream. This PR solves the testcase failure by checking versions of kernels first, and checking the availability of MADV_POPULATE_WRITE when they are not older than 5.14. Liming Liu has updated the pull request incrementally with one additional commit since the last revision: Fix the wrong condition ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18592/files - new: https://git.openjdk.org/jdk/pull/18592/files/318f0261..fe98ec0a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18592&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18592&range=07-08 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/18592.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18592/head:pull/18592 PR: https://git.openjdk.org/jdk/pull/18592 From duke at openjdk.org Mon May 6 05:50:13 2024 From: duke at openjdk.org (Jin Guojie) Date: Mon, 6 May 2024 05:50:13 GMT Subject: RFR: 8331558: AArch64: optimize integer remainder [v2] In-Reply-To: References: Message-ID: > 8331558: AArch64: optimize integer remainder > On some Arm processors, a separate multiply/subtract is actually faster than the combined instruction. > > 8331556: AArch64: CPU_Model support for Neoverse N1/N2/V1/V2 > Add full platform coverage for Neoverse variants in vm_version.?pp > > The following test has passed, which shows definite performance improvement. > > make test TEST="micro:java.lang.IntegerDivMod" > make test TEST="micro:java.lang.LongDivMod" > > * IntegerDivMod.testDivideRemainderUnsigned > baseline(ns/ops) 2223 > with this pacth(ns/ops) 1885 > improvement(%) 17.93% > > * IntegerDivMod.testRemainderUnsigned > baseline(ns/ops) 2225 > with this pacth(ns/ops) 1885 > improvement(%) 18.03% > > * LongDivMod.testDivideRemainderUnsigned > baseline(ns/ops) 2231 > with this pacth(ns/ops) 1894 > improvement(%) 17.79% > > * LongDivMod.testRemainderUnsigned > baseline(ns/ops) 2232 > with this pacth(ns/ops) 1891 > improvement(%) 18.03% Jin Guojie has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: - Merge branch 'openjdk:master' into dev - Update vm_version_aarch64.hpp - 8331558: AArch64: optimize integer remainder On some Arm processors, a separate multiply/subtract is actually faster than the combined instruction. - 8331556: AArch64: CPU_Model support for Neoverse N1/N2/V1/V2 Add full platform coverage for Neoverse variants in vm_version.?pp ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19093/files - new: https://git.openjdk.org/jdk/pull/19093/files/226f7832..786d5016 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19093&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19093&range=00-01 Stats: 6731 lines in 285 files changed: 3391 ins; 1390 del; 1950 mod Patch: https://git.openjdk.org/jdk/pull/19093.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19093/head:pull/19093 PR: https://git.openjdk.org/jdk/pull/19093 From fyang at openjdk.org Mon May 6 06:22:54 2024 From: fyang at openjdk.org (Fei Yang) Date: Mon, 6 May 2024 06:22:54 GMT Subject: RFR: 8326306: RISC-V: Re-structure MASM calls and jumps [v2] In-Reply-To: References: <1UZeWIQJIEYbPetxWPlhQffyAy4gWXvNiV79i4_3pMQ=.86fb3068-940b-49ea-a2ea-b84a865d4cca@github.com> <0gMQgeYKyAzms64-hBIrltqUSfetu3Kczwr7IwLmF18=.8f583ac0-afff-4f1b-985f-a688cd898ae3@github.com> <4iLVM5rBRUo43EgY72DPBxJJ3qaHC4Nx_aWBUW9pIM8=.1f7cdee2-15d8-4b0f-b4ac-082f23198d8e@github.com> Message-ID: On Thu, 2 May 2024 23:17:45 GMT, Fei Yang wrote: >> We still need relocates rt_call, not sure why you removed it. >> It seem like we need two version of rt_call one with address and one with Address. >> Then it seem like we could remove far_call as the rt_call would do the right thing. >> >> I like your idea, and we should do that, but it seems like it's not trivial just to add to this patch. >> Is there a reason we need to include such in changes in this PR? > >> We still need relocates rt_call, not sure why you removed it. > > I removed the relocate because I am thinking it should be absolute calls in the else block of `MacroAssembler::rt_call` [1], so no reloc required, as you mentioned in your previous comments. > > [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp#L5027 > >> It seem like we need two version of rt_call one with address and one with Address. Then it seem like we could remove far_call as the rt_call would do the right thing. >> >> I like your idea, and we should do that, but it seems like it's not trivial just to add to this patch. Is there a reason we need to include such in changes in this PR? > > On holiday this week, we can discuss further next week :-) > We still need relocates rt_call, not sure why you removed it. It seem like we need two version of rt_call one with address and one with Address. Then it seem like we could remove far_call as the rt_call would do the right thing. I guess maybe it's easier to only consider `call` and `rt_call` as a first step? And I prefer to keep `far_call` and `far_jump` as I think they are kind of different from rt_call. These two only handle the case when target is within code cache [1][2] as compared to rt_call which handles both code-cache and non-code-cache targets. [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp#L3173 [2] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp#L3188 > I like your idea, and we should do that, but it seems like it's not trivial just to add to this patch. Is there a reason we need to include such in changes in this PR? Another concern which also leads me to consider unifying `call` and `rt_call` is possible performance issues. I see some `call` are changed into `rt_call` in this PR like in file stubGenerator_riscv.cpp and templateInterpreterGenerator_riscv.cpp. As I mentioned in my previous comments, `rt_call` would emit a fixed-size `movptr` sequence (6 uncompressed instructions) for these call sites which invokes some C++ VM functions. But it's still possible for the original `call` to emit a more simpler auipc + jalr depending on the `is_32bit_offset_from_codecache` check in `la`, right? Hopefully, we can get rid of this issue with my proposed change. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18942#discussion_r1590581405 From rehn at openjdk.org Mon May 6 06:43:52 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Mon, 6 May 2024 06:43:52 GMT Subject: RFR: 8326306: RISC-V: Re-structure MASM calls and jumps [v2] In-Reply-To: References: <1UZeWIQJIEYbPetxWPlhQffyAy4gWXvNiV79i4_3pMQ=.86fb3068-940b-49ea-a2ea-b84a865d4cca@github.com> <0gMQgeYKyAzms64-hBIrltqUSfetu3Kczwr7IwLmF18=.8f583ac0-afff-4f1b-985f-a688cd898ae3@github.com> <4iLVM5rBRUo43EgY72DPBxJJ3qaHC4Nx_aWBUW9pIM8=.1f7cdee2-15d8-4b0f-b4ac-082f23198d8e@github.com> Message-ID: On Mon, 6 May 2024 06:18:45 GMT, Fei Yang wrote: >>> We still need relocates rt_call, not sure why you removed it. >> >> I removed the relocate because I am thinking it should be absolute calls in the else block of `MacroAssembler::rt_call` [1], so no reloc required, as you mentioned in your previous comments. >> >> [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp#L5027 >> >>> It seem like we need two version of rt_call one with address and one with Address. Then it seem like we could remove far_call as the rt_call would do the right thing. >>> >>> I like your idea, and we should do that, but it seems like it's not trivial just to add to this patch. Is there a reason we need to include such in changes in this PR? >> >> On holiday this week, we can discuss further next week :-) > >> We still need relocates rt_call, not sure why you removed it. It seem like we need two version of rt_call one with address and one with Address. Then it seem like we could remove far_call as the rt_call would do the right thing. > > I guess maybe it's easier to only consider `call` and `rt_call` as a first step? And I prefer to keep `far_call` and `far_jump` as I think they are kind of different from rt_call. These two only handle the case when target is within code cache [1][2] as compared to rt_call which handles both code-cache and non-code-cache targets. > > [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp#L3173 > [2] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp#L3188 > >> I like your idea, and we should do that, but it seems like it's not trivial just to add to this patch. Is there a reason we need to include such in changes in this PR? > > Another concern which also leads me to consider unifying `call` and `rt_call` is possible performance issues. I see some `call` are changed into `rt_call` in this PR like in file stubGenerator_riscv.cpp and templateInterpreterGenerator_riscv.cpp. As I mentioned in my previous comments, `rt_call` would emit a fixed-size `movptr` sequence (6 uncompressed instructions) for these call sites which invokes some C++ VM functions. But it's still possible for the original `call` to emit a more simpler auipc + jalr depending on the `is_32bit_offset_from_codecache` check in `la`, right? Hopefully, we can get rid of this issue with my proposed change. Yes, sorry I didn't explain that: The cases in stubGenerator are all slow-path so we don't care much about performance. Calling: - MacroAssembler::debug64 debug - JavaThread::check_special_condition_for_native_trans when suspend flag is set - SharedRuntime::reguard_yellow_pages regaurd pages => syscall - Interpreter::trace_code(t->tos_in()) Calling a interpreter trace method for the bytcode - __ rt_call(runtime_entry); calling runtime when throwing an exception ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18942#discussion_r1590595509 From fyang at openjdk.org Mon May 6 07:15:52 2024 From: fyang at openjdk.org (Fei Yang) Date: Mon, 6 May 2024 07:15:52 GMT Subject: RFR: 8326306: RISC-V: Re-structure MASM calls and jumps [v2] In-Reply-To: References: <1UZeWIQJIEYbPetxWPlhQffyAy4gWXvNiV79i4_3pMQ=.86fb3068-940b-49ea-a2ea-b84a865d4cca@github.com> <0gMQgeYKyAzms64-hBIrltqUSfetu3Kczwr7IwLmF18=.8f583ac0-afff-4f1b-985f-a688cd898ae3@github.com> <4iLVM5rBRUo43EgY72DPBxJJ3qaHC4Nx_aWBUW9pIM8=.1f7cdee2-15d8-4b0f-b4ac-082f23198d8e@github.com> Message-ID: <_OTT8xH0k6eRttntYvhgKJr5KSBJ3q6dmJUm2AynLD4=.9cfaad8d-2918-4a55-bf84-9bf303236f9f@github.com> On Mon, 6 May 2024 06:38:47 GMT, Robbin Ehn wrote: >>> We still need relocates rt_call, not sure why you removed it. It seem like we need two version of rt_call one with address and one with Address. Then it seem like we could remove far_call as the rt_call would do the right thing. >> >> I guess maybe it's easier to only consider `call` and `rt_call` as a first step? And I prefer to keep `far_call` and `far_jump` as I think they are kind of different from rt_call. These two only handle the case when target is within code cache [1][2] as compared to rt_call which handles both code-cache and non-code-cache targets. >> >> [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp#L3173 >> [2] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp#L3188 >> >>> I like your idea, and we should do that, but it seems like it's not trivial just to add to this patch. Is there a reason we need to include such in changes in this PR? >> >> Another concern which also leads me to consider unifying `call` and `rt_call` is possible performance issues. I see some `call` are changed into `rt_call` in this PR like in file stubGenerator_riscv.cpp and templateInterpreterGenerator_riscv.cpp. As I mentioned in my previous comments, `rt_call` would emit a fixed-size `movptr` sequence (6 uncompressed instructions) for these call sites which invokes some C++ VM functions. But it's still possible for the original `call` to emit a more simpler auipc + jalr depending on the `is_32bit_offset_from_codecache` check in `la`, right? Hopefully, we can get rid of this issue with my proposed change. > > Yes, sorry I didn't explain that: > The cases in stubGenerator are all slow-path so we don't care much about performance. > Calling: > - MacroAssembler::debug64 debug > - JavaThread::check_special_condition_for_native_trans when suspend flag is set > - SharedRuntime::reguard_yellow_pages regaurd pages => syscall > - Interpreter::trace_code(t->tos_in()) Calling a interpreter trace method for the bytcode > - __ rt_call(runtime_entry); calling runtime when throwing an exception All right. What about the one in `MacroAssembler::call_VM_leaf_base`? The `call` there is replaced with movptr+jalr. Why not use `rt_call` there like the other places? Then we won't lose it when we eliminate `call` finally. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18942#discussion_r1590622872 From aboldtch at openjdk.org Mon May 6 07:23:14 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Mon, 6 May 2024 07:23:14 GMT Subject: RFR: 8326957: Implement JEP 474: ZGC: Generational Mode by Default [v5] In-Reply-To: References: Message-ID: > This is the implementation task for `JEP 474: ZGC: Generational Mode by Default`. See the JEP for details. [JDK-8326667](https://bugs.openjdk.org/browse/JDK-8326667) Axel Boldt-Christmas has updated the pull request incrementally with one additional commit since the last revision: Default to non generational ZGC with JVMCI ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18393/files - new: https://git.openjdk.org/jdk/pull/18393/files/8fac84cc..4de30da2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18393&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18393&range=03-04 Stats: 6 lines in 1 file changed: 6 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/18393.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18393/head:pull/18393 PR: https://git.openjdk.org/jdk/pull/18393 From rehn at openjdk.org Mon May 6 07:38:53 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Mon, 6 May 2024 07:38:53 GMT Subject: RFR: 8326306: RISC-V: Re-structure MASM calls and jumps [v2] In-Reply-To: <_OTT8xH0k6eRttntYvhgKJr5KSBJ3q6dmJUm2AynLD4=.9cfaad8d-2918-4a55-bf84-9bf303236f9f@github.com> References: <1UZeWIQJIEYbPetxWPlhQffyAy4gWXvNiV79i4_3pMQ=.86fb3068-940b-49ea-a2ea-b84a865d4cca@github.com> <0gMQgeYKyAzms64-hBIrltqUSfetu3Kczwr7IwLmF18=.8f583ac0-afff-4f1b-985f-a688cd898ae3@github.com> <4iLVM5rBRUo43EgY72DPBxJJ3qaHC4Nx_aWBUW9pIM8=.1f7cdee2-15d8-4b0f-b4ac-082f23198d8e@github.com> <_OTT8xH0k6eRttntYvhgKJr5KSBJ3q6dmJUm2AynLD4=.9cfaad8d-2918-4a55-bf84-9bf303236f9f@github.com> Message-ID: On Mon, 6 May 2024 07:13:20 GMT, Fei Yang wrote: >> Yes, sorry I didn't explain that: >> The cases in stubGenerator are all slow-path so we don't care much about performance. >> Calling: >> - MacroAssembler::debug64 debug >> - JavaThread::check_special_condition_for_native_trans when suspend flag is set >> - SharedRuntime::reguard_yellow_pages regaurd pages => syscall >> - Interpreter::trace_code(t->tos_in()) Calling a interpreter trace method for the bytcode >> - __ rt_call(runtime_entry); calling runtime when throwing an exception > > All right. What about the one in `MacroAssembler::call_VM_leaf_base`? The `call` there is replaced with movptr+jalr. Why not use `rt_call` there like the other places? Then we won't lose it when we eliminate `call` finally. Yes, you right the old: call() = mv(tmp, adr) => li(adr) + jalr() li(adr) is better than movptr() I'll fix thanks! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18942#discussion_r1590644162 From stefank at openjdk.org Mon May 6 08:16:56 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Mon, 6 May 2024 08:16:56 GMT Subject: RFR: 8324781: runtime/Thread/TestAlwaysPreTouchStacks.java failed with Expected a higher ratio between stack committed and reserved [v9] In-Reply-To: References: Message-ID: On Mon, 6 May 2024 03:33:30 GMT, Liming Liu wrote: >> The testcase failed on Oracle CI since JDK-8315923. The root cause is that Oracle CI runs Linux-5.4.17-UEK where the value of MADV_POPULATE_WRITE (23) is used as MADV_DONTEXEC which is not supported by upstream. This PR solves the testcase failure by checking versions of kernels first, and checking the availability of MADV_POPULATE_WRITE when they are not older than 5.14. > > Liming Liu has updated the pull request incrementally with one additional commit since the last revision: > > Fix the wrong condition Good that you found that `!UseTransparentHugesPages` bug. ------------- Marked as reviewed by stefank (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18592#pullrequestreview-2040231478 From stefank at openjdk.org Mon May 6 08:35:55 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Mon, 6 May 2024 08:35:55 GMT Subject: RFR: 8331626: unsafe.cpp:162:38: runtime error in index_oop_from_field_offset_long - applying non-zero offset 4563897424 to null pointer In-Reply-To: <73h3Knwa33PoG1bq1S38-dEIKnB0lKfCKe3NwbIvNcU=.b9ea8021-5261-48eb-b0ec-cfae975477e9@github.com> References: <73h3Knwa33PoG1bq1S38-dEIKnB0lKfCKe3NwbIvNcU=.b9ea8021-5261-48eb-b0ec-cfae975477e9@github.com> Message-ID: On Fri, 3 May 2024 14:01:34 GMT, Martin Doerr wrote: > `index_oop_from_field_offset_long` is sometimes used to access an absolute address by using `p == nullptr`. Unfortunately, `nullptr + byte_offset` implies undefined behavior and should better get fixed. UBSan complains about it (see JBS issue). > A possible solution is to replace pointer arithmetic by integer arithmetic. We can use unsigned because `assert_field_offset_sane` checks that `byte_offset >= 0`. Changes requested by stefank (Reviewer). src/hotspot/share/prims/unsafe.cpp line 158: > 156: assert_field_offset_sane(p, field_offset); > 157: uintptr_t base_address = cast_from_oop(p), > 158: byte_offset = (uintptr_t)field_offset_to_byte_offset(field_offset); We tend to not use this style for setting up variables in HotSpot code: I propose that you update the code to: Suggestion: uintptr_t base_address = cast_from_oop(p); uintptr_t byte_offset = (uintptr_t)field_offset_to_byte_offset(field_offset); ------------- PR Review: https://git.openjdk.org/jdk/pull/19087#pullrequestreview-2040261999 PR Review Comment: https://git.openjdk.org/jdk/pull/19087#discussion_r1590699897 From rehn at openjdk.org Mon May 6 08:36:07 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Mon, 6 May 2024 08:36:07 GMT Subject: RFR: 8326306: RISC-V: Re-structure MASM calls and jumps [v5] In-Reply-To: References: Message-ID: <2gtLyG74zJAPBvSyAMrJV5hGnT6KQgobNPOLlg85s90=.2dec9679-7b9d-400e-932a-f16be22dad1d@github.com> > Hi, please consider. > > We have code that directly use the asm for call/jumps instead masm. > Our masm have a bit odd naming, and we don't use 'proper' pseudoinstructions/mnemonics. > Suggested by [riscv-asm-manual](https://github.com/riscv-non-isa/riscv-asm-manual/tree/master) > > j offset jal x0, offset Jump > jal offset jal x1, offset Jump and link > jr rs jalr x0, rs, 0 Jump register > jalr rs jalr x1, rs, 0 Jump and link register > ret jalr x0, x1, 0 Return from subroutine > call offset auipc x1, offset[31:12]; jalr x1, x1, offset[11:0] Call far-away subroutine > tail offset auipc x6, offset[31:12]; jalr x0, x6, offset[11:0] Tail call far-away subroutine > > But these can only be implemented like this if you have small enough application. > The fallback of these is to use GOT (your C compiler should place a copy of GOT every 2G so it's always reachable). > We don't have GOT, instead we materialize, so there is still differences between these and ours. > > This patch: > - Tries to follow these suggested mappings as good we can. > - Make sure all jumps/calls go through MASM. (so we get control and can easily change for sites using a certain calling convention) > - To avoid confusion between MASM public/private methods and ASM methods and the mnemonics there are some renaming. > E.g. the mnemonics jal means call offset, as we can't use that so there is no 'jal'. > - I enabled c.j, but right now we never generate it. > - As always the macro does no good and are legacy from when code base did not use templates. (also the x-macros screws up my IDE (vim+rtags)) > > I started down this path due to I have followup patch on top of this which removes trampoline in favor for load-n-jump. > (WIP: https://github.com/robehn/jdk/compare/jal-fixes...robehn:jdk:load-n-link?expand=1) > While looking into our calls it was a bit confusing, this helps. > > Done a couple of t1-3 slightly different version of this patch, and as part of the followup, no issues found. (VF2, qemu, LP4) > Re-running tests, had some last minute changes. > > Thanks, Robbin Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: - VM leaf should use li - Merge branch 'master' into jal-fixes - Merge branch 'master' into jal-fixes - Merge branch 'master' into jal-fixes - Corrected method name - Missed a ws - JALR ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18942/files - new: https://git.openjdk.org/jdk/pull/18942/files/cb5ec446..6b3e4c47 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18942&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18942&range=03-04 Stats: 5767 lines in 215 files changed: 3298 ins; 1010 del; 1459 mod Patch: https://git.openjdk.org/jdk/pull/18942.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18942/head:pull/18942 PR: https://git.openjdk.org/jdk/pull/18942 From jkratochvil at openjdk.org Mon May 6 08:48:53 2024 From: jkratochvil at openjdk.org (Jan Kratochvil) Date: Mon, 6 May 2024 08:48:53 GMT Subject: RFR: 8331352: error: template-id not allowed for constructor/destructor in C++20 In-Reply-To: References: Message-ID: On Tue, 30 Apr 2024 02:01:01 GMT, Jan Kratochvil wrote: > When compiling trunk (819f3d6fc70ff6fe54ac5f9033c17c3dd4326aa5 2024-04-29) by gcc-14.0.1-0.15.fc40.x86_64 there are many errors: > > In file included from src/hotspot/share/memory/allocation.hpp:30, > from src/hotspot/share/ci/ciBaseObject.hpp:29, > from src/hotspot/share/ci/ciMetadata.hpp:28, > from src/hotspot/share/ci/ciType.hpp:28, > from src/hotspot/share/ci/ciKlass.hpp:28, > from src/hotspot/share/ci/ciArrayKlass.hpp:28, > from src/hotspot/share/ci/ciArray.hpp:28, > from src/hotspot/share/ci/compilerInterface.hpp:28, > from src/hotspot/share/compiler/abstractCompiler.hpp:28, > from src/hotspot/share/compiler/abstractCompiler.cpp:25: > src/hotspot/share/utilities/linkedlist.hpp:85:15: error: template-id not allowed for constructor in C++20 [-Werror=template-id-cdtor] > 85 | NONCOPYABLE(LinkedList); > | ^~~~~~~~~~~~~ > src/hotspot/share/utilities/globalDefinitions.hpp:87:26: note: in definition of macro ?NONCOPYABLE? > 87 | #define NONCOPYABLE(C) C(C const&) = delete; C& operator=(C const&) = delete /* next token must be ; */ > | ^ > src/hotspot/share/utilities/linkedlist.hpp:85:15: note: remove the ?< >? > 85 | NONCOPYABLE(LinkedList); > | ^~~~~~~~~~~~~ > src/hotspot/share/utilities/globalDefinitions.hpp:87:26: note: in definition of macro ?NONCOPYABLE? > 87 | #define NONCOPYABLE(C) C(C const&) = delete; C& operator=(C const&) = delete /* next token must be ; */ > | ^ > > In file included from src/hotspot/share/gc/z/zGranuleMap.inline.hpp:30, > from src/hotspot/share/gc/z/zForwardingTable.inline.hpp:32, > from src/hotspot/share/gc/z/zHeap.inline.hpp:30, > from src/hotspot/share/gc/z/zGeneration.inline.hpp:30, > from src/hotspot/share/gc/z/zBarrier.inline.hpp:30, > from src/hotspot/share/gc/z/zBarrierSet.inline.hpp:31, > from src/hotspot/share/gc/shared/barrierSetConfig.inline.hpp:44, > from src/hotspot/share/oops/access.inline.hpp:31, > from src/hotspot/share/memory/iterator.inline.hpp:32, > from src/hotspot/share/oops/oop.inline.hpp:31, > from src/hotspot/share/compiler/abstractDisassembler.cpp:32: > src/hotspot/share/gc/z/zArray.inline.hpp:99:21: error: template-id not allowed f... ping for a reviewer#2 - JDK currently does not build, carrying the off-trunk patches around is a bit annoying. The patch should be then also backported to JDK LTSes. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19009#issuecomment-2095476743 From duke at openjdk.org Mon May 6 08:52:57 2024 From: duke at openjdk.org (Yuri Gaevsky) Date: Mon, 6 May 2024 08:52:57 GMT Subject: RFR: 8324124: RISC-V: implement _vectorizedMismatch intrinsic In-Reply-To: References: Message-ID: On Fri, 3 May 2024 18:58:55 GMT, Ludovic Henry wrote: >>> Hi, Do you have plan to implement instrinsic `VectorCmpMasked`? It's part of `vectorizedMismatch` >> >> Hi @Hamlin-Li, >> >> I don't have such plan for the moment. Why do you think it should be a part of `_vectorizedMismatch` intrinsic? The similar [fix](https://github.com/openjdk/jdk/commit/b05c40ca3b5fd34cbbc7a9479b108a4ff2c099f1?diff=split&w=0) for X64 ([JDK-8266951](https://bugs.openjdk.org/browse/JDK-8266951)) looks like natural enhancement/followup for the original intrinsic functionality. > > @ygaevsky the `VectorCmpMasked` is to support partial inlining for small arrays: https://github.com/openjdk/jdk/blob/b33096f887108c3d7e1f4e62689c2b10401234fa/src/hotspot/share/opto/library_call.cpp#L6372-L6411 > > It very much complements this intrinsic and allows it to focus on larger arrays. @luhenry: I fully agree that we need `VectorCmpMasked` but I just want to understand why it couldn't be implemented as follow-up (similarly to x64). ------------- PR Comment: https://git.openjdk.org/jdk/pull/17750#issuecomment-2095483470 From jsjolen at openjdk.org Mon May 6 09:00:55 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Mon, 6 May 2024 09:00:55 GMT Subject: RFR: 8324781: runtime/Thread/TestAlwaysPreTouchStacks.java failed with Expected a higher ratio between stack committed and reserved [v9] In-Reply-To: References: Message-ID: On Mon, 6 May 2024 03:33:30 GMT, Liming Liu wrote: >> The testcase failed on Oracle CI since JDK-8315923. The root cause is that Oracle CI runs Linux-5.4.17-UEK where the value of MADV_POPULATE_WRITE (23) is used as MADV_DONTEXEC which is not supported by upstream. This PR solves the testcase failure by checking versions of kernels first, and checking the availability of MADV_POPULATE_WRITE when they are not older than 5.14. > > Liming Liu has updated the pull request incrementally with one additional commit since the last revision: > > Fix the wrong condition LGTM, thank you. ------------- Marked as reviewed by jsjolen (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18592#pullrequestreview-2040304022 From stefank at openjdk.org Mon May 6 09:12:53 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Mon, 6 May 2024 09:12:53 GMT Subject: RFR: 8331352: error: template-id not allowed for constructor/destructor in C++20 In-Reply-To: References: Message-ID: On Tue, 30 Apr 2024 02:01:01 GMT, Jan Kratochvil wrote: > When compiling trunk (819f3d6fc70ff6fe54ac5f9033c17c3dd4326aa5 2024-04-29) by gcc-14.0.1-0.15.fc40.x86_64 there are many errors: > > In file included from src/hotspot/share/memory/allocation.hpp:30, > from src/hotspot/share/ci/ciBaseObject.hpp:29, > from src/hotspot/share/ci/ciMetadata.hpp:28, > from src/hotspot/share/ci/ciType.hpp:28, > from src/hotspot/share/ci/ciKlass.hpp:28, > from src/hotspot/share/ci/ciArrayKlass.hpp:28, > from src/hotspot/share/ci/ciArray.hpp:28, > from src/hotspot/share/ci/compilerInterface.hpp:28, > from src/hotspot/share/compiler/abstractCompiler.hpp:28, > from src/hotspot/share/compiler/abstractCompiler.cpp:25: > src/hotspot/share/utilities/linkedlist.hpp:85:15: error: template-id not allowed for constructor in C++20 [-Werror=template-id-cdtor] > 85 | NONCOPYABLE(LinkedList); > | ^~~~~~~~~~~~~ > src/hotspot/share/utilities/globalDefinitions.hpp:87:26: note: in definition of macro ?NONCOPYABLE? > 87 | #define NONCOPYABLE(C) C(C const&) = delete; C& operator=(C const&) = delete /* next token must be ; */ > | ^ > src/hotspot/share/utilities/linkedlist.hpp:85:15: note: remove the ?< >? > 85 | NONCOPYABLE(LinkedList); > | ^~~~~~~~~~~~~ > src/hotspot/share/utilities/globalDefinitions.hpp:87:26: note: in definition of macro ?NONCOPYABLE? > 87 | #define NONCOPYABLE(C) C(C const&) = delete; C& operator=(C const&) = delete /* next token must be ; */ > | ^ > > In file included from src/hotspot/share/gc/z/zGranuleMap.inline.hpp:30, > from src/hotspot/share/gc/z/zForwardingTable.inline.hpp:32, > from src/hotspot/share/gc/z/zHeap.inline.hpp:30, > from src/hotspot/share/gc/z/zGeneration.inline.hpp:30, > from src/hotspot/share/gc/z/zBarrier.inline.hpp:30, > from src/hotspot/share/gc/z/zBarrierSet.inline.hpp:31, > from src/hotspot/share/gc/shared/barrierSetConfig.inline.hpp:44, > from src/hotspot/share/oops/access.inline.hpp:31, > from src/hotspot/share/memory/iterator.inline.hpp:32, > from src/hotspot/share/oops/oop.inline.hpp:31, > from src/hotspot/share/compiler/abstractDisassembler.cpp:32: > src/hotspot/share/gc/z/zArray.inline.hpp:99:21: error: template-id not allowed f... Marked as reviewed by stefank (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/19009#pullrequestreview-2040324151 From jsjolen at openjdk.org Mon May 6 09:15:22 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Mon, 6 May 2024 09:15:22 GMT Subject: RFR: 8331193: Return references when possible in GrowableArray [v6] In-Reply-To: References: Message-ID: > Hi, > > This PR introduces the possibility of using references more often when using GrowableArray, where as previously this was only possible when using the `at()` method. This lets us avoid copying and redundant method calls and makes the API more streamlined. After the patch, we can use `at_grow` just like `at` works. The same goes for `top`, `first`, and `last`. > > > Some example code: > ```c++ > // Before this patch this worked: > GrowableArray arr(8,8,-1); // Pre-fill with 8 -1s > int& x = arr.at(7); > if (x == -1) { > x = 2; > } > assert(arr.at(7) == 2, "this holds"); > // but this was forbidden > int& x = arr.at_grow(9, -1); // Compilation error! at_grow returns E, not E& > // so we had to do > int x = arr.at_grow(9, -1); > if (x == -1) { > arr.at_put(9, 2); > } > > > Thanks. Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: Same formatting ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18975/files - new: https://git.openjdk.org/jdk/pull/18975/files/3dc21ec4..7a575e5a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18975&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18975&range=04-05 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/18975.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18975/head:pull/18975 PR: https://git.openjdk.org/jdk/pull/18975 From stefank at openjdk.org Mon May 6 09:15:23 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Mon, 6 May 2024 09:15:23 GMT Subject: RFR: 8331193: Return references when possible in GrowableArray [v6] In-Reply-To: References: Message-ID: On Mon, 6 May 2024 09:12:37 GMT, Johan Sj?len wrote: >> Hi, >> >> This PR introduces the possibility of using references more often when using GrowableArray, where as previously this was only possible when using the `at()` method. This lets us avoid copying and redundant method calls and makes the API more streamlined. After the patch, we can use `at_grow` just like `at` works. The same goes for `top`, `first`, and `last`. >> >> >> Some example code: >> ```c++ >> // Before this patch this worked: >> GrowableArray arr(8,8,-1); // Pre-fill with 8 -1s >> int& x = arr.at(7); >> if (x == -1) { >> x = 2; >> } >> assert(arr.at(7) == 2, "this holds"); >> // but this was forbidden >> int& x = arr.at_grow(9, -1); // Compilation error! at_grow returns E, not E& >> // so we had to do >> int x = arr.at_grow(9, -1); >> if (x == -1) { >> arr.at_put(9, 2); >> } >> >> >> Thanks. > > Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: > > Same formatting Marked as reviewed by stefank (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/18975#pullrequestreview-2040325662 From epeter at openjdk.org Mon May 6 09:22:53 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 6 May 2024 09:22:53 GMT Subject: RFR: 8331193: Return references when possible in GrowableArray [v6] In-Reply-To: References: Message-ID: On Mon, 6 May 2024 09:15:22 GMT, Johan Sj?len wrote: >> Hi, >> >> This PR introduces the possibility of using references more often when using GrowableArray, where as previously this was only possible when using the `at()` method. This lets us avoid copying and redundant method calls and makes the API more streamlined. After the patch, we can use `at_grow` just like `at` works. The same goes for `top`, `first`, and `last`. >> >> >> Some example code: >> ```c++ >> // Before this patch this worked: >> GrowableArray arr(8,8,-1); // Pre-fill with 8 -1s >> int& x = arr.at(7); >> if (x == -1) { >> x = 2; >> } >> assert(arr.at(7) == 2, "this holds"); >> // but this was forbidden >> int& x = arr.at_grow(9, -1); // Compilation error! at_grow returns E, not E& >> // so we had to do >> int x = arr.at_grow(9, -1); >> if (x == -1) { >> arr.at_put(9, 2); >> } >> >> >> Thanks. > > Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: > > Same formatting Can you add a regression test that checks exactly the example that you have in your PR descrition? ------------- PR Comment: https://git.openjdk.org/jdk/pull/18975#issuecomment-2095537247 From epeter at openjdk.org Mon May 6 09:25:57 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 6 May 2024 09:25:57 GMT Subject: RFR: 8331193: Return references when possible in GrowableArray [v6] In-Reply-To: References: Message-ID: On Mon, 6 May 2024 09:15:22 GMT, Johan Sj?len wrote: >> Hi, >> >> This PR introduces the possibility of using references more often when using GrowableArray, where as previously this was only possible when using the `at()` method. This lets us avoid copying and redundant method calls and makes the API more streamlined. After the patch, we can use `at_grow` just like `at` works. The same goes for `top`, `first`, and `last`. >> >> >> Some example code: >> ```c++ >> // Before this patch this worked: >> GrowableArray arr(8,8,-1); // Pre-fill with 8 -1s >> int& x = arr.at(7); >> if (x == -1) { >> x = 2; >> } >> assert(arr.at(7) == 2, "this holds"); >> // but this was forbidden >> int& x = arr.at_grow(9, -1); // Compilation error! at_grow returns E, not E& >> // so we had to do >> int x = arr.at_grow(9, -1); >> if (x == -1) { >> arr.at_put(9, 2); >> } >> >> >> Thanks. > > Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: > > Same formatting Otherwise, it looks good to me ------------- PR Comment: https://git.openjdk.org/jdk/pull/18975#issuecomment-2095542079 From ayang at openjdk.org Mon May 6 09:28:13 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Mon, 6 May 2024 09:28:13 GMT Subject: RFR: 8331557: Serial: Refactor SerialHeap::do_collection [v3] In-Reply-To: References: Message-ID: > It's probably easier to read the new code directly. The two classes in `serialVMOperations` serve as entrance points to invoke young/full GCs. Some previously hidden decisions are made more obvious, e.g. if a young-gc fails (or will probablly fail), fallback to full-gc. > > Additionally, `StatRecord` is removed, because this kind of info-aggregation should be done outsite VM (by third-party tool). > > Test: tier1-6 Albert Mingkun Yang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: - review - Merge branch 'master' into s1-do-collect - s1-do-collect ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19056/files - new: https://git.openjdk.org/jdk/pull/19056/files/7375bfbf..d8d5a13e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19056&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19056&range=01-02 Stats: 2776 lines in 84 files changed: 2057 ins; 489 del; 230 mod Patch: https://git.openjdk.org/jdk/pull/19056.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19056/head:pull/19056 PR: https://git.openjdk.org/jdk/pull/19056 From ayang at openjdk.org Mon May 6 09:28:13 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Mon, 6 May 2024 09:28:13 GMT Subject: RFR: 8331557: Serial: Refactor SerialHeap::do_collection [v2] In-Reply-To: References: <2E8psdsbHlnXaWjLMnhAHsoywFxY-jWEhHqAU4699_8=.83ba590a-2357-4924-a74a-e972b70b60da@github.com> Message-ID: <-Vn0Bjjlrq0l5PlHprZIFIETB5YUhw9MQLKh9cKO6LA=.262cec6e-865d-4d55-be6c-d60a08934d08@github.com> On Sat, 4 May 2024 02:39:38 GMT, Guoxiong Li wrote: >> Albert Mingkun Yang has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains one commit: >> >> s1-do-collect > > src/hotspot/share/gc/serial/serialHeap.cpp line 461: > >> 459: if (should_verify && VerifyBeforeGC) { >> 460: prepare_for_verify(); >> 461: Universe::verify("Before GC"); > > May the prefix of the verification log be better to specify the minor or full GC? Such as `Before Minor GC` here. Other `Universe::verify("` seems to not distinguish minor/major. > src/hotspot/share/gc/serial/serialHeap.cpp line 463: > >> 461: Universe::verify("Before GC"); >> 462: } >> 463: gc_prologue(false); > > The parameter `full` of the method `SerialHeap::gc_prologue` doesn't been used. Seems a leftover of [JDK-8323993](https://bugs.openjdk.org/browse/JDK-8323993). True; can probably fixed in a followup cleanup. > src/hotspot/share/gc/serial/serialHeap.cpp line 660: > >> 658: } >> 659: do_full_collection_no_gc_locker(clear_soft_refs); >> 660: } > > Please note the difference between the previous `SerialHeap::do_collection` and `SerialHeap::collect_at_safepoint_no_gc_locker` here. The previous `SerialHeap::do_collection` may invoke full GC according to the method `SerialHeap::should_do_full_collection` even the young GC succeeded. But `SerialHeap::collect_at_safepoint_no_gc_locker` only invokes full GC when the young GC failed (because of failed promotion). Such change makes the `SerialHeap::should_do_full_collection` has no user. If the behaviour of the `SerialHeap::collect_at_safepoint_no_gc_locker` is your intention, I think it is good to remove `SerialHeap::should_do_full_collection`. Removed `should_do_full_collection`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19056#discussion_r1590740631 PR Review Comment: https://git.openjdk.org/jdk/pull/19056#discussion_r1590740947 PR Review Comment: https://git.openjdk.org/jdk/pull/19056#discussion_r1590741518 From eliu at openjdk.org Mon May 6 09:33:54 2024 From: eliu at openjdk.org (Eric Liu) Date: Mon, 6 May 2024 09:33:54 GMT Subject: RFR: 8331558: AArch64: optimize integer remainder [v2] In-Reply-To: References: Message-ID: On Mon, 6 May 2024 05:50:13 GMT, Jin Guojie wrote: >> 8331558: AArch64: optimize integer remainder >> On some Arm processors, a separate multiply/subtract is actually faster than the combined instruction. >> >> 8331556: AArch64: CPU_Model support for Neoverse N1/N2/V1/V2 >> Add full platform coverage for Neoverse variants in vm_version.?pp >> >> The following test has passed, which shows definite performance improvement. >> >> make test TEST="micro:java.lang.IntegerDivMod" >> make test TEST="micro:java.lang.LongDivMod" >> >> * IntegerDivMod.testDivideRemainderUnsigned >> baseline(ns/ops) 2223 >> with this pacth(ns/ops) 1885 >> improvement(%) 17.93% >> >> * IntegerDivMod.testRemainderUnsigned >> baseline(ns/ops) 2225 >> with this pacth(ns/ops) 1885 >> improvement(%) 18.03% >> >> * LongDivMod.testDivideRemainderUnsigned >> baseline(ns/ops) 2231 >> with this pacth(ns/ops) 1894 >> improvement(%) 17.79% >> >> * LongDivMod.testRemainderUnsigned >> baseline(ns/ops) 2232 >> with this pacth(ns/ops) 1891 >> improvement(%) 18.03% > > Jin Guojie has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - Merge branch 'openjdk:master' into dev > - Update vm_version_aarch64.hpp > - 8331558: AArch64: optimize integer remainder > > On some Arm processors, a separate multiply/subtract is actually faster than the combined instruction. > - 8331556: AArch64: CPU_Model support for Neoverse N1/N2/V1/V2 > > Add full platform coverage for Neoverse variants in vm_version.?pp src/hotspot/cpu/aarch64/macroAssembler_aarch64.hpp line 447: > 445: inline void msub(Register Rd, Register Rn, Register Rm, Register Ra) { > 446: if (VM_Version::supports_a53mac() && Ra != zr) > 447: nop(); It was in JDK-8079203 [1] for the first time. May I ask what's the specials on a53mac? [1] https://github.com/openjdk/jdk/commit/a65f9f95894e22ce2fd160024ce46f6aaa6c8bd3 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19093#discussion_r1590760051 From lmao at openjdk.org Mon May 6 09:36:59 2024 From: lmao at openjdk.org (Liang Mao) Date: Mon, 6 May 2024 09:36:59 GMT Subject: RFR: 8331711: G1 doesn't need pre write barrier for stores from new allocated objects Message-ID: The pre-write barrier of G1 is used to capture the object disconnected from the marking graph which could be unmarked aka *white* and stored into *black* objects then break tri-color invariance. But references in new allocated objects are created in object initialization after marking start and never could be white. So we don't need pre-write barrier for stores from new allocated objects. The same mechanism is also used for barrier eliminantion in GenZGC. Additional testing: - [x] Linux aarch64 server release/fastdebug, test/hotspot/jtreg/gc with +UseG1GC - [x] Run several iterations of SPECjbb2015 with aggressively frequent concurrent mark ------------- Commit messages: - 8331711: G1 doesn't need pre write barrier for stores from new allocated objects Changes: https://git.openjdk.org/jdk/pull/19098/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19098&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8331711 Stats: 7 lines in 1 file changed: 6 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/19098.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19098/head:pull/19098 PR: https://git.openjdk.org/jdk/pull/19098 From eosterlund at openjdk.org Mon May 6 09:37:55 2024 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Mon, 6 May 2024 09:37:55 GMT Subject: RFR: 8331285: Deprecate and obsolete OldSize [v2] In-Reply-To: References: Message-ID: On Tue, 30 Apr 2024 07:34:22 GMT, Albert Mingkun Yang wrote: >> Simple deprecating a jvm flag. > > Albert Mingkun Yang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - merge > - review > - old-size Looks good. ------------- Marked as reviewed by eosterlund (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18994#pullrequestreview-2040367127 From mdoerr at openjdk.org Mon May 6 09:42:20 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 6 May 2024 09:42:20 GMT Subject: RFR: 8331626: unsafe.cpp:162:38: runtime error in index_oop_from_field_offset_long - applying non-zero offset 4563897424 to null pointer [v2] In-Reply-To: <73h3Knwa33PoG1bq1S38-dEIKnB0lKfCKe3NwbIvNcU=.b9ea8021-5261-48eb-b0ec-cfae975477e9@github.com> References: <73h3Knwa33PoG1bq1S38-dEIKnB0lKfCKe3NwbIvNcU=.b9ea8021-5261-48eb-b0ec-cfae975477e9@github.com> Message-ID: > `index_oop_from_field_offset_long` is sometimes used to access an absolute address by using `p == nullptr`. Unfortunately, `nullptr + byte_offset` implies undefined behavior and should better get fixed. UBSan complains about it (see JBS issue). > A possible solution is to replace pointer arithmetic by integer arithmetic. We can use unsigned because `assert_field_offset_sane` checks that `byte_offset >= 0`. Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: Change coding style. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19087/files - new: https://git.openjdk.org/jdk/pull/19087/files/24ca3361..c8bc69b9 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19087&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19087&range=00-01 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/19087.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19087/head:pull/19087 PR: https://git.openjdk.org/jdk/pull/19087 From mdoerr at openjdk.org Mon May 6 09:42:20 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 6 May 2024 09:42:20 GMT Subject: RFR: 8331626: unsafe.cpp:162:38: runtime error in index_oop_from_field_offset_long - applying non-zero offset 4563897424 to null pointer [v2] In-Reply-To: References: <73h3Knwa33PoG1bq1S38-dEIKnB0lKfCKe3NwbIvNcU=.b9ea8021-5261-48eb-b0ec-cfae975477e9@github.com> Message-ID: On Mon, 6 May 2024 08:32:58 GMT, Stefan Karlsson wrote: >> Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: >> >> Change coding style. > > src/hotspot/share/prims/unsafe.cpp line 158: > >> 156: assert_field_offset_sane(p, field_offset); >> 157: uintptr_t base_address = cast_from_oop(p), >> 158: byte_offset = (uintptr_t)field_offset_to_byte_offset(field_offset); > > We tend to not use this style for setting up variables in HotSpot code: I propose that you update the code to: > Suggestion: > > uintptr_t base_address = cast_from_oop(p); > uintptr_t byte_offset = (uintptr_t)field_offset_to_byte_offset(field_offset); I couldn't find that in the hotspot style guide. Is that documented anywhere? We sometimes use it. Nevertheless, I've changed it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19087#discussion_r1590768283 From ayang at openjdk.org Mon May 6 09:43:57 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Mon, 6 May 2024 09:43:57 GMT Subject: RFR: 8331285: Deprecate and obsolete OldSize [v2] In-Reply-To: References: Message-ID: <_T4OL-tQLbhYzEX4N55qgS9ja3yHy3ssVRDKq5yTzss=.b51e5c89-aa3f-4953-9dd4-4b0db80e6d7f@github.com> On Tue, 30 Apr 2024 07:34:22 GMT, Albert Mingkun Yang wrote: >> Simple deprecating a jvm flag. > > Albert Mingkun Yang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - merge > - review > - old-size Thanks for review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18994#issuecomment-2095574188 From ayang at openjdk.org Mon May 6 09:43:58 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Mon, 6 May 2024 09:43:58 GMT Subject: Integrated: 8331285: Deprecate and obsolete OldSize In-Reply-To: References: Message-ID: On Mon, 29 Apr 2024 10:06:38 GMT, Albert Mingkun Yang wrote: > Simple deprecating a jvm flag. This pull request has now been integrated. Changeset: 9b0bb033 Author: Albert Mingkun Yang URL: https://git.openjdk.org/jdk/commit/9b0bb03366642dd787b02809b3759ed714da9b81 Stats: 2 lines in 2 files changed: 1 ins; 0 del; 1 mod 8331285: Deprecate and obsolete OldSize Reviewed-by: dholmes, eosterlund ------------- PR: https://git.openjdk.org/jdk/pull/18994 From lmao at openjdk.org Mon May 6 09:58:12 2024 From: lmao at openjdk.org (Liang Mao) Date: Mon, 6 May 2024 09:58:12 GMT Subject: RFR: 8331711: G1 doesn't need pre write barrier for stores from new allocated objects [v2] In-Reply-To: References: Message-ID: > The pre-write barrier of G1 is used to capture the object disconnected from the marking graph which could be unmarked aka *white* and stored into *black* objects then break tri-color invariance. But references in new allocated objects are created in object initialization after marking start and never could be white. So we don't need pre-write barrier for stores from new allocated objects. The same mechanism is also used for barrier eliminantion in GenZGC. > > Additional testing: > - [x] Linux aarch64 server release/fastdebug, test/hotspot/jtreg/gc with +UseG1GC > - [x] Run several iterations of SPECjbb2015 with aggressively frequent concurrent mark Liang Mao has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: - Merge remote-tracking branch 'openjdk/master' into 8331711 - 8331711: G1 doesn't need pre write barrier for stores from new allocated objects ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19098/files - new: https://git.openjdk.org/jdk/pull/19098/files/b34d6e7b..12681d73 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19098&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19098&range=00-01 Stats: 122 lines in 12 files changed: 67 ins; 38 del; 17 mod Patch: https://git.openjdk.org/jdk/pull/19098.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19098/head:pull/19098 PR: https://git.openjdk.org/jdk/pull/19098 From shade at openjdk.org Mon May 6 10:07:24 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 6 May 2024 10:07:24 GMT Subject: RFR: 8331714: Make OopMapCache installation lock-free Message-ID: Trying to solve [JDK-8331572](https://bugs.openjdk.org/browse/JDK-8331572) runs into all sorts of lock ranking issues with `OopMapCacheAlloc_lock`. I think it would be a bit saner to rewrite the double-checked locking installation to atomic lock-free. OpenJDK code was using this lock since the initial load. There is a drawback that we might be trying to instantiate multiple `OopMapCache` instances from multiple threads. I think this is not a practical problem, as only a few threads would race here, and the allocation is relatively small (32*8 = 512 bytes). In imaginary worst^W nightmare case, with 100K threads racing we get a temporary native memory spike at +50M. Additional testing: - [ ] Linux x86_64 server fastdebug, `all` tests ------------- Commit messages: - Touchup comments - Fix Changes: https://git.openjdk.org/jdk/pull/19100/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19100&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8331714 Stats: 12 lines in 3 files changed: 1 ins; 3 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/19100.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19100/head:pull/19100 PR: https://git.openjdk.org/jdk/pull/19100 From duke at openjdk.org Mon May 6 10:10:55 2024 From: duke at openjdk.org (Jin Guojie) Date: Mon, 6 May 2024 10:10:55 GMT Subject: RFR: 8331558: AArch64: optimize integer remainder [v2] In-Reply-To: References: Message-ID: On Mon, 6 May 2024 09:30:45 GMT, Eric Liu wrote: >> Jin Guojie has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: >> >> - Merge branch 'openjdk:master' into dev >> - Update vm_version_aarch64.hpp >> - 8331558: AArch64: optimize integer remainder >> >> On some Arm processors, a separate multiply/subtract is actually faster than the combined instruction. >> - 8331556: AArch64: CPU_Model support for Neoverse N1/N2/V1/V2 >> >> Add full platform coverage for Neoverse variants in vm_version.?pp > > src/hotspot/cpu/aarch64/macroAssembler_aarch64.hpp line 447: > >> 445: inline void msub(Register Rd, Register Rn, Register Rm, Register Ra) { >> 446: if (VM_Version::supports_a53mac() && Ra != zr) >> 447: nop(); > > It was in JDK-8079203 [1] for the first time. May I ask what's the specials on a53mac? > > [1] https://github.com/openjdk/jdk/commit/a65f9f95894e22ce2fd160024ce46f6aaa6c8bd3 This code entered the JDK in 2015. Frankly, I have no idea why an extra nop is needed on CPUs with the a53mac feature. Perhaps the author of patch a65f9f9589, enevill at openjdk.org, could explain? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19093#discussion_r1590798890 From rkennke at openjdk.org Mon May 6 11:04:53 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Mon, 6 May 2024 11:04:53 GMT Subject: RFR: 8331098: [Aarch64] Fix crash in Arrays.equals() intrinsic with -CCP [v4] In-Reply-To: References: <_HzINQ0atD5BmBbIZ6A4A5y1wNvwsvrBxAiaz2Mk9rY=.43cde0ae-1179-4708-afa1-fda64039d722@github.com> Message-ID: On Fri, 26 Apr 2024 11:22:03 GMT, Roman Kennke wrote: >> The implementations of Arrays.equals() in macroAssembler_aarch64.cpp, MacroAssembler::arrays_equals() assumes that the start of arrays is 8-byte-aligned. Since [JDK-8139457](https://bugs.openjdk.org/browse/JDK-8139457) this is no longer the case, at least when running with -CompressedClassPointers (or Lilliput). The effect is that the loops may run over the array end, and if the array is at heap boundary, and that memory is unmapped, then it may crash. >> >> The proposed fix aims to always enter the main loop(s) with an aligned address: >> - When the array base is 8-byte-aligned (default, with +CCP), then compare the array lengths separately, then enter the main loop with the array base. >> - When the array base is not 8-byte-aligned (-CCP and Lilliput), then enter the loop with the address of the array-length (which is then 8-byte-aligned), and compare array lengths in the main loop, and elide the explicit array lengths comparison. >> >> Testing: >> - [x] tier1 (+CCP) >> - [x] tier1 (-CCP) >> - [x] tier2 (+CCP) >> - [x] tier2 (-CCP) > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Remove extra whitespace Friendly ping? Could I get a review for this change? ------------- PR Comment: https://git.openjdk.org/jdk/pull/18948#issuecomment-2095749376 From shade at openjdk.org Mon May 6 11:21:56 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 6 May 2024 11:21:56 GMT Subject: RFR: 8331573: Rename CollectedHeap::is_gc_active to be explicitly about STW GCs In-Reply-To: References: Message-ID: <6hpNRaO10dRNfshEfX3RHSxY5lcxrLeU1wH7zUDqTlA=.40086db0-2a55-4735-8a78-13f154e9155d@github.com> On Thu, 2 May 2024 14:40:35 GMT, Aleksey Shipilev wrote: > `CollectedHeap::is_gc_active()` is confusing, since its name implies _any_ GC phase is running, while it actually only covers the STW GCs. It would be good to rename it for clarity. The freed-up name, `is_gc_active` could then be repurposed to track any (concurrent or STW) GC phase running. That would be useful to resolve [JDK-8331572](https://bugs.openjdk.org/browse/JDK-8331572). > > Doing this rename separately guarantees we have caught and renamed all current uses. > > Additional testing: > - [x] Linux AArch64 server fastdebug, `all` Thanks all! ------------- PR Comment: https://git.openjdk.org/jdk/pull/19064#issuecomment-2095775823 From shade at openjdk.org Mon May 6 11:21:57 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 6 May 2024 11:21:57 GMT Subject: RFR: 8331573: Rename CollectedHeap::is_gc_active to be explicitly about STW GCs In-Reply-To: <8rTp30vldMrfGYMh6uP-tirE9bjNGTBePOSztx95MD4=.8f9cdead-0301-42c4-acad-a2bfc26b4702@github.com> References: <8rTp30vldMrfGYMh6uP-tirE9bjNGTBePOSztx95MD4=.8f9cdead-0301-42c4-acad-a2bfc26b4702@github.com> Message-ID: On Thu, 2 May 2024 17:26:58 GMT, Stefan Karlsson wrote: >> Ah, hm. Indeed! Separate PR? There is some light cleanup in G1 that can be associated with it. This PR would keep with just a mechanical rename. > > Sounds like a good idea. Filed: https://bugs.openjdk.org/browse/JDK-8331719 -- I'll give it out to some of our folks as a starter task. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19064#discussion_r1590876454 From shade at openjdk.org Mon May 6 11:21:57 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 6 May 2024 11:21:57 GMT Subject: Integrated: 8331573: Rename CollectedHeap::is_gc_active to be explicitly about STW GCs In-Reply-To: References: Message-ID: On Thu, 2 May 2024 14:40:35 GMT, Aleksey Shipilev wrote: > `CollectedHeap::is_gc_active()` is confusing, since its name implies _any_ GC phase is running, while it actually only covers the STW GCs. It would be good to rename it for clarity. The freed-up name, `is_gc_active` could then be repurposed to track any (concurrent or STW) GC phase running. That would be useful to resolve [JDK-8331572](https://bugs.openjdk.org/browse/JDK-8331572). > > Doing this rename separately guarantees we have caught and renamed all current uses. > > Additional testing: > - [x] Linux AArch64 server fastdebug, `all` This pull request has now been integrated. Changeset: 1eec30a6 Author: Aleksey Shipilev URL: https://git.openjdk.org/jdk/commit/1eec30a6c03b7f4028405dc9bdb4d2a663b3987d Stats: 64 lines in 27 files changed: 0 ins; 2 del; 62 mod 8331573: Rename CollectedHeap::is_gc_active to be explicitly about STW GCs Reviewed-by: stefank, zgu, tschatzl, gli ------------- PR: https://git.openjdk.org/jdk/pull/19064 From tholenstein at openjdk.org Mon May 6 11:33:01 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Mon, 6 May 2024 11:33:01 GMT Subject: RFR: 8329748: Change default value of AssertWXAtThreadSync to true Message-ID: The debug flag `-XX:+AssertWXAtThreadSync` conservatively checks for correct W^X thread state at possible safepoints or handshake. The flag is useful to detect missing `MACOS_AARCH64_ONLY(ThreadWXEnable __wx(WXWrite, thread));`. Since the check is cheap and it is a `AARCH64_ONLY(develop(..))` only flag it makes sense to enable the flag by default. There was one missing `MACOS_AARCH64_ONLY(ThreadWXEnable __wx(WXWrite, thread));` to make all tests (tier1-7) pass. ------------- Commit messages: - Update jfrIntrinsics.cpp - JDK-8329748: Change default value of AssertWXAtThreadSync to true Changes: https://git.openjdk.org/jdk/pull/19102/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19102&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8329748 Stats: 3 lines in 2 files changed: 1 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/19102.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19102/head:pull/19102 PR: https://git.openjdk.org/jdk/pull/19102 From ayang at openjdk.org Mon May 6 11:39:16 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Mon, 6 May 2024 11:39:16 GMT Subject: RFR: 8331557: Serial: Refactor SerialHeap::do_collection [v4] In-Reply-To: References: Message-ID: > It's probably easier to read the new code directly. The two classes in `serialVMOperations` serve as entrance points to invoke young/full GCs. Some previously hidden decisions are made more obvious, e.g. if a young-gc fails (or will probablly fail), fallback to full-gc. > > Additionally, `StatRecord` is removed, because this kind of info-aggregation should be done outsite VM (by third-party tool). > > Test: tier1-6 Albert Mingkun Yang has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains four commits: - merge - review - Merge branch 'master' into s1-do-collect - s1-do-collect ------------- Changes: https://git.openjdk.org/jdk/pull/19056/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19056&range=03 Stats: 566 lines in 15 files changed: 125 ins; 356 del; 85 mod Patch: https://git.openjdk.org/jdk/pull/19056.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19056/head:pull/19056 PR: https://git.openjdk.org/jdk/pull/19056 From gli at openjdk.org Mon May 6 11:58:53 2024 From: gli at openjdk.org (Guoxiong Li) Date: Mon, 6 May 2024 11:58:53 GMT Subject: RFR: 8331557: Serial: Refactor SerialHeap::do_collection [v2] In-Reply-To: <-Vn0Bjjlrq0l5PlHprZIFIETB5YUhw9MQLKh9cKO6LA=.262cec6e-865d-4d55-be6c-d60a08934d08@github.com> References: <2E8psdsbHlnXaWjLMnhAHsoywFxY-jWEhHqAU4699_8=.83ba590a-2357-4924-a74a-e972b70b60da@github.com> <-Vn0Bjjlrq0l5PlHprZIFIETB5YUhw9MQLKh9cKO6LA=.262cec6e-865d-4d55-be6c-d60a08934d08@github.com> Message-ID: On Mon, 6 May 2024 09:12:47 GMT, Albert Mingkun Yang wrote: >> src/hotspot/share/gc/serial/serialHeap.cpp line 461: >> >>> 459: if (should_verify && VerifyBeforeGC) { >>> 460: prepare_for_verify(); >>> 461: Universe::verify("Before GC"); >> >> May the prefix of the verification log be better to specify the minor or full GC? Such as `Before Minor GC` here. > > Other `Universe::verify("` seems to not distinguish minor/major. OK. If someone want to change all of them in the future, she/he can file another ticket to follow up. >> src/hotspot/share/gc/serial/serialHeap.cpp line 463: >> >>> 461: Universe::verify("Before GC"); >>> 462: } >>> 463: gc_prologue(false); >> >> The parameter `full` of the method `SerialHeap::gc_prologue` doesn't been used. Seems a leftover of [JDK-8323993](https://bugs.openjdk.org/browse/JDK-8323993). > > True; can probably fixed in a followup cleanup. Filed https://bugs.openjdk.org/browse/JDK-8331723 to follow up. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19056#discussion_r1590915891 PR Review Comment: https://git.openjdk.org/jdk/pull/19056#discussion_r1590915740 From aboldtch at openjdk.org Mon May 6 12:05:57 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Mon, 6 May 2024 12:05:57 GMT Subject: RFR: 8331098: [Aarch64] Fix crash in Arrays.equals() intrinsic with -CCP [v4] In-Reply-To: References: <_HzINQ0atD5BmBbIZ6A4A5y1wNvwsvrBxAiaz2Mk9rY=.43cde0ae-1179-4708-afa1-fda64039d722@github.com> Message-ID: On Fri, 26 Apr 2024 11:22:03 GMT, Roman Kennke wrote: >> The implementations of Arrays.equals() in macroAssembler_aarch64.cpp, MacroAssembler::arrays_equals() assumes that the start of arrays is 8-byte-aligned. Since [JDK-8139457](https://bugs.openjdk.org/browse/JDK-8139457) this is no longer the case, at least when running with -CompressedClassPointers (or Lilliput). The effect is that the loops may run over the array end, and if the array is at heap boundary, and that memory is unmapped, then it may crash. >> >> The proposed fix aims to always enter the main loop(s) with an aligned address: >> - When the array base is 8-byte-aligned (default, with +CCP), then compare the array lengths separately, then enter the main loop with the array base. >> - When the array base is not 8-byte-aligned (-CCP and Lilliput), then enter the loop with the address of the array-length (which is then 8-byte-aligned), and compare array lengths in the main loop, and elide the explicit array lengths comparison. >> >> Testing: >> - [x] tier1 (+CCP) >> - [x] tier1 (-CCP) >> - [x] tier2 (+CCP) >> - [x] tier2 (-CCP) > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Remove extra whitespace I can review and approve the correctness of the implementation. However I cannot comment on any performance implications. The whole `!UseSimpleArrayEquals` looks like it was design with (some specific) performance in mind. Maybe the bug fix should be done first, and let any performance implications be dealt with separately. Especially since the emitted code should be the same when running with`+UseCompressedClassPointers`. For me it reads better if the ```c++ if (extra_length != 0) { // [CODE] } if (is_8aligned) { // [CODE] } and ```c++ if (is_8aligned) { // [CODE] } if (extra_length != 0) { // [CODE] } were just turned into: ```c++ if (is_8aligned) { // [CODE] } else { assert(extra_length != 0, "maybe even assert this, not sure if needed"); // [CODE] } Side note on the performance: There is also [JDK-8328138](https://bugs.openjdk.org/browse/JDK-8328138) / #18292 which proposes another variant of array equals, which seems to be a more optimised version of the simple array equals. I think this speaks even more for just taking in the bug fix without evaluating the performance with `-UseCompressedClassPointers` src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp line 5770: > 5768: sub(tmp5, zr, cnt1, LSL, 3 + log_elem_size); > 5769: ldr(tmp3, Address(pre(a1, start_offset))); > 5770: ldr(tmp4, Address(pre(a2, start_offset))); Is the use of `pre` intentional? If that is the case why would that be better than just a `reg + offset`? (Given that `a1` and `a2` are not used again) ------------- PR Review: https://git.openjdk.org/jdk/pull/18948#pullrequestreview-2040626704 PR Review Comment: https://git.openjdk.org/jdk/pull/18948#discussion_r1590922510 From zgu at openjdk.org Mon May 6 13:31:52 2024 From: zgu at openjdk.org (Zhengyu Gu) Date: Mon, 6 May 2024 13:31:52 GMT Subject: RFR: 8331714: Make OopMapCache installation lock-free In-Reply-To: References: Message-ID: On Mon, 6 May 2024 10:02:40 GMT, Aleksey Shipilev wrote: > Trying to solve [JDK-8331572](https://bugs.openjdk.org/browse/JDK-8331572) runs into all sorts of lock ranking issues with `OopMapCacheAlloc_lock`. I think it would be a bit saner to rewrite the double-checked locking installation to atomic lock-free. OpenJDK code was using this lock since the initial load. > > There is a drawback that we might be trying to instantiate multiple `OopMapCache` instances from multiple threads. I think this is not a practical problem, as only a few threads would race here, and the allocation is relatively small (32*8 = 512 bytes). In imaginary worst^W nightmare case, with 100K threads racing we get a temporary native memory spike at +50M. > > Additional testing: > - [ ] Linux x86_64 server fastdebug, `all` tests LGTM. Very similar to what I made in [PR #16074](https://github.com/openjdk/jdk/pull/16074) ------------- Marked as reviewed by zgu (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19100#pullrequestreview-2040789441 From stefank at openjdk.org Mon May 6 14:33:52 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Mon, 6 May 2024 14:33:52 GMT Subject: RFR: 8331626: unsafe.cpp:162:38: runtime error in index_oop_from_field_offset_long - applying non-zero offset 4563897424 to null pointer [v2] In-Reply-To: References: <73h3Knwa33PoG1bq1S38-dEIKnB0lKfCKe3NwbIvNcU=.b9ea8021-5261-48eb-b0ec-cfae975477e9@github.com> Message-ID: On Mon, 6 May 2024 09:38:55 GMT, Martin Doerr wrote: >> src/hotspot/share/prims/unsafe.cpp line 158: >> >>> 156: assert_field_offset_sane(p, field_offset); >>> 157: uintptr_t base_address = cast_from_oop(p), >>> 158: byte_offset = (uintptr_t)field_offset_to_byte_offset(field_offset); >> >> We tend to not use this style for setting up variables in HotSpot code: I propose that you update the code to: >> Suggestion: >> >> uintptr_t base_address = cast_from_oop(p); >> uintptr_t byte_offset = (uintptr_t)field_offset_to_byte_offset(field_offset); > > I couldn't find that in the hotspot style guide. Is that documented anywhere? We sometimes use it. Nevertheless, I've changed it. I think I've read it somewhere, but I can't find it. I would prefer if we didn't use it in shared code. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19087#discussion_r1591100786 From fyang at openjdk.org Mon May 6 14:46:54 2024 From: fyang at openjdk.org (Fei Yang) Date: Mon, 6 May 2024 14:46:54 GMT Subject: RFR: 8326306: RISC-V: Re-structure MASM calls and jumps [v5] In-Reply-To: <2gtLyG74zJAPBvSyAMrJV5hGnT6KQgobNPOLlg85s90=.2dec9679-7b9d-400e-932a-f16be22dad1d@github.com> References: <2gtLyG74zJAPBvSyAMrJV5hGnT6KQgobNPOLlg85s90=.2dec9679-7b9d-400e-932a-f16be22dad1d@github.com> Message-ID: On Mon, 6 May 2024 08:36:07 GMT, Robbin Ehn wrote: >> Hi, please consider. >> >> We have code that directly use the asm for call/jumps instead masm. >> Our masm have a bit odd naming, and we don't use 'proper' pseudoinstructions/mnemonics. >> Suggested by [riscv-asm-manual](https://github.com/riscv-non-isa/riscv-asm-manual/tree/master) >> >> j offset jal x0, offset Jump >> jal offset jal x1, offset Jump and link >> jr rs jalr x0, rs, 0 Jump register >> jalr rs jalr x1, rs, 0 Jump and link register >> ret jalr x0, x1, 0 Return from subroutine >> call offset auipc x1, offset[31:12]; jalr x1, x1, offset[11:0] Call far-away subroutine >> tail offset auipc x6, offset[31:12]; jalr x0, x6, offset[11:0] Tail call far-away subroutine >> >> But these can only be implemented like this if you have small enough application. >> The fallback of these is to use GOT (your C compiler should place a copy of GOT every 2G so it's always reachable). >> We don't have GOT, instead we materialize, so there is still differences between these and ours. >> >> This patch: >> - Tries to follow these suggested mappings as good we can. >> - Make sure all jumps/calls go through MASM. (so we get control and can easily change for sites using a certain calling convention) >> - To avoid confusion between MASM public/private methods and ASM methods and the mnemonics there are some renaming. >> E.g. the mnemonics jal means call offset, as we can't use that so there is no 'jal'. >> - I enabled c.j, but right now we never generate it. >> - As always the macro does no good and are legacy from when code base did not use templates. (also the x-macros screws up my IDE (vim+rtags)) >> >> I started down this path due to I have followup patch on top of this which removes trampoline in favor for load-n-jump. >> (WIP: https://github.com/robehn/jdk/compare/jal-fixes...robehn:jdk:load-n-link?expand=1) >> While looking into our calls it was a bit confusing, this helps. >> >> Done a couple of t1-3 slightly different version of this patch, and as part of the followup, no issues found. (VF2, qemu, LP4) >> Re-running tests, had some last minute changes. >> >> Thanks, Robbin > > Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: > > - VM leaf should use li > - Merge branch 'master' into jal-fixes > - Merge branch 'master' into jal-fixes > - Merge branch 'master' into jal-fixes > - Corrected method name > - Missed a ws > - JALR src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 938: > 936: int32_t offset = 0; > 937: la(temp, dest, offset); > 938: Assembler::jalr(x1, temp, offset); Hi, did you check the possible impact on performance of this change too? old: call() => mv(tmp, adr) => li(tmp, adr) new: call() => la(tmp, adr) => auipc/movptr(tmp, adr) It's OK if we have auipc emitted by the new call(), which should not be worse then the old one. But the new call() could emit a fixed-sized movptr depending on the `is_32bit_offset_from_codecache` check which should be slower than a li(tmp, adr). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18942#discussion_r1591114954 From stefank at openjdk.org Mon May 6 14:49:52 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Mon, 6 May 2024 14:49:52 GMT Subject: RFR: 8331626: unsafe.cpp:162:38: runtime error in index_oop_from_field_offset_long - applying non-zero offset 4563897424 to null pointer [v2] In-Reply-To: References: <73h3Knwa33PoG1bq1S38-dEIKnB0lKfCKe3NwbIvNcU=.b9ea8021-5261-48eb-b0ec-cfae975477e9@github.com> Message-ID: <3ECiNSLQXPoEzKRXMxXTPX1VVZeeW06xKaIhdTRfvcM=.2c26252c-ff31-4f24-ba06-545af7b70ae0@github.com> On Mon, 6 May 2024 14:30:53 GMT, Stefan Karlsson wrote: >> I couldn't find that in the hotspot style guide. Is that documented anywhere? We sometimes use it. Nevertheless, I've changed it. > > I think I've read it somewhere, but I can't find it. I would prefer if we didn't use it in shared code. FWIW, I found that the C++ Core Guidelines mentions this: https://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines#Res-name-one So, it could have been there I read it and had it resonate with my view of the HotSpot code. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19087#discussion_r1591124541 From rehn at openjdk.org Mon May 6 15:23:55 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Mon, 6 May 2024 15:23:55 GMT Subject: RFR: 8326306: RISC-V: Re-structure MASM calls and jumps [v5] In-Reply-To: References: <2gtLyG74zJAPBvSyAMrJV5hGnT6KQgobNPOLlg85s90=.2dec9679-7b9d-400e-932a-f16be22dad1d@github.com> Message-ID: On Mon, 6 May 2024 14:40:29 GMT, Fei Yang wrote: >> Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: >> >> - VM leaf should use li >> - Merge branch 'master' into jal-fixes >> - Merge branch 'master' into jal-fixes >> - Merge branch 'master' into jal-fixes >> - Corrected method name >> - Missed a ws >> - JALR > > src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 938: > >> 936: int32_t offset = 0; >> 937: la(temp, dest, offset); >> 938: Assembler::jalr(x1, temp, offset); > > Hi, did you check the possible impact on performance of this change too? > > old: call() => mv(tmp, adr) => li(tmp, adr) > new: call() => la(tmp, adr) => auipc/movptr(tmp, adr) > > It's OK if we have auipc emitted by the new call(), which should not be worse then the old one. But the new call() could emit a fixed-sized movptr depending on the `is_32bit_offset_from_codecache` check which should be slower than a li(tmp, adr). I checked that we use the use call with a code cache address for the places we care about such as: I.e. JNI FastGetField and far_call() But I now see I was mistaken regarding the interpreter, it seems like we don't have all the math implemented, so we do calls to SharedRuntime there, I'll fix those. Thanks! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18942#discussion_r1591169972 From mdoerr at openjdk.org Mon May 6 16:21:51 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 6 May 2024 16:21:51 GMT Subject: RFR: 8331626: unsafe.cpp:162:38: runtime error in index_oop_from_field_offset_long - applying non-zero offset 4563897424 to null pointer [v2] In-Reply-To: <3ECiNSLQXPoEzKRXMxXTPX1VVZeeW06xKaIhdTRfvcM=.2c26252c-ff31-4f24-ba06-545af7b70ae0@github.com> References: <73h3Knwa33PoG1bq1S38-dEIKnB0lKfCKe3NwbIvNcU=.b9ea8021-5261-48eb-b0ec-cfae975477e9@github.com> <3ECiNSLQXPoEzKRXMxXTPX1VVZeeW06xKaIhdTRfvcM=.2c26252c-ff31-4f24-ba06-545af7b70ae0@github.com> Message-ID: <-4KNPnonH3nMtX0pNZg4bDjT7vwTu__uA1ww20_Dt8Q=.2f33db11-0b9a-4a69-b1b1-710d0cb85c0f@github.com> On Mon, 6 May 2024 14:47:04 GMT, Stefan Karlsson wrote: >> I think I've read it somewhere, but I can't find it. I would prefer if we didn't use it in shared code. > > FWIW, I found that the C++ Core Guidelines mentions this: > https://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines#Res-name-one > > So, it could have been there I read it and had it resonate with my view of the HotSpot code. Thanks for the pointer. I'm aware of the "C/C++ grammar" problems (especially when using pointers) which didn't exist in my simple use case. Anyway, it's already changed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19087#discussion_r1591250111 From kvn at openjdk.org Mon May 6 16:54:51 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 6 May 2024 16:54:51 GMT Subject: RFR: 8329748: Change default value of AssertWXAtThreadSync to true In-Reply-To: References: Message-ID: On Mon, 6 May 2024 11:10:08 GMT, Tobias Holenstein wrote: > The debug flag `-XX:+AssertWXAtThreadSync` conservatively checks for correct W^X thread state at possible safepoints or handshake. The flag is useful to detect missing `MACOS_AARCH64_ONLY(ThreadWXEnable __wx(WXWrite, thread));`. Since the check is cheap and it is a `AARCH64_ONLY(develop(..))` only flag it makes sense to enable the flag by default. > > There was one missing `MACOS_AARCH64_ONLY(ThreadWXEnable __wx(WXWrite, thread));` to make all tests (tier1-7) pass. Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19102#pullrequestreview-2041236164 From rehn at openjdk.org Mon May 6 18:01:07 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Mon, 6 May 2024 18:01:07 GMT Subject: RFR: 8326306: RISC-V: Re-structure MASM calls and jumps [v6] In-Reply-To: References: Message-ID: > Hi, please consider. > > We have code that directly use the asm for call/jumps instead masm. > Our masm have a bit odd naming, and we don't use 'proper' pseudoinstructions/mnemonics. > Suggested by [riscv-asm-manual](https://github.com/riscv-non-isa/riscv-asm-manual/tree/master) > > j offset jal x0, offset Jump > jal offset jal x1, offset Jump and link > jr rs jalr x0, rs, 0 Jump register > jalr rs jalr x1, rs, 0 Jump and link register > ret jalr x0, x1, 0 Return from subroutine > call offset auipc x1, offset[31:12]; jalr x1, x1, offset[11:0] Call far-away subroutine > tail offset auipc x6, offset[31:12]; jalr x0, x6, offset[11:0] Tail call far-away subroutine > > But these can only be implemented like this if you have small enough application. > The fallback of these is to use GOT (your C compiler should place a copy of GOT every 2G so it's always reachable). > We don't have GOT, instead we materialize, so there is still differences between these and ours. > > This patch: > - Tries to follow these suggested mappings as good we can. > - Make sure all jumps/calls go through MASM. (so we get control and can easily change for sites using a certain calling convention) > - To avoid confusion between MASM public/private methods and ASM methods and the mnemonics there are some renaming. > E.g. the mnemonics jal means call offset, as we can't use that so there is no 'jal'. > - I enabled c.j, but right now we never generate it. > - As always the macro does no good and are legacy from when code base did not use templates. (also the x-macros screws up my IDE (vim+rtags)) > > I started down this path due to I have followup patch on top of this which removes trampoline in favor for load-n-jump. > (WIP: https://github.com/robehn/jdk/compare/jal-fixes...robehn:jdk:load-n-link?expand=1) > While looking into our calls it was a bit confusing, this helps. > > Done a couple of t1-3 slightly different version of this patch, and as part of the followup, no issues found. (VF2, qemu, LP4) > Re-running tests, had some last minute changes. > > Thanks, Robbin Robbin Ehn has updated the pull request incrementally with one additional commit since the last revision: Use li instead of movptr ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18942/files - new: https://git.openjdk.org/jdk/pull/18942/files/6b3e4c47..d8fbb00b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18942&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18942&range=04-05 Stats: 4 lines in 1 file changed: 3 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/18942.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18942/head:pull/18942 PR: https://git.openjdk.org/jdk/pull/18942 From rehn at openjdk.org Mon May 6 18:01:07 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Mon, 6 May 2024 18:01:07 GMT Subject: RFR: 8326306: RISC-V: Re-structure MASM calls and jumps [v5] In-Reply-To: References: <2gtLyG74zJAPBvSyAMrJV5hGnT6KQgobNPOLlg85s90=.2dec9679-7b9d-400e-932a-f16be22dad1d@github.com> Message-ID: On Mon, 6 May 2024 15:21:31 GMT, Robbin Ehn wrote: >> src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 938: >> >>> 936: int32_t offset = 0; >>> 937: la(temp, dest, offset); >>> 938: Assembler::jalr(x1, temp, offset); >> >> Hi, did you check the possible impact on performance of this change too? >> >> old: call() => mv(tmp, adr) => li(tmp, adr) >> new: call() => la(tmp, adr) => auipc/movptr(tmp, adr) >> >> It's OK if we have auipc emitted by the new call(), which should not be worse then the old one. But the new call() could emit a fixed-sized movptr depending on the `is_32bit_offset_from_codecache` check which should be slower than a li(tmp, adr). > > I checked that we use the use call with a code cache address for the places we care about such as: > I.e. JNI FastGetField and far_call() > > But I now see I was mistaken regarding the interpreter, it seems like we don't have all the math implemented, so we do calls to SharedRuntime there, I'll fix those. > (e.g. setting -XX:ReservedCodeCacheSize=2047M some interpreter math will use movptr) > > Thanks! Great, thanks. I fixed it by using li() in la() when we can't use auipc, running tests. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18942#discussion_r1591368438 From daniel.smith at oracle.com Mon May 6 18:11:46 2024 From: daniel.smith at oracle.com (Dan Smith) Date: Mon, 6 May 2024 18:11:46 +0000 Subject: 2024 JVM Language Summit Message-ID: <3D31C07C-52B0-4526-BDE2-4E7C4F4D4732@oracle.com> 2024 JVM LANGUAGE SUMMIT -- CALL FOR SPEAKERS We are pleased to announce the 2024 JVM Language Summit to be held at Oracle?s Santa Clara campus on August 5-7, 2024. Registration is now open for all attendees. Speaker submissions will be accepted through May 31. The JVM Language Summit is an open technical collaboration among language designers, compiler writers, tool builders, runtime engineers, and VM architects. We will share our experiences as creators of both the JVM and programming languages for the JVM. We also welcome non-JVM developers of similar technologies to attend or speak on their runtime, VM, or language of choice. Presentations will be recorded and made available to the public. This event is being organized by language and JVM engineers -- no marketers involved! So bring your slide rules and be prepared for some seriously geeky discussions. Please review additional details at: https://jvmlangsummit.com To register: http://register.jvmlangsummit.com Questions: inquire2024 at jvmlangsummit.com The Summit will be followed by the OpenJDK Committers' Workshop on August 8-9. See https://openjdk.org/workshop for details. -------------- next part -------------- An HTML attachment was scrubbed... URL: From coleenp at openjdk.org Mon May 6 18:25:52 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Mon, 6 May 2024 18:25:52 GMT Subject: RFR: 8331714: Make OopMapCache installation lock-free In-Reply-To: References: Message-ID: On Mon, 6 May 2024 10:02:40 GMT, Aleksey Shipilev wrote: > Trying to solve [JDK-8331572](https://bugs.openjdk.org/browse/JDK-8331572) runs into all sorts of lock ranking issues with `OopMapCacheAlloc_lock`. I think it would be a bit saner to rewrite the double-checked locking installation to atomic lock-free. OpenJDK code was using this lock since the initial load. > > There is a drawback that we might be trying to instantiate multiple `OopMapCache` instances from multiple threads. I think this is not a practical problem, as only a few threads would race here, and the allocation is relatively small (32*8 = 512 bytes). In imaginary worst^W nightmare case, with 100K threads racing we get a temporary native memory spike at +50M. > > Additional testing: > - [ ] Linux x86_64 server fastdebug, `all` tests This seems completely reasonable. ------------- Marked as reviewed by coleenp (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19100#pullrequestreview-2041405668 From jsjolen at openjdk.org Mon May 6 20:02:18 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Mon, 6 May 2024 20:02:18 GMT Subject: RFR: 8312132: Add tracking of multiple address spaces in NMT [v60] In-Reply-To: References: Message-ID: > Hi, > > This PR introduces a new abstraction to NMT, named `MemoryFileTracker`. Today, NMT does not track any memory outside of the virtual memory address space. This means that if you allocated memory in something such as a memory-backed file and use `mmap` to map into that memory, then you'll have trouble reporting this to NMT. This is the situation that ZGC is in, and that is what this patch attempts to fix. > > ## `MemoryFileTracker` > > The `MemoryFileTracker` adds the ability of adding new virtual memory address spaces to NMT and committing memory to these, the basic API is: > > ```c++ > static MemoryFile* make_device(const char* descriptive_name); > static void free_device(MemoryFile* device); > > static void allocate_memory(MemoryFile* device, size_t offset, size_t size, > MEMFLAGS flag, const NativeCallStack& stack); > static void free_memory(MemoryFile* device, size_t offset, size_t size); > > > It is easiest to see how this is used by looking at what ZGC's `ZNMT` class does: > > ```c++ > void ZNMT::reserve(zaddress_unsafe start, size_t size) { > MemTracker::record_virtual_memory_reserve((address)start, size, CALLER_PC, mtJavaHeap); > } > void ZNMT::commit(zoffset offset, size_t size) { > MemTracker::allocate_memory_in(ZNMT::_device, static_cast(offset), size, mtJavaHeap, CALLER_PC); > } > void ZNMT::uncommit(zoffset offset, size_t size) { > MemTracker::free_memory_in(ZNMT::_device, (size_t)offset, size); > } > > void ZNMT::map(zaddress_unsafe addr, size_t size, zoffset offset) { > // NMT doesn't track mappings at the moment. > } > void ZNMT::unmap(zaddress_unsafe addr, size_t size) { > // NMT doesn't track mappings at the moment. > } > > > As you can see, any mapping between reserved regions and device-allocated memory is not recorded in NMT. This means that in detailed mode you only get reserved regions printed for the reserved memory, the device-allocated memory is reported separately. When performing summary reporting any memory allocated via these devices is added to the corresponding `MEMFLAGS` as `committed` memory. > > This patch is also acting as a base on which we deploy multiple new backend ideas to NMT. These ideas are: > > 1. Implement VMA tracking using a balanced binary tree approach. Today's `VirtualMemoryTracker`'s usage of linked lists is slow and brittle, we'd like to move away from it. Our Treap-based approach in this patch gives a performance boost such that we see 25x better performance in a benchmark. The idea and draft of this... Johan Sj?len has updated the pull request incrementally with three additional commits since the last revision: - Move comma - Store NCS:s on the side for 4-byte pointers to each NCS - Monotonic ordering of keys ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18289/files - new: https://git.openjdk.org/jdk/pull/18289/files/c70203db..fc674b56 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18289&range=59 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18289&range=58-59 Stats: 110 lines in 4 files changed: 57 ins; 23 del; 30 mod Patch: https://git.openjdk.org/jdk/pull/18289.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18289/head:pull/18289 PR: https://git.openjdk.org/jdk/pull/18289 From jsjolen at openjdk.org Mon May 6 20:09:14 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Mon, 6 May 2024 20:09:14 GMT Subject: RFR: 8312132: Add tracking of multiple address spaces in NMT [v61] In-Reply-To: References: Message-ID: > Hi, > > This PR introduces a new abstraction to NMT, named `MemoryFileTracker`. Today, NMT does not track any memory outside of the virtual memory address space. This means that if you allocated memory in something such as a memory-backed file and use `mmap` to map into that memory, then you'll have trouble reporting this to NMT. This is the situation that ZGC is in, and that is what this patch attempts to fix. > > ## `MemoryFileTracker` > > The `MemoryFileTracker` adds the ability of adding new virtual memory address spaces to NMT and committing memory to these, the basic API is: > > ```c++ > static MemoryFile* make_device(const char* descriptive_name); > static void free_device(MemoryFile* device); > > static void allocate_memory(MemoryFile* device, size_t offset, size_t size, > MEMFLAGS flag, const NativeCallStack& stack); > static void free_memory(MemoryFile* device, size_t offset, size_t size); > > > It is easiest to see how this is used by looking at what ZGC's `ZNMT` class does: > > ```c++ > void ZNMT::reserve(zaddress_unsafe start, size_t size) { > MemTracker::record_virtual_memory_reserve((address)start, size, CALLER_PC, mtJavaHeap); > } > void ZNMT::commit(zoffset offset, size_t size) { > MemTracker::allocate_memory_in(ZNMT::_device, static_cast(offset), size, mtJavaHeap, CALLER_PC); > } > void ZNMT::uncommit(zoffset offset, size_t size) { > MemTracker::free_memory_in(ZNMT::_device, (size_t)offset, size); > } > > void ZNMT::map(zaddress_unsafe addr, size_t size, zoffset offset) { > // NMT doesn't track mappings at the moment. > } > void ZNMT::unmap(zaddress_unsafe addr, size_t size) { > // NMT doesn't track mappings at the moment. > } > > > As you can see, any mapping between reserved regions and device-allocated memory is not recorded in NMT. This means that in detailed mode you only get reserved regions printed for the reserved memory, the device-allocated memory is reported separately. When performing summary reporting any memory allocated via these devices is added to the corresponding `MEMFLAGS` as `committed` memory. > > This patch is also acting as a base on which we deploy multiple new backend ideas to NMT. These ideas are: > > 1. Implement VMA tracking using a balanced binary tree approach. Today's `VirtualMemoryTracker`'s usage of linked lists is slow and brittle, we'd like to move away from it. Our Treap-based approach in this patch gives a performance boost such that we see 25x better performance in a benchmark. The idea and draft of this... Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: Forgot to initialize ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18289/files - new: https://git.openjdk.org/jdk/pull/18289/files/fc674b56..101fbf72 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18289&range=60 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18289&range=59-60 Stats: 3 lines in 1 file changed: 3 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/18289.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18289/head:pull/18289 PR: https://git.openjdk.org/jdk/pull/18289 From kbarrett at openjdk.org Mon May 6 20:40:53 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Mon, 6 May 2024 20:40:53 GMT Subject: RFR: 8331193: Return references when possible in GrowableArray [v6] In-Reply-To: References: Message-ID: <7njc5S0tLBOtD7vHuYnYHE2rTMc70x1AnC3CYWyuBkA=.f90b61fb-6e5e-4f58-a38a-bc43781caf18@github.com> On Mon, 6 May 2024 09:15:22 GMT, Johan Sj?len wrote: >> Hi, >> >> This PR introduces the possibility of using references more often when using GrowableArray, where as previously this was only possible when using the `at()` method. This lets us avoid copying and redundant method calls and makes the API more streamlined. After the patch, we can use `at_grow` just like `at` works. The same goes for `top`, `first`, and `last`. >> >> >> Some example code: >> ```c++ >> // Before this patch this worked: >> GrowableArray arr(8,8,-1); // Pre-fill with 8 -1s >> int& x = arr.at(7); >> if (x == -1) { >> x = 2; >> } >> assert(arr.at(7) == 2, "this holds"); >> // but this was forbidden >> int& x = arr.at_grow(9, -1); // Compilation error! at_grow returns E, not E& >> // so we had to do >> int x = arr.at_grow(9, -1); >> if (x == -1) { >> arr.at_put(9, 2); >> } >> >> >> Thanks. > > Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: > > Same formatting Marked as reviewed by kbarrett (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/18975#pullrequestreview-2041636119 From kbarrett at openjdk.org Mon May 6 20:40:54 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Mon, 6 May 2024 20:40:54 GMT Subject: RFR: 8331193: Return references when possible in GrowableArray [v3] In-Reply-To: References: Message-ID: On Thu, 2 May 2024 12:55:18 GMT, Johan Sj?len wrote: >> Done! > > Alright, > > I'm reverting this change. The issue is that C1 and C2 aren't very `const`-correct, and fixing this would make the size of the PR blow up. OK. Can you file a followup issue to work on that? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18975#discussion_r1591529916 From matsaave at openjdk.org Mon May 6 20:50:02 2024 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Mon, 6 May 2024 20:50:02 GMT Subject: RFR: 8329418: Replace pointers to tables with offsets in relocation bitmap Message-ID: The beginning of the RW region contains pointers to c++ vtables which are always located at a fixed offset from the shared base address at runtime. This offset can be calculated at dumptime and stored with the read-only tables at the top of the RO region. As a further improvement, all the pointers to RO tables are replaced with offsets as well. These changes will reduce the number of pointers in the RW and RO regions and will allow for the relocation bitmap size optimizations to be more effective. Verified with tier 1-5 tests. ------------- Commit messages: - Merge branch 'master' into pointer_to_offset_8329418 - Cleanup - Corrected SA - Editing SA - Fixed dynamic dumping - Now works with -Xshare:on - Adjusted serialization - Serializing offsets - 8329418: Replace pointers to tables with offsets in relocation bitmap Changes: https://git.openjdk.org/jdk/pull/19107/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19107&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8329418 Stats: 121 lines in 8 files changed: 69 ins; 23 del; 29 mod Patch: https://git.openjdk.org/jdk/pull/19107.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19107/head:pull/19107 PR: https://git.openjdk.org/jdk/pull/19107 From jsjolen at openjdk.org Mon May 6 21:14:21 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Mon, 6 May 2024 21:14:21 GMT Subject: RFR: 8312132: Add tracking of multiple address spaces in NMT [v62] In-Reply-To: References: Message-ID: > Hi, > > This PR introduces a new abstraction to NMT, named `MemoryFileTracker`. Today, NMT does not track any memory outside of the virtual memory address space. This means that if you allocated memory in something such as a memory-backed file and use `mmap` to map into that memory, then you'll have trouble reporting this to NMT. This is the situation that ZGC is in, and that is what this patch attempts to fix. > > ## `MemoryFileTracker` > > The `MemoryFileTracker` adds the ability of adding new virtual memory address spaces to NMT and committing memory to these, the basic API is: > > ```c++ > static MemoryFile* make_device(const char* descriptive_name); > static void free_device(MemoryFile* device); > > static void allocate_memory(MemoryFile* device, size_t offset, size_t size, > MEMFLAGS flag, const NativeCallStack& stack); > static void free_memory(MemoryFile* device, size_t offset, size_t size); > > > It is easiest to see how this is used by looking at what ZGC's `ZNMT` class does: > > ```c++ > void ZNMT::reserve(zaddress_unsafe start, size_t size) { > MemTracker::record_virtual_memory_reserve((address)start, size, CALLER_PC, mtJavaHeap); > } > void ZNMT::commit(zoffset offset, size_t size) { > MemTracker::allocate_memory_in(ZNMT::_device, static_cast(offset), size, mtJavaHeap, CALLER_PC); > } > void ZNMT::uncommit(zoffset offset, size_t size) { > MemTracker::free_memory_in(ZNMT::_device, (size_t)offset, size); > } > > void ZNMT::map(zaddress_unsafe addr, size_t size, zoffset offset) { > // NMT doesn't track mappings at the moment. > } > void ZNMT::unmap(zaddress_unsafe addr, size_t size) { > // NMT doesn't track mappings at the moment. > } > > > As you can see, any mapping between reserved regions and device-allocated memory is not recorded in NMT. This means that in detailed mode you only get reserved regions printed for the reserved memory, the device-allocated memory is reported separately. When performing summary reporting any memory allocated via these devices is added to the corresponding `MEMFLAGS` as `committed` memory. > > This patch is also acting as a base on which we deploy multiple new backend ideas to NMT. These ideas are: > > 1. Implement VMA tracking using a balanced binary tree approach. Today's `VirtualMemoryTracker`'s usage of linked lists is slow and brittle, we'd like to move away from it. Our Treap-based approach in this patch gives a performance boost such that we see 25x better performance in a benchmark. The idea and draft of this... Johan Sj?len has updated the pull request incrementally with two additional commits since the last revision: - Remove friend class VMATree from Treap - Switch out manual loop in VMATree to functor iterators in Treap ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18289/files - new: https://git.openjdk.org/jdk/pull/18289/files/101fbf72..53695f81 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18289&range=61 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18289&range=60-61 Stats: 112 lines in 4 files changed: 29 ins; 59 del; 24 mod Patch: https://git.openjdk.org/jdk/pull/18289.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18289/head:pull/18289 PR: https://git.openjdk.org/jdk/pull/18289 From iklam at openjdk.org Mon May 6 21:52:11 2024 From: iklam at openjdk.org (Ioi Lam) Date: Mon, 6 May 2024 21:52:11 GMT Subject: RFR: 8330532: Improve line-oriented text parsing in HotSpot [v5] In-Reply-To: References: Message-ID: <0MXnQKyhTxKdjHFtsHZsw9yeUzVVtC1xDcaWdoZJJEA=.c511a113-15bb-4c7b-9b74-56ae92784cea@github.com> > (This PR is an alternative to https://github.com/openjdk/jdk/pull/18669 with a better API for reading lines of text) > > HotSpot has a few cases where information is parsed from a file, or from a memory buffer, one line at a time. Example: > > - https://github.com/openjdk/jdk/blob/064628471b83616b4463baa78618d1b7a66d0c7c/src/hotspot/share/cds/classListParser.cpp#L169 > - https://github.com/openjdk/jdk/blob/064628471b83616b4463baa78618d1b7a66d0c7c/src/hotspot/share/compiler/compilerOracle.cpp#L1059-L1066 > > Common problems: > - They use a fixed buffer for reading a line, so long (but valid) lines will cause errors. > - There's ad-hoc code that deals with `FILE*` differently than from memory. > > This RFE implements a common utility, `inputStream`, for reading lines from different sources of input (see `FileInput` and `MemoryInput`). We fixed only `ClassListParser` and `CompilerOracle` in this RFE, but we can fix other readers in follow-up RFEs. > > The API allows other source of input to be implemented. For example, one could implement a `SocketInput` if there's a use case for it. > > In the future, `inputStream` can be extended (or encapsulated in a higher-level reader class) to read typed input tokens (for example, integers, strings, etc.) > > Credit: > The `inputStream` class and friends are contributed by @rose00 . See https://mail.openjdk.org/pipermail/hotspot-dev/2024-April/087077.html . > > John's original version is in the draft PR https://github.com/openjdk/jdk/pull/18773. In order to minimize the size of this PR, I have kept only the functionalities for reading a line and a time. Other features, such as pushing back contents into the `inputStream`, could be added in follow-up PRs. (These removed features can be found in the commit history of this PR). Ioi Lam has updated the pull request incrementally with two additional commits since the last revision: - set _buffer to _small_buffer in InputStream constructor - removed Input::close() ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18833/files - new: https://git.openjdk.org/jdk/pull/18833/files/bd7986e1..01fa3e38 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18833&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18833&range=03-04 Stats: 33 lines in 2 files changed: 2 ins; 20 del; 11 mod Patch: https://git.openjdk.org/jdk/pull/18833.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18833/head:pull/18833 PR: https://git.openjdk.org/jdk/pull/18833 From dlong at openjdk.org Mon May 6 21:52:55 2024 From: dlong at openjdk.org (Dean Long) Date: Mon, 6 May 2024 21:52:55 GMT Subject: RFR: 8329748: Change default value of AssertWXAtThreadSync to true In-Reply-To: References: Message-ID: On Mon, 6 May 2024 11:10:08 GMT, Tobias Holenstein wrote: > The debug flag `-XX:+AssertWXAtThreadSync` conservatively checks for correct W^X thread state at possible safepoints or handshake. The flag is useful to detect missing `MACOS_AARCH64_ONLY(ThreadWXEnable __wx(WXWrite, thread));`. Since the check is cheap and it is a `AARCH64_ONLY(develop(..))` only flag it makes sense to enable the flag by default. > > There was one missing `MACOS_AARCH64_ONLY(ThreadWXEnable __wx(WXWrite, thread));` to make all tests (tier1-7) pass. src/hotspot/share/jfr/support/jfrIntrinsics.cpp line 77: > 75: void* JfrIntrinsicSupport::return_lease(JavaThread* jt) { > 76: DEBUG_ONLY(assert_precondition(jt);) > 77: MACOS_AARCH64_ONLY(ThreadWXEnable __wx(WXWrite, jt)); It seems like this could be moved down. It doesn't seem to be needed for the Java --> native transition. Is it needed for the JfrJavaEventWriter::flush() call? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19102#discussion_r1591592616 From dlong at openjdk.org Mon May 6 21:55:52 2024 From: dlong at openjdk.org (Dean Long) Date: Mon, 6 May 2024 21:55:52 GMT Subject: RFR: 8329748: Change default value of AssertWXAtThreadSync to true In-Reply-To: References: Message-ID: On Mon, 6 May 2024 11:10:08 GMT, Tobias Holenstein wrote: > The debug flag `-XX:+AssertWXAtThreadSync` conservatively checks for correct W^X thread state at possible safepoints or handshake. The flag is useful to detect missing `MACOS_AARCH64_ONLY(ThreadWXEnable __wx(WXWrite, thread));`. Since the check is cheap and it is a `AARCH64_ONLY(develop(..))` only flag it makes sense to enable the flag by default. > > There was one missing `MACOS_AARCH64_ONLY(ThreadWXEnable __wx(WXWrite, thread));` to make all tests (tier1-7) pass. Changes requested by dlong (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/19102#pullrequestreview-2041740481 From dlong at openjdk.org Mon May 6 21:55:53 2024 From: dlong at openjdk.org (Dean Long) Date: Mon, 6 May 2024 21:55:53 GMT Subject: RFR: 8329748: Change default value of AssertWXAtThreadSync to true In-Reply-To: References: Message-ID: On Mon, 6 May 2024 21:50:03 GMT, Dean Long wrote: >> The debug flag `-XX:+AssertWXAtThreadSync` conservatively checks for correct W^X thread state at possible safepoints or handshake. The flag is useful to detect missing `MACOS_AARCH64_ONLY(ThreadWXEnable __wx(WXWrite, thread));`. Since the check is cheap and it is a `AARCH64_ONLY(develop(..))` only flag it makes sense to enable the flag by default. >> >> There was one missing `MACOS_AARCH64_ONLY(ThreadWXEnable __wx(WXWrite, thread));` to make all tests (tier1-7) pass. > > src/hotspot/share/jfr/support/jfrIntrinsics.cpp line 77: > >> 75: void* JfrIntrinsicSupport::return_lease(JavaThread* jt) { >> 76: DEBUG_ONLY(assert_precondition(jt);) >> 77: MACOS_AARCH64_ONLY(ThreadWXEnable __wx(WXWrite, jt)); > > It seems like this could be moved down. It doesn't seem to be needed for the Java --> native transition. Is it needed for the JfrJavaEventWriter::flush() call? If it is only needed for the native --> Java transition below, why don't we do it lazily? The interpreter and compilers already do this by calling check_special_condition_for_native_trans() only if a safepoint is detected. Normally we would want to be in the WXExec state when executing in _thread_in_native. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19102#discussion_r1591594916 From iklam at openjdk.org Mon May 6 21:59:55 2024 From: iklam at openjdk.org (Ioi Lam) Date: Mon, 6 May 2024 21:59:55 GMT Subject: RFR: 8330532: Improve line-oriented text parsing in HotSpot [v4] In-Reply-To: References: Message-ID: <5nyBey-sKrs3ywibYD9QSSp7Lhh1VxG75N0Z4K-0e20=.a996892a-70c6-47ac-b86f-7fb1873ebad6@github.com> On Thu, 2 May 2024 10:12:47 GMT, Johan Sj?len wrote: >> Ioi Lam has updated the pull request incrementally with three additional commits since the last revision: >> >> - BlockInputStream is used by gtest only, so moved it there >> - removed unused set_position(), etc >> - removed _must_free > > src/hotspot/share/utilities/istream.cpp line 173: > >> 171: assert(_buffer_size > 0, ""); >> 172: // and continue with at least a little buffer >> 173: } > > Get rid of this branch, small buffer now default. Fixed. > src/hotspot/share/utilities/istream.hpp line 236: > >> 234: _end(0), >> 235: _next(0), >> 236: _line_count(0) {} > > Explicitly initialize the `_small_buffer` (`0` it out?). Set `_buffer` and `_buffer_size` to the small buffer by default. Fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18833#discussion_r1591598064 PR Review Comment: https://git.openjdk.org/jdk/pull/18833#discussion_r1591597908 From iklam at openjdk.org Mon May 6 22:13:31 2024 From: iklam at openjdk.org (Ioi Lam) Date: Mon, 6 May 2024 22:13:31 GMT Subject: RFR: 8330532: Improve line-oriented text parsing in HotSpot [v6] In-Reply-To: References: Message-ID: > (This PR is an alternative to https://github.com/openjdk/jdk/pull/18669 with a better API for reading lines of text) > > HotSpot has a few cases where information is parsed from a file, or from a memory buffer, one line at a time. Example: > > - https://github.com/openjdk/jdk/blob/064628471b83616b4463baa78618d1b7a66d0c7c/src/hotspot/share/cds/classListParser.cpp#L169 > - https://github.com/openjdk/jdk/blob/064628471b83616b4463baa78618d1b7a66d0c7c/src/hotspot/share/compiler/compilerOracle.cpp#L1059-L1066 > > Common problems: > - They use a fixed buffer for reading a line, so long (but valid) lines will cause errors. > - There's ad-hoc code that deals with `FILE*` differently than from memory. > > This RFE implements a common utility, `inputStream`, for reading lines from different sources of input (see `FileInput` and `MemoryInput`). We fixed only `ClassListParser` and `CompilerOracle` in this RFE, but we can fix other readers in follow-up RFEs. > > The API allows other source of input to be implemented. For example, one could implement a `SocketInput` if there's a use case for it. > > In the future, `inputStream` can be extended (or encapsulated in a higher-level reader class) to read typed input tokens (for example, integers, strings, etc.) > > Credit: > The `inputStream` class and friends are contributed by @rose00 . See https://mail.openjdk.org/pipermail/hotspot-dev/2024-April/087077.html . > > John's original version is in the draft PR https://github.com/openjdk/jdk/pull/18773. In order to minimize the size of this PR, I have kept only the functionalities for reading a line and a time. Other features, such as pushing back contents into the `inputStream`, could be added in follow-up PRs. (These removed features can be found in the commit history of this PR). Ioi Lam has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 15 additional commits since the last revision: - Merge branch 'master' of https://github.com/openjdk/jdk into 8330532-improve-line-oriented-text-parsing-in-hotspot - inputStream::_buffer can never be nullptr - set _buffer to _small_buffer in InputStream constructor - removed Input::close() - BlockInputStream is used by gtest only, so moved it there - removed unused set_position(), etc - removed _must_free - Merge branch 'master' of https://github.com/openjdk/jdk into 8330532-improve-line-oriented-text-parsing-in-hotspot - Comments fro @coleenp and @matias9927 - removed more unused code from istream.hpp - ... and 5 more: https://git.openjdk.org/jdk/compare/953bd4d9...6bad4971 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18833/files - new: https://git.openjdk.org/jdk/pull/18833/files/01fa3e38..6bad4971 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18833&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18833&range=04-05 Stats: 32959 lines in 1870 files changed: 14280 ins; 12248 del; 6431 mod Patch: https://git.openjdk.org/jdk/pull/18833.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18833/head:pull/18833 PR: https://git.openjdk.org/jdk/pull/18833 From cjplummer at openjdk.org Mon May 6 22:21:53 2024 From: cjplummer at openjdk.org (Chris Plummer) Date: Mon, 6 May 2024 22:21:53 GMT Subject: RFR: 8330694: Rename 'HeapRegion' to 'G1HeapRegion' [v9] In-Reply-To: <7Aud9EX-Q09Bx3MmZjM182gBp9sDmbvIt7rSmtBa1FM=.cc43a81c-7431-484d-9eae-295da93c9a52@github.com> References: <3IdWn9VGEERd8v9RcH2E_LzjVo0L8nMfi5jGWmhgVuM=.6b5b3be4-bfbd-4376-9580-48d78d75665c@github.com> <7Aud9EX-Q09Bx3MmZjM182gBp9sDmbvIt7rSmtBa1FM=.cc43a81c-7431-484d-9eae-295da93c9a52@github.com> Message-ID: <3x1oThcCfOj6FR0ZJoH5ipYkrHTFAzrgJXm69Tggb8k=.83dba355-787a-4f05-a721-df5aee8fd810@github.com> On Sat, 4 May 2024 03:45:31 GMT, Lei Zaakjyu wrote: >> follow up 8267941 > > Lei Zaakjyu has updated the pull request incrementally with one additional commit since the last revision: > > review In SA I see references to heapRegionIterate() that possibly should be renamed. I noticed that the HeapRegionManager and HeapRegionClosure classes were not renamed (in the hotspot source). Is this intentional or an oversite? ------------- PR Review: https://git.openjdk.org/jdk/pull/18871#pullrequestreview-2041768479 From cjplummer at openjdk.org Mon May 6 22:37:53 2024 From: cjplummer at openjdk.org (Chris Plummer) Date: Mon, 6 May 2024 22:37:53 GMT Subject: RFR: 8329418: Replace pointers to tables with offsets in relocation bitmap In-Reply-To: References: Message-ID: On Mon, 6 May 2024 17:05:47 GMT, Matias Saavedra Silva wrote: > The beginning of the RW region contains pointers to c++ vtables which are always located at a fixed offset from the shared base address at runtime. This offset can be calculated at dumptime and stored with the read-only tables at the top of the RO region. As a further improvement, all the pointers to RO tables are replaced with offsets as well. > > These changes will reduce the number of pointers in the RW and RO regions and will allow for the relocation bitmap size optimizations to be more effective. Verified with tier 1-5 tests. src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/memory/FileMapInfo.java line 186: > 184: // }; > 185: // > 186: // The following loop compues the following Since there are lot of comments after this point, maybe the wording should instead be "The loop below...". Also, should be "computes" src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/memory/FileMapInfo.java line 216: > 214: for (int i=0; i < metadataTypeArray.length; i++) { > 215: long vtable_offset = vtablesIndex.getJLongAt(i * addressSize); // long offset = _index[i] > 216: System.out.printf("Offset: %x\n", vtable_offset); Remove printf(). test/hotspot/jtreg/serviceability/sa/TestSysProps.java line 68: > 66: } > 67: if (numProps != expectedCount) { > 68: throw new RuntimeException("Wrong number of " + cmdName + " properties: " + numProps + " Expected: " + expectedCount); I think it would be good to add parenthesis around the extra output you added. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19107#discussion_r1591620087 PR Review Comment: https://git.openjdk.org/jdk/pull/19107#discussion_r1591619830 PR Review Comment: https://git.openjdk.org/jdk/pull/19107#discussion_r1591620843 From sviswanathan at openjdk.org Mon May 6 22:43:57 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Mon, 6 May 2024 22:43:57 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v19] In-Reply-To: References: Message-ID: <8Y-nIHc8vfB1X_hp3tpqqqgpCzu6dAt6BBIP_zc4Q70=.c9a48c68-8c14-4af9-8357-ab50e62a5fd3@github.com> On Sat, 4 May 2024 19:35:21 GMT, Scott Gibbons wrote: >> Re-write the IndexOf code without the use of the pcmpestri instruction, only using AVX2 instructions. This change accelerates String.IndexOf on average 1.3x for AVX2. The benchmark numbers: >> >> >> Benchmark Score Latest >> StringIndexOf.advancedWithMediumSub 343.573 317.934 0.925375393x >> StringIndexOf.advancedWithShortSub1 1039.081 1053.96 1.014319384x >> StringIndexOf.advancedWithShortSub2 55.828 110.541 1.980027943x >> StringIndexOf.constantPattern 9.361 11.906 1.271872663x >> StringIndexOf.searchCharLongSuccess 4.216 4.218 1.000474383x >> StringIndexOf.searchCharMediumSuccess 3.133 3.216 1.02649218x >> StringIndexOf.searchCharShortSuccess 3.76 3.761 1.000265957x >> StringIndexOf.success 9.186 9.713 1.057369911x >> StringIndexOf.successBig 14.341 46.343 3.231504079x >> StringIndexOfChar.latin1_AVX2_String 6220.918 12154.52 1.953814533x >> StringIndexOfChar.latin1_AVX2_char 5503.556 5540.044 1.006629895x >> StringIndexOfChar.latin1_SSE4_String 6978.854 6818.689 0.977049957x >> StringIndexOfChar.latin1_SSE4_char 5657.499 5474.624 0.967675646x >> StringIndexOfChar.latin1_Short_String 7132.541 6863.359 0.962260014x >> StringIndexOfChar.latin1_Short_char 16013.389 16162.437 1.009307711x >> StringIndexOfChar.latin1_mixed_String 7386.123 14771.622 1.999915517x >> StringIndexOfChar.latin1_mixed_char 9901.671 9782.245 0.987938803 > > Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: > > Rearrange; add lambdas for clarity src/hotspot/cpu/x86/macroAssembler_x86.cpp line 1174: > 1172: // Alignment specifying the maximum number of allowed bytes to pad. > 1173: // If padding > max, no padding is inserted. > 1174: void MacroAssembler::p2align(int modulus, int maxbytes) { We could pass offset() as an argument to p2align. Basically have three arguments to p2align(modulus, target, maxbytes). Also maybe rename p2align as align then? src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 208: > 206: //////////////////////////////////////////////////////////////////////////////////////// > 207: //////////////////////////////////////////////////////////////////////////////////////// > 208: if (VM_Version::supports_avx2()) { // AVX2 version Instead of the if check here, it would be better to do an assert here: assert (VM_Version::supports_avx2(), "Needs AVX2 support"); src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 233: > 231: //////////////////////////////////////////////////////////////////////////////////////// > 232: //////////////////////////////////////////////////////////////////////////////////////// > 233: This comment can go right before the method start. Also good to add in the comment the native function parameters. src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 238: > 236: const Register needle = rdx; > 237: const Register needle_len = rcx; > 238: This is the calling convention on Linux. How is windows platform handled? src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 260: > 258: // const XMMRegister save_rcx = xmm11; > 259: // const XMMRegister save_r8 = xmm12; > 260: This could be removed? src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 279: > 277: fnptrs[isLL ? StrIntrinsicNode::LL > 278: : isUU ? StrIntrinsicNode::UU > 279: : StrIntrinsicNode::UL] = __ pc(); Could this not be simplified as: fnptrs[ae] = __ pc(); src/hotspot/share/opto/library_call.cpp line 1263: > 1261: if (result != nullptr) { > 1262: // The result is index relative to from_index if substring was found, -1 otherwise. > 1263: // Generate code which will fold into cmove. Any reason to remove this comment? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1591547667 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1591612417 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1591613215 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1591617528 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1591607921 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1591618222 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1591554296 From duke at openjdk.org Mon May 6 22:47:23 2024 From: duke at openjdk.org (Volodymyr Paprotski) Date: Mon, 6 May 2024 22:47:23 GMT Subject: RFR: 8329538: Accelerate P256 on x86_64 using Montgomery intrinsic [v6] In-Reply-To: References: Message-ID: > Performance. Before: > > Benchmark (algorithm) (dataSize) (keyLength) (provider) Mode Cnt Score Error Units > SignatureBench.ECDSA.sign SHA256withECDSA 1024 256 thrpt 3 6443.934 ? 6.491 ops/s > SignatureBench.ECDSA.sign SHA256withECDSA 16384 256 thrpt 3 6152.979 ? 4.954 ops/s > SignatureBench.ECDSA.verify SHA256withECDSA 1024 256 thrpt 3 1895.410 ? 36.979 ops/s > SignatureBench.ECDSA.verify SHA256withECDSA 16384 256 thrpt 3 1878.955 ? 45.487 ops/s > Benchmark (algorithm) (keyLength) (kpgAlgorithm) (provider) Mode Cnt Score Error Units > o.o.b.j.c.full.KeyAgreementBench.EC.generateSecret ECDH 256 EC thrpt 3 1357.810 ? 26.584 ops/s > o.o.b.j.c.small.KeyAgreementBench.EC.generateSecret ECDH 256 EC thrpt 3 1352.119 ? 23.547 ops/s > Benchmark (isMontBench) Mode Cnt Score Error Units > PolynomialP256Bench.benchMultiply false thrpt 3 1746.126 ? 10.970 ops/s > > Performance, no intrinsic: > > Benchmark (algorithm) (dataSize) (keyLength) (provider) Mode Cnt Score Error Units > SignatureBench.ECDSA.sign SHA256withECDSA 1024 256 thrpt 3 6529.839 ? 42.420 ops/s > SignatureBench.ECDSA.sign SHA256withECDSA 16384 256 thrpt 3 6199.747 ? 133.566 ops/s > SignatureBench.ECDSA.verify SHA256withECDSA 1024 256 thrpt 3 1973.676 ? 54.071 ops/s > SignatureBench.ECDSA.verify SHA256withECDSA 16384 256 thrpt 3 1932.127 ? 35.920 ops/s > Benchmark (algorithm) (keyLength) (kpgAlgorithm) (provider) Mode Cnt Score Error Units > o.o.b.j.c.full.KeyAgreementBench.EC.generateSecret ECDH 256 EC thrpt 3 1355.788 ? 29.858 ops/s > o.o.b.j.c.small.KeyAgreementBench.EC.generateSecret ECDH 256 EC thrpt 3 1346.523 ? 28.722 ops/s > Benchmark (isMontBench) Mode Cnt Score Error Units > PolynomialP256Bench.benchMultiply true thrpt 3 1919.574 ? 10.591 ops/s > > Performance, **with intrinsics*... Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision: Use AffinePoint to exit Montgomery domain Style notes: Affine.equals() - Mismatched fields only appear to be used from testing, perhaps should be moved there instead Affine.getX(boolean)|getY(boolean) - "Passing flag is bad design" - cleanest/performant alternative to several instanceof checks - needed to convert Affine to Projective (need to stay in montgomery domain) ECOperations.PointMultiplier - changes could probably be restored to original (since ProjectivePoint handling no longer required) - consider these changes an improvement? (fewer nested classes) - was an inner-class but not using inner-class features (i.e. ecOps variable should be converted) ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18583/files - new: https://git.openjdk.org/jdk/pull/18583/files/a1984501..8ff243a2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18583&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18583&range=04-05 Stats: 268 lines in 7 files changed: 89 ins; 147 del; 32 mod Patch: https://git.openjdk.org/jdk/pull/18583.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18583/head:pull/18583 PR: https://git.openjdk.org/jdk/pull/18583 From sviswanathan at openjdk.org Mon May 6 23:21:57 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Mon, 6 May 2024 23:21:57 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v19] In-Reply-To: References: Message-ID: On Sat, 4 May 2024 19:35:21 GMT, Scott Gibbons wrote: >> Re-write the IndexOf code without the use of the pcmpestri instruction, only using AVX2 instructions. This change accelerates String.IndexOf on average 1.3x for AVX2. The benchmark numbers: >> >> >> Benchmark Score Latest >> StringIndexOf.advancedWithMediumSub 343.573 317.934 0.925375393x >> StringIndexOf.advancedWithShortSub1 1039.081 1053.96 1.014319384x >> StringIndexOf.advancedWithShortSub2 55.828 110.541 1.980027943x >> StringIndexOf.constantPattern 9.361 11.906 1.271872663x >> StringIndexOf.searchCharLongSuccess 4.216 4.218 1.000474383x >> StringIndexOf.searchCharMediumSuccess 3.133 3.216 1.02649218x >> StringIndexOf.searchCharShortSuccess 3.76 3.761 1.000265957x >> StringIndexOf.success 9.186 9.713 1.057369911x >> StringIndexOf.successBig 14.341 46.343 3.231504079x >> StringIndexOfChar.latin1_AVX2_String 6220.918 12154.52 1.953814533x >> StringIndexOfChar.latin1_AVX2_char 5503.556 5540.044 1.006629895x >> StringIndexOfChar.latin1_SSE4_String 6978.854 6818.689 0.977049957x >> StringIndexOfChar.latin1_SSE4_char 5657.499 5474.624 0.967675646x >> StringIndexOfChar.latin1_Short_String 7132.541 6863.359 0.962260014x >> StringIndexOfChar.latin1_Short_char 16013.389 16162.437 1.009307711x >> StringIndexOfChar.latin1_mixed_String 7386.123 14771.622 1.999915517x >> StringIndexOfChar.latin1_mixed_char 9901.671 9782.245 0.987938803 > > Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: > > Rearrange; add lambdas for clarity src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 314: > 312: > 313: // needle_len is in elements, not bytes, for UTF-16 > 314: __ cmpq(needle_len, isUU ? OPT_NEEDLE_SIZE_MAX / 2 : OPT_NEEDLE_SIZE_MAX); OPT_NEEDLE_SIZE_MAX is an odd number (set to 5), should that have been an even number? src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 329: > 327: //////////////////////////////////////////////////////////////////////////////////////// > 328: > 329: __ bind(L_begin); So far we have handled haystack <= 32 and needle_size <= 5 (?) in bytes. A high level algorithm description here is needed in comments to follow the code below. A description of what are the various paths in terms of haystack and needle sizes and how to reason the assembly code below and make sure that all the paths are taken care of. Also the abstraction level suddenly changes here to detailed code below instead of methods for the various paths. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1591640551 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1591646095 From jinguojie.jgj at alibaba-inc.com Tue May 7 03:13:42 2024 From: jinguojie.jgj at alibaba-inc.com (Jin Guojie) Date: Tue, 07 May 2024 11:13:42 +0800 Subject: =?UTF-8?B?UmVwbHk6IFJGUjogODMzMTU1ODogQUFyY2g2NDogb3B0aW1pemUgaW50ZWdlciByZW1haW5k?= =?UTF-8?B?ZXIgW3YyXQ==?= In-Reply-To: References: Message-ID: Hi Nevill, I submitted a patch which is related to Arm64 MSUB instrunction. Eric found that in the two lines of code you submitted in 2015, an extra nop is executed for CPUs with a53mac features. My patch just keeps these two lines of code unchanged. Could you provide some background explanation for what's the specials on a53mac? Jin Guojie (Alibaba Group, hotspot developer) On 2024/5/6 Jin Guojie wrote: > On Mon, 6 May 2024 09:30:45 GMT, Eric Liu wrote: >>> Jin Guojie has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: >>> >>>? - Merge branch 'openjdk:master' into dev >>>? - Update vm_version_aarch64.hpp >>>? - 8331558: AArch64: optimize integer remainder >>>? ? >>>? ? On some Arm processors, a separate multiply/subtract is actually faster than the combined instruction. >>>? - 8331556: AArch64: CPU_Model support for Neoverse N1/N2/V1/V2 >>>? ? >>>? ? Add full platform coverage for Neoverse variants in vm_version.?pp >> >> src/hotspot/cpu/aarch64/macroAssembler_aarch64.hpp line 447: >> >>> 445: ? inline void msub(Register Rd, Register Rn, Register Rm, Register Ra) { >>> 446: ? ? if (VM_Version::supports_a53mac() && Ra != zr) >>> 447: ? ? ? nop(); >> >> It was in JDK-8079203 [1] for the first time. May I ask what's the specials on a53mac? >> >> [1] https://github.com/openjdk/jdk/commit/a65f9f95894e22ce2fd160024ce46f6aaa6c8bd3 > This code entered the JDK in 2015. Frankly, I have no idea why an extra nop is needed on CPUs with the a53mac feature. > Perhaps the author of patch a65f9f9589, enevill at openjdk.org, could explain? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19093#discussion_r1590798890 From iklam at openjdk.org Tue May 7 03:29:25 2024 From: iklam at openjdk.org (Ioi Lam) Date: Tue, 7 May 2024 03:29:25 GMT Subject: RFR: 8330532: Improve line-oriented text parsing in HotSpot [v7] In-Reply-To: References: Message-ID: > (This PR is an alternative to https://github.com/openjdk/jdk/pull/18669 with a better API for reading lines of text) > > HotSpot has a few cases where information is parsed from a file, or from a memory buffer, one line at a time. Example: > > - https://github.com/openjdk/jdk/blob/064628471b83616b4463baa78618d1b7a66d0c7c/src/hotspot/share/cds/classListParser.cpp#L169 > - https://github.com/openjdk/jdk/blob/064628471b83616b4463baa78618d1b7a66d0c7c/src/hotspot/share/compiler/compilerOracle.cpp#L1059-L1066 > > Common problems: > - They use a fixed buffer for reading a line, so long (but valid) lines will cause errors. > - There's ad-hoc code that deals with `FILE*` differently than from memory. > > This RFE implements a common utility, `inputStream`, for reading lines from different sources of input (see `FileInput` and `MemoryInput`). We fixed only `ClassListParser` and `CompilerOracle` in this RFE, but we can fix other readers in follow-up RFEs. > > The API allows other source of input to be implemented. For example, one could implement a `SocketInput` if there's a use case for it. > > In the future, `inputStream` can be extended (or encapsulated in a higher-level reader class) to read typed input tokens (for example, integers, strings, etc.) > > Credit: > The `inputStream` class and friends are contributed by @rose00 . See https://mail.openjdk.org/pipermail/hotspot-dev/2024-April/087077.html . > > John's original version is in the draft PR https://github.com/openjdk/jdk/pull/18773. In order to minimize the size of this PR, I have kept only the functionalities for reading a line and a time. Other features, such as pushing back contents into the `inputStream`, could be added in follow-up PRs. (These removed features can be found in the commit history of this PR). Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: No need to call set_input(null_ptr) from inputStream destructor ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18833/files - new: https://git.openjdk.org/jdk/pull/18833/files/6bad4971..2ddbfea9 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18833&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18833&range=05-06 Stats: 3 lines in 1 file changed: 0 ins; 3 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/18833.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18833/head:pull/18833 PR: https://git.openjdk.org/jdk/pull/18833 From stuefe at openjdk.org Tue May 7 04:21:57 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 7 May 2024 04:21:57 GMT Subject: RFR: 8330532: Improve line-oriented text parsing in HotSpot [v7] In-Reply-To: References: Message-ID: On Tue, 7 May 2024 03:29:25 GMT, Ioi Lam wrote: >> (This PR is an alternative to https://github.com/openjdk/jdk/pull/18669 with a better API for reading lines of text) >> >> HotSpot has a few cases where information is parsed from a file, or from a memory buffer, one line at a time. Example: >> >> - https://github.com/openjdk/jdk/blob/064628471b83616b4463baa78618d1b7a66d0c7c/src/hotspot/share/cds/classListParser.cpp#L169 >> - https://github.com/openjdk/jdk/blob/064628471b83616b4463baa78618d1b7a66d0c7c/src/hotspot/share/compiler/compilerOracle.cpp#L1059-L1066 >> >> Common problems: >> - They use a fixed buffer for reading a line, so long (but valid) lines will cause errors. >> - There's ad-hoc code that deals with `FILE*` differently than from memory. >> >> This RFE implements a common utility, `inputStream`, for reading lines from different sources of input (see `FileInput` and `MemoryInput`). We fixed only `ClassListParser` and `CompilerOracle` in this RFE, but we can fix other readers in follow-up RFEs. >> >> The API allows other source of input to be implemented. For example, one could implement a `SocketInput` if there's a use case for it. >> >> In the future, `inputStream` can be extended (or encapsulated in a higher-level reader class) to read typed input tokens (for example, integers, strings, etc.) >> >> Credit: >> The `inputStream` class and friends are contributed by @rose00 . See https://mail.openjdk.org/pipermail/hotspot-dev/2024-April/087077.html . >> >> John's original version is in the draft PR https://github.com/openjdk/jdk/pull/18773. In order to minimize the size of this PR, I have kept only the functionalities for reading a line and a time. Other features, such as pushing back contents into the `inputStream`, could be added in follow-up PRs. (These removed features can be found in the commit history of this PR). > > Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: > > No need to call set_input(null_ptr) from inputStream destructor Drive-by comment: do we really need to combine input and output capabilities in one class, fileStream? Why not a dedicated fileOutputStream? APIs like setPosition() do not make much sense on an output stream. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18833#issuecomment-2097416228 From rehn at openjdk.org Tue May 7 05:47:10 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Tue, 7 May 2024 05:47:10 GMT Subject: RFR: 8326306: RISC-V: Re-structure MASM calls and jumps [v7] In-Reply-To: References: Message-ID: > Hi, please consider. > > We have code that directly use the asm for call/jumps instead masm. > Our masm have a bit odd naming, and we don't use 'proper' pseudoinstructions/mnemonics. > Suggested by [riscv-asm-manual](https://github.com/riscv-non-isa/riscv-asm-manual/tree/master) > > j offset jal x0, offset Jump > jal offset jal x1, offset Jump and link > jr rs jalr x0, rs, 0 Jump register > jalr rs jalr x1, rs, 0 Jump and link register > ret jalr x0, x1, 0 Return from subroutine > call offset auipc x1, offset[31:12]; jalr x1, x1, offset[11:0] Call far-away subroutine > tail offset auipc x6, offset[31:12]; jalr x0, x6, offset[11:0] Tail call far-away subroutine > > But these can only be implemented like this if you have small enough application. > The fallback of these is to use GOT (your C compiler should place a copy of GOT every 2G so it's always reachable). > We don't have GOT, instead we materialize, so there is still differences between these and ours. > > This patch: > - Tries to follow these suggested mappings as good we can. > - Make sure all jumps/calls go through MASM. (so we get control and can easily change for sites using a certain calling convention) > - To avoid confusion between MASM public/private methods and ASM methods and the mnemonics there are some renaming. > E.g. the mnemonics jal means call offset, as we can't use that so there is no 'jal'. > - I enabled c.j, but right now we never generate it. > - As always the macro does no good and are legacy from when code base did not use templates. (also the x-macros screws up my IDE (vim+rtags)) > > I started down this path due to I have followup patch on top of this which removes trampoline in favor for load-n-jump. > (WIP: https://github.com/robehn/jdk/compare/jal-fixes...robehn:jdk:load-n-link?expand=1) > While looking into our calls it was a bit confusing, this helps. > > Done a couple of t1-3 slightly different version of this patch, and as part of the followup, no issues found. (VF2, qemu, LP4) > Re-running tests, had some last minute changes. > > Thanks, Robbin Robbin Ehn has updated the pull request incrementally with one additional commit since the last revision: REVERT: Use li instead of movptr ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18942/files - new: https://git.openjdk.org/jdk/pull/18942/files/d8fbb00b..38bd4187 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18942&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18942&range=05-06 Stats: 4 lines in 1 file changed: 0 ins; 3 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/18942.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18942/head:pull/18942 PR: https://git.openjdk.org/jdk/pull/18942 From rehn at openjdk.org Tue May 7 05:47:10 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Tue, 7 May 2024 05:47:10 GMT Subject: RFR: 8326306: RISC-V: Re-structure MASM calls and jumps [v5] In-Reply-To: References: <2gtLyG74zJAPBvSyAMrJV5hGnT6KQgobNPOLlg85s90=.2dec9679-7b9d-400e-932a-f16be22dad1d@github.com> Message-ID: On Mon, 6 May 2024 17:58:02 GMT, Robbin Ehn wrote: >> I checked that we use the use call with a code cache address for the places we care about such as: >> I.e. JNI FastGetField and far_call() >> >> But I now see I was mistaken regarding the interpreter, it seems like we don't have all the math implemented, so we do calls to SharedRuntime there, I'll fix those. >> (e.g. setting -XX:ReservedCodeCacheSize=2047M some interpreter math will use movptr) >> >> Thanks! > > Great, thanks. > > I fixed it by using li() in la() when we can't use auipc, running tests. That was bad idea. Looking into something else. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18942#discussion_r1591841980 From rehn at openjdk.org Tue May 7 06:03:52 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Tue, 7 May 2024 06:03:52 GMT Subject: RFR: 8326306: RISC-V: Re-structure MASM calls and jumps [v5] In-Reply-To: References: <2gtLyG74zJAPBvSyAMrJV5hGnT6KQgobNPOLlg85s90=.2dec9679-7b9d-400e-932a-f16be22dad1d@github.com> Message-ID: On Tue, 7 May 2024 05:44:08 GMT, Robbin Ehn wrote: >> Great, thanks. >> >> I fixed it by using li() in la() when we can't use auipc, running tests. > > That was bad idea. > > Looking into something else. It's a bit annoying, since if we are about performance in the interpreter we should implement the math stubs, e.g. StubRoutines::dcos. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18942#discussion_r1591854389 From sjayagond at openjdk.org Tue May 7 06:06:56 2024 From: sjayagond at openjdk.org (Sidraya Jayagond) Date: Tue, 7 May 2024 06:06:56 GMT Subject: RFR: 8327652: S390x: Implements SLP support [v7] In-Reply-To: References: Message-ID: On Tue, 26 Mar 2024 15:10:37 GMT, Sidraya Jayagond wrote: >> This PR Adds SIMD support on s390x. > > Sidraya Jayagond has updated the pull request incrementally with one additional commit since the last revision: > > PopCountVI supported by z14 onwards. Still working on addressing intrinsic vector register clobbering. Commenting to keep PR Open. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18162#issuecomment-2097515266 From rehn at openjdk.org Tue May 7 07:12:56 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Tue, 7 May 2024 07:12:56 GMT Subject: RFR: 8326306: RISC-V: Re-structure MASM calls and jumps [v5] In-Reply-To: References: <2gtLyG74zJAPBvSyAMrJV5hGnT6KQgobNPOLlg85s90=.2dec9679-7b9d-400e-932a-f16be22dad1d@github.com> Message-ID: <0CA1KZQ1O5QsAPTPb2qnk_LqAVjrpec3-fh5dXvZOG0=.dd163617-9cc2-443c-8727-ca3a723071c6@github.com> On Tue, 7 May 2024 06:01:35 GMT, Robbin Ehn wrote: >> That was bad idea. >> >> Looking into something else. > > It's a bit annoying, since if we are about performance in the interpreter we should implement the math stubs, e.g. StubRoutines::dcos. I don't find a nice way to get li for them. https://github.com/openjdk/jdk/blob/3b8227ba24c7bc05a8ea23801e3816e8fc80de4e/src/hotspot/cpu/riscv/templateInterpreterGenerator_riscv.cpp#L272 Any suggestions? As I said, if we want to be performant here we should implement the "StubRoutines::dpow()". What you think? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18942#discussion_r1591921090 From fyang at openjdk.org Tue May 7 07:28:52 2024 From: fyang at openjdk.org (Fei Yang) Date: Tue, 7 May 2024 07:28:52 GMT Subject: RFR: 8326306: RISC-V: Re-structure MASM calls and jumps [v5] In-Reply-To: <0CA1KZQ1O5QsAPTPb2qnk_LqAVjrpec3-fh5dXvZOG0=.dd163617-9cc2-443c-8727-ca3a723071c6@github.com> References: <2gtLyG74zJAPBvSyAMrJV5hGnT6KQgobNPOLlg85s90=.2dec9679-7b9d-400e-932a-f16be22dad1d@github.com> <0CA1KZQ1O5QsAPTPb2qnk_LqAVjrpec3-fh5dXvZOG0=.dd163617-9cc2-443c-8727-ca3a723071c6@github.com> Message-ID: On Tue, 7 May 2024 07:10:21 GMT, Robbin Ehn wrote: >> It's a bit annoying, since if we are about performance in the interpreter we should implement the math stubs, e.g. StubRoutines::dcos. > > I don't find a nice way to get li for them. > https://github.com/openjdk/jdk/blob/3b8227ba24c7bc05a8ea23801e3816e8fc80de4e/src/hotspot/cpu/riscv/templateInterpreterGenerator_riscv.cpp#L272 > > Any suggestions? > > As I said, if we want to be performant here we should implement the "StubRoutines::dpow()". > > What you think? Hmm .. It does look bad to me if we use mv instead of movptr like you do in your previous commit [1]. Seems that a mv is more reasonable than movptr in this case when addr is out of code cache, isn't it? And it's not solely about the currently call sites like in the interpreter, we should also consider future possible uses of `call`. [1] https://github.com/openjdk/jdk/pull/18942/commits/d8fbb00b51afbf6a3b7dcd828f623b4add949faf ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18942#discussion_r1591939129 From aph-open at littlepinkcloud.com Tue May 7 08:06:12 2024 From: aph-open at littlepinkcloud.com (Andrew Haley) Date: Tue, 7 May 2024 09:06:12 +0100 Subject: Reply: RFR: 8331558: AArch64: optimize integer remainder [v2] In-Reply-To: References: Message-ID: <83f15dbc-bee5-470c-ae13-a60d4d0d6f32@littlepinkcloud.com> On 5/7/24 04:13, Jin Guojie wrote: > Could you provide some background explanation for what's the specials on a53mac? Type "Cortex A53 errata" into Google and have a read. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From aph at openjdk.org Tue May 7 08:19:53 2024 From: aph at openjdk.org (Andrew Haley) Date: Tue, 7 May 2024 08:19:53 GMT Subject: RFR: 8331098: [Aarch64] Fix crash in Arrays.equals() intrinsic with -CCP [v4] In-Reply-To: References: <_HzINQ0atD5BmBbIZ6A4A5y1wNvwsvrBxAiaz2Mk9rY=.43cde0ae-1179-4708-afa1-fda64039d722@github.com> Message-ID: On Fri, 26 Apr 2024 11:22:03 GMT, Roman Kennke wrote: >> The implementations of Arrays.equals() in macroAssembler_aarch64.cpp, MacroAssembler::arrays_equals() assumes that the start of arrays is 8-byte-aligned. Since [JDK-8139457](https://bugs.openjdk.org/browse/JDK-8139457) this is no longer the case, at least when running with -CompressedClassPointers (or Lilliput). The effect is that the loops may run over the array end, and if the array is at heap boundary, and that memory is unmapped, then it may crash. >> >> The proposed fix aims to always enter the main loop(s) with an aligned address: >> - When the array base is 8-byte-aligned (default, with +CCP), then compare the array lengths separately, then enter the main loop with the array base. >> - When the array base is not 8-byte-aligned (-CCP and Lilliput), then enter the loop with the address of the array-length (which is then 8-byte-aligned), and compare array lengths in the main loop, and elide the explicit array lengths comparison. >> >> Testing: >> - [x] tier1 (+CCP) >> - [x] tier1 (-CCP) >> - [x] tier2 (+CCP) >> - [x] tier2 (-CCP) > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Remove extra whitespace This still looks too complicated for such a simple thing. Why not read the whole array, fully aligned, until the final word (at start+length*elementSize-wordSize) which is possibly unaligned? That would work regardless of alignment. Check the lengths agree first, and the first whole read may or may not include the length field. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18948#issuecomment-2097719024 From jsjolen at openjdk.org Tue May 7 08:27:20 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Tue, 7 May 2024 08:27:20 GMT Subject: RFR: 8312132: Add tracking of multiple address spaces in NMT [v63] In-Reply-To: References: Message-ID: > Hi, > > This PR introduces a new abstraction to NMT, named `MemoryFileTracker`. Today, NMT does not track any memory outside of the virtual memory address space. This means that if you allocated memory in something such as a memory-backed file and use `mmap` to map into that memory, then you'll have trouble reporting this to NMT. This is the situation that ZGC is in, and that is what this patch attempts to fix. > > ## `MemoryFileTracker` > > The `MemoryFileTracker` adds the ability of adding new virtual memory address spaces to NMT and committing memory to these, the basic API is: > > ```c++ > static MemoryFile* make_device(const char* descriptive_name); > static void free_device(MemoryFile* device); > > static void allocate_memory(MemoryFile* device, size_t offset, size_t size, > MEMFLAGS flag, const NativeCallStack& stack); > static void free_memory(MemoryFile* device, size_t offset, size_t size); > > > It is easiest to see how this is used by looking at what ZGC's `ZNMT` class does: > > ```c++ > void ZNMT::reserve(zaddress_unsafe start, size_t size) { > MemTracker::record_virtual_memory_reserve((address)start, size, CALLER_PC, mtJavaHeap); > } > void ZNMT::commit(zoffset offset, size_t size) { > MemTracker::allocate_memory_in(ZNMT::_device, static_cast(offset), size, mtJavaHeap, CALLER_PC); > } > void ZNMT::uncommit(zoffset offset, size_t size) { > MemTracker::free_memory_in(ZNMT::_device, (size_t)offset, size); > } > > void ZNMT::map(zaddress_unsafe addr, size_t size, zoffset offset) { > // NMT doesn't track mappings at the moment. > } > void ZNMT::unmap(zaddress_unsafe addr, size_t size) { > // NMT doesn't track mappings at the moment. > } > > > As you can see, any mapping between reserved regions and device-allocated memory is not recorded in NMT. This means that in detailed mode you only get reserved regions printed for the reserved memory, the device-allocated memory is reported separately. When performing summary reporting any memory allocated via these devices is added to the corresponding `MEMFLAGS` as `committed` memory. > > This patch is also acting as a base on which we deploy multiple new backend ideas to NMT. These ideas are: > > 1. Implement VMA tracking using a balanced binary tree approach. Today's `VirtualMemoryTracker`'s usage of linked lists is slow and brittle, we'd like to move away from it. Our Treap-based approach in this patch gives a performance boost such that we see 25x better performance in a benchmark. The idea and draft of this... Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: Do not do that many operations when verifying the treap ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18289/files - new: https://git.openjdk.org/jdk/pull/18289/files/53695f81..50f42e8c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18289&range=62 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18289&range=61-62 Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/18289.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18289/head:pull/18289 PR: https://git.openjdk.org/jdk/pull/18289 From stefank at openjdk.org Tue May 7 08:28:02 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Tue, 7 May 2024 08:28:02 GMT Subject: RFR: 8331626: unsafe.cpp:162:38: runtime error in index_oop_from_field_offset_long - applying non-zero offset 4563897424 to null pointer [v2] In-Reply-To: References: <73h3Knwa33PoG1bq1S38-dEIKnB0lKfCKe3NwbIvNcU=.b9ea8021-5261-48eb-b0ec-cfae975477e9@github.com> Message-ID: On Mon, 6 May 2024 09:42:20 GMT, Martin Doerr wrote: >> `index_oop_from_field_offset_long` is sometimes used to access an absolute address by using `p == nullptr`. Unfortunately, `nullptr + byte_offset` implies undefined behavior and should better get fixed. UBSan complains about it (see JBS issue). >> A possible solution is to replace pointer arithmetic by integer arithmetic. We can use unsigned because `assert_field_offset_sane` checks that `byte_offset >= 0`. > > Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: > > Change coding style. Marked as reviewed by stefank (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/19087#pullrequestreview-2042422955 From stefank at openjdk.org Tue May 7 08:28:02 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Tue, 7 May 2024 08:28:02 GMT Subject: RFR: 8331626: unsafe.cpp:162:38: runtime error in index_oop_from_field_offset_long - applying non-zero offset 4563897424 to null pointer [v2] In-Reply-To: <-4KNPnonH3nMtX0pNZg4bDjT7vwTu__uA1ww20_Dt8Q=.2f33db11-0b9a-4a69-b1b1-710d0cb85c0f@github.com> References: <73h3Knwa33PoG1bq1S38-dEIKnB0lKfCKe3NwbIvNcU=.b9ea8021-5261-48eb-b0ec-cfae975477e9@github.com> <3ECiNSLQXPoEzKRXMxXTPX1VVZeeW06xKaIhdTRfvcM=.2c26252c-ff31-4f24-ba06-545af7b70ae0@github.com> <-4KNPnonH3nMtX0pNZg4bDjT7vwTu__uA1ww20_Dt8Q=.2f33db11-0b9a-4a69-b1b1-710d0cb85c0f@github.com> Message-ID: On Mon, 6 May 2024 16:19:29 GMT, Martin Doerr wrote: > I'm aware of the "C/C++ grammar" problems (especially when using pointers) which didn't exist in my simple use case. Of course. I didn't try to imply that you didn't. > Anyway, it's already changed. Thanks. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19087#discussion_r1592015457 From shade at openjdk.org Tue May 7 08:32:58 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 7 May 2024 08:32:58 GMT Subject: RFR: 8331714: Make OopMapCache installation lock-free In-Reply-To: References: Message-ID: <3VOqc8uqGvZRi13zhuE0qGuUJdbU1lGjKjeyzSj1TyQ=.b325144e-8ee2-4711-9de4-fcc66ce4475d@github.com> On Mon, 6 May 2024 10:02:40 GMT, Aleksey Shipilev wrote: > Trying to solve [JDK-8331572](https://bugs.openjdk.org/browse/JDK-8331572) runs into all sorts of lock ranking issues with `OopMapCacheAlloc_lock`. I think it would be a bit saner to rewrite the double-checked locking installation to atomic lock-free. OpenJDK code was using this lock since the initial load. > > There is a drawback that we might be trying to instantiate multiple `OopMapCache` instances from multiple threads. I think this is not a practical problem, as only a few threads would race here, and the allocation is relatively small (32*8 = 512 bytes). In imaginary worst^W nightmare case, with 100K threads racing we get a temporary native memory spike at +50M. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `all` tests Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/19100#issuecomment-2097739754 From shade at openjdk.org Tue May 7 08:32:58 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 7 May 2024 08:32:58 GMT Subject: Integrated: 8331714: Make OopMapCache installation lock-free In-Reply-To: References: Message-ID: On Mon, 6 May 2024 10:02:40 GMT, Aleksey Shipilev wrote: > Trying to solve [JDK-8331572](https://bugs.openjdk.org/browse/JDK-8331572) runs into all sorts of lock ranking issues with `OopMapCacheAlloc_lock`. I think it would be a bit saner to rewrite the double-checked locking installation to atomic lock-free. OpenJDK code was using this lock since the initial load. > > There is a drawback that we might be trying to instantiate multiple `OopMapCache` instances from multiple threads. I think this is not a practical problem, as only a few threads would race here, and the allocation is relatively small (32*8 = 512 bytes). In imaginary worst^W nightmare case, with 100K threads racing we get a temporary native memory spike at +50M. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `all` tests This pull request has now been integrated. Changeset: a2584a83 Author: Aleksey Shipilev URL: https://git.openjdk.org/jdk/commit/a2584a8341b2dc9c102abd373a890b2108d3f57e Stats: 12 lines in 3 files changed: 1 ins; 3 del; 8 mod 8331714: Make OopMapCache installation lock-free Reviewed-by: zgu, coleenp ------------- PR: https://git.openjdk.org/jdk/pull/19100 From mdoerr at openjdk.org Tue May 7 08:34:57 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 7 May 2024 08:34:57 GMT Subject: Integrated: 8331626: unsafe.cpp:162:38: runtime error in index_oop_from_field_offset_long - applying non-zero offset 4563897424 to null pointer In-Reply-To: <73h3Knwa33PoG1bq1S38-dEIKnB0lKfCKe3NwbIvNcU=.b9ea8021-5261-48eb-b0ec-cfae975477e9@github.com> References: <73h3Knwa33PoG1bq1S38-dEIKnB0lKfCKe3NwbIvNcU=.b9ea8021-5261-48eb-b0ec-cfae975477e9@github.com> Message-ID: On Fri, 3 May 2024 14:01:34 GMT, Martin Doerr wrote: > `index_oop_from_field_offset_long` is sometimes used to access an absolute address by using `p == nullptr`. Unfortunately, `nullptr + byte_offset` implies undefined behavior and should better get fixed. UBSan complains about it (see JBS issue). > A possible solution is to replace pointer arithmetic by integer arithmetic. We can use unsigned because `assert_field_offset_sane` checks that `byte_offset >= 0`. This pull request has now been integrated. Changeset: 23a72a1f Author: Martin Doerr URL: https://git.openjdk.org/jdk/commit/23a72a1f2f651d5e8e9a0eb1e75e2b44572a13da Stats: 7 lines in 1 file changed: 0 ins; 4 del; 3 mod 8331626: unsafe.cpp:162:38: runtime error in index_oop_from_field_offset_long - applying non-zero offset 4563897424 to null pointer Reviewed-by: mbaesken, stefank ------------- PR: https://git.openjdk.org/jdk/pull/19087 From mdoerr at openjdk.org Tue May 7 08:34:56 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 7 May 2024 08:34:56 GMT Subject: RFR: 8331626: unsafe.cpp:162:38: runtime error in index_oop_from_field_offset_long - applying non-zero offset 4563897424 to null pointer [v2] In-Reply-To: References: <73h3Knwa33PoG1bq1S38-dEIKnB0lKfCKe3NwbIvNcU=.b9ea8021-5261-48eb-b0ec-cfae975477e9@github.com> Message-ID: On Mon, 6 May 2024 09:42:20 GMT, Martin Doerr wrote: >> `index_oop_from_field_offset_long` is sometimes used to access an absolute address by using `p == nullptr`. Unfortunately, `nullptr + byte_offset` implies undefined behavior and should better get fixed. UBSan complains about it (see JBS issue). >> A possible solution is to replace pointer arithmetic by integer arithmetic. We can use unsigned because `assert_field_offset_sane` checks that `byte_offset >= 0`. > > Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: > > Change coding style. Thanks for the reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/19087#issuecomment-2097744567 From jsjolen at openjdk.org Tue May 7 09:42:18 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Tue, 7 May 2024 09:42:18 GMT Subject: RFR: 8312132: Add tracking of multiple address spaces in NMT [v64] In-Reply-To: References: Message-ID: > Hi, > > This PR introduces a new abstraction to NMT, named `MemoryFileTracker`. Today, NMT does not track any memory outside of the virtual memory address space. This means that if you allocated memory in something such as a memory-backed file and use `mmap` to map into that memory, then you'll have trouble reporting this to NMT. This is the situation that ZGC is in, and that is what this patch attempts to fix. > > ## `MemoryFileTracker` > > The `MemoryFileTracker` adds the ability of adding new virtual memory address spaces to NMT and committing memory to these, the basic API is: > > ```c++ > static MemoryFile* make_device(const char* descriptive_name); > static void free_device(MemoryFile* device); > > static void allocate_memory(MemoryFile* device, size_t offset, size_t size, > MEMFLAGS flag, const NativeCallStack& stack); > static void free_memory(MemoryFile* device, size_t offset, size_t size); > > > It is easiest to see how this is used by looking at what ZGC's `ZNMT` class does: > > ```c++ > void ZNMT::reserve(zaddress_unsafe start, size_t size) { > MemTracker::record_virtual_memory_reserve((address)start, size, CALLER_PC, mtJavaHeap); > } > void ZNMT::commit(zoffset offset, size_t size) { > MemTracker::allocate_memory_in(ZNMT::_device, static_cast(offset), size, mtJavaHeap, CALLER_PC); > } > void ZNMT::uncommit(zoffset offset, size_t size) { > MemTracker::free_memory_in(ZNMT::_device, (size_t)offset, size); > } > > void ZNMT::map(zaddress_unsafe addr, size_t size, zoffset offset) { > // NMT doesn't track mappings at the moment. > } > void ZNMT::unmap(zaddress_unsafe addr, size_t size) { > // NMT doesn't track mappings at the moment. > } > > > As you can see, any mapping between reserved regions and device-allocated memory is not recorded in NMT. This means that in detailed mode you only get reserved regions printed for the reserved memory, the device-allocated memory is reported separately. When performing summary reporting any memory allocated via these devices is added to the corresponding `MEMFLAGS` as `committed` memory. > > This patch is also acting as a base on which we deploy multiple new backend ideas to NMT. These ideas are: > > 1. Implement VMA tracking using a balanced binary tree approach. Today's `VirtualMemoryTracker`'s usage of linked lists is slow and brittle, we'd like to move away from it. Our Treap-based approach in this patch gives a performance boost such that we see 25x better performance in a benchmark. The idea and draft of this... Johan Sj?len has updated the pull request incrementally with two additional commits since the last revision: - Compress - Visit range in order to skip sorting ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18289/files - new: https://git.openjdk.org/jdk/pull/18289/files/50f42e8c..91b8b44e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18289&range=63 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18289&range=62-63 Stats: 110 lines in 5 files changed: 38 ins; 38 del; 34 mod Patch: https://git.openjdk.org/jdk/pull/18289.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18289/head:pull/18289 PR: https://git.openjdk.org/jdk/pull/18289 From jsjolen at openjdk.org Tue May 7 09:59:38 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Tue, 7 May 2024 09:59:38 GMT Subject: RFR: 8312132: Add tracking of multiple address spaces in NMT [v65] In-Reply-To: References: Message-ID: > Hi, > > This PR introduces a new abstraction to NMT, named `MemoryFileTracker`. Today, NMT does not track any memory outside of the virtual memory address space. This means that if you allocated memory in something such as a memory-backed file and use `mmap` to map into that memory, then you'll have trouble reporting this to NMT. This is the situation that ZGC is in, and that is what this patch attempts to fix. > > ## `MemoryFileTracker` > > The `MemoryFileTracker` adds the ability of adding new virtual memory address spaces to NMT and committing memory to these, the basic API is: > > ```c++ > static MemoryFile* make_device(const char* descriptive_name); > static void free_device(MemoryFile* device); > > static void allocate_memory(MemoryFile* device, size_t offset, size_t size, > MEMFLAGS flag, const NativeCallStack& stack); > static void free_memory(MemoryFile* device, size_t offset, size_t size); > > > It is easiest to see how this is used by looking at what ZGC's `ZNMT` class does: > > ```c++ > void ZNMT::reserve(zaddress_unsafe start, size_t size) { > MemTracker::record_virtual_memory_reserve((address)start, size, CALLER_PC, mtJavaHeap); > } > void ZNMT::commit(zoffset offset, size_t size) { > MemTracker::allocate_memory_in(ZNMT::_device, static_cast(offset), size, mtJavaHeap, CALLER_PC); > } > void ZNMT::uncommit(zoffset offset, size_t size) { > MemTracker::free_memory_in(ZNMT::_device, (size_t)offset, size); > } > > void ZNMT::map(zaddress_unsafe addr, size_t size, zoffset offset) { > // NMT doesn't track mappings at the moment. > } > void ZNMT::unmap(zaddress_unsafe addr, size_t size) { > // NMT doesn't track mappings at the moment. > } > > > As you can see, any mapping between reserved regions and device-allocated memory is not recorded in NMT. This means that in detailed mode you only get reserved regions printed for the reserved memory, the device-allocated memory is reported separately. When performing summary reporting any memory allocated via these devices is added to the corresponding `MEMFLAGS` as `committed` memory. > > This patch is also acting as a base on which we deploy multiple new backend ideas to NMT. These ideas are: > > 1. Implement VMA tracking using a balanced binary tree approach. Today's `VirtualMemoryTracker`'s usage of linked lists is slow and brittle, we'd like to move away from it. Our Treap-based approach in this patch gives a performance boost such that we see 25x better performance in a benchmark. The idea and draft of this... Johan Sj?len has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 132 additional commits since the last revision: - Merge remote-tracking branch 'openjdk/master' into nmt-physical-device - Compress - Visit range in order to skip sorting - Do not do that many operations when verifying the treap - Remove friend class VMATree from Treap - Switch out manual loop in VMATree to functor iterators in Treap - Forgot to initialize - Move comma - Store NCS:s on the side for 4-byte pointers to each NCS - Monotonic ordering of keys - ... and 122 more: https://git.openjdk.org/jdk/compare/d58a870e...daba4b5d ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18289/files - new: https://git.openjdk.org/jdk/pull/18289/files/91b8b44e..daba4b5d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18289&range=64 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18289&range=63-64 Stats: 4645 lines in 231 files changed: 2884 ins; 878 del; 883 mod Patch: https://git.openjdk.org/jdk/pull/18289.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18289/head:pull/18289 PR: https://git.openjdk.org/jdk/pull/18289 From rehn at openjdk.org Tue May 7 10:44:04 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Tue, 7 May 2024 10:44:04 GMT Subject: RFR: 8326306: RISC-V: Re-structure MASM calls and jumps [v8] In-Reply-To: References: Message-ID: > Hi, please consider. > > We have code that directly use the asm for call/jumps instead masm. > Our masm have a bit odd naming, and we don't use 'proper' pseudoinstructions/mnemonics. > Suggested by [riscv-asm-manual](https://github.com/riscv-non-isa/riscv-asm-manual/tree/master) > > j offset jal x0, offset Jump > jal offset jal x1, offset Jump and link > jr rs jalr x0, rs, 0 Jump register > jalr rs jalr x1, rs, 0 Jump and link register > ret jalr x0, x1, 0 Return from subroutine > call offset auipc x1, offset[31:12]; jalr x1, x1, offset[11:0] Call far-away subroutine > tail offset auipc x6, offset[31:12]; jalr x0, x6, offset[11:0] Tail call far-away subroutine > > But these can only be implemented like this if you have small enough application. > The fallback of these is to use GOT (your C compiler should place a copy of GOT every 2G so it's always reachable). > We don't have GOT, instead we materialize, so there is still differences between these and ours. > > This patch: > - Tries to follow these suggested mappings as good we can. > - Make sure all jumps/calls go through MASM. (so we get control and can easily change for sites using a certain calling convention) > - To avoid confusion between MASM public/private methods and ASM methods and the mnemonics there are some renaming. > E.g. the mnemonics jal means call offset, as we can't use that so there is no 'jal'. > - I enabled c.j, but right now we never generate it. > - As always the macro does no good and are legacy from when code base did not use templates. (also the x-macros screws up my IDE (vim+rtags)) > > I started down this path due to I have followup patch on top of this which removes trampoline in favor for load-n-jump. > (WIP: https://github.com/robehn/jdk/compare/jal-fixes...robehn:jdk:load-n-link?expand=1) > While looking into our calls it was a bit confusing, this helps. > > Done a couple of t1-3 slightly different version of this patch, and as part of the followup, no issues found. (VF2, qemu, LP4) > Re-running tests, had some last minute changes. > > Thanks, Robbin Robbin Ehn has updated the pull request incrementally with one additional commit since the last revision: Use li instead of movptr for call ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18942/files - new: https://git.openjdk.org/jdk/pull/18942/files/38bd4187..8408c027 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18942&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18942&range=06-07 Stats: 8 lines in 2 files changed: 5 ins; 1 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/18942.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18942/head:pull/18942 PR: https://git.openjdk.org/jdk/pull/18942 From rehn at openjdk.org Tue May 7 10:44:04 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Tue, 7 May 2024 10:44:04 GMT Subject: RFR: 8326306: RISC-V: Re-structure MASM calls and jumps [v5] In-Reply-To: References: <2gtLyG74zJAPBvSyAMrJV5hGnT6KQgobNPOLlg85s90=.2dec9679-7b9d-400e-932a-f16be22dad1d@github.com> <0CA1KZQ1O5QsAPTPb2qnk_LqAVjrpec3-fh5dXvZOG0=.dd163617-9cc2-443c-8727-ca3a723071c6@github.com> Message-ID: On Tue, 7 May 2024 07:25:02 GMT, Fei Yang wrote: >> I don't find a nice way to get li for them. >> https://github.com/openjdk/jdk/blob/3b8227ba24c7bc05a8ea23801e3816e8fc80de4e/src/hotspot/cpu/riscv/templateInterpreterGenerator_riscv.cpp#L272 >> >> Any suggestions? >> >> As I said, if we want to be performant here we should implement the "StubRoutines::dpow()". >> >> What you think? > > Hmm .. It does look bad to me if we use mv instead of movptr like you do in your previous commit [1]. > Seems that a mv is more reasonable than movptr in this case when addr is out of code cache, isn't it? > And it's not solely about the currently call sites like in the interpreter, we should also consider future possible uses of `call`. > > (Or keep the old `call` to be conservative for this PR?) > > [1] https://github.com/openjdk/jdk/pull/18942/commits/d8fbb00b51afbf6a3b7dcd828f623b4add949faf As I can't fix this with a few changes, and I don't want to mess up this PR. I added check for li directly in the call. Sanity tested with -XX:ReservedCodeCacheSize=2047M ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18942#discussion_r1592234233 From rkennke at openjdk.org Tue May 7 10:58:53 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Tue, 7 May 2024 10:58:53 GMT Subject: RFR: 8331098: [Aarch64] Fix crash in Arrays.equals() intrinsic with -CCP [v4] In-Reply-To: References: <_HzINQ0atD5BmBbIZ6A4A5y1wNvwsvrBxAiaz2Mk9rY=.43cde0ae-1179-4708-afa1-fda64039d722@github.com> Message-ID: On Mon, 6 May 2024 12:03:03 GMT, Axel Boldt-Christmas wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Remove extra whitespace > > src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp line 5770: > >> 5768: sub(tmp5, zr, cnt1, LSL, 3 + log_elem_size); >> 5769: ldr(tmp3, Address(pre(a1, start_offset))); >> 5770: ldr(tmp4, Address(pre(a2, start_offset))); > > Is the use of `pre` intentional? If that is the case why would that be better than just a `reg + offset`? (Given that `a1` and `a2` are not used again) No, this is a copy+paste mistake. Will fix. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18948#discussion_r1592272700 From jsjolen at openjdk.org Tue May 7 11:06:35 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Tue, 7 May 2024 11:06:35 GMT Subject: RFR: 8312132: Add tracking of multiple address spaces in NMT [v66] In-Reply-To: References: Message-ID: > Hi, > > This PR introduces a new abstraction to NMT, named `MemoryFileTracker`. Today, NMT does not track any memory outside of the virtual memory address space. This means that if you allocated memory in something such as a memory-backed file and use `mmap` to map into that memory, then you'll have trouble reporting this to NMT. This is the situation that ZGC is in, and that is what this patch attempts to fix. > > ## `MemoryFileTracker` > > The `MemoryFileTracker` adds the ability of adding new virtual memory address spaces to NMT and committing memory to these, the basic API is: > > ```c++ > static MemoryFile* make_device(const char* descriptive_name); > static void free_device(MemoryFile* device); > > static void allocate_memory(MemoryFile* device, size_t offset, size_t size, > MEMFLAGS flag, const NativeCallStack& stack); > static void free_memory(MemoryFile* device, size_t offset, size_t size); > > > It is easiest to see how this is used by looking at what ZGC's `ZNMT` class does: > > ```c++ > void ZNMT::reserve(zaddress_unsafe start, size_t size) { > MemTracker::record_virtual_memory_reserve((address)start, size, CALLER_PC, mtJavaHeap); > } > void ZNMT::commit(zoffset offset, size_t size) { > MemTracker::allocate_memory_in(ZNMT::_device, static_cast(offset), size, mtJavaHeap, CALLER_PC); > } > void ZNMT::uncommit(zoffset offset, size_t size) { > MemTracker::free_memory_in(ZNMT::_device, (size_t)offset, size); > } > > void ZNMT::map(zaddress_unsafe addr, size_t size, zoffset offset) { > // NMT doesn't track mappings at the moment. > } > void ZNMT::unmap(zaddress_unsafe addr, size_t size) { > // NMT doesn't track mappings at the moment. > } > > > As you can see, any mapping between reserved regions and device-allocated memory is not recorded in NMT. This means that in detailed mode you only get reserved regions printed for the reserved memory, the device-allocated memory is reported separately. When performing summary reporting any memory allocated via these devices is added to the corresponding `MEMFLAGS` as `committed` memory. > > This patch is also acting as a base on which we deploy multiple new backend ideas to NMT. These ideas are: > > 1. Implement VMA tracking using a balanced binary tree approach. Today's `VirtualMemoryTracker`'s usage of linked lists is slow and brittle, we'd like to move away from it. Our Treap-based approach in this patch gives a performance boost such that we see 25x better performance in a benchmark. The idea and draft of this... Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: ifdef after includes ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18289/files - new: https://git.openjdk.org/jdk/pull/18289/files/daba4b5d..510d7a3c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18289&range=65 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18289&range=64-65 Stats: 4 lines in 1 file changed: 2 ins; 2 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/18289.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18289/head:pull/18289 PR: https://git.openjdk.org/jdk/pull/18289 From rkennke at openjdk.org Tue May 7 11:07:54 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Tue, 7 May 2024 11:07:54 GMT Subject: RFR: 8331098: [Aarch64] Fix crash in Arrays.equals() intrinsic with -CCP [v4] In-Reply-To: References: <_HzINQ0atD5BmBbIZ6A4A5y1wNvwsvrBxAiaz2Mk9rY=.43cde0ae-1179-4708-afa1-fda64039d722@github.com> Message-ID: On Tue, 7 May 2024 08:17:43 GMT, Andrew Haley wrote: > This still looks too complicated for such a simple thing. Why not read the whole array, fully aligned, until the final word (at start+length*elementSize-wordSize) which is possibly unaligned? That would work regardless of alignment. Check the lengths agree first, and the first whole read may or may not include the length field. I disagree. This means we would access the length 2x unconditionally (at least in the -CCP/Lilliput path), and need to generate extra code (in the +CCP path for the length check, in the -CCP path to also increase the cnt2 length counter). I suspect that reading lengths 2x could add up pretty quick when comparing short arrays (which is pretty common, e.g. strings). What we *could* do instead is unconditionally compare the lengths in the loop. This means that in the +CCP path, we would also have to compare the compressed Klass*, but I think this should be ok. I don't think that 8-byte loads are any slower than 4-byte loads, and we're actually saving the extra instructions ahead of the loop. And we would not have 2 different paths for -CCP vs +CCP. What do you think? ------------- PR Comment: https://git.openjdk.org/jdk/pull/18948#issuecomment-2098139105 From rkennke at openjdk.org Tue May 7 11:11:05 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Tue, 7 May 2024 11:11:05 GMT Subject: RFR: 8331098: [Aarch64] Fix crash in Arrays.equals() intrinsic with -CCP [v5] In-Reply-To: <_HzINQ0atD5BmBbIZ6A4A5y1wNvwsvrBxAiaz2Mk9rY=.43cde0ae-1179-4708-afa1-fda64039d722@github.com> References: <_HzINQ0atD5BmBbIZ6A4A5y1wNvwsvrBxAiaz2Mk9rY=.43cde0ae-1179-4708-afa1-fda64039d722@github.com> Message-ID: > The implementations of Arrays.equals() in macroAssembler_aarch64.cpp, MacroAssembler::arrays_equals() assumes that the start of arrays is 8-byte-aligned. Since [JDK-8139457](https://bugs.openjdk.org/browse/JDK-8139457) this is no longer the case, at least when running with -CompressedClassPointers (or Lilliput). The effect is that the loops may run over the array end, and if the array is at heap boundary, and that memory is unmapped, then it may crash. > > The proposed fix aims to always enter the main loop(s) with an aligned address: > - When the array base is 8-byte-aligned (default, with +CCP), then compare the array lengths separately, then enter the main loop with the array base. > - When the array base is not 8-byte-aligned (-CCP and Lilliput), then enter the loop with the address of the array-length (which is then 8-byte-aligned), and compare array lengths in the main loop, and elide the explicit array lengths comparison. > > Testing: > - [x] tier1 (+CCP) > - [x] tier1 (-CCP) > - [x] tier2 (+CCP) > - [x] tier2 (-CCP) Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: @xmas92 review ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18948/files - new: https://git.openjdk.org/jdk/pull/18948/files/cca53b89..031c91e4 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18948&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18948&range=03-04 Stats: 15 lines in 1 file changed: 6 ins; 5 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/18948.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18948/head:pull/18948 PR: https://git.openjdk.org/jdk/pull/18948 From jsjolen at openjdk.org Tue May 7 11:56:33 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Tue, 7 May 2024 11:56:33 GMT Subject: RFR: 8312132: Add tracking of multiple address spaces in NMT [v67] In-Reply-To: References: Message-ID: > Hi, > > This PR introduces a new abstraction to NMT, named `MemoryFileTracker`. Today, NMT does not track any memory outside of the virtual memory address space. This means that if you allocated memory in something such as a memory-backed file and use `mmap` to map into that memory, then you'll have trouble reporting this to NMT. This is the situation that ZGC is in, and that is what this patch attempts to fix. > > ## `MemoryFileTracker` > > The `MemoryFileTracker` adds the ability of adding new virtual memory address spaces to NMT and committing memory to these, the basic API is: > > ```c++ > static MemoryFile* make_device(const char* descriptive_name); > static void free_device(MemoryFile* device); > > static void allocate_memory(MemoryFile* device, size_t offset, size_t size, > MEMFLAGS flag, const NativeCallStack& stack); > static void free_memory(MemoryFile* device, size_t offset, size_t size); > > > It is easiest to see how this is used by looking at what ZGC's `ZNMT` class does: > > ```c++ > void ZNMT::reserve(zaddress_unsafe start, size_t size) { > MemTracker::record_virtual_memory_reserve((address)start, size, CALLER_PC, mtJavaHeap); > } > void ZNMT::commit(zoffset offset, size_t size) { > MemTracker::allocate_memory_in(ZNMT::_device, static_cast(offset), size, mtJavaHeap, CALLER_PC); > } > void ZNMT::uncommit(zoffset offset, size_t size) { > MemTracker::free_memory_in(ZNMT::_device, (size_t)offset, size); > } > > void ZNMT::map(zaddress_unsafe addr, size_t size, zoffset offset) { > // NMT doesn't track mappings at the moment. > } > void ZNMT::unmap(zaddress_unsafe addr, size_t size) { > // NMT doesn't track mappings at the moment. > } > > > As you can see, any mapping between reserved regions and device-allocated memory is not recorded in NMT. This means that in detailed mode you only get reserved regions printed for the reserved memory, the device-allocated memory is reported separately. When performing summary reporting any memory allocated via these devices is added to the corresponding `MEMFLAGS` as `committed` memory. > > This patch is also acting as a base on which we deploy multiple new backend ideas to NMT. These ideas are: > > 1. Implement VMA tracking using a balanced binary tree approach. Today's `VirtualMemoryTracker`'s usage of linked lists is slow and brittle, we'd like to move away from it. Our Treap-based approach in this patch gives a performance boost such that we see 25x better performance in a benchmark. The idea and draft of this... Johan Sj?len has updated the pull request incrementally with four additional commits since the last revision: - Remove GEQ_B - Move things around slightly to be closer to usage - Simplify code - Remove superfluous comment ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18289/files - new: https://git.openjdk.org/jdk/pull/18289/files/510d7a3c..78b75213 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18289&range=66 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18289&range=65-66 Stats: 28 lines in 1 file changed: 3 ins; 18 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/18289.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18289/head:pull/18289 PR: https://git.openjdk.org/jdk/pull/18289 From jsjolen at openjdk.org Tue May 7 11:56:33 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Tue, 7 May 2024 11:56:33 GMT Subject: RFR: 8312132: Add tracking of multiple address spaces in NMT [v5] In-Reply-To: References: <-XAziSwGMo20pUAnbdRW1JUk_0ZB-80RVfAHr0iuewE=.bff8f2f7-01e2-46eb-bd4b-1b16fccc6aa1@github.com> <3al4DjsRcIX_qJZNbTGqBDIAOj4bU5l8xpYPHQE8cNM=.7cc0bdfe-c9c8-46ce-ad42-397c61b5a603@github.com> <7u3imUh6-qb_wLdyZ4mn5SfnEOkxyFEQ20O0fb6WJj0=.3179edcb-0340-4d50-a674-c18128cc2e2f@github.com> <3z6o8urlRN3qEViyH6CMdXYByP0LR8mMKBYVe9_xKGI=.db9bb8d2-d4c1-4b23-9667-c0a9b7d7b94f@github.com> Message-ID: On Sun, 5 May 2024 09:28:09 GMT, Johan Sj?len wrote: >> Ah, I get the confusion. This is not what I meant. >> >> What I mean was: >> >> At the moment you malloc space for NativeCallStack, then keep NativeCallStack* in the hash map. NativeCallStack* now uniquely identifies your stack. >> >> What I meant is to place NativeCallStack in a growable array. Now, you have a 32-bit or even a 16-bit index into that array. That index uniquely identifies the stack. You keep that index the hashmap. The hashmap does not change. Hashmap storage has nothing to do with that array. This is not the bucket array. >> >> Basically, you replace the malloc for the NativeCallStack with a placement-new in a new growable array. The rest stays the same. >> >> But now, you have a 32-bit or even 16-bit index, and that is smaller than a native pointer, which makes it possible to encode the stack information in a tree node much more succinctively. This makes it possible to encode the whole tree node metainfo very comfortably in a single 64-bit value. You can even get both in- and out-state of the VMATree into a single 64-bit value like this: >> >> bits 0-7 MEMFLAGS in >> bits 8-16 State in >> bits 16-31 callstack index in >> >> bits 32-39 MEMFLAGS out >> bits 40-47 State out >> bits 48-63 callstack index out > > Oh right of course, just store the NCS separately to the closed-addressing hashtable. I'm going for a 32-bit value just because that's the quickest. We can do a further compression round in a future PR. > > If we really wanted to, we could also store the `Link`s in a GA and thus reduce their pointer sizes to 32-bits also. Still, future PR, IMHO. We're down to a 4-byte `StackIndex`, hooray! Resolving this conversation now, let's go further down this route in a future RFE. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18289#discussion_r1592346276 From jsjolen at openjdk.org Tue May 7 11:56:33 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Tue, 7 May 2024 11:56:33 GMT Subject: RFR: 8312132: Add tracking of multiple address spaces in NMT [v5] In-Reply-To: <3z6o8urlRN3qEViyH6CMdXYByP0LR8mMKBYVe9_xKGI=.db9bb8d2-d4c1-4b23-9667-c0a9b7d7b94f@github.com> References: <-XAziSwGMo20pUAnbdRW1JUk_0ZB-80RVfAHr0iuewE=.bff8f2f7-01e2-46eb-bd4b-1b16fccc6aa1@github.com> <3al4DjsRcIX_qJZNbTGqBDIAOj4bU5l8xpYPHQE8cNM=.7cc0bdfe-c9c8-46ce-ad42-397c61b5a603@github.com> <7u3imUh6-qb_wLdyZ4mn5SfnEOkxyFEQ20O0fb6WJj0=.3179edcb-0340-4d50-a674-c18128cc2e2f@github.com> <3z6o8urlRN3qEViyH6CMdXYByP0LR8mMKBYVe9_xKGI=.db9bb8d2-d4c1-4b23-9667-c0a9b7d7b94f@github.com> Message-ID: <8zlh2tct178ScSv30gJrs94kElKiMtIzrXAd9vHnpRs=.3e31351e-e1ad-4567-bfc2-6682f6056bfe@github.com> On Tue, 30 Apr 2024 08:01:55 GMT, Thomas Stuefe wrote: >If you want to keep the linked list - after all, this is just a performance- and memory-optimization - why not just return a const NativeCallStack* instead of an index? Because it made refactoring into a 4-byte index trivial ;-)! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18289#discussion_r1592350252 From jsjolen at openjdk.org Tue May 7 12:04:59 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Tue, 7 May 2024 12:04:59 GMT Subject: RFR: 8312132: Add tracking of multiple address spaces in NMT [v46] In-Reply-To: References: Message-ID: <1Qlng-GCLxD3vC4_kbRr9jVbkAqOkF_wEYquJSvte74=.8bbaf0f3-4f15-4c88-af64-e4e4ba2e15ba@github.com> On Thu, 25 Apr 2024 10:23:35 GMT, Johan Sj?len wrote: >> src/hotspot/share/nmt/vmatree.cpp line 242: >> >>> 240: } >>> 241: return diff; >>> 242: } >> >> Would be nice if we can break this function into some smaller sub-functions. It is 200+ line now and little hard to track the logic. Thanks! > > Sure, I think there are a couple of cases which are actual functions (taking input, producing output, nothing else), those can be converted. I've cleaned up the code a bit. I don't think there's much of a point in hiding the computations behind subfunctions, as it reads from top to bottom. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18289#discussion_r1592364938 From bkilambi at openjdk.org Tue May 7 12:46:55 2024 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Tue, 7 May 2024 12:46:55 GMT Subject: RFR: 8331558: AArch64: optimize integer remainder [v2] In-Reply-To: References: Message-ID: On Mon, 6 May 2024 05:50:13 GMT, Jin Guojie wrote: >> 8331558: AArch64: optimize integer remainder >> On some Arm processors, a separate multiply/subtract is actually faster than the combined instruction. >> >> 8331556: AArch64: CPU_Model support for Neoverse N1/N2/V1/V2 >> Add full platform coverage for Neoverse variants in vm_version.?pp >> >> The following test has passed, which shows definite performance improvement. >> >> make test TEST="micro:java.lang.IntegerDivMod" >> make test TEST="micro:java.lang.LongDivMod" >> >> * IntegerDivMod.testDivideRemainderUnsigned >> baseline(ns/ops) 2223 >> with this pacth(ns/ops) 1885 >> improvement(%) 17.93% >> >> * IntegerDivMod.testRemainderUnsigned >> baseline(ns/ops) 2225 >> with this pacth(ns/ops) 1885 >> improvement(%) 18.03% >> >> * LongDivMod.testDivideRemainderUnsigned >> baseline(ns/ops) 2231 >> with this pacth(ns/ops) 1894 >> improvement(%) 17.79% >> >> * LongDivMod.testRemainderUnsigned >> baseline(ns/ops) 2232 >> with this pacth(ns/ops) 1891 >> improvement(%) 18.03% > > Jin Guojie has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - Merge branch 'openjdk:master' into dev > - Update vm_version_aarch64.hpp > - 8331558: AArch64: optimize integer remainder > > On some Arm processors, a separate multiply/subtract is actually faster than the combined instruction. > - 8331556: AArch64: CPU_Model support for Neoverse N1/N2/V1/V2 > > Add full platform coverage for Neoverse variants in vm_version.?pp Hi, on which machine are the reported performance numbers (in the commit message) run? ------------- PR Comment: https://git.openjdk.org/jdk/pull/19093#issuecomment-2098322163 From stuefe at openjdk.org Tue May 7 13:08:59 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 7 May 2024 13:08:59 GMT Subject: RFR: 8312132: Add tracking of multiple address spaces in NMT [v46] In-Reply-To: References: <8Pkr2lOm0YS7yPAEZooSGXR1WhOwyDkv2ej0qxCOKp4=.513c6399-f24e-4145-bcc9-e19eb0243949@github.com> Message-ID: On Fri, 3 May 2024 08:25:04 GMT, Johan Sj?len wrote: >> I think we should rethink recording specific stacks for uncommitted memory. I don't believe anyone cares who reserves uncommitted memory; or who uncommits memory. And this only leads to splintering the tree, if we uncommit from different callsites. We should consider keeping stacks for committed memory only, and use some noop stack placeholder for uncommitted mmeory. > > The issue, as I see it, is that we think of committing memory as a "layering" on top of reserving memory, and when that commit goes away the underlying layer of reserved memory is exposed again. In our VMATree, we don't store that underlying reservation anymore. > > So what to do? If we add callstack and MEMFLAGS for uncommitting memory then that's an easy solution. The best would be to keep VMT's semantics here. We can do that, if the metadata stored is doubled in size per node and we recognise this pattern. > > Still, I'll re-iterate: This is a problem for tomorrow, when we do port VMT. Note that the callstack information will be wrong for uncommitted memory though, since what should happen is that the now-again-reserved region should be re-accounted to the callstack that originally did the total reservation, not the callstack now doing the uncommitting. But tbh, as much as I used NMT with customer cases over the years, the detail view with callstacks was never of much use for the mmap case anyway. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18289#discussion_r1592450143 From mli at openjdk.org Tue May 7 13:34:02 2024 From: mli at openjdk.org (Hamlin Li) Date: Tue, 7 May 2024 13:34:02 GMT Subject: RFR: 8322753: RISC-V: C2 ReverseBytesV Message-ID: Hi, Can you review this patch to add ReverseBytesV intrinsic? Thanks. ------------- Commit messages: - remove reverse bits - fix test filter - fix zvbb flag; fix tests - merge master - ReverseV/ReverseBytesV: Initial Commit Changes: https://git.openjdk.org/jdk/pull/19120/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19120&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8322753 Stats: 50 lines in 11 files changed: 45 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/19120.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19120/head:pull/19120 PR: https://git.openjdk.org/jdk/pull/19120 From mli at openjdk.org Tue May 7 13:36:08 2024 From: mli at openjdk.org (Hamlin Li) Date: Tue, 7 May 2024 13:36:08 GMT Subject: RFR: 8320995: RISC-V: C2 PopCountVI [v3] In-Reply-To: References: Message-ID: > Hi, > Can you help to review this patch? > Both auto-vect and vector api depends on this intrinsic. > Thanks! > > ## Performance > Not performance test was done, as this depends on vcpop.v instruction in zvbb extension and the code seqeunce is rather simple than non-intrinsic version. Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: mark UseZvbb experimenal ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19065/files - new: https://git.openjdk.org/jdk/pull/19065/files/74692e23..f316f660 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19065&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19065&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/19065.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19065/head:pull/19065 PR: https://git.openjdk.org/jdk/pull/19065 From mli at openjdk.org Tue May 7 13:36:09 2024 From: mli at openjdk.org (Hamlin Li) Date: Tue, 7 May 2024 13:36:09 GMT Subject: RFR: 8320995: RISC-V: C2 PopCountVI [v2] In-Reply-To: References: <7eoTRo9miet61MlKRv6MRwFY3HCjyG4RiW6RGGJ4sAM=.982fad36-b2d9-4103-8a02-eca041a40e7d@github.com> Message-ID: On Fri, 3 May 2024 18:56:48 GMT, Ludovic Henry wrote: >> Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: >> >> fix minor flag > > src/hotspot/cpu/riscv/globals_riscv.hpp line 118: > >> 116: product(bool, UseZihintpause, false, EXPERIMENTAL, \ >> 117: "Use Zihintpause instructions") \ >> 118: product(bool, UseZvbb, false, "Use Zvbb instructions") \ > > Shouldn't this be marked `EXPERIMENTAL` as we have no hardware to test it on? Thanks, you're right. Fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19065#discussion_r1592492566 From fyang at openjdk.org Tue May 7 13:55:55 2024 From: fyang at openjdk.org (Fei Yang) Date: Tue, 7 May 2024 13:55:55 GMT Subject: RFR: 8326306: RISC-V: Re-structure MASM calls and jumps [v8] In-Reply-To: References: Message-ID: On Tue, 7 May 2024 10:44:04 GMT, Robbin Ehn wrote: >> Hi, please consider. >> >> We have code that directly use the asm for call/jumps instead masm. >> Our masm have a bit odd naming, and we don't use 'proper' pseudoinstructions/mnemonics. >> Suggested by [riscv-asm-manual](https://github.com/riscv-non-isa/riscv-asm-manual/tree/master) >> >> j offset jal x0, offset Jump >> jal offset jal x1, offset Jump and link >> jr rs jalr x0, rs, 0 Jump register >> jalr rs jalr x1, rs, 0 Jump and link register >> ret jalr x0, x1, 0 Return from subroutine >> call offset auipc x1, offset[31:12]; jalr x1, x1, offset[11:0] Call far-away subroutine >> tail offset auipc x6, offset[31:12]; jalr x0, x6, offset[11:0] Tail call far-away subroutine >> >> But these can only be implemented like this if you have small enough application. >> The fallback of these is to use GOT (your C compiler should place a copy of GOT every 2G so it's always reachable). >> We don't have GOT, instead we materialize, so there is still differences between these and ours. >> >> This patch: >> - Tries to follow these suggested mappings as good we can. >> - Make sure all jumps/calls go through MASM. (so we get control and can easily change for sites using a certain calling convention) >> - To avoid confusion between MASM public/private methods and ASM methods and the mnemonics there are some renaming. >> E.g. the mnemonics jal means call offset, as we can't use that so there is no 'jal'. >> - I enabled c.j, but right now we never generate it. >> - As always the macro does no good and are legacy from when code base did not use templates. (also the x-macros screws up my IDE (vim+rtags)) >> >> I started down this path due to I have followup patch on top of this which removes trampoline in favor for load-n-jump. >> (WIP: https://github.com/robehn/jdk/compare/jal-fixes...robehn:jdk:load-n-link?expand=1) >> While looking into our calls it was a bit confusing, this helps. >> >> Done a couple of t1-3 slightly different version of this patch, and as part of the followup, no issues found. (VF2, qemu, LP4) >> Re-running tests, had some last minute changes. >> >> Thanks, Robbin > > Robbin Ehn has updated the pull request incrementally with one additional commit since the last revision: > > Use li instead of movptr for call src/hotspot/cpu/riscv/jniFastGetField_riscv.cpp line 178: > 176: ExternalAddress target(slow_case_addr); > 177: __ relocate(target.rspec(), [&] { > 178: __ call(target.target()); Should we revert this change after your last commit? As I think call is now not necessarily la + jalr. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18942#discussion_r1592527384 From rkennke at openjdk.org Tue May 7 14:00:07 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Tue, 7 May 2024 14:00:07 GMT Subject: RFR: 8331098: [Aarch64] Fix crash in Arrays.equals() intrinsic with -CCP [v6] In-Reply-To: <_HzINQ0atD5BmBbIZ6A4A5y1wNvwsvrBxAiaz2Mk9rY=.43cde0ae-1179-4708-afa1-fda64039d722@github.com> References: <_HzINQ0atD5BmBbIZ6A4A5y1wNvwsvrBxAiaz2Mk9rY=.43cde0ae-1179-4708-afa1-fda64039d722@github.com> Message-ID: <1BfXhOjLGLcOI7zwKBMp5MiE0NgHRJ0rTfgDBRDIDPw=.a9ac9448-5f14-44b7-b4a3-d831b596e6d6@github.com> > The implementations of Arrays.equals() in macroAssembler_aarch64.cpp, MacroAssembler::arrays_equals() assumes that the start of arrays is 8-byte-aligned. Since [JDK-8139457](https://bugs.openjdk.org/browse/JDK-8139457) this is no longer the case, at least when running with -CompressedClassPointers (or Lilliput). The effect is that the loops may run over the array end, and if the array is at heap boundary, and that memory is unmapped, then it may crash. > > The proposed fix aims to always enter the main loop(s) with an aligned address: > - When the array base is 8-byte-aligned (default, with +CCP), then compare the array lengths separately, then enter the main loop with the array base. > - When the array base is not 8-byte-aligned (-CCP and Lilliput), then enter the loop with the address of the array-length (which is then 8-byte-aligned), and compare array lengths in the main loop, and elide the explicit array lengths comparison. > > Testing: > - [x] tier1 (+CCP) > - [x] tier1 (-CCP) > - [x] tier2 (+CCP) > - [x] tier2 (-CCP) Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Unify impls for +/-CCP ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18948/files - new: https://git.openjdk.org/jdk/pull/18948/files/031c91e4..a812f698 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18948&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18948&range=04-05 Stats: 58 lines in 1 file changed: 2 ins; 46 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/18948.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18948/head:pull/18948 PR: https://git.openjdk.org/jdk/pull/18948 From mbaesken at openjdk.org Tue May 7 14:00:13 2024 From: mbaesken at openjdk.org (Matthias Baesken) Date: Tue, 7 May 2024 14:00:13 GMT Subject: RFR: 8331789: ubsan: deoptimization.cpp:403:29: runtime error: load of value 208, which is not a valid value for type 'bool' Message-ID: When using ubsan (configure flag --enable-ubsan) on macOS x86_64 we run into this error : /jdk/src/hotspot/share/runtime/deoptimization.cpp:403:29: runtime error: load of value 208, which is not a valid value for type 'bool' #0 0x10247693e in restore_eliminated_locks(JavaThread*, GrowableArray*, bool, frame&, int, bool&) deoptimization.cpp:403 #1 0x102474b6f in Deoptimization::fetch_unroll_info_helper(JavaThread*, int) deoptimization.cpp:552 #2 0x10247fae9 in Deoptimization::uncommon_trap(JavaThread*, int, int) deoptimization.cpp:2624 #3 0x12846ab80 () Reason might be an uninitialized bool variable on one code path, which is unused in the calling function anyway. ------------- Commit messages: - JDK-8331789 Changes: https://git.openjdk.org/jdk/pull/19121/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19121&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8331789 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/19121.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19121/head:pull/19121 PR: https://git.openjdk.org/jdk/pull/19121 From rkennke at openjdk.org Tue May 7 14:00:08 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Tue, 7 May 2024 14:00:08 GMT Subject: RFR: 8331098: [Aarch64] Fix crash in Arrays.equals() intrinsic with -CCP [v5] In-Reply-To: References: <_HzINQ0atD5BmBbIZ6A4A5y1wNvwsvrBxAiaz2Mk9rY=.43cde0ae-1179-4708-afa1-fda64039d722@github.com> Message-ID: On Tue, 7 May 2024 11:11:05 GMT, Roman Kennke wrote: >> The implementations of Arrays.equals() in macroAssembler_aarch64.cpp, MacroAssembler::arrays_equals() assumes that the start of arrays is 8-byte-aligned. Since [JDK-8139457](https://bugs.openjdk.org/browse/JDK-8139457) this is no longer the case, at least when running with -CompressedClassPointers (or Lilliput). The effect is that the loops may run over the array end, and if the array is at heap boundary, and that memory is unmapped, then it may crash. >> >> The proposed fix aims to always enter the main loop(s) with an aligned address: >> - When the array base is 8-byte-aligned (default, with +CCP), then compare the array lengths separately, then enter the main loop with the array base. >> - When the array base is not 8-byte-aligned (-CCP and Lilliput), then enter the loop with the address of the array-length (which is then 8-byte-aligned), and compare array lengths in the main loop, and elide the explicit array lengths comparison. >> >> Testing: >> - [x] tier1 (+CCP) >> - [x] tier1 (-CCP) >> - [x] tier2 (+CCP) >> - [x] tier2 (-CCP) > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > @xmas92 review > > This still looks too complicated for such a simple thing. Why not read the whole array, fully aligned, until the final word (at start+length*elementSize-wordSize) which is possibly unaligned? That would work regardless of alignment. Check the lengths agree first, and the first whole read may or may not include the length field. > > I disagree. This means we would access the length 2x unconditionally (at least in the -CCP/Lilliput path), and need to generate extra code (in the +CCP path for the length check, in the -CCP path to also increase the cnt2 length counter). I suspect that reading lengths 2x could add up pretty quick when comparing short arrays (which is pretty common, e.g. strings). > > What we _could_ do instead is unconditionally compare the lengths in the loop. This means that in the +CCP path, we would also have to compare the compressed Klass*, but I think this should be ok. I don't think that 8-byte loads are any slower than 4-byte loads, and we're actually saving the extra instructions ahead of the loop. And we would not have 2 different paths for -CCP vs +CCP. > > What do you think? I pushed a change that unifies and simplifies the implementation as described above. If that's not what we want, I would revert it back. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18948#issuecomment-2098474942 From bkilambi at openjdk.org Tue May 7 14:03:56 2024 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Tue, 7 May 2024 14:03:56 GMT Subject: RFR: 8331558: AArch64: optimize integer remainder [v2] In-Reply-To: References: Message-ID: On Mon, 6 May 2024 05:50:13 GMT, Jin Guojie wrote: >> 8331558: AArch64: optimize integer remainder >> On some Arm processors, a separate multiply/subtract is actually faster than the combined instruction. >> >> 8331556: AArch64: CPU_Model support for Neoverse N1/N2/V1/V2 >> Add full platform coverage for Neoverse variants in vm_version.?pp >> >> The following test has passed, which shows definite performance improvement. >> >> make test TEST="micro:java.lang.IntegerDivMod" >> make test TEST="micro:java.lang.LongDivMod" >> >> * IntegerDivMod.testDivideRemainderUnsigned >> baseline(ns/ops) 2223 >> with this pacth(ns/ops) 1885 >> improvement(%) 17.93% >> >> * IntegerDivMod.testRemainderUnsigned >> baseline(ns/ops) 2225 >> with this pacth(ns/ops) 1885 >> improvement(%) 18.03% >> >> * LongDivMod.testDivideRemainderUnsigned >> baseline(ns/ops) 2231 >> with this pacth(ns/ops) 1894 >> improvement(%) 17.79% >> >> * LongDivMod.testRemainderUnsigned >> baseline(ns/ops) 2232 >> with this pacth(ns/ops) 1891 >> improvement(%) 18.03% > > Jin Guojie has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - Merge branch 'openjdk:master' into dev > - Update vm_version_aarch64.hpp > - 8331558: AArch64: optimize integer remainder > > On some Arm processors, a separate multiply/subtract is actually faster than the combined instruction. > - 8331556: AArch64: CPU_Model support for Neoverse N1/N2/V1/V2 > > Add full platform coverage for Neoverse variants in vm_version.?pp src/hotspot/cpu/aarch64/macroAssembler_aarch64.hpp line 462: > 460: if (VM_Version::supports_a53mac() && Ra != zr) > 461: nop(); > 462: if (VM_Version::is_neoverse_n_series()) { Why only Neoverse N series? Even on the V series (V1 and V2), both `sdiv/udiv` and `msub` instructions are executed in M0 unit (Integer multi cycle). It should benefit the V series as well. Source: https://developer.arm.com/documentation/pjdoc466751330-9685/latest/ and https://developer.arm.com/documentation/PJDOC-466751330-593177/latest/ A quick run on a V1 machine shows ~15% performance gain for the `IntegerDivMod` tests if we generate separate `mul` and `sub` instructions instead of a single `msub`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19093#discussion_r1592539756 From rehn at openjdk.org Tue May 7 14:12:56 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Tue, 7 May 2024 14:12:56 GMT Subject: RFR: 8326306: RISC-V: Re-structure MASM calls and jumps [v8] In-Reply-To: References: Message-ID: On Tue, 7 May 2024 13:53:41 GMT, Fei Yang wrote: >> Robbin Ehn has updated the pull request incrementally with one additional commit since the last revision: >> >> Use li instead of movptr for call > > src/hotspot/cpu/riscv/jniFastGetField_riscv.cpp line 178: > >> 176: ExternalAddress target(slow_case_addr); >> 177: __ relocate(target.rspec(), [&] { >> 178: __ call(target.target()); > > Should we revert this change after your last commit? As I think call is now not necessarily la + jalr. The addresses should come from JNI_FastGetField::generate_fast_get_XXX_field0. So slow_case_addr is not really an ExternalAddress. This means call will always do auipc + jalr as you know (intra code cache). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18942#discussion_r1592552786 From fyang at openjdk.org Tue May 7 14:22:54 2024 From: fyang at openjdk.org (Fei Yang) Date: Tue, 7 May 2024 14:22:54 GMT Subject: RFR: 8320995: RISC-V: C2 PopCountVI [v3] In-Reply-To: References: Message-ID: <0fE8F3ygX9kbh181YwQrnSoxNyspx3fxm1zxsTmOJbc=.e9812c16-7c1a-4df8-b28e-884ed87c1f00@github.com> On Tue, 7 May 2024 13:36:08 GMT, Hamlin Li wrote: >> Hi, >> Can you help to review this patch? >> Both auto-vect and vector api depends on this intrinsic. >> Thanks! >> >> ## Performance >> Not performance test was done, as this depends on vcpop.v instruction in zvbb extension and the code seqeunce is rather simple than non-intrinsic version. > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > mark UseZvbb experimenal src/hotspot/cpu/riscv/riscv_v.ad line 3795: > 3793: instruct vpopcount_mask(vReg dst_src, vRegMask_V0 v0) %{ > 3794: match(Set dst_src (PopCountVI dst_src v0)); > 3795: match(Set dst_src (PopCountVL dst_src v0)); Is there a reason to force input & output being the same vector register? src/hotspot/os_cpu/linux_riscv/riscv_hwprobe.cpp line 180: > 178: if (is_set(RISCV_HWPROBE_KEY_IMA_EXT_0, RISCV_HWPROBE_EXT_ZVBB)) { > 179: VM_Version::ext_Zvbb.enable_feature(); > 180: } I don't think it's appropriate to auto-enable an experimental extension. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19065#discussion_r1592566821 PR Review Comment: https://git.openjdk.org/jdk/pull/19065#discussion_r1592568545 From stefank at openjdk.org Tue May 7 14:24:53 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Tue, 7 May 2024 14:24:53 GMT Subject: RFR: 8331789: ubsan: deoptimization.cpp:403:29: runtime error: load of value 208, which is not a valid value for type 'bool' In-Reply-To: References: Message-ID: On Tue, 7 May 2024 13:55:16 GMT, Matthias Baesken wrote: > When using ubsan (configure flag --enable-ubsan) on macOS x86_64 we run into this error : > > /jdk/src/hotspot/share/runtime/deoptimization.cpp:403:29: runtime error: load of value 208, which is not a valid value for type 'bool' > #0 0x10247693e in restore_eliminated_locks(JavaThread*, GrowableArray*, bool, frame&, int, bool&) deoptimization.cpp:403 > #1 0x102474b6f in Deoptimization::fetch_unroll_info_helper(JavaThread*, int) deoptimization.cpp:552 > #2 0x10247fae9 in Deoptimization::uncommon_trap(JavaThread*, int, int) deoptimization.cpp:2624 > #3 0x12846ab80 () > > Reason might be an uninitialized bool variable on one code path, which is unused in the calling function anyway. Marked as reviewed by stefank (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/19121#pullrequestreview-2043358524 From aboldtch at openjdk.org Tue May 7 14:39:51 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Tue, 7 May 2024 14:39:51 GMT Subject: RFR: 8331789: ubsan: deoptimization.cpp:403:29: runtime error: load of value 208, which is not a valid value for type 'bool' In-Reply-To: References: Message-ID: On Tue, 7 May 2024 13:55:16 GMT, Matthias Baesken wrote: > When using ubsan (configure flag --enable-ubsan) on macOS x86_64 we run into this error : > > /jdk/src/hotspot/share/runtime/deoptimization.cpp:403:29: runtime error: load of value 208, which is not a valid value for type 'bool' > #0 0x10247693e in restore_eliminated_locks(JavaThread*, GrowableArray*, bool, frame&, int, bool&) deoptimization.cpp:403 > #1 0x102474b6f in Deoptimization::fetch_unroll_info_helper(JavaThread*, int) deoptimization.cpp:552 > #2 0x10247fae9 in Deoptimization::uncommon_trap(JavaThread*, int, int) deoptimization.cpp:2624 > #3 0x12846ab80 () > > Reason might be an uninitialized bool variable on one code path, which is unused in the calling function anyway. Marked as reviewed by aboldtch (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/19121#pullrequestreview-2043396930 From matsaave at openjdk.org Tue May 7 15:06:53 2024 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Tue, 7 May 2024 15:06:53 GMT Subject: RFR: 8329418: Replace pointers to tables with offsets in relocation bitmap In-Reply-To: References: Message-ID: On Mon, 6 May 2024 22:32:12 GMT, Chris Plummer wrote: >> The beginning of the RW region contains pointers to c++ vtables which are always located at a fixed offset from the shared base address at runtime. This offset can be calculated at dumptime and stored with the read-only tables at the top of the RO region. As a further improvement, all the pointers to RO tables are replaced with offsets as well. >> >> These changes will reduce the number of pointers in the RW and RO regions and will allow for the relocation bitmap size optimizations to be more effective. Verified with tier 1-5 tests. > > test/hotspot/jtreg/serviceability/sa/TestSysProps.java line 68: > >> 66: } >> 67: if (numProps != expectedCount) { >> 68: throw new RuntimeException("Wrong number of " + cmdName + " properties: " + numProps + " Expected: " + expectedCount); > > I think it would be good to add parenthesis around the extra output you added. This was an accidental leftover from debugging, I didn't intend for this to be part of the change. I should revert this since it's beyond the scope of this change. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19107#discussion_r1592648930 From mli at openjdk.org Tue May 7 16:00:07 2024 From: mli at openjdk.org (Hamlin Li) Date: Tue, 7 May 2024 16:00:07 GMT Subject: RFR: 8320995: RISC-V: C2 PopCountVI [v4] In-Reply-To: References: Message-ID: > Hi, > Can you help to review this patch? > Both auto-vect and vector api depends on this intrinsic. > Thanks! > > ## Performance > Not performance test was done, as this depends on vcpop.v instruction in zvbb extension and the code seqeunce is rather simple than non-intrinsic version. Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: minor fix ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19065/files - new: https://git.openjdk.org/jdk/pull/19065/files/f316f660..cbfde208 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19065&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19065&range=02-03 Stats: 9 lines in 3 files changed: 0 ins; 3 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/19065.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19065/head:pull/19065 PR: https://git.openjdk.org/jdk/pull/19065 From mli at openjdk.org Tue May 7 16:00:09 2024 From: mli at openjdk.org (Hamlin Li) Date: Tue, 7 May 2024 16:00:09 GMT Subject: RFR: 8320995: RISC-V: C2 PopCountVI [v3] In-Reply-To: <0fE8F3ygX9kbh181YwQrnSoxNyspx3fxm1zxsTmOJbc=.e9812c16-7c1a-4df8-b28e-884ed87c1f00@github.com> References: <0fE8F3ygX9kbh181YwQrnSoxNyspx3fxm1zxsTmOJbc=.e9812c16-7c1a-4df8-b28e-884ed87c1f00@github.com> Message-ID: On Tue, 7 May 2024 14:19:10 GMT, Fei Yang wrote: >> Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: >> >> mark UseZvbb experimenal > > src/hotspot/cpu/riscv/riscv_v.ad line 3795: > >> 3793: instruct vpopcount_mask(vReg dst_src, vRegMask_V0 v0) %{ >> 3794: match(Set dst_src (PopCountVI dst_src v0)); >> 3795: match(Set dst_src (PopCountVL dst_src v0)); > > Is there a reason to force input & output being the same vector register? Seems not, I modify it to usual pattern. > src/hotspot/os_cpu/linux_riscv/riscv_hwprobe.cpp line 180: > >> 178: if (is_set(RISCV_HWPROBE_KEY_IMA_EXT_0, RISCV_HWPROBE_EXT_ZVBB)) { >> 179: VM_Version::ext_Zvbb.enable_feature(); >> 180: } > > I don't think it's appropriate to auto-enable an experimental extension. Thanks for catching, fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19065#discussion_r1592724179 PR Review Comment: https://git.openjdk.org/jdk/pull/19065#discussion_r1592724503 From matsaave at openjdk.org Tue May 7 16:38:23 2024 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Tue, 7 May 2024 16:38:23 GMT Subject: RFR: 8329418: Replace pointers to tables with offsets in relocation bitmap [v2] In-Reply-To: References: Message-ID: > The beginning of the RW region contains pointers to c++ vtables which are always located at a fixed offset from the shared base address at runtime. This offset can be calculated at dumptime and stored with the read-only tables at the top of the RO region. As a further improvement, all the pointers to RO tables are replaced with offsets as well. > > These changes will reduce the number of pointers in the RW and RO regions and will allow for the relocation bitmap size optimizations to be more effective. Verified with tier 1-5 tests. Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: Chris comments and cleanup ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19107/files - new: https://git.openjdk.org/jdk/pull/19107/files/c925025e..d40afef9 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19107&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19107&range=00-01 Stats: 10 lines in 3 files changed: 0 ins; 6 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/19107.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19107/head:pull/19107 PR: https://git.openjdk.org/jdk/pull/19107 From luhenry at openjdk.org Tue May 7 16:55:00 2024 From: luhenry at openjdk.org (Ludovic Henry) Date: Tue, 7 May 2024 16:55:00 GMT Subject: RFR: 8320995: RISC-V: C2 PopCountVI [v4] In-Reply-To: References: Message-ID: On Tue, 7 May 2024 16:00:07 GMT, Hamlin Li wrote: >> Hi, >> Can you help to review this patch? >> Both auto-vect and vector api depends on this intrinsic. >> Thanks! >> >> ## Performance >> Not performance test was done, as this depends on vcpop.v instruction in zvbb extension and the code seqeunce is rather simple than non-intrinsic version. > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > minor fix Marked as reviewed by luhenry (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/19065#pullrequestreview-2043720393 From luhenry at openjdk.org Tue May 7 18:03:54 2024 From: luhenry at openjdk.org (Ludovic Henry) Date: Tue, 7 May 2024 18:03:54 GMT Subject: RFR: 8322753: RISC-V: C2 ReverseBytesV In-Reply-To: References: Message-ID: <71dgzGhNrtu95zP0OtGxZ-cxJ3kwGcV2lbF8oHcjeBM=.05f90142-3bbb-49d5-b2a2-c41408a90b19@github.com> On Tue, 7 May 2024 13:29:33 GMT, Hamlin Li wrote: > Hi, > Can you review this patch to add ReverseBytesV intrinsic? > Thanks. src/hotspot/cpu/riscv/globals_riscv.hpp line 118: > 116: product(bool, UseZihintpause, false, EXPERIMENTAL, \ > 117: "Use Zihintpause instructions") \ > 118: product(bool, UseZvbb, false, "Use Zvbb instructions") \ That'll conflict with https://github.com/openjdk/jdk/pull/19065, but same, we'd want to have `EXPERIMENTAL` src/hotspot/os_cpu/linux_riscv/riscv_hwprobe.cpp line 182: > 180: } > 181: if (is_set(RISCV_HWPROBE_KEY_IMA_EXT_0, RISCV_HWPROBE_EXT_ZVBB)) { > 182: VM_Version::ext_Zvbb.enable_feature(); Same as https://github.com/openjdk/jdk/pull/19065, we don't want to enable experimental extensions via hwprobe. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19120#discussion_r1592872466 PR Review Comment: https://git.openjdk.org/jdk/pull/19120#discussion_r1592873103 From sviswanathan at openjdk.org Tue May 7 18:24:03 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Tue, 7 May 2024 18:24:03 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v19] In-Reply-To: References: Message-ID: On Sat, 4 May 2024 19:35:21 GMT, Scott Gibbons wrote: >> Re-write the IndexOf code without the use of the pcmpestri instruction, only using AVX2 instructions. This change accelerates String.IndexOf on average 1.3x for AVX2. The benchmark numbers: >> >> >> Benchmark Score Latest >> StringIndexOf.advancedWithMediumSub 343.573 317.934 0.925375393x >> StringIndexOf.advancedWithShortSub1 1039.081 1053.96 1.014319384x >> StringIndexOf.advancedWithShortSub2 55.828 110.541 1.980027943x >> StringIndexOf.constantPattern 9.361 11.906 1.271872663x >> StringIndexOf.searchCharLongSuccess 4.216 4.218 1.000474383x >> StringIndexOf.searchCharMediumSuccess 3.133 3.216 1.02649218x >> StringIndexOf.searchCharShortSuccess 3.76 3.761 1.000265957x >> StringIndexOf.success 9.186 9.713 1.057369911x >> StringIndexOf.successBig 14.341 46.343 3.231504079x >> StringIndexOfChar.latin1_AVX2_String 6220.918 12154.52 1.953814533x >> StringIndexOfChar.latin1_AVX2_char 5503.556 5540.044 1.006629895x >> StringIndexOfChar.latin1_SSE4_String 6978.854 6818.689 0.977049957x >> StringIndexOfChar.latin1_SSE4_char 5657.499 5474.624 0.967675646x >> StringIndexOfChar.latin1_Short_String 7132.541 6863.359 0.962260014x >> StringIndexOfChar.latin1_Short_char 16013.389 16162.437 1.009307711x >> StringIndexOfChar.latin1_mixed_String 7386.123 14771.622 1.999915517x >> StringIndexOfChar.latin1_mixed_char 9901.671 9782.245 0.987938803 > > Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: > > Rearrange; add lambdas for clarity src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 383: > 381: { > 382: Label L_short; > 383: A comment here: // Broadcast the beginning of needle into a vector register. src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 390: > 388: __ vpbroadcastb(byte_0, Address(needle, 0), Assembler::AVX_256bit); > 389: } > 390: A comment here: // Broadcast the end of needle into a vector register. This step is not needed for single element needle. src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 418: > 416: __ cmpq(haystack_len, 0x10); > 417: __ ja_b(L_moreThan16); > 418: An assert here to check for header size >= 16 would be good. Also a comment here would he good, something like: // Copy 16 or 32 bytes prior to haystack end onto stack // This will possibly including some object header bytes when haystack length is less than 16 or 32 bytes // Set the new haystack address to beginning of copied haystack on stack adjusting for extra bytes copied src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 498: > 496: > 497: // big_case_loop_helper will fall through to this point if one or more potential matches are found > 498: // The mask will have a bitmask indicating the position of the potential matches within the haystack If no potential match, which label does the big_case_loop_helper jump to? src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 517: > 515: __C2 arrays_equals(false, haystackStart, firstNeedleCompare, compLen, retval, rScratch, xmm_tmp3, xmm_tmp4, > 516: false /* char */, knoreg); > 517: __ testl(retval, retval); Since this is byte compare even for isU, the retval here could be a 64-bit quantity so the testl should be a testq. src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 553: > 551: // Haystack always copied to stack, so 32-byte reads OK > 552: // Haystack length < 32 > 553: // 10 < needle length < 32 The comment below may need update as we come here for needle_len > OPT_NEEDLE_SIZE_MAX which is currently set as 5: // 10 < needle length < 32 src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 611: > 609: __C2 arrays_equals(false, rTmp, firstNeedleCompare, compLen, rTmp3, rTmp2, xmm_tmp3, xmm_tmp4, false /* char */, > 610: knoreg); > 611: __ testl(rTmp3, rTmp3); Since this is byte compare even for isU, the rtmp3 here could be a 64-bit quantity so the testl should be a testq. src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 629: > 627: > 628: __ bind(L_returnError); > 629: __ movq(rbp, -1); This could directly be rax instead of intermediate rbp and then moving from rbp to rax. src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 633: > 631: > 632: __ bind(L_returnZero); > 633: __ xorl(rbp, rbp); This could directly be rax instead of intermediate rbp and then moving from rbp to rax. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1592791718 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1592792401 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1592774634 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1592866631 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1592868501 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1592880650 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1592885514 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1592892211 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1592892329 From mli at openjdk.org Tue May 7 20:05:54 2024 From: mli at openjdk.org (Hamlin Li) Date: Tue, 7 May 2024 20:05:54 GMT Subject: RFR: 8322753: RISC-V: C2 ReverseBytesV In-Reply-To: <71dgzGhNrtu95zP0OtGxZ-cxJ3kwGcV2lbF8oHcjeBM=.05f90142-3bbb-49d5-b2a2-c41408a90b19@github.com> References: <71dgzGhNrtu95zP0OtGxZ-cxJ3kwGcV2lbF8oHcjeBM=.05f90142-3bbb-49d5-b2a2-c41408a90b19@github.com> Message-ID: On Tue, 7 May 2024 18:00:37 GMT, Ludovic Henry wrote: >> Hi, >> Can you review this patch to add ReverseBytesV intrinsic? >> Thanks. > > src/hotspot/cpu/riscv/globals_riscv.hpp line 118: > >> 116: product(bool, UseZihintpause, false, EXPERIMENTAL, \ >> 117: "Use Zihintpause instructions") \ >> 118: product(bool, UseZvbb, false, "Use Zvbb instructions") \ > > That'll conflict with https://github.com/openjdk/jdk/pull/19065, but same, we'd want to have `EXPERIMENTAL` Yes, this will be fixed after https://github.com/openjdk/jdk/pull/19065 by merging from master. > src/hotspot/os_cpu/linux_riscv/riscv_hwprobe.cpp line 182: > >> 180: } >> 181: if (is_set(RISCV_HWPROBE_KEY_IMA_EXT_0, RISCV_HWPROBE_EXT_ZVBB)) { >> 182: VM_Version::ext_Zvbb.enable_feature(); > > Same as https://github.com/openjdk/jdk/pull/19065, we don't want to enable experimental extensions via hwprobe. Same as above. Thanks! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19120#discussion_r1593001036 PR Review Comment: https://git.openjdk.org/jdk/pull/19120#discussion_r1593001423 From sviswanathan at openjdk.org Tue May 7 20:40:59 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Tue, 7 May 2024 20:40:59 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v19] In-Reply-To: References: Message-ID: On Sat, 4 May 2024 19:35:21 GMT, Scott Gibbons wrote: >> Re-write the IndexOf code without the use of the pcmpestri instruction, only using AVX2 instructions. This change accelerates String.IndexOf on average 1.3x for AVX2. The benchmark numbers: >> >> >> Benchmark Score Latest >> StringIndexOf.advancedWithMediumSub 343.573 317.934 0.925375393x >> StringIndexOf.advancedWithShortSub1 1039.081 1053.96 1.014319384x >> StringIndexOf.advancedWithShortSub2 55.828 110.541 1.980027943x >> StringIndexOf.constantPattern 9.361 11.906 1.271872663x >> StringIndexOf.searchCharLongSuccess 4.216 4.218 1.000474383x >> StringIndexOf.searchCharMediumSuccess 3.133 3.216 1.02649218x >> StringIndexOf.searchCharShortSuccess 3.76 3.761 1.000265957x >> StringIndexOf.success 9.186 9.713 1.057369911x >> StringIndexOf.successBig 14.341 46.343 3.231504079x >> StringIndexOfChar.latin1_AVX2_String 6220.918 12154.52 1.953814533x >> StringIndexOfChar.latin1_AVX2_char 5503.556 5540.044 1.006629895x >> StringIndexOfChar.latin1_SSE4_String 6978.854 6818.689 0.977049957x >> StringIndexOfChar.latin1_SSE4_char 5657.499 5474.624 0.967675646x >> StringIndexOfChar.latin1_Short_String 7132.541 6863.359 0.962260014x >> StringIndexOfChar.latin1_Short_char 16013.389 16162.437 1.009307711x >> StringIndexOfChar.latin1_mixed_String 7386.123 14771.622 1.999915517x >> StringIndexOfChar.latin1_mixed_char 9901.671 9782.245 0.987938803 > > Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: > > Rearrange; add lambdas for clarity src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 576: > 574: broadcast_additional_needles(false, 0 /* unknown */, NUMBER_OF_NEEDLE_BYTES_TO_COMPARE, needle, needleLen, rTmp3, > 575: isUU, isUL, _masm); > 576: Good to pass output xmm registers to this method. src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 587: > 585: // firstNeedleCompare has address of second element of needle > 586: // compLen has length of comparison to do > 587: This is not clear. firstNeedleCompare gets needle + NUMBER_OF_NEEDLE_BYTES_TO_COMPARE - 1 which is not necessarily the second element of needle. If it helps let us fix the NUMBER_OF_NEEDLE_BYTES_TO_COMPARE to 3 and have comments and code versus that only. src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 590: > 588: compare_haystack_to_needle(false, 0, NUMBER_OF_NEEDLE_BYTES_TO_COMPARE, L_returnRBP, haystack, isU, > 589: DO_EARLY_BAILOUT, mask, needleLen, rTmp3, _masm); > 590: It is better to pass the broadcasted xmm registers to compare_haystack_to_nedle. Basically pass input, output, and temps to all the methods. src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 639: > 637: __ movl(rax, r8); > 638: __ subq(rcx, rbx); > 639: __ addq(rcx, rax); This could be: __ subq(rcx, rbx); __ addq(rcx, r8); src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 647: > 645: __ cmpq(r11, r10); > 646: __ movq(rbp, -1); > 647: __ cmovq(Assembler::belowEqual, rbp, r11); This could be directly computed in rax: __ movq(rax, -1); __ cmovq(Assembler::belowEqual, rax, r11); Also is it possible to not do cmov on some paths? It is an expensive operation. src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1010: > 1008: static void broadcast_additional_needles(bool sizeKnown, int size, int bytesToCompare, Register needle, > 1009: Register needleLen, Register rTmp, bool isUU, bool isUL, > 1010: MacroAssembler *_masm) { Good to add output XMM registers to the parameter list. src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1040: > 1038: __ vpbroadcastb(byte_1, Address(needle, 1), Assembler::AVX_256bit); > 1039: } > 1040: } It will be good to have a function which broadcasts a needle element from a given offset into a vector register. That function could take (needle address, offset, outout vector register, temps). Such a function could then be called twice from here and from main function for offset 0. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1593046499 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1593057834 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1593045710 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1592989197 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1592992225 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1593023349 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1593006539 From cjplummer at openjdk.org Tue May 7 21:41:54 2024 From: cjplummer at openjdk.org (Chris Plummer) Date: Tue, 7 May 2024 21:41:54 GMT Subject: RFR: 8329418: Replace pointers to tables with offsets in relocation bitmap [v2] In-Reply-To: References: Message-ID: On Tue, 7 May 2024 16:38:23 GMT, Matias Saavedra Silva wrote: >> The beginning of the RW region contains pointers to c++ vtables which are always located at a fixed offset from the shared base address at runtime. This offset can be calculated at dumptime and stored with the read-only tables at the top of the RO region. As a further improvement, all the pointers to RO tables are replaced with offsets as well. >> >> These changes will reduce the number of pointers in the RW and RO regions and will allow for the relocation bitmap size optimizations to be more effective. Verified with tier 1-5 tests. > > Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: > > Chris comments and cleanup SA changes look good. Thanks for taking care of this. ------------- Marked as reviewed by cjplummer (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19107#pullrequestreview-2044246122 From sviswanathan at openjdk.org Wed May 8 00:26:59 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Wed, 8 May 2024 00:26:59 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v19] In-Reply-To: References: Message-ID: On Sat, 4 May 2024 19:35:21 GMT, Scott Gibbons wrote: >> Re-write the IndexOf code without the use of the pcmpestri instruction, only using AVX2 instructions. This change accelerates String.IndexOf on average 1.3x for AVX2. The benchmark numbers: >> >> >> Benchmark Score Latest >> StringIndexOf.advancedWithMediumSub 343.573 317.934 0.925375393x >> StringIndexOf.advancedWithShortSub1 1039.081 1053.96 1.014319384x >> StringIndexOf.advancedWithShortSub2 55.828 110.541 1.980027943x >> StringIndexOf.constantPattern 9.361 11.906 1.271872663x >> StringIndexOf.searchCharLongSuccess 4.216 4.218 1.000474383x >> StringIndexOf.searchCharMediumSuccess 3.133 3.216 1.02649218x >> StringIndexOf.searchCharShortSuccess 3.76 3.761 1.000265957x >> StringIndexOf.success 9.186 9.713 1.057369911x >> StringIndexOf.successBig 14.341 46.343 3.231504079x >> StringIndexOfChar.latin1_AVX2_String 6220.918 12154.52 1.953814533x >> StringIndexOfChar.latin1_AVX2_char 5503.556 5540.044 1.006629895x >> StringIndexOfChar.latin1_SSE4_String 6978.854 6818.689 0.977049957x >> StringIndexOfChar.latin1_SSE4_char 5657.499 5474.624 0.967675646x >> StringIndexOfChar.latin1_Short_String 7132.541 6863.359 0.962260014x >> StringIndexOfChar.latin1_Short_char 16013.389 16162.437 1.009307711x >> StringIndexOfChar.latin1_mixed_String 7386.123 14771.622 1.999915517x >> StringIndexOfChar.latin1_mixed_char 9901.671 9782.245 0.987938803 > > Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: > > Rearrange; add lambdas for clarity src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1082: > 1080: // noMatch - label bound outside to jump to if there is no match > 1081: // haystack - the address of the first byte of the haystack > 1082: // hsLen - the sizeof the haystack Good to specify if the size (size of needle) and hsLen (size of haystack) is in bytes or elements. src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1149: > 1147: > 1148: if (size == (isU ? 2 : 1)) { > 1149: __ vpmovmskb(eq_mask, cmp_0, Assembler::AVX_256bit); vpmovmskb is being done twice if doEarlyBailout is set to 1 (the setting we have currently). If it helps to simplify, we could assume that doEarlyBailout is always set to 1 and remove this configurability. src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1174: > 1172: #define lastMask rTmp > 1173: __ vpmovmskb(lastMask, cmp_k, Assembler::AVX_256bit); > 1174: __ shrq(lastMask); did you mean to shift the lastMask by shiftVal here? src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1185: > 1183: if (size > (isU ? 4 : 2)) { > 1184: if (doEarlyBailout) { > 1185: __ testl(eq_mask, eq_mask); The masks are 32 bit as we are comparing max 32 byes (256 bits) at a time. So we could consistently do either andl, testl, shrl or andq, testq, shrq. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1593225178 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1593225488 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1593227487 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1593229554 From john.r.rose at oracle.com Wed May 8 00:47:14 2024 From: john.r.rose at oracle.com (John Rose) Date: Tue, 07 May 2024 17:47:14 -0700 Subject: RFR: 8330532: Improve line-oriented text parsing in HotSpot [v7] In-Reply-To: References: Message-ID: > > Drive-by comment: do we really need to combine input and output capabilities in one class, fileStream? Why not a dedicated fileOutputStream? APIs like setPosition() do not make much sense on an output stream. No we don?t need bidirectional capability on one class. That?s just the way fileStream was coded in the beginning of HotSpot. This PR does not propose to redesign fileStream. That class really ancient, and probably has lots more technical debt we could fix. There?s plenty of debt go around, for many PRs. Touching any part of this stream stuff certainly does provoke plenty of larger questions of technical debt, and how to reduce it moving forward. I suppose we could remove the old function rewind (not used anywhere?) and also the proposed functions set/get_position and remaining. The point is whether we want capable file streams or not. I think we do, but of course the question is always which capabilities. And I don?t want to try to settle that question in this PR; I just want to make incremental progress. Ioi, I would support reducing coupling with fileStream by removing old rewind and new position/remaining functions. But I?d rather keep the new functions, because I think they are likely to be useful. I have future uses in mind, which might or might not happen. For example, open a CDS archive or large config file, position the fileStream at the base address of some textual configuration data, and start reading. Whether we do that or not, the idea of positioning seems natural enough to HotSpot to put in now, or else later in a similar form. From duke at openjdk.org Wed May 8 01:04:37 2024 From: duke at openjdk.org (Jin Guojie) Date: Wed, 8 May 2024 01:04:37 GMT Subject: RFR: 8331558: AArch64: optimize integer remainder [v3] In-Reply-To: References: Message-ID: > 8331558: AArch64: optimize integer remainder > On some Arm processors, a separate multiply/subtract is actually faster than the combined instruction. > > 8331556: AArch64: CPU_Model support for Neoverse N1/N2/V1/V2 > Add full platform coverage for Neoverse variants in vm_version.?pp > > The following test has passed, which shows definite performance improvement. > > make test TEST="micro:java.lang.IntegerDivMod" > make test TEST="micro:java.lang.LongDivMod" > > * IntegerDivMod.testDivideRemainderUnsigned > baseline(ns/ops) 2223 > with this pacth(ns/ops) 1885 > improvement(%) 17.93% > > * IntegerDivMod.testRemainderUnsigned > baseline(ns/ops) 2225 > with this pacth(ns/ops) 1885 > improvement(%) 18.03% > > * LongDivMod.testDivideRemainderUnsigned > baseline(ns/ops) 2231 > with this pacth(ns/ops) 1894 > improvement(%) 17.79% > > * LongDivMod.testRemainderUnsigned > baseline(ns/ops) 2232 > with this pacth(ns/ops) 1891 > improvement(%) 18.03% Jin Guojie has updated the pull request incrementally with one additional commit since the last revision: Applicable platforms expanded to the entire neoverse family Even on the V series (V1 and V2), both sdiv/udiv and msub instructions are executed in M0 unit (Integer multi cycle). It should benefit the V series as well. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19093/files - new: https://git.openjdk.org/jdk/pull/19093/files/786d5016..d8b8dbfe Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19093&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19093&range=01-02 Stats: 6 lines in 1 file changed: 0 ins; 0 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/19093.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19093/head:pull/19093 PR: https://git.openjdk.org/jdk/pull/19093 From kbarrett at openjdk.org Wed May 8 01:19:55 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Wed, 8 May 2024 01:19:55 GMT Subject: RFR: 8331711: G1 doesn't need pre write barrier for stores from new allocated objects [v2] In-Reply-To: References: Message-ID: On Mon, 6 May 2024 09:58:12 GMT, Liang Mao wrote: >> The pre-write barrier of G1 is used to capture the object disconnected from the marking graph which could be unmarked aka *white* and stored into *black* objects then break tri-color invariance. But references in new allocated objects are created in object initialization after marking start and never could be white. So we don't need pre-write barrier for stores from new allocated objects. The same mechanism is also used for barrier eliminantion in GenZGC. >> >> Additional testing: >> - [x] Linux aarch64 server release/fastdebug, test/hotspot/jtreg/gc with +UseG1GC >> - [x] Run several iterations of SPECjbb2015 with aggressively frequent concurrent mark > > Liang Mao has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: > > - Merge remote-tracking branch 'openjdk/master' into 8331711 > - 8331711: G1 doesn't need pre write barrier for stores from new allocated objects src/hotspot/share/gc/g1/c2/g1BarrierSetC2.cpp line 199: > 197: assert(val_type != nullptr, "need a type"); > 198: > 199: if (use_ReduceInitialCardMarks() && obj == kit->just_allocated_object(kit->control())) { This isn't correct. It doesn't ensure there aren't any stores to the same location between the allocation and this store that might need to be tracked. g1_can_remove_pre_barrier (below) does that. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19098#discussion_r1593261993 From kbarrett at openjdk.org Wed May 8 01:41:53 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Wed, 8 May 2024 01:41:53 GMT Subject: RFR: 8331711: G1 doesn't need pre write barrier for stores from new allocated objects [v2] In-Reply-To: References: Message-ID: <0OdHsQmnM80KQib8u-yWtCSCejCTIK8lJ_bpLk3O_9E=.d727d825-882e-4574-84d9-6a908138066c@github.com> On Wed, 8 May 2024 01:16:32 GMT, Kim Barrett wrote: >> Liang Mao has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: >> >> - Merge remote-tracking branch 'openjdk/master' into 8331711 >> - 8331711: G1 doesn't need pre write barrier for stores from new allocated objects > > src/hotspot/share/gc/g1/c2/g1BarrierSetC2.cpp line 199: > >> 197: assert(val_type != nullptr, "need a type"); >> 198: >> 199: if (use_ReduceInitialCardMarks() && obj == kit->just_allocated_object(kit->control())) { > > This isn't correct. It doesn't ensure there aren't any stores to the same > location between the allocation and this store that might need to be tracked. > g1_can_remove_pre_barrier (below) does that. We've had lots of problems with safepoints sneaking into unexpected places. That's why (Gen)ZGC uses late barrier expansion. That's also one of the motivations for https://openjdk.org/jeps/475. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19098#discussion_r1593277386 From fyang at openjdk.org Wed May 8 02:15:54 2024 From: fyang at openjdk.org (Fei Yang) Date: Wed, 8 May 2024 02:15:54 GMT Subject: RFR: 8320995: RISC-V: C2 PopCountVI [v4] In-Reply-To: References: Message-ID: On Tue, 7 May 2024 16:00:07 GMT, Hamlin Li wrote: >> Hi, >> Can you help to review this patch? >> Both auto-vect and vector api depends on this intrinsic. >> Thanks! >> >> ## Performance >> Not performance test was done, as this depends on vcpop.v instruction in zvbb extension and the code seqeunce is rather simple than non-intrinsic version. > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > minor fix One minor comment remains, otherwise looks good. Thanks. test/hotspot/jtreg/compiler/vectorization/TestPopCountVectorLong.java line 30: > 28: * @requires ((os.arch=="x86" | os.arch=="i386" | os.arch=="amd64" | os.arch=="x86_64") & vm.cpu.features ~= ".*avx512bw.*") | > 29: * os.simpleArch == "aarch64" | > 30: * (os.arch == "riscv64" & vm.cpu.features ~= ".*zvbb,.*") Suggestion: `(os.arch == "riscv64" & vm.cpu.features ~= ".*zvbb.*")` The comma should not be there. See: https://bugs.openjdk.org/browse/JDK-8327689 ------------- Changes requested by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19065#pullrequestreview-2044401508 PR Review Comment: https://git.openjdk.org/jdk/pull/19065#discussion_r1593228927 From duke at openjdk.org Wed May 8 02:18:54 2024 From: duke at openjdk.org (Jin Guojie) Date: Wed, 8 May 2024 02:18:54 GMT Subject: RFR: 8331558: AArch64: optimize integer remainder [v2] In-Reply-To: References: Message-ID: <10lMuTPPge4MQ4zMaDTw_Oyt4vPDN5DSReV2RW6rkIU=.54435524-fe72-4260-90b6-c7872ba1dacb@github.com> On Tue, 7 May 2024 14:01:31 GMT, Bhavana Kilambi wrote: >> Jin Guojie has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: >> >> - Merge branch 'openjdk:master' into dev >> - Update vm_version_aarch64.hpp >> - 8331558: AArch64: optimize integer remainder >> >> On some Arm processors, a separate multiply/subtract is actually faster than the combined instruction. >> - 8331556: AArch64: CPU_Model support for Neoverse N1/N2/V1/V2 >> >> Add full platform coverage for Neoverse variants in vm_version.?pp > > src/hotspot/cpu/aarch64/macroAssembler_aarch64.hpp line 462: > >> 460: if (VM_Version::supports_a53mac() && Ra != zr) >> 461: nop(); >> 462: if (VM_Version::is_neoverse_n_series()) { > > Why only Neoverse N series? Even on the V series (V1 and V2), both `sdiv/udiv` and `msub` instructions are executed in M0 unit (Integer multi cycle). It should benefit the V series as well. Source: https://developer.arm.com/documentation/pjdoc466751330-9685/latest/ and https://developer.arm.com/documentation/PJDOC-466751330-593177/latest/ > > A quick run on a V1 machine shows ~15% performance gain for the `IntegerDivMod` tests if we generate separate `mul` and `sub` instructions instead of a single `msub`. Thanks for your review. This new commit includes the support for V1/V2 you mentioned. https://github.com/openjdk/jdk/pull/19093/commits/d8b8dbfe102d2716ef9e332aec7c52e566bf1727 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19093#discussion_r1593296677 From duke at openjdk.org Wed May 8 02:21:03 2024 From: duke at openjdk.org (jjscl8888) Date: Wed, 8 May 2024 02:21:03 GMT Subject: RFR: 8319548: Unexpected internal name for Filler array klass causes error in VisualVM In-Reply-To: References: Message-ID: <3N7b5H6FtKT1e5pk-IDnU4GtnV1oadvj461vyBwMfRw=.0d25265e-3fa6-4923-9fc9-f9a4ba840592@github.com> On Fri, 3 May 2024 12:50:45 GMT, Thomas Schatzl wrote: > (because the bot does not seem to forward the answer from the mailing list within a few hours; fwiw, it has been pure luck that I stumbled across that question within github): > > On 30.04.24 03:38, jjscl8888 wrote: > > > Thank you for your clarification. if the instance in question had no > > traffic but you observed a sudden increase in the old generation size > > at 2:35 in the graph, and subsequent garbage collections (GCs) did not > > reduce the size of the old generation back to its original value > > Collectors are fairly reluctant to give back memory to the OS. > > For G1 in particular, there are the options `MinHeapFreeRatio` and `MaxHeapFreeRatio` which to some degree steer commit and uncommit. > > * `MinHeapFreeRatio` is "The minimum percentage of heap free after GC to avoid expansion", i.e. minimum amount of memory should be kept free. Default is 40%, i.e. expands if less than that amount of memory is free. > * `MaxHeapFreeRatio` is "The maximum percentage of heap free after GC to avoid shrinking", i.e. maximum amount of memory that should be kept free. Default is 70%; i.e. only shrinks the heap if more than 70% of memory is free. > > Not sure the latter condition is met here to shrink, and without logs (`-Xlog:gc+ergo+heap=debug`) this is just a guess. Also, this kind of heap resizing (including shrinking) only occurs in the Remark pause. > > So to decrease the heap more aggressively, it might work to decrease `MaxHeapFreeRatio` (and probably `MinHeapFreeRatio` too because for such large heaps the default values are maybe not optimal). > > Hth, Thomas Thank you for your previous question. I have another inquiry regarding compiling the JDK source code. I've noticed that when I compile the JDK without selecting specific configure parameters, the resulting JDK size differs from the official version available on the website. I'm curious to know which configuration parameters were used for the official LTS (Long-Term Support) version of the JDK. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17155#issuecomment-2099609444 From duke at openjdk.org Wed May 8 02:26:58 2024 From: duke at openjdk.org (Jin Guojie) Date: Wed, 8 May 2024 02:26:58 GMT Subject: RFR: 8331558: AArch64: optimize integer remainder [v2] In-Reply-To: References: Message-ID: On Mon, 6 May 2024 10:08:30 GMT, Jin Guojie wrote: >> src/hotspot/cpu/aarch64/macroAssembler_aarch64.hpp line 447: >> >>> 445: inline void msub(Register Rd, Register Rn, Register Rm, Register Ra) { >>> 446: if (VM_Version::supports_a53mac() && Ra != zr) >>> 447: nop(); >> >> It was in JDK-8079203 [1] for the first time. May I ask what's the specials on a53mac? >> >> [1] https://github.com/openjdk/jdk/commit/a65f9f95894e22ce2fd160024ce46f6aaa6c8bd3 > > This code entered the JDK in 2015. Frankly, I have no idea why an extra nop is needed on CPUs with the a53mac feature. > Perhaps the author of patch a65f9f9589, enevill at openjdk.org, could explain? > It was in JDK-8079203 [1] for the first time. May I ask what's the specials on a53mac? > > [1] [a65f9f9](https://github.com/openjdk/jdk/commit/a65f9f95894e22ce2fd160024ce46f6aaa6c8bd3) @e1iu The feature is clearly described in this material: **Cortex-A53 MPCore Product Revision r0 - Software Developers Errata Notice** https://developer.arm.com/documentation/EPM048406/2000/?lang=en > 835769: AArch64 multiply-accumulate instruction might produce incorrect result > > Description > When executing in AArch64 state, some multiply-accumulate instructions which read an accumulator operand from the > result of an earlier multiply instruction might produce incorrect results. > > Workaround > The only viable workaround is to avoid any of these code sequences, typically by avoiding the use of multiply- > accumulate instructions, or by inserting a NOP between any adjacent load/store/prefetch instruction and multiply- > accumulate instruction with no data dependency between them. > ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19093#discussion_r1593300840 From lmao at openjdk.org Wed May 8 02:41:51 2024 From: lmao at openjdk.org (Liang Mao) Date: Wed, 8 May 2024 02:41:51 GMT Subject: RFR: 8331711: G1 doesn't need pre write barrier for stores from new allocated objects [v2] In-Reply-To: <0OdHsQmnM80KQib8u-yWtCSCejCTIK8lJ_bpLk3O_9E=.d727d825-882e-4574-84d9-6a908138066c@github.com> References: <0OdHsQmnM80KQib8u-yWtCSCejCTIK8lJ_bpLk3O_9E=.d727d825-882e-4574-84d9-6a908138066c@github.com> Message-ID: On Wed, 8 May 2024 01:38:46 GMT, Kim Barrett wrote: >> src/hotspot/share/gc/g1/c2/g1BarrierSetC2.cpp line 199: >> >>> 197: assert(val_type != nullptr, "need a type"); >>> 198: >>> 199: if (use_ReduceInitialCardMarks() && obj == kit->just_allocated_object(kit->control())) { >> >> This isn't correct. It doesn't ensure there aren't any stores to the same >> location between the allocation and this store that might need to be tracked. >> g1_can_remove_pre_barrier (below) does that. > > We've had lots of problems with safepoints sneaking into unexpected places. > That's why (Gen)ZGC uses late barrier expansion. That's also one of the > motivations for https://openjdk.org/jeps/475. > This isn't correct. It doesn't ensure there aren't any stores to the same location between the allocation and this store that might need to be tracked. g1_can_remove_pre_barrier (below) does that. Hi Kim, g1_can_remove_pre_barrier is not sufficient which only eliminates stores with previous "null" value from object initialization. But actually no matter the original value is null or not stores to new allocated objects don't need SATB barrier at all. SATB pre-write barriers are delete protection barrier to protect reference disconnected from graph which is not necessary for new allocated objects. Any stored pointers into new alllocated objects are guaranted not white. GenZGC uses the same way the store barrier could be removed only if allocations dominate without safepoint blocking. The pre-write barrier elimination should be same to post-write barrier for new allocated objects. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19098#discussion_r1593309336 From lmao at openjdk.org Wed May 8 02:46:55 2024 From: lmao at openjdk.org (Liang Mao) Date: Wed, 8 May 2024 02:46:55 GMT Subject: RFR: 8331711: G1 doesn't need pre write barrier for stores from new allocated objects [v2] In-Reply-To: References: <0OdHsQmnM80KQib8u-yWtCSCejCTIK8lJ_bpLk3O_9E=.d727d825-882e-4574-84d9-6a908138066c@github.com> Message-ID: On Wed, 8 May 2024 02:39:40 GMT, Liang Mao wrote: >> We've had lots of problems with safepoints sneaking into unexpected places. >> That's why (Gen)ZGC uses late barrier expansion. That's also one of the >> motivations for https://openjdk.org/jeps/475. > >> This isn't correct. It doesn't ensure there aren't any stores to the same location between the allocation and this store that might need to be tracked. g1_can_remove_pre_barrier (below) does that. > > Hi Kim, g1_can_remove_pre_barrier is not sufficient which only eliminates stores with previous "null" value from object initialization. But actually no matter the original value is null or not stores to new allocated objects don't need SATB barrier at all. SATB pre-write barriers are delete protection barrier to protect reference disconnected from graph which is not necessary for new allocated objects. Any stored pointers into new alllocated objects are guaranted not white. GenZGC uses the same way the store barrier could be removed only if allocations dominate without safepoint blocking. The pre-write barrier elimination should be same to post-write barrier for new allocated objects. > We've had lots of problems with safepoints sneaking into unexpected places. That's why (Gen)ZGC uses late barrier expansion. That's also one of the motivations for https://openjdk.org/jeps/475. That's problems with Load barrier because Load nodes in C2 could be rescheduled into other basic blocks which would cause safepoint between Load barrier and Load. Store nodes don't have this problem. This PR doesn't conflict with JEP475 and would reduce the barrier data in very early stage. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19098#discussion_r1593311684 From Bruno.Borges at microsoft.com Wed May 8 03:25:30 2024 From: Bruno.Borges at microsoft.com (Bruno Borges) Date: Wed, 8 May 2024 03:25:30 +0000 Subject: Discuss: Prevent jlink runtimes from reading _JAVA_OPTIONS Message-ID: In this Reddit discussion [1], the user complains that a jlinked runtime of their application, packaged with jpackage, was failing to some degree due to the environment variable _JAVA_OPTIONS being set somewhere else in the system. I do agree with the user that a runtime shipped as a built-in component of a Java-based standalone application should not have its properties altered due to a magical environment variable. I'd like to ask if it is reasonable to suggest that in the case of a jlinked runtime, this should not happen. [1] https://www.reddit.com/r/java/s/4nF4S1Kpgb Thanks, Bruno -------------- next part -------------- An HTML attachment was scrubbed... URL: From eliu at openjdk.org Wed May 8 03:26:53 2024 From: eliu at openjdk.org (Eric Liu) Date: Wed, 8 May 2024 03:26:53 GMT Subject: RFR: 8331558: AArch64: optimize integer remainder [v3] In-Reply-To: References: Message-ID: On Wed, 8 May 2024 01:04:37 GMT, Jin Guojie wrote: >> 8331558: AArch64: optimize integer remainder >> On some Arm processors, a separate multiply/subtract is actually faster than the combined instruction. >> >> 8331556: AArch64: CPU_Model support for Neoverse N1/N2/V1/V2 >> Add full platform coverage for Neoverse variants in vm_version.?pp >> >> The following test has passed, which shows definite performance improvement. >> >> make test TEST="micro:java.lang.IntegerDivMod" >> make test TEST="micro:java.lang.LongDivMod" >> >> * IntegerDivMod.testDivideRemainderUnsigned >> baseline(ns/ops) 2223 >> with this pacth(ns/ops) 1885 >> improvement(%) 17.93% >> >> * IntegerDivMod.testRemainderUnsigned >> baseline(ns/ops) 2225 >> with this pacth(ns/ops) 1885 >> improvement(%) 18.03% >> >> * LongDivMod.testDivideRemainderUnsigned >> baseline(ns/ops) 2231 >> with this pacth(ns/ops) 1894 >> improvement(%) 17.79% >> >> * LongDivMod.testRemainderUnsigned >> baseline(ns/ops) 2232 >> with this pacth(ns/ops) 1891 >> improvement(%) 18.03% > > Jin Guojie has updated the pull request incrementally with one additional commit since the last revision: > > Applicable platforms expanded to the entire neoverse family > > Even on the V series (V1 and V2), both sdiv/udiv and msub instructions are executed in M0 unit (Integer multi cycle). It should benefit the V series as well. src/hotspot/cpu/aarch64/macroAssembler_aarch64.hpp line 465: > 463: /* On Neoverse, MSUB uses the same ALU with SDIV. > 464: * The combination of MUL/SUB can utilize multiple ALUs, > 465: * and is much faster than MSUB. */ Please refine this comment. I suppose this combination can benefit other situiations which multiple instructions grab the M0, not just for MSUB + SDIV. src/hotspot/cpu/aarch64/vm_version_aarch64.hpp line 181: > 179: (model_is(CPU_MODEL_NEOVERSE_V1) || model_is(CPU_MODEL_NEOVERSE_V2)); > 180: } > 181: Not in used. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19093#discussion_r1593330024 PR Review Comment: https://git.openjdk.org/jdk/pull/19093#discussion_r1593312244 From eliu at openjdk.org Wed May 8 03:30:52 2024 From: eliu at openjdk.org (Eric Liu) Date: Wed, 8 May 2024 03:30:52 GMT Subject: RFR: 8331558: AArch64: optimize integer remainder [v2] In-Reply-To: <10lMuTPPge4MQ4zMaDTw_Oyt4vPDN5DSReV2RW6rkIU=.54435524-fe72-4260-90b6-c7872ba1dacb@github.com> References: <10lMuTPPge4MQ4zMaDTw_Oyt4vPDN5DSReV2RW6rkIU=.54435524-fe72-4260-90b6-c7872ba1dacb@github.com> Message-ID: On Wed, 8 May 2024 02:16:32 GMT, Jin Guojie wrote: >> src/hotspot/cpu/aarch64/macroAssembler_aarch64.hpp line 462: >> >>> 460: if (VM_Version::supports_a53mac() && Ra != zr) >>> 461: nop(); >>> 462: if (VM_Version::is_neoverse_n_series()) { >> >> Why only Neoverse N series? Even on the V series (V1 and V2), both `sdiv/udiv` and `msub` instructions are executed in M0 unit (Integer multi cycle). It should benefit the V series as well. Source: https://developer.arm.com/documentation/pjdoc466751330-9685/latest/ and https://developer.arm.com/documentation/PJDOC-466751330-593177/latest/ >> >> A quick run on a V1 machine shows ~15% performance gain for the `IntegerDivMod` tests if we generate separate `mul` and `sub` instructions instead of a single `msub`. > > Thanks for your review. > This new commit includes the support for V1/V2 you mentioned. > > https://github.com/openjdk/jdk/pull/19093/commits/d8b8dbfe102d2716ef9e332aec7c52e566bf1727 > Why only Neoverse N series? Even on the V series (V1 and V2), both `sdiv/udiv` and `msub` instructions are executed in M0 unit (Integer multi cycle). It should benefit the V series as well. Source: https://developer.arm.com/documentation/pjdoc466751330-9685/latest/ and https://developer.arm.com/documentation/PJDOC-466751330-593177/latest/ > > A quick run on a V1 machine shows ~15% performance gain for the `IntegerDivMod` tests if we generate separate `mul` and `sub` instructions instead of a single `msub`. Not sure if this can benefit V3, since MSUB can use M rather M0. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19093#discussion_r1593337735 From Alan.Bateman at oracle.com Wed May 8 04:43:27 2024 From: Alan.Bateman at oracle.com (Alan Bateman) Date: Wed, 8 May 2024 05:43:27 +0100 Subject: Discuss: Prevent jlink runtimes from reading _JAVA_OPTIONS In-Reply-To: References: Message-ID: <97917bce-8e4e-48d6-a459-5dc166a7b288@oracle.com> On 08/05/2024 04:25, Bruno Borges wrote: > In this Reddit discussion [1], the user complains that a jlinked > runtime of their application, packaged with jpackage, was failing to > some degree due to the environment variable _JAVA_OPTIONS being set > somewhere else in the system. > > I do agree with the user that a runtime shipped as a built-in > component of a Java-based standalone application should not have its > properties altered due to a magical environment variable. > > I'd like to ask if it is reasonable to suggest that in the case of a > jlinked runtime, this should not happen. > There was another thread about this a few days ago [1]. -Alan [1] https://mail.openjdk.org/pipermail/hotspot-dev/2024-May/088245.html From Bruno.Borges at microsoft.com Wed May 8 05:00:45 2024 From: Bruno.Borges at microsoft.com (Bruno Borges) Date: Wed, 8 May 2024 05:00:45 +0000 Subject: [EXTERNAL] Re: Discuss: Prevent jlink runtimes from reading _JAVA_OPTIONS In-Reply-To: <97917bce-8e4e-48d6-a459-5dc166a7b288@oracle.com> References: <97917bce-8e4e-48d6-a459-5dc166a7b288@oracle.com> Message-ID: Thanks Alan. I'll follow up there Sent from mobile device. ________________________________ From: Alan Bateman Sent: Tuesday, May 7, 2024 9:43:27 PM To: Bruno Borges ; hotspot-dev at openjdk.org Subject: [EXTERNAL] Re: Discuss: Prevent jlink runtimes from reading _JAVA_OPTIONS On 08/05/2024 04:25, Bruno Borges wrote: > In this Reddit discussion [1], the user complains that a jlinked > runtime of their application, packaged with jpackage, was failing to > some degree due to the environment variable _JAVA_OPTIONS being set > somewhere else in the system. > > I do agree with the user that a runtime shipped as a built-in > component of a Java-based standalone application should not have its > properties altered due to a magical environment variable. > > I'd like to ask if it is reasonable to suggest that in the case of a > jlinked runtime, this should not happen. > There was another thread about this a few days ago [1]. -Alan [1] https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmail.openjdk.org%2Fpipermail%2Fhotspot-dev%2F2024-May%2F088245.html&data=05%7C02%7CBruno.Borges%40microsoft.com%7Cf45d571f168c44eb82b908dc6f196f34%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638507402252381755%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=ZCiuEnWjSdSqRLH0luxNRksK0wYm2IGSzDwuvSQkHFY%3D&reserved=0 -------------- next part -------------- An HTML attachment was scrubbed... URL: From fyang at openjdk.org Wed May 8 05:14:53 2024 From: fyang at openjdk.org (Fei Yang) Date: Wed, 8 May 2024 05:14:53 GMT Subject: RFR: 8326306: RISC-V: Re-structure MASM calls and jumps [v8] In-Reply-To: References: Message-ID: On Tue, 7 May 2024 14:10:03 GMT, Robbin Ehn wrote: >> src/hotspot/cpu/riscv/jniFastGetField_riscv.cpp line 178: >> >>> 176: ExternalAddress target(slow_case_addr); >>> 177: __ relocate(target.rspec(), [&] { >>> 178: __ call(target.target()); >> >> Should we revert this change after your last commit? As I think call is now not necessarily la + jalr. > > The addresses should come from JNI_FastGetField::generate_fast_get_XXX_field0. > So slow_case_addr is not really an ExternalAddress. > This means call will always do auipc + jalr as you know (intra code cache). Hmm... So I did a quick try on linux-aarch64 invoking `CodeCache::contains` on slow_case_addr and the result is false. Anything I missed? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18942#discussion_r1593392866 From eosterlund at openjdk.org Wed May 8 05:32:52 2024 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Wed, 8 May 2024 05:32:52 GMT Subject: RFR: 8331711: G1 doesn't need pre write barrier for stores from new allocated objects [v2] In-Reply-To: References: <0OdHsQmnM80KQib8u-yWtCSCejCTIK8lJ_bpLk3O_9E=.d727d825-882e-4574-84d9-6a908138066c@github.com> Message-ID: On Wed, 8 May 2024 02:44:03 GMT, Liang Mao wrote: >>> This isn't correct. It doesn't ensure there aren't any stores to the same location between the allocation and this store that might need to be tracked. g1_can_remove_pre_barrier (below) does that. >> >> Hi Kim, g1_can_remove_pre_barrier is not sufficient which only eliminates stores with previous "null" value from object initialization. But actually no matter the original value is null or not stores to new allocated objects don't need SATB barrier at all. SATB pre-write barriers are delete protection barrier to protect reference disconnected from graph which is not necessary for new allocated objects. Any stored pointers into new alllocated objects are guaranted not white. GenZGC uses the same way the store barrier could be removed only if allocations dominate without safepoint blocking. The pre-write barrier elimination should be same to post-write barrier for new allocated objects. > >> We've had lots of problems with safepoints sneaking into unexpected places. That's why (Gen)ZGC uses late barrier expansion. That's also one of the motivations for https://openjdk.org/jeps/475. > > That's problems with Load barrier because Load nodes in C2 could be rescheduled into other basic blocks which would cause safepoint between Load barrier and Load. Store nodes don't have this problem. This PR doesn't conflict with JEP475 and would reduce the barrier data in very early stage. I think that store capturing of initializing stores already removes most of the barriers of this category. We do that a bit later on. We find initializing stores onto newly allocated objects, and replace the store with barriers, with a store without barriers. That one usually elides a large portion of store barriers. Did you find any example where you have a newly allocated object with stores that are not initializing, hence not elided, which unnecessarily invoked a pre-write barrier, but not a post-write barrier? Just trying to find out what the problem space is, that this fixes. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19098#discussion_r1593404114 From lmao at openjdk.org Wed May 8 06:13:53 2024 From: lmao at openjdk.org (Liang Mao) Date: Wed, 8 May 2024 06:13:53 GMT Subject: RFR: 8331711: G1 doesn't need pre write barrier for stores from new allocated objects [v2] In-Reply-To: References: <0OdHsQmnM80KQib8u-yWtCSCejCTIK8lJ_bpLk3O_9E=.d727d825-882e-4574-84d9-6a908138066c@github.com> Message-ID: On Wed, 8 May 2024 05:30:04 GMT, Erik ?sterlund wrote: > I think that store capturing of initializing stores already removes most of the barriers of this category. We do that a bit later on. We find initializing stores onto newly allocated objects, and replace the store with barriers, with a store without barriers. That one usually elides a large portion of store barriers. Did you find any example where you have a newly allocated object with stores that are not initializing, hence not elided, which unnecessarily invoked a pre-write barrier, but not a post-write barrier? Just trying to find out what the problem space is, that this fixes. Hi Erik, I found examples not filtered by g1_can_remove_pre_barrier in testing. But I just did some statistics on SPECjbb2015 that if g1_can_remove_pre_barrier ran first it would elide most of the pre-barriers and "obj == kit->just_allocated_object" only found very few remaining opportunities. If we run condition "obj == kit->just_allocated_object" first, it would cover ~30% opportunities. I think technically this PR should be correct but it's up to reviewers to decide if we practically need it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19098#discussion_r1593436110 From rehn at openjdk.org Wed May 8 06:27:53 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Wed, 8 May 2024 06:27:53 GMT Subject: RFR: 8326306: RISC-V: Re-structure MASM calls and jumps [v8] In-Reply-To: References: Message-ID: On Wed, 8 May 2024 05:12:00 GMT, Fei Yang wrote: >> The addresses should come from JNI_FastGetField::generate_fast_get_XXX_field0. >> So slow_case_addr is not really an ExternalAddress. >> This means call will always do auipc + jalr as you know (intra code cache). > > Hmm... So I did a quick try on linux-aarch64 invoking `CodeCache::contains` on slow_case_addr and the result is false. Anything I missed? The JNI_FastGetField::generate_fast_get_XXX_field0 write the code in a CodeBuffer, this where slow_case_addr points to. As I test with ReservedCodeCacheSize=2047M if we generated a li() relocation would fail. (there is usually around 120 MB between them) Added your assert, it passes also. ExternalAddress target(slow_case_addr); + assert(CodeCache::contains(slow_case_addr), "Must be"); __ relocate(target.rspec(), [&] { I guess you are running on apple, the code concerning "static_fast_get_field_wrapper" smells. Maybe you found a bug here. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18942#discussion_r1593449303 From mbaesken at openjdk.org Wed May 8 07:07:56 2024 From: mbaesken at openjdk.org (Matthias Baesken) Date: Wed, 8 May 2024 07:07:56 GMT Subject: RFR: 8331789: ubsan: deoptimization.cpp:403:29: runtime error: load of value 208, which is not a valid value for type 'bool' In-Reply-To: References: Message-ID: On Tue, 7 May 2024 13:55:16 GMT, Matthias Baesken wrote: > When using ubsan (configure flag --enable-ubsan) on macOS x86_64 we run into this error : > > /jdk/src/hotspot/share/runtime/deoptimization.cpp:403:29: runtime error: load of value 208, which is not a valid value for type 'bool' > #0 0x10247693e in restore_eliminated_locks(JavaThread*, GrowableArray*, bool, frame&, int, bool&) deoptimization.cpp:403 > #1 0x102474b6f in Deoptimization::fetch_unroll_info_helper(JavaThread*, int) deoptimization.cpp:552 > #2 0x10247fae9 in Deoptimization::uncommon_trap(JavaThread*, int, int) deoptimization.cpp:2624 > #3 0x12846ab80 () > > Reason might be an uninitialized bool variable on one code path, which is unused in the calling function anyway. Hi Axel and Stefan, thanks for the reviews ! ------------- PR Comment: https://git.openjdk.org/jdk/pull/19121#issuecomment-2099886159 From mbaesken at openjdk.org Wed May 8 07:07:56 2024 From: mbaesken at openjdk.org (Matthias Baesken) Date: Wed, 8 May 2024 07:07:56 GMT Subject: Integrated: 8331789: ubsan: deoptimization.cpp:403:29: runtime error: load of value 208, which is not a valid value for type 'bool' In-Reply-To: References: Message-ID: On Tue, 7 May 2024 13:55:16 GMT, Matthias Baesken wrote: > When using ubsan (configure flag --enable-ubsan) on macOS x86_64 we run into this error : > > /jdk/src/hotspot/share/runtime/deoptimization.cpp:403:29: runtime error: load of value 208, which is not a valid value for type 'bool' > #0 0x10247693e in restore_eliminated_locks(JavaThread*, GrowableArray*, bool, frame&, int, bool&) deoptimization.cpp:403 > #1 0x102474b6f in Deoptimization::fetch_unroll_info_helper(JavaThread*, int) deoptimization.cpp:552 > #2 0x10247fae9 in Deoptimization::uncommon_trap(JavaThread*, int, int) deoptimization.cpp:2624 > #3 0x12846ab80 () > > Reason might be an uninitialized bool variable on one code path, which is unused in the calling function anyway. This pull request has now been integrated. Changeset: 2baacfc1 Author: Matthias Baesken URL: https://git.openjdk.org/jdk/commit/2baacfc16916220846743c6e49a99a6c41cac510 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod 8331789: ubsan: deoptimization.cpp:403:29: runtime error: load of value 208, which is not a valid value for type 'bool' Reviewed-by: stefank, aboldtch ------------- PR: https://git.openjdk.org/jdk/pull/19121 From aboldtch at openjdk.org Wed May 8 07:49:55 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Wed, 8 May 2024 07:49:55 GMT Subject: RFR: 8331098: [Aarch64] Fix crash in Arrays.equals() intrinsic with -CCP [v6] In-Reply-To: References: <_HzINQ0atD5BmBbIZ6A4A5y1wNvwsvrBxAiaz2Mk9rY=.43cde0ae-1179-4708-afa1-fda64039d722@github.com> <1BfXhOjLGLcOI7zwKBMp5MiE0NgHRJ0rTfgDBRDIDPw=.a9ac9448-5f14-44b7-b4a3-d831b596e6d6@github.com> Message-ID: On Wed, 8 May 2024 07:18:06 GMT, Axel Boldt-Christmas wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Unify impls for +/-CCP > > src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp line 5558: > >> 5556: int start_offset = length_is_8aligned ? length_offset : klass_offset; >> 5557: assert(is_aligned(start_offset, BytesPerWord), "start offset must be 8-byte-aligned"); >> 5558: int extra_length = length_is_8aligned ? base_offset - length_offset : base_offset - klass_offset; > > Some of these asserts and comments seem a little bit out of place, inaccurate and/or incomplete with the new change to include the klass in the comparison. > > Suggestion: > > // When the length offset is not aligned to 8 bytes, > // then we let the compare loop include the klass > // and length, otherwise start from the length. > bool length_is_8aligned = is_aligned(length_offset, BytesPerWord); > int start_offset = length_is_8aligned ? length_offset : klass_offset; > assert(is_aligned(start_offset, BytesPerWord), "start offset must be 8-byte-aligned"); > int extra_length = base_offset - start_offset; > assert(!length_is_8aligned || extra_length == BytesPerInt, > "no padding between length and base"); > assert(length_is_8aligned || extra_length == BytesPerWord, > "no padding between klass, length and base"); Also right now the `length_is_8aligned` very much now just look like a `!oopDesc::has_klass_gap()` check. The validity of choosing to start from the klass when `!length_is_8aligned` is that we must then be using `UseCompressedClassPointers /* && !UseCompactObjectHeaders for Lilliput */`. Until Lilliput starting from the Klass is valid for both `+UseCompressedClassPointers` and `-UseCompressedClassPointers` as the klass, length and base will always be tightly packed (no padding) for byte and char type arrays. In all modes what we effectively do is `int start_offset = align_down(length_offset, BytesPerWord )`. Not sure if the intent gets clearer then. Something along the lines of: Suggestion: // When the length offset is not aligned to 8 bytes, // then we align it down, this is valid as the new // offset will always be the klass which is the same // for type arrays. int start_offset = align_down(length_offset, BytesPerWord); int extra_length = base_offset - start_offset; assert(start_offset == length_offset || start_offset == klass_offset, "start offset must be 8-byte-aligned or be the klass offset"); assert(base_offset != start_offset, "must include the length field"); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18948#discussion_r1593537114 From aboldtch at openjdk.org Wed May 8 07:49:54 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Wed, 8 May 2024 07:49:54 GMT Subject: RFR: 8331098: [Aarch64] Fix crash in Arrays.equals() intrinsic with -CCP [v6] In-Reply-To: <1BfXhOjLGLcOI7zwKBMp5MiE0NgHRJ0rTfgDBRDIDPw=.a9ac9448-5f14-44b7-b4a3-d831b596e6d6@github.com> References: <_HzINQ0atD5BmBbIZ6A4A5y1wNvwsvrBxAiaz2Mk9rY=.43cde0ae-1179-4708-afa1-fda64039d722@github.com> <1BfXhOjLGLcOI7zwKBMp5MiE0NgHRJ0rTfgDBRDIDPw=.a9ac9448-5f14-44b7-b4a3-d831b596e6d6@github.com> Message-ID: On Tue, 7 May 2024 14:00:07 GMT, Roman Kennke wrote: >> The implementations of Arrays.equals() in macroAssembler_aarch64.cpp, MacroAssembler::arrays_equals() assumes that the start of arrays is 8-byte-aligned. Since [JDK-8139457](https://bugs.openjdk.org/browse/JDK-8139457) this is no longer the case, at least when running with -CompressedClassPointers (or Lilliput). The effect is that the loops may run over the array end, and if the array is at heap boundary, and that memory is unmapped, then it may crash. >> >> The proposed fix aims to always enter the main loop(s) with an aligned address: >> - When the array base is 8-byte-aligned (default, with +CCP), then compare the array lengths separately, then enter the main loop with the array base. >> - When the array base is not 8-byte-aligned (-CCP and Lilliput), then enter the loop with the address of the array-length (which is then 8-byte-aligned), and compare array lengths in the main loop, and elide the explicit array lengths comparison. >> >> Testing: >> - [x] tier1 (+CCP) >> - [x] tier1 (-CCP) >> - [x] tier2 (+CCP) >> - [x] tier2 (-CCP) > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Unify impls for +/-CCP The change to include the class looked a little scary to me. Both that this code makes even more assumptions of the object layout (without asserts), and with its interactions/meaning in Lilliput. Because this is only used for byte and char type arrays including the klass should be harmless. I have two different suggestions, I believe both will work in Lilliput as well. The second is a bit cleaner imo. Regardless the comments and asserts should be cleaned up. src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp line 5558: > 5556: int start_offset = length_is_8aligned ? length_offset : klass_offset; > 5557: assert(is_aligned(start_offset, BytesPerWord), "start offset must be 8-byte-aligned"); > 5558: int extra_length = length_is_8aligned ? base_offset - length_offset : base_offset - klass_offset; Some of these asserts and comments seem a little bit out of place, inaccurate and/or incomplete with the new change to include the klass in the comparison. Suggestion: // When the length offset is not aligned to 8 bytes, // then we let the compare loop include the klass // and length, otherwise start from the length. bool length_is_8aligned = is_aligned(length_offset, BytesPerWord); int start_offset = length_is_8aligned ? length_offset : klass_offset; assert(is_aligned(start_offset, BytesPerWord), "start offset must be 8-byte-aligned"); int extra_length = base_offset - start_offset; assert(!length_is_8aligned || extra_length == BytesPerInt, "no padding between length and base"); assert(length_is_8aligned || extra_length == BytesPerWord, "no padding between klass, length and base"); src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp line 5593: > 5591: bind(A_IS_NOT_NULL); > 5592: ldrw(cnt1, Address(a1, length_offset)); > 5593: assert(extra_length != 0, "expect extra length"); Not needed with my other suggested asserts. Suggestion: src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp line 5660: > 5658: ldrw(cnt1, Address(a1, length_offset)); > 5659: cbz(a2, DONE); > 5660: assert(extra_length != 0, "expect extra length"); Not needed with my other suggested asserts. Suggestion: ------------- Changes requested by aboldtch (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18948#pullrequestreview-2044827759 PR Review Comment: https://git.openjdk.org/jdk/pull/18948#discussion_r1593510196 PR Review Comment: https://git.openjdk.org/jdk/pull/18948#discussion_r1593510782 PR Review Comment: https://git.openjdk.org/jdk/pull/18948#discussion_r1593511187 From kevinw at openjdk.org Wed May 8 07:57:08 2024 From: kevinw at openjdk.org (Kevin Walls) Date: Wed, 8 May 2024 07:57:08 GMT Subject: RFR: 8314225: SIGSEGV in JavaThread::is_lock_owned [v9] In-Reply-To: References: Message-ID: > Removal of JavaThread's MonitorChunks member. This held lock information during deoptimization, but access to it is unnecessary for anything other than the deoptimization itself. > > Access to it in is_lock_owned() was racy, and caused rare crashes. Kevin Walls has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 14 additional commits since the last revision: - Merge remote-tracking branch 'upstream/master' into 8314225_is_lock_owned_no_monitor_chunks_check - Add back the null checks in unpack_on_stack asserts - Merge remote-tracking branch 'upstream/master' into 8314225_is_lock_owned_no_monitor_chunks_check - fill_in assert update - JavaThread comment update and synchronizer check before cast - monitor->owner() == nullptr handling in fill_in - Missing include - Move is_lock_owned from Thread to JavaThread - Remove JavaThread's is_lock_owned - Feedback from Dean - ... and 4 more: https://git.openjdk.org/jdk/compare/0e646f04...f4fe65d8 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18940/files - new: https://git.openjdk.org/jdk/pull/18940/files/b5380800..f4fe65d8 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18940&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18940&range=07-08 Stats: 3410 lines in 147 files changed: 2797 ins; 268 del; 345 mod Patch: https://git.openjdk.org/jdk/pull/18940.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18940/head:pull/18940 PR: https://git.openjdk.org/jdk/pull/18940 From stefank at openjdk.org Wed May 8 07:59:53 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Wed, 8 May 2024 07:59:53 GMT Subject: RFR: 8326957: Implement JEP 474: ZGC: Generational Mode by Default [v4] In-Reply-To: References: Message-ID: On Fri, 3 May 2024 17:25:45 GMT, Tom Rodriguez wrote: > Graal still doesn't support generational ZGC though I'm actively working on it. I'm hoping to have it in before rampdown but at least until that time JVMCI needs to default to non-generational ZGC. Great that you are working on support for Generational ZGC! The intent of this JEP is to make it so that you get Generational ZGC when you specify -XX:+UseZGC, and then warn that non-generational ZGC is deprecated and warn if the user explicitly turns it on. I don't want us to change the meaning of -XX:+UseZGC when the Graal JIT is enabled. Doing so would be inconsistent, cause confusion, and could be misleading to our users. In fact, we have already seen instances of a somewhat opposite problem. Today if you try to run Generational ZGC with the Graal JIT, you get Generational ZGC but revert back to using C2. We print a warning when that happens, but we have seen users not realizing that they ran C2 instead of the Graal JIT. My proposal is to simply refuse to start the JVM and print an error message when we have conflicting GC and JIT compiler flags. With this we would have: * -XX:+UseZGC -XX: => Refuse to start with error message * -XX:+UseZGC -XX:-ZGenerational -XX: => Same as above * -XX:+UseZGC -XX:-ZGenerational -XX: => Start (with message about deprecation of non-generational ZGC) With this we would both get rid of the confusing situation where the user asks for Graal JIT but get C2 and the other confusing situation where the user asks for Generational ZGC but gets non-generational ZGC. The user will have to make a decision on what combination they want to run. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18393#issuecomment-2099976099 From kevinw at openjdk.org Wed May 8 08:02:54 2024 From: kevinw at openjdk.org (Kevin Walls) Date: Wed, 8 May 2024 08:02:54 GMT Subject: RFR: 8314225: SIGSEGV in JavaThread::is_lock_owned [v9] In-Reply-To: References: Message-ID: On Wed, 8 May 2024 07:57:08 GMT, Kevin Walls wrote: >> Removal of JavaThread's MonitorChunks member. This held lock information during deoptimization, but access to it is unnecessary for anything other than the deoptimization itself. >> >> Access to it in is_lock_owned() was racy, and caused rare crashes. > > Kevin Walls has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 14 additional commits since the last revision: > > - Merge remote-tracking branch 'upstream/master' into 8314225_is_lock_owned_no_monitor_chunks_check > - Add back the null checks in unpack_on_stack asserts > - Merge remote-tracking branch 'upstream/master' into 8314225_is_lock_owned_no_monitor_chunks_check > - fill_in assert update > - JavaThread comment update and synchronizer check before cast > - monitor->owner() == nullptr handling in fill_in > - Missing include > - Move is_lock_owned from Thread to JavaThread > - Remove JavaThread's is_lock_owned > - Feedback from Dean > - ... and 4 more: https://git.openjdk.org/jdk/compare/e2822004...f4fe65d8 On the null checks in the unpack_on_stack asserts: I'm putting them back 8-) In vframearrayelememnt::fill_in we itereate MonitorInfo* That gives us the is_scalar_replaced information, which means the MonitorInfo has a null owner(), and there is an incoming param realloc_failures which can tell us if something in the preceeding Deoptimization::realloc_objects call had an allocation failure, so something has a null oop (which could be a monitor owner I think we can say). If the MonitorInfo is_scalar_replaced is true, then we do dest->set_obj(nullptr); That should leave vframeArrayElement::unpack_on_stack seeing a null src->obj() where we iterate at line 315. In unpack_on_stack there is no realloc_failures flag, and we aren't using a MonitorInfo so we don't have is_scalar_replaced to check. It just has a BasicObjectLock that it pulls from the MonitorChunks array (and those aren't specifically initialized, other than what we set in fill_in). unpack_on_stack has always called src->lock()->move_to(src->obj(), top->lock()); without checking src->obj(), and this works, but there are decisions in move_to that make that work. There should be a follow-up bug on making sure this is works intentionally, which I can log. So should there be a src->obj() == nullptr check in the unpack_on_stack asserts? Looks like yes. assert(src->obj() == nullptr || ObjectSynchronizer::current_thread_holds_lock(thread, Handle(thread, src->obj())), "should be held, before/after move_to"); ------------- PR Comment: https://git.openjdk.org/jdk/pull/18940#issuecomment-2099981747 From rkennke at openjdk.org Wed May 8 08:24:18 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Wed, 8 May 2024 08:24:18 GMT Subject: RFR: 8331098: [Aarch64] Fix crash in Arrays.equals() intrinsic with -CCP [v7] In-Reply-To: <_HzINQ0atD5BmBbIZ6A4A5y1wNvwsvrBxAiaz2Mk9rY=.43cde0ae-1179-4708-afa1-fda64039d722@github.com> References: <_HzINQ0atD5BmBbIZ6A4A5y1wNvwsvrBxAiaz2Mk9rY=.43cde0ae-1179-4708-afa1-fda64039d722@github.com> Message-ID: <0nktj7UneZcBYuVA5uR-3JgYn1bo-H3Cpc4lIR93zeI=.603ba9f3-5480-4e55-a696-b62f9c299722@github.com> > The implementations of Arrays.equals() in macroAssembler_aarch64.cpp, MacroAssembler::arrays_equals() assumes that the start of arrays is 8-byte-aligned. Since [JDK-8139457](https://bugs.openjdk.org/browse/JDK-8139457) this is no longer the case, at least when running with -CompressedClassPointers (or Lilliput). The effect is that the loops may run over the array end, and if the array is at heap boundary, and that memory is unmapped, then it may crash. > > The proposed fix aims to always enter the main loop(s) with an aligned address: > - When the array base is 8-byte-aligned (default, with +CCP), then compare the array lengths separately, then enter the main loop with the array base. > - When the array base is not 8-byte-aligned (-CCP and Lilliput), then enter the loop with the address of the array-length (which is then 8-byte-aligned), and compare array lengths in the main loop, and elide the explicit array lengths comparison. > > Testing: > - [x] tier1 (+CCP) > - [x] tier1 (-CCP) > - [x] tier2 (+CCP) > - [x] tier2 (-CCP) Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: @xmas92 review ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18948/files - new: https://git.openjdk.org/jdk/pull/18948/files/a812f698..88da6b5d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18948&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18948&range=05-06 Stats: 13 lines in 1 file changed: 0 ins; 4 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/18948.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18948/head:pull/18948 PR: https://git.openjdk.org/jdk/pull/18948 From rkennke at openjdk.org Wed May 8 08:24:18 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Wed, 8 May 2024 08:24:18 GMT Subject: RFR: 8331098: [Aarch64] Fix crash in Arrays.equals() intrinsic with -CCP [v6] In-Reply-To: References: <_HzINQ0atD5BmBbIZ6A4A5y1wNvwsvrBxAiaz2Mk9rY=.43cde0ae-1179-4708-afa1-fda64039d722@github.com> <1BfXhOjLGLcOI7zwKBMp5MiE0NgHRJ0rTfgDBRDIDPw=.a9ac9448-5f14-44b7-b4a3-d831b596e6d6@github.com> Message-ID: On Wed, 8 May 2024 07:40:52 GMT, Axel Boldt-Christmas wrote: >> src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp line 5558: >> >>> 5556: int start_offset = length_is_8aligned ? length_offset : klass_offset; >>> 5557: assert(is_aligned(start_offset, BytesPerWord), "start offset must be 8-byte-aligned"); >>> 5558: int extra_length = length_is_8aligned ? base_offset - length_offset : base_offset - klass_offset; >> >> Some of these asserts and comments seem a little bit out of place, inaccurate and/or incomplete with the new change to include the klass in the comparison. >> >> Suggestion: >> >> // When the length offset is not aligned to 8 bytes, >> // then we let the compare loop include the klass >> // and length, otherwise start from the length. >> bool length_is_8aligned = is_aligned(length_offset, BytesPerWord); >> int start_offset = length_is_8aligned ? length_offset : klass_offset; >> assert(is_aligned(start_offset, BytesPerWord), "start offset must be 8-byte-aligned"); >> int extra_length = base_offset - start_offset; >> assert(!length_is_8aligned || extra_length == BytesPerInt, >> "no padding between length and base"); >> assert(length_is_8aligned || extra_length == BytesPerWord, >> "no padding between klass, length and base"); > > Also right now the `length_is_8aligned` very much now just look like a `!oopDesc::has_klass_gap()` check. > > The validity of choosing to start from the klass when `!length_is_8aligned` is that we must then be using `UseCompressedClassPointers /* && !UseCompactObjectHeaders for Lilliput */`. Until Lilliput starting from the Klass is valid for both `+UseCompressedClassPointers` and `-UseCompressedClassPointers` as the klass, length and base will always be tightly packed (no padding) for byte and char type arrays. > > In all modes what we effectively do is `int start_offset = align_down(length_offset, BytesPerWord )`. Not sure if the intent gets clearer then. Something along the lines of: > Suggestion: > > // When the length offset is not aligned to 8 bytes, > // then we align it down, this is valid as the new > // offset will always be the klass which is the same > // for type arrays. > int start_offset = align_down(length_offset, BytesPerWord); > int extra_length = base_offset - start_offset; > assert(start_offset == length_offset || start_offset == klass_offset, > "start offset must be 8-byte-aligned or be the klass offset"); > assert(base_offset != start_offset, "must include the length field"); Right, not assuming layout of Klass* seems saner. BTW, including the Klass* is not only safe because it's only char[] or byte[], it's also already guaranteed to be the same by the calling APIs. And even if it were not (arraycopy-style) - if the Klass* could be different, then array-equality would have to check this, also, anyway. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18948#discussion_r1593613289 From kevinw at openjdk.org Wed May 8 08:30:30 2024 From: kevinw at openjdk.org (Kevin Walls) Date: Wed, 8 May 2024 08:30:30 GMT Subject: RFR: 8314225: SIGSEGV in JavaThread::is_lock_owned [v10] In-Reply-To: References: Message-ID: > Removal of JavaThread's MonitorChunks member. This held lock information during deoptimization, but access to it is unnecessary for anything other than the deoptimization itself. > > Access to it in is_lock_owned() was racy, and caused rare crashes. Kevin Walls has updated the pull request incrementally with one additional commit since the last revision: null nullptr oops ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18940/files - new: https://git.openjdk.org/jdk/pull/18940/files/f4fe65d8..47ddc4ba Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18940&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18940&range=08-09 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/18940.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18940/head:pull/18940 PR: https://git.openjdk.org/jdk/pull/18940 From kevinw at openjdk.org Wed May 8 08:35:55 2024 From: kevinw at openjdk.org (Kevin Walls) Date: Wed, 8 May 2024 08:35:55 GMT Subject: RFR: 8314225: SIGSEGV in JavaThread::is_lock_owned [v6] In-Reply-To: References: Message-ID: <2x4eSZf1pOpManAyWuWGaHN0jwg5TqGODoom64RT5GA=.ea9ab049-2594-49a2-9d28-f5b9639978f1@github.com> On Fri, 3 May 2024 02:01:51 GMT, Dean Long wrote: >> Kevin Walls has updated the pull request incrementally with one additional commit since the last revision: >> >> monitor->owner() == nullptr handling in fill_in > > src/hotspot/share/runtime/synchronizer.cpp line 1060: > >> 1058: // the ObjectMonitor. >> 1059: } else if (LockingMode == LM_LEGACY && mark.has_locker() >> 1060: && JavaThread::cast(current)->is_lock_owned((address)mark.locker())) { > > This looks risky. How about guarding it with a check for current->is_Java_thread()? Yes! > src/hotspot/share/runtime/vframeArray.cpp line 94: > >> 92: assert(!monitor->owner_is_scalar_replaced() || realloc_failures, "object should be reallocated already"); >> 93: BasicObjectLock* dest = _monitors->at(index); >> 94: if (monitor->owner_is_scalar_replaced() || monitor->owner() == nullptr) { > > The only way to get a null owner is if owner_is_scalar_replaced() is true: > https://github.com/openjdk/jdk/blob/6bef0474c8b8773d0d20c0f25c36a2ce9cdbd7e8/src/hotspot/share/runtime/stackValue.hpp#L52 > and to get this far with it still null means `realloc_failures` is true. We could avoid a null check later in unpack_on_stack if we skip adding to _monitors in this case. So maybe use a GrowableArray inside MonitorChunk and add elements using append(). > Suggestion: > > if (monitor->owner_is_scalar_replaced()) { Yes, no nullptr check needed there. As discussed, moving to a GrowableArray complicates things quite a lot. fill_in and unpack_on stack both make a similar iteration, but with slightly different information available. Growing the array on demand and making both loops variable involves more changes in MonitorChunk and its accessors. I think we're agreed to not be that disruptive for this change. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18940#discussion_r1593640314 PR Review Comment: https://git.openjdk.org/jdk/pull/18940#discussion_r1593639706 From bkilambi at openjdk.org Wed May 8 08:46:56 2024 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Wed, 8 May 2024 08:46:56 GMT Subject: RFR: 8331558: AArch64: optimize integer remainder [v2] In-Reply-To: References: <10lMuTPPge4MQ4zMaDTw_Oyt4vPDN5DSReV2RW6rkIU=.54435524-fe72-4260-90b6-c7872ba1dacb@github.com> Message-ID: On Wed, 8 May 2024 03:28:20 GMT, Eric Liu wrote: >> Thanks for your review. >> This new commit includes the support for V1/V2 you mentioned. >> >> https://github.com/openjdk/jdk/pull/19093/commits/d8b8dbfe102d2716ef9e332aec7c52e566bf1727 > >> Why only Neoverse N series? Even on the V series (V1 and V2), both `sdiv/udiv` and `msub` instructions are executed in M0 unit (Integer multi cycle). It should benefit the V series as well. Source: https://developer.arm.com/documentation/pjdoc466751330-9685/latest/ and https://developer.arm.com/documentation/PJDOC-466751330-593177/latest/ >> >> A quick run on a V1 machine shows ~15% performance gain for the `IntegerDivMod` tests if we generate separate `mul` and `sub` instructions instead of a single `msub`. > > Not sure if this can benefit V3, since MSUB can use M rather than M0. Agreed. MSUB can use either M0/M1 in case of V3. Also N3 is the same as N1/N2, so that's ok. I think the changes made right now, might be ok for the currently available N/V series but when support is added in the JDK for the next versions (like N3, V3, ... Nx, Vx) then these changes may or may not be applicable. It might be better to either test for specific cpu's instead of checking for "neoverse_family" (because when V3 support is added and is included in the neoverse_family, then this check will have to be modified) or have a separate method which tests if the given CPU matches N1,N2,V1,V2. In the future, it's just a matter of adding more versions to this method as and when applicable. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19093#discussion_r1593655473 From bkilambi at openjdk.org Wed May 8 08:51:55 2024 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Wed, 8 May 2024 08:51:55 GMT Subject: RFR: 8331558: AArch64: optimize integer remainder [v3] In-Reply-To: References: Message-ID: On Wed, 8 May 2024 01:04:37 GMT, Jin Guojie wrote: >> 8331558: AArch64: optimize integer remainder >> On some Arm processors, a separate multiply/subtract is actually faster than the combined instruction. >> >> 8331556: AArch64: CPU_Model support for Neoverse N1/N2/V1/V2 >> Add full platform coverage for Neoverse variants in vm_version.?pp >> >> The following test has passed, which shows definite performance improvement. >> >> make test TEST="micro:java.lang.IntegerDivMod" >> make test TEST="micro:java.lang.LongDivMod" >> >> * IntegerDivMod.testDivideRemainderUnsigned >> baseline(ns/ops) 2223 >> with this pacth(ns/ops) 1885 >> improvement(%) 17.93% >> >> * IntegerDivMod.testRemainderUnsigned >> baseline(ns/ops) 2225 >> with this pacth(ns/ops) 1885 >> improvement(%) 18.03% >> >> * LongDivMod.testDivideRemainderUnsigned >> baseline(ns/ops) 2231 >> with this pacth(ns/ops) 1894 >> improvement(%) 17.79% >> >> * LongDivMod.testRemainderUnsigned >> baseline(ns/ops) 2232 >> with this pacth(ns/ops) 1891 >> improvement(%) 18.03% > > Jin Guojie has updated the pull request incrementally with one additional commit since the last revision: > > Applicable platforms expanded to the entire neoverse family > > Even on the V series (V1 and V2), both sdiv/udiv and msub instructions are executed in M0 unit (Integer multi cycle). It should benefit the V series as well. src/hotspot/cpu/aarch64/macroAssembler_aarch64.hpp line 448: > 446: if (VM_Version::supports_a53mac() && Ra != zr) > 447: nop(); > 448: if (VM_Version::is_neoverse_family()) { Thanks for changing. However, currently neoverse_family includes N1,N2,V1 and V2 for which this change is ok but eventually when support is added for next versions of N/V series which do not require splitting `msub` into `mul` and `sub` (for ex. V3) then this check will have to be modified as V3 will need to be included in the "neoverse_family". Maybe a separate function here which checks for only those N/V series where this change will benefit. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19093#discussion_r1593664033 From mli at openjdk.org Wed May 8 08:53:22 2024 From: mli at openjdk.org (Hamlin Li) Date: Wed, 8 May 2024 08:53:22 GMT Subject: RFR: 8320995: RISC-V: C2 PopCountVI [v5] In-Reply-To: References: Message-ID: <0aDSi8YyhH9uWXdFzoO2wm2Px1s3YBEFsxCGA-LGmTs=.496a9694-797a-415d-bd11-067d4e55e29e@github.com> > Hi, > Can you help to review this patch? > Both auto-vect and vector api depends on this intrinsic. > Thanks! > > ## Performance > Not performance test was done, as this depends on vcpop.v instruction in zvbb extension and the code seqeunce is rather simple than non-intrinsic version. Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: minor fix ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19065/files - new: https://git.openjdk.org/jdk/pull/19065/files/cbfde208..73abb9f1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19065&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19065&range=03-04 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/19065.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19065/head:pull/19065 PR: https://git.openjdk.org/jdk/pull/19065 From mli at openjdk.org Wed May 8 08:53:22 2024 From: mli at openjdk.org (Hamlin Li) Date: Wed, 8 May 2024 08:53:22 GMT Subject: RFR: 8320995: RISC-V: C2 PopCountVI [v4] In-Reply-To: References: Message-ID: On Wed, 8 May 2024 00:06:23 GMT, Fei Yang wrote: >> Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: >> >> minor fix > > test/hotspot/jtreg/compiler/vectorization/TestPopCountVectorLong.java line 30: > >> 28: * @requires ((os.arch=="x86" | os.arch=="i386" | os.arch=="amd64" | os.arch=="x86_64") & vm.cpu.features ~= ".*avx512bw.*") | >> 29: * os.simpleArch == "aarch64" | >> 30: * (os.arch == "riscv64" & vm.cpu.features ~= ".*zvbb,.*") > > Suggestion: `(os.arch == "riscv64" & vm.cpu.features ~= ".*zvbb.*")` > The comma should not be there. See: https://bugs.openjdk.org/browse/JDK-8327689 Thanks, fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19065#discussion_r1593664140 From aph at openjdk.org Wed May 8 09:10:56 2024 From: aph at openjdk.org (Andrew Haley) Date: Wed, 8 May 2024 09:10:56 GMT Subject: RFR: 8331098: [Aarch64] Fix crash in Arrays.equals() intrinsic with -CCP [v7] In-Reply-To: <0nktj7UneZcBYuVA5uR-3JgYn1bo-H3Cpc4lIR93zeI=.603ba9f3-5480-4e55-a696-b62f9c299722@github.com> References: <_HzINQ0atD5BmBbIZ6A4A5y1wNvwsvrBxAiaz2Mk9rY=.43cde0ae-1179-4708-afa1-fda64039d722@github.com> <0nktj7UneZcBYuVA5uR-3JgYn1bo-H3Cpc4lIR93zeI=.603ba9f3-5480-4e55-a696-b62f9c299722@github.com> Message-ID: On Wed, 8 May 2024 08:24:18 GMT, Roman Kennke wrote: >> The implementations of Arrays.equals() in macroAssembler_aarch64.cpp, MacroAssembler::arrays_equals() assumes that the start of arrays is 8-byte-aligned. Since [JDK-8139457](https://bugs.openjdk.org/browse/JDK-8139457) this is no longer the case, at least when running with -CompressedClassPointers (or Lilliput). The effect is that the loops may run over the array end, and if the array is at heap boundary, and that memory is unmapped, then it may crash. >> >> The proposed fix aims to always enter the main loop(s) with an aligned address: >> - When the array base is 8-byte-aligned (default, with +CCP), then compare the array lengths separately, then enter the main loop with the array base. >> - When the array base is not 8-byte-aligned (-CCP and Lilliput), then enter the loop with the address of the array-length (which is then 8-byte-aligned), and compare array lengths in the main loop, and elide the explicit array lengths comparison. >> >> Testing: >> - [x] tier1 (+CCP) >> - [x] tier1 (-CCP) >> - [x] tier2 (+CCP) >> - [x] tier2 (-CCP) > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > @xmas92 review Great! This is the best patch. src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp line 5549: > 5547: = arrayOopDesc::base_offset_in_bytes(elem_size == 2 ? T_CHAR : T_BYTE); > 5548: // When the length offset is not aligned to 8 bytes, > 5549: // then we align it down, this is valid as the new Suggestion: // When the length offset is not aligned to 8 bytes, // then we align it down. This is valid because the new Correlation is not causation. ------------- Marked as reviewed by aph (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18948#pullrequestreview-2045094225 PR Review Comment: https://git.openjdk.org/jdk/pull/18948#discussion_r1593694440 From aph at openjdk.org Wed May 8 09:17:54 2024 From: aph at openjdk.org (Andrew Haley) Date: Wed, 8 May 2024 09:17:54 GMT Subject: RFR: 8331558: AArch64: optimize integer remainder [v3] In-Reply-To: References: Message-ID: On Wed, 8 May 2024 08:49:00 GMT, Bhavana Kilambi wrote: >> Jin Guojie has updated the pull request incrementally with one additional commit since the last revision: >> >> Applicable platforms expanded to the entire neoverse family >> >> Even on the V series (V1 and V2), both sdiv/udiv and msub instructions are executed in M0 unit (Integer multi cycle). It should benefit the V series as well. > > src/hotspot/cpu/aarch64/macroAssembler_aarch64.hpp line 448: > >> 446: if (VM_Version::supports_a53mac() && Ra != zr) >> 447: nop(); >> 448: if (VM_Version::is_neoverse_family()) { > > Thanks for changing. However, currently neoverse_family includes N1,N2,V1 and V2 for which this change is ok but eventually when support is added for next versions of N/V series which do not require splitting `msub` into `mul` and `sub` (for ex. V3) then this check will have to be modified as V3 will need to be included in the "neoverse_family". Maybe a separate function here which checks for only those N/V series where this change will benefit. Let's not try to guess about the performance of future processors. `is_neoverse_family()` fine for now, because the performance issue affects all known Neoverse processors. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19093#discussion_r1593708588 From aph at openjdk.org Wed May 8 09:17:55 2024 From: aph at openjdk.org (Andrew Haley) Date: Wed, 8 May 2024 09:17:55 GMT Subject: RFR: 8331558: AArch64: optimize integer remainder [v3] In-Reply-To: References: Message-ID: <_jCui--l1X46FvnnL2rhvpToPy9PBGLmIfq0XphgGSk=.7a22ad35-f8cb-46a9-9715-1ad735aca5c7@github.com> On Wed, 8 May 2024 01:04:37 GMT, Jin Guojie wrote: >> 8331558: AArch64: optimize integer remainder >> On some Arm processors, a separate multiply/subtract is actually faster than the combined instruction. >> >> 8331556: AArch64: CPU_Model support for Neoverse N1/N2/V1/V2 >> Add full platform coverage for Neoverse variants in vm_version.?pp >> >> The following test has passed, which shows definite performance improvement. >> >> make test TEST="micro:java.lang.IntegerDivMod" >> make test TEST="micro:java.lang.LongDivMod" >> >> * IntegerDivMod.testDivideRemainderUnsigned >> baseline(ns/ops) 2223 >> with this pacth(ns/ops) 1885 >> improvement(%) 17.93% >> >> * IntegerDivMod.testRemainderUnsigned >> baseline(ns/ops) 2225 >> with this pacth(ns/ops) 1885 >> improvement(%) 18.03% >> >> * LongDivMod.testDivideRemainderUnsigned >> baseline(ns/ops) 2231 >> with this pacth(ns/ops) 1894 >> improvement(%) 17.79% >> >> * LongDivMod.testRemainderUnsigned >> baseline(ns/ops) 2232 >> with this pacth(ns/ops) 1891 >> improvement(%) 18.03% > > Jin Guojie has updated the pull request incrementally with one additional commit since the last revision: > > Applicable platforms expanded to the entire neoverse family > > Even on the V series (V1 and V2), both sdiv/udiv and msub instructions are executed in M0 unit (Integer multi cycle). It should benefit the V series as well. src/hotspot/cpu/aarch64/macroAssembler_aarch64.hpp line 465: > 463: /* On Neoverse, MSUB uses the same ALU with SDIV. > 464: * The combination of MUL/SUB can utilize multiple ALUs, > 465: * and is much faster than MSUB. */ Suggestion: * The combination of MUL/SUB can utilize multiple ALUs, * and can be somewhat faster than MSUB. */ ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19093#discussion_r1593709793 From rkennke at openjdk.org Wed May 8 09:18:07 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Wed, 8 May 2024 09:18:07 GMT Subject: RFR: 8331098: [Aarch64] Fix crash in Arrays.equals() intrinsic with -CCP [v8] In-Reply-To: <_HzINQ0atD5BmBbIZ6A4A5y1wNvwsvrBxAiaz2Mk9rY=.43cde0ae-1179-4708-afa1-fda64039d722@github.com> References: <_HzINQ0atD5BmBbIZ6A4A5y1wNvwsvrBxAiaz2Mk9rY=.43cde0ae-1179-4708-afa1-fda64039d722@github.com> Message-ID: > The implementations of Arrays.equals() in macroAssembler_aarch64.cpp, MacroAssembler::arrays_equals() assumes that the start of arrays is 8-byte-aligned. Since [JDK-8139457](https://bugs.openjdk.org/browse/JDK-8139457) this is no longer the case, at least when running with -CompressedClassPointers (or Lilliput). The effect is that the loops may run over the array end, and if the array is at heap boundary, and that memory is unmapped, then it may crash. > > The proposed fix aims to always enter the main loop(s) with an aligned address: > - When the array base is 8-byte-aligned (default, with +CCP), then compare the array lengths separately, then enter the main loop with the array base. > - When the array base is not 8-byte-aligned (-CCP and Lilliput), then enter the loop with the address of the array-length (which is then 8-byte-aligned), and compare array lengths in the main loop, and elide the explicit array lengths comparison. > > Testing: > - [x] tier1 (+CCP) > - [x] tier1 (-CCP) > - [x] tier2 (+CCP) > - [x] tier2 (-CCP) Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Update src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp Co-authored-by: Andrew Haley ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18948/files - new: https://git.openjdk.org/jdk/pull/18948/files/88da6b5d..84f9a933 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18948&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18948&range=06-07 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/18948.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18948/head:pull/18948 PR: https://git.openjdk.org/jdk/pull/18948 From aph at openjdk.org Wed May 8 09:24:53 2024 From: aph at openjdk.org (Andrew Haley) Date: Wed, 8 May 2024 09:24:53 GMT Subject: RFR: 8331558: AArch64: optimize integer remainder [v3] In-Reply-To: References: Message-ID: <71pyAhART1ZJIp59TScsQcLadK8XnF0lxYZgyi5JLP4=.14be120b-99b7-4451-8534-d7a2e2c9691f@github.com> On Wed, 8 May 2024 01:04:37 GMT, Jin Guojie wrote: >> 8331558: AArch64: optimize integer remainder >> On some Arm processors, a separate multiply/subtract is actually faster than the combined instruction. >> >> 8331556: AArch64: CPU_Model support for Neoverse N1/N2/V1/V2 >> Add full platform coverage for Neoverse variants in vm_version.?pp >> >> The following test has passed, which shows definite performance improvement. >> >> make test TEST="micro:java.lang.IntegerDivMod" >> make test TEST="micro:java.lang.LongDivMod" >> >> * IntegerDivMod.testDivideRemainderUnsigned >> baseline(ns/ops) 2223 >> with this pacth(ns/ops) 1885 >> improvement(%) 17.93% >> >> * IntegerDivMod.testRemainderUnsigned >> baseline(ns/ops) 2225 >> with this pacth(ns/ops) 1885 >> improvement(%) 18.03% >> >> * LongDivMod.testDivideRemainderUnsigned >> baseline(ns/ops) 2231 >> with this pacth(ns/ops) 1894 >> improvement(%) 17.79% >> >> * LongDivMod.testRemainderUnsigned >> baseline(ns/ops) 2232 >> with this pacth(ns/ops) 1891 >> improvement(%) 18.03% > > Jin Guojie has updated the pull request incrementally with one additional commit since the last revision: > > Applicable platforms expanded to the entire neoverse family > > Even on the V series (V1 and V2), both sdiv/udiv and msub instructions are executed in M0 unit (Integer multi cycle). It should benefit the V series as well. src/hotspot/cpu/aarch64/vm_version_aarch64.hpp line 169: > 167: return _cpu == CPU_ARM > 168: && (model_is(CPU_MODEL_NEOVERSE_N1) || model_is(CPU_MODEL_NEOVERSE_N2) || > 169: model_is(CPU_MODEL_NEOVERSE_V1) || model_is(CPU_MODEL_NEOVERSE_V2)); What is this, programming made difficult? Suggestion: switch(_cpu) { case CPU_MODEL_NEOVERSE_N1: case CPU_MODEL_NEOVERSE_N2: case CPU_MODEL_NEOVERSE_V1: case CPU_MODEL_NEOVERSE_V2: return true; default: return false; } ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19093#discussion_r1593718631 From aph at openjdk.org Wed May 8 09:32:54 2024 From: aph at openjdk.org (Andrew Haley) Date: Wed, 8 May 2024 09:32:54 GMT Subject: RFR: 8331558: AArch64: optimize integer remainder [v2] In-Reply-To: References: <10lMuTPPge4MQ4zMaDTw_Oyt4vPDN5DSReV2RW6rkIU=.54435524-fe72-4260-90b6-c7872ba1dacb@github.com> Message-ID: On Wed, 8 May 2024 08:44:21 GMT, Bhavana Kilambi wrote: >>> Why only Neoverse N series? Even on the V series (V1 and V2), both `sdiv/udiv` and `msub` instructions are executed in M0 unit (Integer multi cycle). It should benefit the V series as well. Source: https://developer.arm.com/documentation/pjdoc466751330-9685/latest/ and https://developer.arm.com/documentation/PJDOC-466751330-593177/latest/ >>> >>> A quick run on a V1 machine shows ~15% performance gain for the `IntegerDivMod` tests if we generate separate `mul` and `sub` instructions instead of a single `msub`. >> >> Not sure if this can benefit V3, since MSUB can use M rather than M0. > > Agreed. MSUB can use either M0/M1 in case of V3. Also N3 is the same as N1/N2, so that's ok. I think the changes made right now, might be ok for the currently available N/V series but when support is added in the JDK for the next versions (like N3, V3, ... Nx, Vx) then these changes may or may not be applicable. It might be better to either test for specific cpu's instead of checking for "neoverse_family" (because when V3 support is added and is included in the neoverse_family, then this check will have to be modified) or have a separate method which tests if the given CPU matches N1,N2,V1,V2. In the future, it's just a matter of adding more versions to this method as and when applicable. And it might also be that separate mul/sub neither helps nor hurts V3. Let's not over-engineer this patch. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19093#discussion_r1593731708 From fyang at openjdk.org Wed May 8 09:32:55 2024 From: fyang at openjdk.org (Fei Yang) Date: Wed, 8 May 2024 09:32:55 GMT Subject: RFR: 8320995: RISC-V: C2 PopCountVI [v5] In-Reply-To: <0aDSi8YyhH9uWXdFzoO2wm2Px1s3YBEFsxCGA-LGmTs=.496a9694-797a-415d-bd11-067d4e55e29e@github.com> References: <0aDSi8YyhH9uWXdFzoO2wm2Px1s3YBEFsxCGA-LGmTs=.496a9694-797a-415d-bd11-067d4e55e29e@github.com> Message-ID: On Wed, 8 May 2024 08:53:22 GMT, Hamlin Li wrote: >> Hi, >> Can you help to review this patch? >> Both auto-vect and vector api depends on this intrinsic. >> Thanks! >> >> ## Performance >> Not performance test was done, as this depends on vcpop.v instruction in zvbb extension and the code seqeunce is rather simple than non-intrinsic version. > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > minor fix Marked as reviewed by fyang (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/19065#pullrequestreview-2045152467 From mli at openjdk.org Wed May 8 09:39:59 2024 From: mli at openjdk.org (Hamlin Li) Date: Wed, 8 May 2024 09:39:59 GMT Subject: RFR: 8320995: RISC-V: C2 PopCountVI [v4] In-Reply-To: References: Message-ID: On Tue, 7 May 2024 16:52:40 GMT, Ludovic Henry wrote: >> Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: >> >> minor fix > > Marked as reviewed by luhenry (Committer). Thanks @luhenry @RealFYang for your reviewing. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19065#issuecomment-2100174248 From mli at openjdk.org Wed May 8 09:40:00 2024 From: mli at openjdk.org (Hamlin Li) Date: Wed, 8 May 2024 09:40:00 GMT Subject: Integrated: 8320995: RISC-V: C2 PopCountVI In-Reply-To: References: Message-ID: On Thu, 2 May 2024 14:55:43 GMT, Hamlin Li wrote: > Hi, > Can you help to review this patch? > Both auto-vect and vector api depends on this intrinsic. > Thanks! > > ## Performance > Not performance test was done, as this depends on vcpop.v instruction in zvbb extension and the code seqeunce is rather simple than non-intrinsic version. This pull request has now been integrated. Changeset: 1aebab78 Author: Hamlin Li URL: https://git.openjdk.org/jdk/commit/1aebab780c5b84a85b6f10884d05bb29bae3c3bf Stats: 87 lines in 9 files changed: 75 ins; 0 del; 12 mod 8320995: RISC-V: C2 PopCountVI 8320996: RISC-V: C2 PopCountVL Reviewed-by: luhenry, fyang ------------- PR: https://git.openjdk.org/jdk/pull/19065 From dlong at openjdk.org Wed May 8 10:01:55 2024 From: dlong at openjdk.org (Dean Long) Date: Wed, 8 May 2024 10:01:55 GMT Subject: RFR: 8314225: SIGSEGV in JavaThread::is_lock_owned [v10] In-Reply-To: References: Message-ID: On Wed, 8 May 2024 08:30:30 GMT, Kevin Walls wrote: >> Removal of JavaThread's MonitorChunks member. This held lock information during deoptimization, but access to it is unnecessary for anything other than the deoptimization itself. >> >> Access to it in is_lock_owned() was racy, and caused rare crashes. > > Kevin Walls has updated the pull request incrementally with one additional commit since the last revision: > > null nullptr oops src/hotspot/share/runtime/vframeArray.cpp line 95: > 93: BasicObjectLock* dest = _monitors->at(index); > 94: if (monitor->owner_is_scalar_replaced()) { > 95: dest->set_obj(nullptr); It looks like there is an existing bug that allows dest->lock() to be uninitialized here, which could cause problems later on in unpack_on_stack if move_to sees a "neutral" value and tries to inflate. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18940#discussion_r1593771246 From ayang at openjdk.org Wed May 8 10:04:15 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Wed, 8 May 2024 10:04:15 GMT Subject: RFR: 8331557: Serial: Refactor SerialHeap::do_collection [v5] In-Reply-To: References: Message-ID: <3KivzgORzLhAreonPr-CJki3nXgPznlKMpqI4fQCWuk=.f44a8b19-56e6-4809-ac7b-659d700407af@github.com> > It's probably easier to read the new code directly. The two classes in `serialVMOperations` serve as entrance points to invoke young/full GCs. Some previously hidden decisions are made more obvious, e.g. if a young-gc fails (or will probablly fail), fallback to full-gc. > > Additionally, `StatRecord` is removed, because this kind of info-aggregation should be done outsite VM (by third-party tool). > > Test: tier1-6 Albert Mingkun Yang has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits: - Merge branch 'master' into s1-do-collect - merge - review - Merge branch 'master' into s1-do-collect - s1-do-collect ------------- Changes: https://git.openjdk.org/jdk/pull/19056/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19056&range=04 Stats: 566 lines in 15 files changed: 125 ins; 356 del; 85 mod Patch: https://git.openjdk.org/jdk/pull/19056.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19056/head:pull/19056 PR: https://git.openjdk.org/jdk/pull/19056 From dholmes at openjdk.org Wed May 8 10:15:56 2024 From: dholmes at openjdk.org (David Holmes) Date: Wed, 8 May 2024 10:15:56 GMT Subject: RFR: 8331208: Memory stress test that checks OutOfMemoryError stack trace fails [v3] In-Reply-To: References: Message-ID: On Thu, 2 May 2024 09:08:34 GMT, Doug Simon wrote: >> This pull request mitigates failures in memory stress tests that check the stack trace of an `OutOfMemoryError` for certain expected entries. >> >> The stack trace of an OOME will [not be allocated once all preallocated OOMEs are used up](https://github.com/openjdk/jdk/blob/3d5eeac3a38ece4a23ea6da2dfe5939d64e81cea/src/hotspot/share/memory/universe.cpp#L722). If the only heap allocations performed in stressful conditions are those of the stress test, then the [4 preallocated OOMEs](https://github.com/openjdk/jdk/blob/f1d0e715b67e2ca47b525069d8153abbb33f75b9/src/hotspot/share/runtime/globals.hpp#L800) would be sufficient. However, it's possible for VM internal allocations to also occur during stressful conditions, especially in `-Xcomp` mode. For example, [CompileBroker::compile_method](https://github.com/openjdk/jdk/blob/3d5eeac3a38ece4a23ea6da2dfe5939d64e81cea/src/hotspot/share/compiler/compileBroker.cpp#L1399) will try to resolve the string constants in the constant pool of the method about to be compiled. This can fail as shown here: >> >> V [jvm.dll+0x62c23a] Exceptions::_throw+0x11a (exceptions.cpp:168) >> V [jvm.dll+0x62d85b] Exceptions::_throw_oop+0xab (exceptions.cpp:140) >> V [jvm.dll+0xbbce78] MemAllocator::Allocation::check_out_of_memory+0x208 (memAllocator.cpp:138) >> V [jvm.dll+0xbbcac8] MemAllocator::allocate+0x158 (memAllocator.cpp:377) >> V [jvm.dll+0x79bd05] InstanceKlass::allocate_instance+0x95 (instanceKlass.cpp:1509) >> V [jvm.dll+0x7ddeed] java_lang_String::basic_create+0x9d (javaClasses.cpp:273) >> V [jvm.dll+0x7e43c0] java_lang_String::create_from_unicode+0x60 (javaClasses.cpp:291) >> V [jvm.dll+0xdb91a5] StringTable::do_intern+0xb5 (stringTable.cpp:379) >> V [jvm.dll+0xdba9f2] StringTable::intern+0x1b2 (stringTable.cpp:368) >> V [jvm.dll+0xdbaaa6] StringTable::intern+0x86 (stringTable.cpp:328) >> V [jvm.dll+0x51c8b1] ConstantPool::string_at_impl+0x1d1 (constantPool.cpp:1251) >> V [jvm.dll+0x51b95b] ConstantPool::resolve_string_constants_impl+0xeb (constantPool.cpp:800) >> V [jvm.dll+0x4f2f8d] CompileBroker::compile_method+0x31d (compileBroker.cpp:1395) >> V [jvm.dll+0x4f3474] CompileBroker::compile_method+0xc4 (compileBroker.cpp:1348) >> >> These internal allocations can occur before the allocations of the test and thus use up the pre-allocated OOMEs. As a result, the OOMEs triggered by the stress test may end up throwing the [default, shared OOME instance](https://github.com/openjdk/jdk/blob/3d5eeac3a38ec... > > Doug Simon has updated the pull request incrementally with one additional commit since the last revision: > > addressed review comments and suggestions Updates look good. Thanks ------------- Marked as reviewed by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18925#pullrequestreview-2045242573 From jsjolen at openjdk.org Wed May 8 10:20:26 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Wed, 8 May 2024 10:20:26 GMT Subject: RFR: 8312132: Add tracking of multiple address spaces in NMT [v68] In-Reply-To: References: Message-ID: > Hi, > > This PR introduces a new abstraction to NMT, named `MemoryFileTracker`. Today, NMT does not track any memory outside of the virtual memory address space. This means that if you allocated memory in something such as a memory-backed file and use `mmap` to map into that memory, then you'll have trouble reporting this to NMT. This is the situation that ZGC is in, and that is what this patch attempts to fix. > > ## `MemoryFileTracker` > > The `MemoryFileTracker` adds the ability of adding new virtual memory address spaces to NMT and committing memory to these, the basic API is: > > ```c++ > static MemoryFile* make_device(const char* descriptive_name); > static void free_device(MemoryFile* device); > > static void allocate_memory(MemoryFile* device, size_t offset, size_t size, > MEMFLAGS flag, const NativeCallStack& stack); > static void free_memory(MemoryFile* device, size_t offset, size_t size); > > > It is easiest to see how this is used by looking at what ZGC's `ZNMT` class does: > > ```c++ > void ZNMT::reserve(zaddress_unsafe start, size_t size) { > MemTracker::record_virtual_memory_reserve((address)start, size, CALLER_PC, mtJavaHeap); > } > void ZNMT::commit(zoffset offset, size_t size) { > MemTracker::allocate_memory_in(ZNMT::_device, static_cast(offset), size, mtJavaHeap, CALLER_PC); > } > void ZNMT::uncommit(zoffset offset, size_t size) { > MemTracker::free_memory_in(ZNMT::_device, (size_t)offset, size); > } > > void ZNMT::map(zaddress_unsafe addr, size_t size, zoffset offset) { > // NMT doesn't track mappings at the moment. > } > void ZNMT::unmap(zaddress_unsafe addr, size_t size) { > // NMT doesn't track mappings at the moment. > } > > > As you can see, any mapping between reserved regions and device-allocated memory is not recorded in NMT. This means that in detailed mode you only get reserved regions printed for the reserved memory, the device-allocated memory is reported separately. When performing summary reporting any memory allocated via these devices is added to the corresponding `MEMFLAGS` as `committed` memory. > > This patch is also acting as a base on which we deploy multiple new backend ideas to NMT. These ideas are: > > 1. Implement VMA tracking using a balanced binary tree approach. Today's `VirtualMemoryTracker`'s usage of linked lists is slow and brittle, we'd like to move away from it. Our Treap-based approach in this patch gives a performance boost such that we see 25x better performance in a benchmark. The idea and draft of this... Johan Sj?len has updated the pull request incrementally with three additional commits since the last revision: - Style - Make tree private in VMATree - Only compile verify_self for ASSERT ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18289/files - new: https://git.openjdk.org/jdk/pull/18289/files/78b75213..693423ed Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18289&range=67 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18289&range=66-67 Stats: 27 lines in 4 files changed: 6 ins; 1 del; 20 mod Patch: https://git.openjdk.org/jdk/pull/18289.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18289/head:pull/18289 PR: https://git.openjdk.org/jdk/pull/18289 From dnsimon at openjdk.org Wed May 8 10:21:02 2024 From: dnsimon at openjdk.org (Doug Simon) Date: Wed, 8 May 2024 10:21:02 GMT Subject: RFR: 8331208: Memory stress test that checks OutOfMemoryError stack trace fails [v3] In-Reply-To: References: Message-ID: On Thu, 2 May 2024 09:08:34 GMT, Doug Simon wrote: >> This pull request mitigates failures in memory stress tests that check the stack trace of an `OutOfMemoryError` for certain expected entries. >> >> The stack trace of an OOME will [not be allocated once all preallocated OOMEs are used up](https://github.com/openjdk/jdk/blob/3d5eeac3a38ece4a23ea6da2dfe5939d64e81cea/src/hotspot/share/memory/universe.cpp#L722). If the only heap allocations performed in stressful conditions are those of the stress test, then the [4 preallocated OOMEs](https://github.com/openjdk/jdk/blob/f1d0e715b67e2ca47b525069d8153abbb33f75b9/src/hotspot/share/runtime/globals.hpp#L800) would be sufficient. However, it's possible for VM internal allocations to also occur during stressful conditions, especially in `-Xcomp` mode. For example, [CompileBroker::compile_method](https://github.com/openjdk/jdk/blob/3d5eeac3a38ece4a23ea6da2dfe5939d64e81cea/src/hotspot/share/compiler/compileBroker.cpp#L1399) will try to resolve the string constants in the constant pool of the method about to be compiled. This can fail as shown here: >> >> V [jvm.dll+0x62c23a] Exceptions::_throw+0x11a (exceptions.cpp:168) >> V [jvm.dll+0x62d85b] Exceptions::_throw_oop+0xab (exceptions.cpp:140) >> V [jvm.dll+0xbbce78] MemAllocator::Allocation::check_out_of_memory+0x208 (memAllocator.cpp:138) >> V [jvm.dll+0xbbcac8] MemAllocator::allocate+0x158 (memAllocator.cpp:377) >> V [jvm.dll+0x79bd05] InstanceKlass::allocate_instance+0x95 (instanceKlass.cpp:1509) >> V [jvm.dll+0x7ddeed] java_lang_String::basic_create+0x9d (javaClasses.cpp:273) >> V [jvm.dll+0x7e43c0] java_lang_String::create_from_unicode+0x60 (javaClasses.cpp:291) >> V [jvm.dll+0xdb91a5] StringTable::do_intern+0xb5 (stringTable.cpp:379) >> V [jvm.dll+0xdba9f2] StringTable::intern+0x1b2 (stringTable.cpp:368) >> V [jvm.dll+0xdbaaa6] StringTable::intern+0x86 (stringTable.cpp:328) >> V [jvm.dll+0x51c8b1] ConstantPool::string_at_impl+0x1d1 (constantPool.cpp:1251) >> V [jvm.dll+0x51b95b] ConstantPool::resolve_string_constants_impl+0xeb (constantPool.cpp:800) >> V [jvm.dll+0x4f2f8d] CompileBroker::compile_method+0x31d (compileBroker.cpp:1395) >> V [jvm.dll+0x4f3474] CompileBroker::compile_method+0xc4 (compileBroker.cpp:1348) >> >> These internal allocations can occur before the allocations of the test and thus use up the pre-allocated OOMEs. As a result, the OOMEs triggered by the stress test may end up throwing the [default, shared OOME instance](https://github.com/openjdk/jdk/blob/3d5eeac3a38ec... > > Doug Simon has updated the pull request incrementally with one additional commit since the last revision: > > addressed review comments and suggestions Thanks for the reviews. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18925#issuecomment-2100246194 From dnsimon at openjdk.org Wed May 8 10:21:03 2024 From: dnsimon at openjdk.org (Doug Simon) Date: Wed, 8 May 2024 10:21:03 GMT Subject: Integrated: 8331208: Memory stress test that checks OutOfMemoryError stack trace fails In-Reply-To: References: Message-ID: On Tue, 23 Apr 2024 21:11:53 GMT, Doug Simon wrote: > This pull request mitigates failures in memory stress tests that check the stack trace of an `OutOfMemoryError` for certain expected entries. > > The stack trace of an OOME will [not be allocated once all preallocated OOMEs are used up](https://github.com/openjdk/jdk/blob/3d5eeac3a38ece4a23ea6da2dfe5939d64e81cea/src/hotspot/share/memory/universe.cpp#L722). If the only heap allocations performed in stressful conditions are those of the stress test, then the [4 preallocated OOMEs](https://github.com/openjdk/jdk/blob/f1d0e715b67e2ca47b525069d8153abbb33f75b9/src/hotspot/share/runtime/globals.hpp#L800) would be sufficient. However, it's possible for VM internal allocations to also occur during stressful conditions, especially in `-Xcomp` mode. For example, [CompileBroker::compile_method](https://github.com/openjdk/jdk/blob/3d5eeac3a38ece4a23ea6da2dfe5939d64e81cea/src/hotspot/share/compiler/compileBroker.cpp#L1399) will try to resolve the string constants in the constant pool of the method about to be compiled. This can fail as shown here: > > V [jvm.dll+0x62c23a] Exceptions::_throw+0x11a (exceptions.cpp:168) > V [jvm.dll+0x62d85b] Exceptions::_throw_oop+0xab (exceptions.cpp:140) > V [jvm.dll+0xbbce78] MemAllocator::Allocation::check_out_of_memory+0x208 (memAllocator.cpp:138) > V [jvm.dll+0xbbcac8] MemAllocator::allocate+0x158 (memAllocator.cpp:377) > V [jvm.dll+0x79bd05] InstanceKlass::allocate_instance+0x95 (instanceKlass.cpp:1509) > V [jvm.dll+0x7ddeed] java_lang_String::basic_create+0x9d (javaClasses.cpp:273) > V [jvm.dll+0x7e43c0] java_lang_String::create_from_unicode+0x60 (javaClasses.cpp:291) > V [jvm.dll+0xdb91a5] StringTable::do_intern+0xb5 (stringTable.cpp:379) > V [jvm.dll+0xdba9f2] StringTable::intern+0x1b2 (stringTable.cpp:368) > V [jvm.dll+0xdbaaa6] StringTable::intern+0x86 (stringTable.cpp:328) > V [jvm.dll+0x51c8b1] ConstantPool::string_at_impl+0x1d1 (constantPool.cpp:1251) > V [jvm.dll+0x51b95b] ConstantPool::resolve_string_constants_impl+0xeb (constantPool.cpp:800) > V [jvm.dll+0x4f2f8d] CompileBroker::compile_method+0x31d (compileBroker.cpp:1395) > V [jvm.dll+0x4f3474] CompileBroker::compile_method+0xc4 (compileBroker.cpp:1348) > > These internal allocations can occur before the allocations of the test and thus use up the pre-allocated OOMEs. As a result, the OOMEs triggered by the stress test may end up throwing the [default, shared OOME instance](https://github.com/openjdk/jdk/blob/3d5eeac3a38ece4a23ea6da2dfe5939d64e81cea/src/hotspot/... This pull request has now been integrated. Changeset: aafa15fc Author: Doug Simon URL: https://git.openjdk.org/jdk/commit/aafa15fc173af07ebf5361a8c6a09c2a28981c38 Stats: 108 lines in 11 files changed: 54 ins; 40 del; 14 mod 8331208: Memory stress test that checks OutOfMemoryError stack trace fails Reviewed-by: dholmes, never ------------- PR: https://git.openjdk.org/jdk/pull/18925 From mli at openjdk.org Wed May 8 10:21:18 2024 From: mli at openjdk.org (Hamlin Li) Date: Wed, 8 May 2024 10:21:18 GMT Subject: RFR: 8322753: RISC-V: C2 ReverseBytesV [v2] In-Reply-To: References: Message-ID: > Hi, > Can you review this patch to add ReverseBytesV intrinsic? > Thanks. Hamlin Li has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains seven commits: - minor fix - merge master - remove reverse bits - fix test filter - fix zvbb flag; fix tests - merge master - ReverseV/ReverseBytesV: Initial Commit ------------- Changes: https://git.openjdk.org/jdk/pull/19120/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19120&range=01 Stats: 38 lines in 7 files changed: 34 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/19120.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19120/head:pull/19120 PR: https://git.openjdk.org/jdk/pull/19120 From fyang at openjdk.org Wed May 8 10:23:54 2024 From: fyang at openjdk.org (Fei Yang) Date: Wed, 8 May 2024 10:23:54 GMT Subject: RFR: 8326306: RISC-V: Re-structure MASM calls and jumps [v8] In-Reply-To: References: Message-ID: On Wed, 8 May 2024 06:25:16 GMT, Robbin Ehn wrote: >> Hmm... So I did a quick try on linux-aarch64 invoking `CodeCache::contains` on slow_case_addr and the result is false. Anything I missed? > > The JNI_FastGetField::generate_fast_get_XXX_field0 write the code in a CodeBuffer, this where slow_case_addr points to. > > As I test with ReservedCodeCacheSize=2047M if we generated a li() relocation would fail. (there is usually around 120 MB between them) > > Added your assert, it passes also. > > ExternalAddress target(slow_case_addr); > + assert(CodeCache::contains(slow_case_addr), "Must be"); > __ relocate(target.rspec(), [&] { > > > > I guess you are running on apple, the code concerning "static_fast_get_field_wrapper" smells. > Maybe you found a bug here. Try this: `make test TEST="runtime/jni/FastGetField/FastGetField.java"` The assertion failed on both linux-aarch64 and linux-riscv64 platforms. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18942#discussion_r1593795773 From mli at openjdk.org Wed May 8 10:24:56 2024 From: mli at openjdk.org (Hamlin Li) Date: Wed, 8 May 2024 10:24:56 GMT Subject: RFR: 8322753: RISC-V: C2 ReverseBytesV [v2] In-Reply-To: References: <71dgzGhNrtu95zP0OtGxZ-cxJ3kwGcV2lbF8oHcjeBM=.05f90142-3bbb-49d5-b2a2-c41408a90b19@github.com> Message-ID: On Tue, 7 May 2024 20:02:32 GMT, Hamlin Li wrote: >> src/hotspot/cpu/riscv/globals_riscv.hpp line 118: >> >>> 116: product(bool, UseZihintpause, false, EXPERIMENTAL, \ >>> 117: "Use Zihintpause instructions") \ >>> 118: product(bool, UseZvbb, false, "Use Zvbb instructions") \ >> >> That'll conflict with https://github.com/openjdk/jdk/pull/19065, but same, we'd want to have `EXPERIMENTAL` > > Yes, this will be fixed after https://github.com/openjdk/jdk/pull/19065 by merging from master. Now, merged/fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19120#discussion_r1593797197 From jsjolen at openjdk.org Wed May 8 10:25:14 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Wed, 8 May 2024 10:25:14 GMT Subject: RFR: 8312132: Add tracking of multiple address spaces in NMT [v69] In-Reply-To: References: Message-ID: > Hi, > > This PR introduces a new abstraction to NMT, named `MemoryFileTracker`. Today, NMT does not track any memory outside of the virtual memory address space. This means that if you allocated memory in something such as a memory-backed file and use `mmap` to map into that memory, then you'll have trouble reporting this to NMT. This is the situation that ZGC is in, and that is what this patch attempts to fix. > > ## `MemoryFileTracker` > > The `MemoryFileTracker` adds the ability of adding new virtual memory address spaces to NMT and committing memory to these, the basic API is: > > ```c++ > static MemoryFile* make_device(const char* descriptive_name); > static void free_device(MemoryFile* device); > > static void allocate_memory(MemoryFile* device, size_t offset, size_t size, > MEMFLAGS flag, const NativeCallStack& stack); > static void free_memory(MemoryFile* device, size_t offset, size_t size); > > > It is easiest to see how this is used by looking at what ZGC's `ZNMT` class does: > > ```c++ > void ZNMT::reserve(zaddress_unsafe start, size_t size) { > MemTracker::record_virtual_memory_reserve((address)start, size, CALLER_PC, mtJavaHeap); > } > void ZNMT::commit(zoffset offset, size_t size) { > MemTracker::allocate_memory_in(ZNMT::_device, static_cast(offset), size, mtJavaHeap, CALLER_PC); > } > void ZNMT::uncommit(zoffset offset, size_t size) { > MemTracker::free_memory_in(ZNMT::_device, (size_t)offset, size); > } > > void ZNMT::map(zaddress_unsafe addr, size_t size, zoffset offset) { > // NMT doesn't track mappings at the moment. > } > void ZNMT::unmap(zaddress_unsafe addr, size_t size) { > // NMT doesn't track mappings at the moment. > } > > > As you can see, any mapping between reserved regions and device-allocated memory is not recorded in NMT. This means that in detailed mode you only get reserved regions printed for the reserved memory, the device-allocated memory is reported separately. When performing summary reporting any memory allocated via these devices is added to the corresponding `MEMFLAGS` as `committed` memory. > > This patch is also acting as a base on which we deploy multiple new backend ideas to NMT. These ideas are: > > 1. Implement VMA tracking using a balanced binary tree approach. Today's `VirtualMemoryTracker`'s usage of linked lists is slow and brittle, we'd like to move away from it. Our Treap-based approach in this patch gives a performance boost such that we see 25x better performance in a benchmark. The idea and draft of this... Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: clangd messed up automatic refactoring ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18289/files - new: https://git.openjdk.org/jdk/pull/18289/files/693423ed..3d8875df Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18289&range=68 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18289&range=67-68 Stats: 27 lines in 2 files changed: 4 ins; 0 del; 23 mod Patch: https://git.openjdk.org/jdk/pull/18289.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18289/head:pull/18289 PR: https://git.openjdk.org/jdk/pull/18289 From luhenry at openjdk.org Wed May 8 10:33:53 2024 From: luhenry at openjdk.org (Ludovic Henry) Date: Wed, 8 May 2024 10:33:53 GMT Subject: RFR: 8322753: RISC-V: C2 ReverseBytesV [v2] In-Reply-To: References: <71dgzGhNrtu95zP0OtGxZ-cxJ3kwGcV2lbF8oHcjeBM=.05f90142-3bbb-49d5-b2a2-c41408a90b19@github.com> Message-ID: On Tue, 7 May 2024 20:02:50 GMT, Hamlin Li wrote: >> src/hotspot/os_cpu/linux_riscv/riscv_hwprobe.cpp line 182: >> >>> 180: } >>> 181: if (is_set(RISCV_HWPROBE_KEY_IMA_EXT_0, RISCV_HWPROBE_EXT_ZVBB)) { >>> 182: VM_Version::ext_Zvbb.enable_feature(); >> >> Same as https://github.com/openjdk/jdk/pull/19065, we don't want to enable experimental extensions via hwprobe. > > Same as above. Thanks! That is still leftover from merging master. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19120#discussion_r1593807443 From luhenry at openjdk.org Wed May 8 10:33:55 2024 From: luhenry at openjdk.org (Ludovic Henry) Date: Wed, 8 May 2024 10:33:55 GMT Subject: RFR: 8322753: RISC-V: C2 ReverseBytesV [v2] In-Reply-To: References: Message-ID: On Wed, 8 May 2024 10:21:18 GMT, Hamlin Li wrote: >> Hi, >> Can you review this patch to add ReverseBytesV intrinsic? >> Thanks. > > Hamlin Li has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains seven commits: > > - minor fix > - merge master > - remove reverse bits > - fix test filter > - fix zvbb flag; fix tests > - merge master > - ReverseV/ReverseBytesV: Initial Commit test/hotspot/jtreg/compiler/vectorapi/VectorReverseBytesTest.java line 46: > 44: * @requires vm.compiler2.enabled > 45: * @requires (os.simpleArch == "x64" & vm.cpu.features ~= ".*avx2.*") | > 46: * os.arch == "aarch64" | You can avoid the newline here ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19120#discussion_r1593807869 From mli at openjdk.org Wed May 8 10:42:10 2024 From: mli at openjdk.org (Hamlin Li) Date: Wed, 8 May 2024 10:42:10 GMT Subject: RFR: 8322753: RISC-V: C2 ReverseBytesV [v3] In-Reply-To: References: Message-ID: > Hi, > Can you review this patch to add ReverseBytesV intrinsic? > Thanks. Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: fix merge ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19120/files - new: https://git.openjdk.org/jdk/pull/19120/files/4f8a3a53..d4e56eac Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19120&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19120&range=01-02 Stats: 5 lines in 2 files changed: 0 ins; 4 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/19120.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19120/head:pull/19120 PR: https://git.openjdk.org/jdk/pull/19120 From mli at openjdk.org Wed May 8 10:42:11 2024 From: mli at openjdk.org (Hamlin Li) Date: Wed, 8 May 2024 10:42:11 GMT Subject: RFR: 8322753: RISC-V: C2 ReverseBytesV [v2] In-Reply-To: References: Message-ID: <7u8JXQyyB5iVAPDC-r7-pXSzF8OKCOTgPv355zj9IZY=.8568a019-0b77-427d-aa0a-7ea2a8506fb1@github.com> On Wed, 8 May 2024 10:30:45 GMT, Ludovic Henry wrote: >> Hamlin Li has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains seven commits: >> >> - minor fix >> - merge master >> - remove reverse bits >> - fix test filter >> - fix zvbb flag; fix tests >> - merge master >> - ReverseV/ReverseBytesV: Initial Commit > > test/hotspot/jtreg/compiler/vectorapi/VectorReverseBytesTest.java line 46: > >> 44: * @requires vm.compiler2.enabled >> 45: * @requires (os.simpleArch == "x64" & vm.cpu.features ~= ".*avx2.*") | >> 46: * os.arch == "aarch64" | > > You can avoid the newline here > That is still leftover from merging master. Thanks for catching, fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19120#discussion_r1593819899 From tholenstein at openjdk.org Wed May 8 10:44:52 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Wed, 8 May 2024 10:44:52 GMT Subject: RFR: 8329748: Change default value of AssertWXAtThreadSync to true In-Reply-To: References: Message-ID: On Mon, 6 May 2024 21:53:18 GMT, Dean Long wrote: >> src/hotspot/share/jfr/support/jfrIntrinsics.cpp line 77: >> >>> 75: void* JfrIntrinsicSupport::return_lease(JavaThread* jt) { >>> 76: DEBUG_ONLY(assert_precondition(jt);) >>> 77: MACOS_AARCH64_ONLY(ThreadWXEnable __wx(WXWrite, jt)); >> >> It seems like this could be moved down. It doesn't seem to be needed for the Java --> native transition. Is it needed for the JfrJavaEventWriter::flush() call? > > If it is only needed for the native --> Java transition below, why don't we do it lazily? The interpreter and compilers already do this by calling check_special_condition_for_native_trans() only if a safepoint is detected. > Normally we would want to be in the WXExec state when executing in _thread_in_native. `WXWrite` is needed for JfrIntrinsicSupport::return_lease -> ThreadStateTransition::transition_from_native -> SafepointMechanism::process_if_requested_with_exit_check -> SafepointMechanism::process_if_requested -> JavaThread::check_possible_safepoint -> assert_wx_state(WXWrite) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19102#discussion_r1593825099 From tholenstein at openjdk.org Wed May 8 10:47:56 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Wed, 8 May 2024 10:47:56 GMT Subject: RFR: 8329748: Change default value of AssertWXAtThreadSync to true In-Reply-To: References: Message-ID: On Wed, 8 May 2024 10:42:40 GMT, Tobias Holenstein wrote: >> If it is only needed for the native --> Java transition below, why don't we do it lazily? The interpreter and compilers already do this by calling check_special_condition_for_native_trans() only if a safepoint is detected. >> Normally we would want to be in the WXExec state when executing in _thread_in_native. > > `WXWrite` is needed for > > > JfrIntrinsicSupport::return_lease -> > ThreadStateTransition::transition_from_native -> > SafepointMechanism::process_if_requested_with_exit_check -> > SafepointMechanism::process_if_requested -> > JavaThread::check_possible_safepoint -> > assert_wx_state(WXWrite) > Normally we would want to be in the WXExec state when executing in _thread_in_native. I agree. So we would need to aquire `WXWrite` twice just for `ThreadStateTransition::transition_from_java` and again for `ThreadStateTransition::transition_from_native`. I think its a bit unfortune that `WXWrite` is needed for the state transition.. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19102#discussion_r1593828011 From kevinw at openjdk.org Wed May 8 10:49:57 2024 From: kevinw at openjdk.org (Kevin Walls) Date: Wed, 8 May 2024 10:49:57 GMT Subject: RFR: 8314225: SIGSEGV in JavaThread::is_lock_owned [v10] In-Reply-To: References: Message-ID: <90cep4r29bNWe507rQeeN-28VY0TaLpFpjYDrc6qwMY=.d53a6eaa-24e9-4f44-b483-dbcbca2afc0b@github.com> On Wed, 8 May 2024 09:59:43 GMT, Dean Long wrote: >> Kevin Walls has updated the pull request incrementally with one additional commit since the last revision: >> >> null nullptr oops > > src/hotspot/share/runtime/vframeArray.cpp line 95: > >> 93: BasicObjectLock* dest = _monitors->at(index); >> 94: if (monitor->owner_is_scalar_replaced()) { >> 95: dest->set_obj(nullptr); > > It looks like there is an existing bug that allows dest->lock() to be uninitialized here, which could cause problems later on in unpack_on_stack if move_to sees a "neutral" value and tries to inflate. I created https://bugs.openjdk.org/browse/JDK-8331918 so this can be looked at separately. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18940#discussion_r1593830347 From jsjolen at openjdk.org Wed May 8 10:51:28 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Wed, 8 May 2024 10:51:28 GMT Subject: RFR: 8312132: Add tracking of multiple address spaces in NMT [v70] In-Reply-To: References: Message-ID: > Hi, > > This PR introduces a new abstraction to NMT, named `MemoryFileTracker`. Today, NMT does not track any memory outside of the virtual memory address space. This means that if you allocated memory in something such as a memory-backed file and use `mmap` to map into that memory, then you'll have trouble reporting this to NMT. This is the situation that ZGC is in, and that is what this patch attempts to fix. > > ## `MemoryFileTracker` > > The `MemoryFileTracker` adds the ability of adding new virtual memory address spaces to NMT and committing memory to these, the basic API is: > > ```c++ > static MemoryFile* make_device(const char* descriptive_name); > static void free_device(MemoryFile* device); > > static void allocate_memory(MemoryFile* device, size_t offset, size_t size, > MEMFLAGS flag, const NativeCallStack& stack); > static void free_memory(MemoryFile* device, size_t offset, size_t size); > > > It is easiest to see how this is used by looking at what ZGC's `ZNMT` class does: > > ```c++ > void ZNMT::reserve(zaddress_unsafe start, size_t size) { > MemTracker::record_virtual_memory_reserve((address)start, size, CALLER_PC, mtJavaHeap); > } > void ZNMT::commit(zoffset offset, size_t size) { > MemTracker::allocate_memory_in(ZNMT::_device, static_cast(offset), size, mtJavaHeap, CALLER_PC); > } > void ZNMT::uncommit(zoffset offset, size_t size) { > MemTracker::free_memory_in(ZNMT::_device, (size_t)offset, size); > } > > void ZNMT::map(zaddress_unsafe addr, size_t size, zoffset offset) { > // NMT doesn't track mappings at the moment. > } > void ZNMT::unmap(zaddress_unsafe addr, size_t size) { > // NMT doesn't track mappings at the moment. > } > > > As you can see, any mapping between reserved regions and device-allocated memory is not recorded in NMT. This means that in detailed mode you only get reserved regions printed for the reserved memory, the device-allocated memory is reported separately. When performing summary reporting any memory allocated via these devices is added to the corresponding `MEMFLAGS` as `committed` memory. > > This patch is also acting as a base on which we deploy multiple new backend ideas to NMT. These ideas are: > > 1. Implement VMA tracking using a balanced binary tree approach. Today's `VirtualMemoryTracker`'s usage of linked lists is slow and brittle, we'd like to move away from it. Our Treap-based approach in this patch gives a performance boost such that we see 25x better performance in a benchmark. The idea and draft of this... Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: Push all reporting intoMemoryFileTracker ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18289/files - new: https://git.openjdk.org/jdk/pull/18289/files/3d8875df..150d17cd Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18289&range=69 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18289&range=68-69 Stats: 22 lines in 3 files changed: 13 ins; 8 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/18289.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18289/head:pull/18289 PR: https://git.openjdk.org/jdk/pull/18289 From fyang at openjdk.org Wed May 8 11:08:54 2024 From: fyang at openjdk.org (Fei Yang) Date: Wed, 8 May 2024 11:08:54 GMT Subject: RFR: 8322753: RISC-V: C2 ReverseBytesV [v3] In-Reply-To: References: Message-ID: On Wed, 8 May 2024 10:42:10 GMT, Hamlin Li wrote: >> Hi, >> Can you review this patch to add ReverseBytesV intrinsic? >> Thanks. > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > fix merge src/hotspot/cpu/riscv/assembler_riscv.hpp line 1891: > 1889: // Vector Bit-manipulation used in Cryptography (Zvkb) Extension > 1890: INSN(vbrev_v, 0b1010111, 0b010, 0b01010, 0b010010); // reverse bits in every element > 1891: INSN(vbrev8_v, 0b1010111, 0b010, 0b01000, 0b010010); // reverse btis in every byte of element Typo: s/btis/bits/ src/hotspot/cpu/riscv/riscv_v.ad line 3762: > 3760: // -------------------------------- Reverse Bytes Vector Operations ------------------------ > 3761: > 3762: instruct vreverse_bytes_mask(vReg dst, vReg src, vRegMask_V0 v0) %{ Suggestion: s/vpopcount_mask/vpopcount_masked/ Maybe you can rename `vpopcount_mask` to `vpopcount_masked` as well? src/hotspot/cpu/riscv/riscv_v.ad line 3764: > 3762: instruct vreverse_bytes_mask(vReg dst, vReg src, vRegMask_V0 v0) %{ > 3763: match(Set dst (ReverseBytesV src v0)); > 3764: format %{ "vector_reverse_byte $dst, $src, v0.t" %} Suggestion: format %{ "vreverse_bytes_masked $dst, $src, $v0" %} src/hotspot/cpu/riscv/riscv_v.ad line 3766: > 3764: format %{ "vector_reverse_byte $dst, $src, v0.t" %} > 3765: ins_encode %{ > 3766: __ vrev8_v(as_VectorRegister($dst$$reg), as_VectorRegister($src$$reg), __ VectorMask::v0_t); I think we should call `vsetvli_helper(bt, vlen)` for both newly-added instructs. Also, can we use `Assembler::v0_t` here like you do in https://github.com/openjdk/jdk/pull/19065 for consistency? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19120#discussion_r1593830691 PR Review Comment: https://git.openjdk.org/jdk/pull/19120#discussion_r1593842831 PR Review Comment: https://git.openjdk.org/jdk/pull/19120#discussion_r1593844198 PR Review Comment: https://git.openjdk.org/jdk/pull/19120#discussion_r1593838481 From jsjolen at openjdk.org Wed May 8 11:16:31 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Wed, 8 May 2024 11:16:31 GMT Subject: RFR: 8312132: Add tracking of multiple address spaces in NMT [v71] In-Reply-To: References: Message-ID: > Hi, > > This PR introduces a new abstraction to NMT, named `MemoryFileTracker`. Today, NMT does not track any memory outside of the virtual memory address space. This means that if you allocated memory in something such as a memory-backed file and use `mmap` to map into that memory, then you'll have trouble reporting this to NMT. This is the situation that ZGC is in, and that is what this patch attempts to fix. > > ## `MemoryFileTracker` > > The `MemoryFileTracker` adds the ability of adding new virtual memory address spaces to NMT and committing memory to these, the basic API is: > > ```c++ > static MemoryFile* make_device(const char* descriptive_name); > static void free_device(MemoryFile* device); > > static void allocate_memory(MemoryFile* device, size_t offset, size_t size, > MEMFLAGS flag, const NativeCallStack& stack); > static void free_memory(MemoryFile* device, size_t offset, size_t size); > > > It is easiest to see how this is used by looking at what ZGC's `ZNMT` class does: > > ```c++ > void ZNMT::reserve(zaddress_unsafe start, size_t size) { > MemTracker::record_virtual_memory_reserve((address)start, size, CALLER_PC, mtJavaHeap); > } > void ZNMT::commit(zoffset offset, size_t size) { > MemTracker::allocate_memory_in(ZNMT::_device, static_cast(offset), size, mtJavaHeap, CALLER_PC); > } > void ZNMT::uncommit(zoffset offset, size_t size) { > MemTracker::free_memory_in(ZNMT::_device, (size_t)offset, size); > } > > void ZNMT::map(zaddress_unsafe addr, size_t size, zoffset offset) { > // NMT doesn't track mappings at the moment. > } > void ZNMT::unmap(zaddress_unsafe addr, size_t size) { > // NMT doesn't track mappings at the moment. > } > > > As you can see, any mapping between reserved regions and device-allocated memory is not recorded in NMT. This means that in detailed mode you only get reserved regions printed for the reserved memory, the device-allocated memory is reported separately. When performing summary reporting any memory allocated via these devices is added to the corresponding `MEMFLAGS` as `committed` memory. > > This patch is also acting as a base on which we deploy multiple new backend ideas to NMT. These ideas are: > > 1. Implement VMA tracking using a balanced binary tree approach. Today's `VirtualMemoryTracker`'s usage of linked lists is slow and brittle, we'd like to move away from it. Our Treap-based approach in this patch gives a performance boost such that we see 25x better performance in a benchmark. The idea and draft of this... Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: Swap around MEMFLAGS and NCS in allocate_memory ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18289/files - new: https://git.openjdk.org/jdk/pull/18289/files/150d17cd..a843a9e4 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18289&range=70 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18289&range=69-70 Stats: 14 lines in 5 files changed: 0 ins; 0 del; 14 mod Patch: https://git.openjdk.org/jdk/pull/18289.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18289/head:pull/18289 PR: https://git.openjdk.org/jdk/pull/18289 From kevinw at openjdk.org Wed May 8 11:16:57 2024 From: kevinw at openjdk.org (Kevin Walls) Date: Wed, 8 May 2024 11:16:57 GMT Subject: RFR: 8314225: SIGSEGV in JavaThread::is_lock_owned [v6] In-Reply-To: References: Message-ID: On Fri, 3 May 2024 02:20:34 GMT, Dean Long wrote: >> Kevin Walls has updated the pull request incrementally with one additional commit since the last revision: >> >> monitor->owner() == nullptr handling in fill_in > > src/hotspot/share/runtime/vframeArray.cpp line 97: > >> 95: dest->set_obj(nullptr); >> 96: } else { >> 97: assert(!monitor->owner()->is_unlocked(), "object must be null or locked"); > > Suggestion: > > assert(monitor->owner() != nullptr, "monitor owner must not be null"); > assert(!monitor->owner()->is_unlocked(), "monitor must be locked"); Yes, done. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18940#discussion_r1593858530 From kevinw at openjdk.org Wed May 8 11:19:55 2024 From: kevinw at openjdk.org (Kevin Walls) Date: Wed, 8 May 2024 11:19:55 GMT Subject: RFR: 8314225: SIGSEGV in JavaThread::is_lock_owned [v10] In-Reply-To: <2YLBGUEZCYrtkqsT4jCdmLrNzGhdsN8U6YOY_hqkbo0=.597115db-ac29-4fa3-84c3-b713b5652c85@github.com> References: <4NzfdylxvqETF87l3E4O3XdBMInuP7_8S9mhS6tN0QA=.cc497605-246b-4ebc-9816-09b384683e0d@github.com> <4U-AP8zHxJrxwXYoTcxlpn5OvztYUW-ijTAd5TJ3I_4=.731aeb9c-115a-40c8-9298-577e0fada9ce@github.com> <2YLBGUEZCYrtkqsT4jCdmLrNzGhdsN8U6YOY_hqkbo0=.597115db-ac29-4fa3-84c3-b713b5652c85@github.com> Message-ID: On Thu, 2 May 2024 19:37:31 GMT, Kevin Walls wrote: >> I assume it's only for the `fill_in` `realloc_failures` case. But you're right, it doesn't seem very useful. It's just going to look like an unlocked monitor slot in the interpreter frame. We could consider skipping these in `fill_in`, then they won't show up later in `unpack_on_stack`(). > > fill_in() has previously OK with seeing monitor->owner() == nullptr > so it's already setting dest->set_obj(null) under some conditions. > > I see we can handle the null separately and simplify the asserts there. > > vframeArrayElement::unpack_on_stack() > Still might retrieve a null, so the asserts there keep the guard against doing the owner check -- I'm not sure if they won't show up there - it loops over the number of elements in the MonitorChunk* so it should see them all? (marking as resolved, null checks are back in these asserts, there could be a null there from the is_scalar_replaced case, and possibly if realloc_failures is true) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18940#discussion_r1593862180 From jsjolen at openjdk.org Wed May 8 11:24:19 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Wed, 8 May 2024 11:24:19 GMT Subject: RFR: 8312132: Add tracking of multiple address spaces in NMT [v72] In-Reply-To: References: Message-ID: > Hi, > > This PR introduces a new abstraction to NMT, named `MemoryFileTracker`. Today, NMT does not track any memory outside of the virtual memory address space. This means that if you allocated memory in something such as a memory-backed file and use `mmap` to map into that memory, then you'll have trouble reporting this to NMT. This is the situation that ZGC is in, and that is what this patch attempts to fix. > > ## `MemoryFileTracker` > > The `MemoryFileTracker` adds the ability of adding new virtual memory address spaces to NMT and committing memory to these, the basic API is: > > ```c++ > static MemoryFile* make_device(const char* descriptive_name); > static void free_device(MemoryFile* device); > > static void allocate_memory(MemoryFile* device, size_t offset, size_t size, > MEMFLAGS flag, const NativeCallStack& stack); > static void free_memory(MemoryFile* device, size_t offset, size_t size); > > > It is easiest to see how this is used by looking at what ZGC's `ZNMT` class does: > > ```c++ > void ZNMT::reserve(zaddress_unsafe start, size_t size) { > MemTracker::record_virtual_memory_reserve((address)start, size, CALLER_PC, mtJavaHeap); > } > void ZNMT::commit(zoffset offset, size_t size) { > MemTracker::allocate_memory_in(ZNMT::_device, static_cast(offset), size, mtJavaHeap, CALLER_PC); > } > void ZNMT::uncommit(zoffset offset, size_t size) { > MemTracker::free_memory_in(ZNMT::_device, (size_t)offset, size); > } > > void ZNMT::map(zaddress_unsafe addr, size_t size, zoffset offset) { > // NMT doesn't track mappings at the moment. > } > void ZNMT::unmap(zaddress_unsafe addr, size_t size) { > // NMT doesn't track mappings at the moment. > } > > > As you can see, any mapping between reserved regions and device-allocated memory is not recorded in NMT. This means that in detailed mode you only get reserved regions printed for the reserved memory, the device-allocated memory is reported separately. When performing summary reporting any memory allocated via these devices is added to the corresponding `MEMFLAGS` as `committed` memory. > > This patch is also acting as a base on which we deploy multiple new backend ideas to NMT. These ideas are: > > 1. Implement VMA tracking using a balanced binary tree approach. Today's `VirtualMemoryTracker`'s usage of linked lists is slow and brittle, we'd like to move away from it. Our Treap-based approach in this patch gives a performance boost such that we see 25x better performance in a benchmark. The idea and draft of this... Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: Oh man, almost made a refactoring without a mistake :-) ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18289/files - new: https://git.openjdk.org/jdk/pull/18289/files/a843a9e4..3b9bb9fa Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18289&range=71 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18289&range=70-71 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/18289.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18289/head:pull/18289 PR: https://git.openjdk.org/jdk/pull/18289 From jsjolen at openjdk.org Wed May 8 11:24:20 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Wed, 8 May 2024 11:24:20 GMT Subject: RFR: 8312132: Add tracking of multiple address spaces in NMT [v46] In-Reply-To: References: Message-ID: On Tue, 23 Apr 2024 13:09:04 GMT, Afshin Zafari wrote: >> Johan Sj?len has updated the pull request incrementally with two additional commits since the last revision: >> >> - Remove faulty condition after removing merging >> - Add failing test case > > src/hotspot/share/gc/z/zNMT.cpp line 45: > >> 43: >> 44: void ZNMT::commit(zoffset offset, size_t size) { >> 45: MemTracker::allocate_memory_in(ZNMT::_device, untype(offset), size, mtJavaHeap, CALLER_PC); > > `NativeCallStack` param should be before the `MEMFLAGS` param, same as other functions. Fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18289#discussion_r1593866154 From eosterlund at openjdk.org Wed May 8 11:30:54 2024 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Wed, 8 May 2024 11:30:54 GMT Subject: RFR: 8331711: G1 doesn't need pre write barrier for stores from new allocated objects [v2] In-Reply-To: References: <0OdHsQmnM80KQib8u-yWtCSCejCTIK8lJ_bpLk3O_9E=.d727d825-882e-4574-84d9-6a908138066c@github.com> Message-ID: On Wed, 8 May 2024 06:10:52 GMT, Liang Mao wrote: >> I think that store capturing of initializing stores already removes most of the barriers of this category. We do that a bit later on. We find initializing stores onto newly allocated objects, and replace the store with barriers, with a store without barriers. That one usually elides a large portion of store barriers. Did you find any example where you have a newly allocated object with stores that are not initializing, hence not elided, which unnecessarily invoked a pre-write barrier, but not a post-write barrier? Just trying to find out what the problem space is, that this fixes. > >> I think that store capturing of initializing stores already removes most of the barriers of this category. We do that a bit later on. We find initializing stores onto newly allocated objects, and replace the store with barriers, with a store without barriers. That one usually elides a large portion of store barriers. Did you find any example where you have a newly allocated object with stores that are not initializing, hence not elided, which unnecessarily invoked a pre-write barrier, but not a post-write barrier? Just trying to find out what the problem space is, that this fixes. > > Hi Erik, I found examples not filtered by g1_can_remove_pre_barrier in testing. But I just did some statistics on SPECjbb2015 that if g1_can_remove_pre_barrier ran first it would elide most of the pre-barriers and "obj == kit->just_allocated_object" only found very few remaining opportunities. If we run condition "obj == kit->just_allocated_object" first, it would cover ~30% opportunities. I think technically this PR should be correct but it's up to reviewers to decide if we practically need it. Did you check how many of the stores where g1_can_remove_pre_barrier said false and you would have said true, were elided anyway during store capturing (cf. InitializeNode::capture_store), or as part of G1BarrierSetC2::eliminate_gc_barrier? In other words, how many barriers are you eliding, that were not in fact already elided, just a bit later on? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19098#discussion_r1593874081 From jsjolen at openjdk.org Wed May 8 11:31:21 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Wed, 8 May 2024 11:31:21 GMT Subject: RFR: 8312132: Add tracking of multiple address spaces in NMT [v73] In-Reply-To: References: Message-ID: > Hi, > > This PR introduces a new abstraction to NMT, named `MemoryFileTracker`. Today, NMT does not track any memory outside of the virtual memory address space. This means that if you allocated memory in something such as a memory-backed file and use `mmap` to map into that memory, then you'll have trouble reporting this to NMT. This is the situation that ZGC is in, and that is what this patch attempts to fix. > > ## `MemoryFileTracker` > > The `MemoryFileTracker` adds the ability of adding new virtual memory address spaces to NMT and committing memory to these, the basic API is: > > ```c++ > static MemoryFile* make_device(const char* descriptive_name); > static void free_device(MemoryFile* device); > > static void allocate_memory(MemoryFile* device, size_t offset, size_t size, > MEMFLAGS flag, const NativeCallStack& stack); > static void free_memory(MemoryFile* device, size_t offset, size_t size); > > > It is easiest to see how this is used by looking at what ZGC's `ZNMT` class does: > > ```c++ > void ZNMT::reserve(zaddress_unsafe start, size_t size) { > MemTracker::record_virtual_memory_reserve((address)start, size, CALLER_PC, mtJavaHeap); > } > void ZNMT::commit(zoffset offset, size_t size) { > MemTracker::allocate_memory_in(ZNMT::_device, static_cast(offset), size, mtJavaHeap, CALLER_PC); > } > void ZNMT::uncommit(zoffset offset, size_t size) { > MemTracker::free_memory_in(ZNMT::_device, (size_t)offset, size); > } > > void ZNMT::map(zaddress_unsafe addr, size_t size, zoffset offset) { > // NMT doesn't track mappings at the moment. > } > void ZNMT::unmap(zaddress_unsafe addr, size_t size) { > // NMT doesn't track mappings at the moment. > } > > > As you can see, any mapping between reserved regions and device-allocated memory is not recorded in NMT. This means that in detailed mode you only get reserved regions printed for the reserved memory, the device-allocated memory is reported separately. When performing summary reporting any memory allocated via these devices is added to the corresponding `MEMFLAGS` as `committed` memory. > > This patch is also acting as a base on which we deploy multiple new backend ideas to NMT. These ideas are: > > 1. Implement VMA tracking using a balanced binary tree approach. Today's `VirtualMemoryTracker`'s usage of linked lists is slow and brittle, we'd like to move away from it. Our Treap-based approach in this patch gives a performance boost such that we see 25x better performance in a benchmark. The idea and draft of this... Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: Rename nmtMemoryFileTracker to skip nmt prefix ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18289/files - new: https://git.openjdk.org/jdk/pull/18289/files/3b9bb9fa..80d408d0 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18289&range=72 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18289&range=71-72 Stats: 11 lines in 7 files changed: 1 ins; 3 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/18289.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18289/head:pull/18289 PR: https://git.openjdk.org/jdk/pull/18289 From jsjolen at openjdk.org Wed May 8 11:53:17 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Wed, 8 May 2024 11:53:17 GMT Subject: RFR: 8312132: Add tracking of multiple address spaces in NMT [v74] In-Reply-To: References: Message-ID: > Hi, > > This PR introduces a new abstraction to NMT, named `MemoryFileTracker`. Today, NMT does not track any memory outside of the virtual memory address space. This means that if you allocated memory in something such as a memory-backed file and use `mmap` to map into that memory, then you'll have trouble reporting this to NMT. This is the situation that ZGC is in, and that is what this patch attempts to fix. > > ## `MemoryFileTracker` > > The `MemoryFileTracker` adds the ability of adding new virtual memory address spaces to NMT and committing memory to these, the basic API is: > > ```c++ > static MemoryFile* make_device(const char* descriptive_name); > static void free_device(MemoryFile* device); > > static void allocate_memory(MemoryFile* device, size_t offset, size_t size, > MEMFLAGS flag, const NativeCallStack& stack); > static void free_memory(MemoryFile* device, size_t offset, size_t size); > > > It is easiest to see how this is used by looking at what ZGC's `ZNMT` class does: > > ```c++ > void ZNMT::reserve(zaddress_unsafe start, size_t size) { > MemTracker::record_virtual_memory_reserve((address)start, size, CALLER_PC, mtJavaHeap); > } > void ZNMT::commit(zoffset offset, size_t size) { > MemTracker::allocate_memory_in(ZNMT::_device, static_cast(offset), size, mtJavaHeap, CALLER_PC); > } > void ZNMT::uncommit(zoffset offset, size_t size) { > MemTracker::free_memory_in(ZNMT::_device, (size_t)offset, size); > } > > void ZNMT::map(zaddress_unsafe addr, size_t size, zoffset offset) { > // NMT doesn't track mappings at the moment. > } > void ZNMT::unmap(zaddress_unsafe addr, size_t size) { > // NMT doesn't track mappings at the moment. > } > > > As you can see, any mapping between reserved regions and device-allocated memory is not recorded in NMT. This means that in detailed mode you only get reserved regions printed for the reserved memory, the device-allocated memory is reported separately. When performing summary reporting any memory allocated via these devices is added to the corresponding `MEMFLAGS` as `committed` memory. > > This patch is also acting as a base on which we deploy multiple new backend ideas to NMT. These ideas are: > > 1. Implement VMA tracking using a balanced binary tree approach. Today's `VirtualMemoryTracker`'s usage of linked lists is slow and brittle, we'd like to move away from it. Our Treap-based approach in this patch gives a performance boost such that we see 25x better performance in a benchmark. The idea and draft of this... Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: Some style ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18289/files - new: https://git.openjdk.org/jdk/pull/18289/files/80d408d0..137af84f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18289&range=73 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18289&range=72-73 Stats: 6 lines in 1 file changed: 1 ins; 2 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/18289.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18289/head:pull/18289 PR: https://git.openjdk.org/jdk/pull/18289 From jsjolen at openjdk.org Wed May 8 12:03:55 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Wed, 8 May 2024 12:03:55 GMT Subject: RFR: 8330532: Improve line-oriented text parsing in HotSpot [v7] In-Reply-To: References: Message-ID: On Tue, 7 May 2024 03:29:25 GMT, Ioi Lam wrote: >> (This PR is an alternative to https://github.com/openjdk/jdk/pull/18669 with a better API for reading lines of text) >> >> HotSpot has a few cases where information is parsed from a file, or from a memory buffer, one line at a time. Example: >> >> - https://github.com/openjdk/jdk/blob/064628471b83616b4463baa78618d1b7a66d0c7c/src/hotspot/share/cds/classListParser.cpp#L169 >> - https://github.com/openjdk/jdk/blob/064628471b83616b4463baa78618d1b7a66d0c7c/src/hotspot/share/compiler/compilerOracle.cpp#L1059-L1066 >> >> Common problems: >> - They use a fixed buffer for reading a line, so long (but valid) lines will cause errors. >> - There's ad-hoc code that deals with `FILE*` differently than from memory. >> >> This RFE implements a common utility, `inputStream`, for reading lines from different sources of input (see `FileInput` and `MemoryInput`). We fixed only `ClassListParser` and `CompilerOracle` in this RFE, but we can fix other readers in follow-up RFEs. >> >> The API allows other source of input to be implemented. For example, one could implement a `SocketInput` if there's a use case for it. >> >> In the future, `inputStream` can be extended (or encapsulated in a higher-level reader class) to read typed input tokens (for example, integers, strings, etc.) >> >> Credit: >> The `inputStream` class and friends are contributed by @rose00 . See https://mail.openjdk.org/pipermail/hotspot-dev/2024-April/087077.html . >> >> John's original version is in the draft PR https://github.com/openjdk/jdk/pull/18773. In order to minimize the size of this PR, I have kept only the functionalities for reading a line and a time. Other features, such as pushing back contents into the `inputStream`, could be added in follow-up PRs. (These removed features can be found in the commit history of this PR). > > Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: > > No need to call set_input(null_ptr) from inputStream destructor Please apply my suggestions, we should use `override` rather than `virtual` when implementing virtual functions. Other than those two suggestions, the code looks good to me. Thanks for your efforts in making this happen! src/hotspot/share/utilities/istream.hpp line 353: > 351: > 352: protected: > 353: virtual size_t read(char* buf, size_t size) { Suggestion: size_t read(char* buf, size_t size) override { src/hotspot/share/utilities/istream.hpp line 373: > 371: > 372: protected: > 373: virtual size_t read(char* buf, size_t size) { Suggestion: size_t read(char* buf, size_t size) override { ------------- Marked as reviewed by jsjolen (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18833#pullrequestreview-2045461103 PR Comment: https://git.openjdk.org/jdk/pull/18833#issuecomment-2100420229 PR Review Comment: https://git.openjdk.org/jdk/pull/18833#discussion_r1593912748 PR Review Comment: https://git.openjdk.org/jdk/pull/18833#discussion_r1593913186 From rehn at openjdk.org Wed May 8 12:35:52 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Wed, 8 May 2024 12:35:52 GMT Subject: RFR: 8326306: RISC-V: Re-structure MASM calls and jumps [v8] In-Reply-To: References: Message-ID: On Wed, 8 May 2024 10:21:08 GMT, Fei Yang wrote: >> The JNI_FastGetField::generate_fast_get_XXX_field0 write the code in a CodeBuffer, this where slow_case_addr points to. >> >> As I test with ReservedCodeCacheSize=2047M if we generated a li() relocation would fail. (there is usually around 120 MB between them) >> >> Added your assert, it passes also. >> >> ExternalAddress target(slow_case_addr); >> + assert(CodeCache::contains(slow_case_addr), "Must be"); >> __ relocate(target.rspec(), [&] { >> >> >> >> I guess you are running on apple, the code concerning "static_fast_get_field_wrapper" smells. >> Maybe you found a bug here. > > Try this: `make test TEST="runtime/jni/FastGetField/FastGetField.java"` > The assertion failed on both linux-aarch64 and linux-riscv64 platforms. If `-UseFastJNIAccessors or +VerifyJNIFields or +CheckJNICalls` we never use the stubs. In this case the JVM TI agent have callbacks since it have field watches, so it turns the stubs off. Good catch thanks! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18942#discussion_r1593956722 From rehn at openjdk.org Wed May 8 12:44:52 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Wed, 8 May 2024 12:44:52 GMT Subject: RFR: 8326306: RISC-V: Re-structure MASM calls and jumps [v8] In-Reply-To: References: Message-ID: On Wed, 8 May 2024 12:33:33 GMT, Robbin Ehn wrote: >> Try this: `make test TEST="runtime/jni/FastGetField/FastGetField.java"` >> The assertion failed on both linux-aarch64 and linux-riscv64 platforms. > > If `-UseFastJNIAccessors or +VerifyJNIFields or +CheckJNICalls` we never use the stubs. > In this case the JVM TI agent have callbacks since it have field watches, so it turns the stubs off. > > Good catch thanks! As you turn off "fast JNI" with those option I think it's fine to use rt_call, which would do movptr in this case. If you wanted fast JNI you shouldn't turn it off. But I can revert to la+jalr if you feel strongly about it? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18942#discussion_r1593967094 From mli at openjdk.org Wed May 8 12:49:06 2024 From: mli at openjdk.org (Hamlin Li) Date: Wed, 8 May 2024 12:49:06 GMT Subject: RFR: 8322753: RISC-V: C2 ReverseBytesV [v4] In-Reply-To: References: Message-ID: > Hi, > Can you review this patch to add ReverseBytesV intrinsic? > Thanks. Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: fix misc ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19120/files - new: https://git.openjdk.org/jdk/pull/19120/files/d4e56eac..cee9b99b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19120&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19120&range=02-03 Stats: 11 lines in 2 files changed: 6 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/19120.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19120/head:pull/19120 PR: https://git.openjdk.org/jdk/pull/19120 From mli at openjdk.org Wed May 8 12:52:10 2024 From: mli at openjdk.org (Hamlin Li) Date: Wed, 8 May 2024 12:52:10 GMT Subject: RFR: 8322753: RISC-V: C2 ReverseBytesV [v5] In-Reply-To: References: Message-ID: > Hi, > Can you review this patch to add ReverseBytesV intrinsic? > Thanks. Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: minor ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19120/files - new: https://git.openjdk.org/jdk/pull/19120/files/cee9b99b..addb1441 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19120&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19120&range=03-04 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/19120.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19120/head:pull/19120 PR: https://git.openjdk.org/jdk/pull/19120 From mli at openjdk.org Wed May 8 12:52:10 2024 From: mli at openjdk.org (Hamlin Li) Date: Wed, 8 May 2024 12:52:10 GMT Subject: RFR: 8322753: RISC-V: C2 ReverseBytesV [v3] In-Reply-To: References: Message-ID: On Wed, 8 May 2024 10:55:22 GMT, Fei Yang wrote: >> Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: >> >> fix merge > > src/hotspot/cpu/riscv/riscv_v.ad line 3766: > >> 3764: format %{ "vector_reverse_byte $dst, $src, v0.t" %} >> 3765: ins_encode %{ >> 3766: __ vrev8_v(as_VectorRegister($dst$$reg), as_VectorRegister($src$$reg), __ VectorMask::v0_t); > > I think we should call `vsetvli_helper(bt, vlen)` for both newly-added instructs. Also, can we use `Assembler::v0_t` instead of `__ VectorMask::v0_t` here like you do in https://github.com/openjdk/jdk/pull/19065 for consistency? Thanks for catching! Others are also fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19120#discussion_r1593977034 From aboldtch at openjdk.org Wed May 8 13:21:54 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Wed, 8 May 2024 13:21:54 GMT Subject: RFR: 8331098: [Aarch64] Fix crash in Arrays.equals() intrinsic with -CCP [v6] In-Reply-To: References: <_HzINQ0atD5BmBbIZ6A4A5y1wNvwsvrBxAiaz2Mk9rY=.43cde0ae-1179-4708-afa1-fda64039d722@github.com> <1BfXhOjLGLcOI7zwKBMp5MiE0NgHRJ0rTfgDBRDIDPw=.a9ac9448-5f14-44b7-b4a3-d831b596e6d6@github.com> Message-ID: On Wed, 8 May 2024 08:18:30 GMT, Roman Kennke wrote: >> Also right now the `length_is_8aligned` very much now just look like a `!oopDesc::has_klass_gap()` check. >> >> The validity of choosing to start from the klass when `!length_is_8aligned` is that we must then be using `UseCompressedClassPointers /* && !UseCompactObjectHeaders for Lilliput */`. Until Lilliput starting from the Klass is valid for both `+UseCompressedClassPointers` and `-UseCompressedClassPointers` as the klass, length and base will always be tightly packed (no padding) for byte and char type arrays. >> >> In all modes what we effectively do is `int start_offset = align_down(length_offset, BytesPerWord )`. Not sure if the intent gets clearer then. Something along the lines of: >> Suggestion: >> >> // When the length offset is not aligned to 8 bytes, >> // then we align it down, this is valid as the new >> // offset will always be the klass which is the same >> // for type arrays. >> int start_offset = align_down(length_offset, BytesPerWord); >> int extra_length = base_offset - start_offset; >> assert(start_offset == length_offset || start_offset == klass_offset, >> "start offset must be 8-byte-aligned or be the klass offset"); >> assert(base_offset != start_offset, "must include the length field"); > > Right, not assuming layout of Klass* seems saner. > BTW, including the Klass* is not only safe because it's only char[] or byte[], it's also already guaranteed to be the same by the calling APIs. And even if it were not (arraycopy-style) - if the Klass* could be different, then array-equality would have to check this, also, anyway. Maybe we are talking about different things, I am also unsure which method is actually intrinsified here. If it actually is `java.util.Arrays.equals(...)` then it does have the property that it does not really care about the klass. (But the method overloading ensures that you only compare the same type array object with another of the object of the same type). class Main { public static void main(String args[]) { final int size = 10; Integer iArray[] = new Integer[size]; Object oArray[] = new Object[size]; System.out.println(iArray.equals(oArray)); // Prints "false" System.out.println(oArray.equals(iArray)); // Prints "false" System.out.println(java.util.Arrays.equals(iArray, oArray)); // Prints "true" System.out.println(iArray.getClass() == oArray.getClass()); // Prints "false" } }; ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18948#discussion_r1594020876 From fyang at openjdk.org Wed May 8 13:24:55 2024 From: fyang at openjdk.org (Fei Yang) Date: Wed, 8 May 2024 13:24:55 GMT Subject: RFR: 8322753: RISC-V: C2 ReverseBytesV [v5] In-Reply-To: References: Message-ID: On Wed, 8 May 2024 12:52:10 GMT, Hamlin Li wrote: >> Hi, >> Can you review this patch to add ReverseBytesV intrinsic? >> Thanks. > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > minor Thanks for the quick update. LGTM. src/hotspot/cpu/riscv/riscv_v.ad line 3771: > 3769: __ vrev8_v(as_VectorRegister($dst$$reg), as_VectorRegister($src$$reg), Assembler::v0_t); > 3770: %} > 3771: ins_pipe( pipe_slow ); Nit: Better to remove redundant the spaces around pipe_slow. src/hotspot/cpu/riscv/riscv_v.ad line 3783: > 3781: __ vrev8_v(as_VectorRegister($dst$$reg), as_VectorRegister($src$$reg)); > 3782: %} > 3783: ins_pipe( pipe_slow ); Similar here. ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19120#pullrequestreview-2045642522 PR Review Comment: https://git.openjdk.org/jdk/pull/19120#discussion_r1594021889 PR Review Comment: https://git.openjdk.org/jdk/pull/19120#discussion_r1594022483 From asmehra at openjdk.org Wed May 8 13:27:05 2024 From: asmehra at openjdk.org (Ashutosh Mehra) Date: Wed, 8 May 2024 13:27:05 GMT Subject: RFR: 8330275: Crash in XMark::follow_array [v4] In-Reply-To: References: Message-ID: > This PR addresses the issue in ZGC where the number of address offset bits can go beyond the limit imposed by the encoding scheme in mark stack, thereby causing the encoding to fail. > Encoding of partial array offset in mark stack requires that the address offset be no more than 44 bits. But the current mechanism to probe maximum address offset bits on aarch64, riscv and ppc platforms can return value larger that 44 bits. This patch sets the maximum address offset bits to 44. > > I have updated the generational mode to avoid subtracting 3 bits from the maximum address offset bit probed by the system, as the generational mode does not use multi-mapping. > > I have also updated the code to set MarkPartialArrayMinSizeShift dynamically depending on the number of address offset bits used. This would avoid running into such problem again if in future maximum address offset bits is increased beyond 44. > > For some reason (that I can't comprehend from the code) the existing implementation for probing the max addressable bit for ppc in non-generation ZGC is very different from other platforms and from generational mode as well. I have kept the existing implementation as is and just fixed it to ensure it does not return value greater than 44 bits. > > Testing: test/hotspot/jtreg/gc/z and test/hotspot/jtreg/gc/x on x86 Ashutosh Mehra has updated the pull request incrementally with two additional commits since the last revision: - Set max addressable bit for zgc to 46 on aarch64 Signed-off-by: Ashutosh Mehra - Revert all previous changes Signed-off-by: Ashutosh Mehra ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18941/files - new: https://git.openjdk.org/jdk/pull/18941/files/4888ce19..198c8b41 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18941&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18941&range=02-03 Stats: 42 lines in 10 files changed: 12 ins; 16 del; 14 mod Patch: https://git.openjdk.org/jdk/pull/18941.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18941/head:pull/18941 PR: https://git.openjdk.org/jdk/pull/18941 From mli at openjdk.org Wed May 8 13:29:18 2024 From: mli at openjdk.org (Hamlin Li) Date: Wed, 8 May 2024 13:29:18 GMT Subject: RFR: 8322753: RISC-V: C2 ReverseBytesV [v6] In-Reply-To: References: Message-ID: <0CfgzUKbuhVHe1V1v03tuRv1VaYvEkWYSyuJAW7oiCk=.5487fa17-7163-4d12-aa6f-6c4bfe45373b@github.com> > Hi, > Can you review this patch to add ReverseBytesV intrinsic? > Thanks. Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: space ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19120/files - new: https://git.openjdk.org/jdk/pull/19120/files/addb1441..ad9c077d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19120&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19120&range=04-05 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/19120.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19120/head:pull/19120 PR: https://git.openjdk.org/jdk/pull/19120 From mli at openjdk.org Wed May 8 13:29:18 2024 From: mli at openjdk.org (Hamlin Li) Date: Wed, 8 May 2024 13:29:18 GMT Subject: RFR: 8322753: RISC-V: C2 ReverseBytesV [v5] In-Reply-To: References: Message-ID: <-AvA91wTgy5-GV8K7BhZy0gf_t6rnRqpwxyO_a-xhTo=.d3a33ca6-73a8-4305-a635-040b05a645c8@github.com> On Wed, 8 May 2024 13:19:43 GMT, Fei Yang wrote: >> Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: >> >> minor > > src/hotspot/cpu/riscv/riscv_v.ad line 3771: > >> 3769: __ vrev8_v(as_VectorRegister($dst$$reg), as_VectorRegister($src$$reg), Assembler::v0_t); >> 3770: %} >> 3771: ins_pipe( pipe_slow ); > > Nit: Better to remove redundant the spaces around pipe_slow. done! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19120#discussion_r1594030510 From asmehra at openjdk.org Wed May 8 13:29:55 2024 From: asmehra at openjdk.org (Ashutosh Mehra) Date: Wed, 8 May 2024 13:29:55 GMT Subject: RFR: 8330275: Crash in XMark::follow_array [v3] In-Reply-To: References: Message-ID: On Thu, 25 Apr 2024 14:28:47 GMT, Ashutosh Mehra wrote: >> This PR addresses the issue in ZGC where the number of address offset bits can go beyond the limit imposed by the encoding scheme in mark stack, thereby causing the encoding to fail. >> Encoding of partial array offset in mark stack requires that the address offset be no more than 44 bits. But the current mechanism to probe maximum address offset bits on aarch64, riscv and ppc platforms can return value larger that 44 bits. This patch sets the maximum address offset bits to 44. >> >> I have updated the generational mode to avoid subtracting 3 bits from the maximum address offset bit probed by the system, as the generational mode does not use multi-mapping. >> >> I have also updated the code to set MarkPartialArrayMinSizeShift dynamically depending on the number of address offset bits used. This would avoid running into such problem again if in future maximum address offset bits is increased beyond 44. >> >> For some reason (that I can't comprehend from the code) the existing implementation for probing the max addressable bit for ppc in non-generation ZGC is very different from other platforms and from generational mode as well. I have kept the existing implementation as is and just fixed it to ensure it does not return value greater than 44 bits. >> >> Testing: test/hotspot/jtreg/gc/z and test/hotspot/jtreg/gc/x on x86 > > Ashutosh Mehra has updated the pull request incrementally with one additional commit since the last revision: > > Fix typos > > Signed-off-by: Ashutosh Mehra Sorry for the long absence on this PR. I have updated the PR to just do a point fix for aarch64. I have also done tier1, tier2 and tier3 tests on aarch64. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18941#issuecomment-2100575661 From stefank at openjdk.org Wed May 8 13:38:54 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Wed, 8 May 2024 13:38:54 GMT Subject: RFR: 8330275: Crash in XMark::follow_array [v4] In-Reply-To: References: Message-ID: On Wed, 8 May 2024 13:27:05 GMT, Ashutosh Mehra wrote: >> This PR addresses the issue in ZGC where the number of address offset bits can go beyond the limit imposed by the encoding scheme in mark stack, thereby causing the encoding to fail. >> Encoding of partial array offset in mark stack requires that the address offset be no more than 44 bits. But the current mechanism to probe maximum address offset bits on aarch64, riscv and ppc platforms can return value larger that 44 bits. This patch sets the maximum address offset bits to 44. >> >> I have updated the generational mode to avoid subtracting 3 bits from the maximum address offset bit probed by the system, as the generational mode does not use multi-mapping. >> >> I have also updated the code to set MarkPartialArrayMinSizeShift dynamically depending on the number of address offset bits used. This would avoid running into such problem again if in future maximum address offset bits is increased beyond 44. >> >> For some reason (that I can't comprehend from the code) the existing implementation for probing the max addressable bit for ppc in non-generation ZGC is very different from other platforms and from generational mode as well. I have kept the existing implementation as is and just fixed it to ensure it does not return value greater than 44 bits. >> >> Testing: test/hotspot/jtreg/gc/z and test/hotspot/jtreg/gc/x on x86 > > Ashutosh Mehra has updated the pull request incrementally with two additional commits since the last revision: > > - Set max addressable bit for zgc to 46 on aarch64 > > Signed-off-by: Ashutosh Mehra > - Revert all previous changes > > Signed-off-by: Ashutosh Mehra Looks good. FWIW, the 128TB and 64GB numbers are just confusing when we are talking about a bit position. If the 46th bit succeeds the usable address range is 128TB, and the 46th bit will account for 64TB out of those 128TB. I wouldn't mind at all if we just ripped out the mentions of 64TB and 64GB here. ------------- Marked as reviewed by stefank (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18941#pullrequestreview-2045683548 From aboldtch at openjdk.org Wed May 8 13:48:54 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Wed, 8 May 2024 13:48:54 GMT Subject: RFR: 8331098: [Aarch64] Fix crash in Arrays.equals() intrinsic with -CCP [v8] In-Reply-To: References: <_HzINQ0atD5BmBbIZ6A4A5y1wNvwsvrBxAiaz2Mk9rY=.43cde0ae-1179-4708-afa1-fda64039d722@github.com> Message-ID: On Wed, 8 May 2024 09:18:07 GMT, Roman Kennke wrote: >> The implementations of Arrays.equals() in macroAssembler_aarch64.cpp, MacroAssembler::arrays_equals() assumes that the start of arrays is 8-byte-aligned. Since [JDK-8139457](https://bugs.openjdk.org/browse/JDK-8139457) this is no longer the case, at least when running with -CompressedClassPointers (or Lilliput). The effect is that the loops may run over the array end, and if the array is at heap boundary, and that memory is unmapped, then it may crash. >> >> The proposed fix aims to always enter the main loop(s) with an aligned address: >> - When the array base is 8-byte-aligned (default, with +CCP), then compare the array lengths separately, then enter the main loop with the array base. >> - When the array base is not 8-byte-aligned (-CCP and Lilliput), then enter the loop with the address of the array-length (which is then 8-byte-aligned), and compare array lengths in the main loop, and elide the explicit array lengths comparison. >> >> Testing: >> - [x] tier1 (+CCP) >> - [x] tier1 (-CCP) >> - [x] tier2 (+CCP) >> - [x] tier2 (-CCP) > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Update src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp > > Co-authored-by: Andrew Haley Looks good to me. The ` // Increase loop counter by size of length field.` comments are not exactly true, unsure what to write instead. ------------- Marked as reviewed by aboldtch (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18948#pullrequestreview-2045716594 From asmehra at openjdk.org Wed May 8 14:17:13 2024 From: asmehra at openjdk.org (Ashutosh Mehra) Date: Wed, 8 May 2024 14:17:13 GMT Subject: RFR: 8330275: Crash in XMark::follow_array [v4] In-Reply-To: References: Message-ID: <92d6LN6MBklY680U1wzWkw8NzTc4WlLQ4C9TsuzqrY0=.33bcc5af-6d55-4456-88d8-37fe27866c30@github.com> On Wed, 8 May 2024 13:36:11 GMT, Stefan Karlsson wrote: > I wouldn't mind at all if we just ripped out the mentions of 64TB and 64GB here. Done ------------- PR Comment: https://git.openjdk.org/jdk/pull/18941#issuecomment-2100686076 From asmehra at openjdk.org Wed May 8 14:17:13 2024 From: asmehra at openjdk.org (Ashutosh Mehra) Date: Wed, 8 May 2024 14:17:13 GMT Subject: RFR: 8330275: Crash in XMark::follow_array [v5] In-Reply-To: References: Message-ID: > This PR addresses the issue in ZGC where the number of address offset bits can go beyond the limit imposed by the encoding scheme in mark stack, thereby causing the encoding to fail. > Encoding of partial array offset in mark stack requires that the max address bit be no more than 46 bit. ~~But the current mechanism to probe maximum address offset bits on aarch64, riscv and ppc platforms can return value larger that 44 bits. This patch sets the maximum address offset bits to 44.~~ > > ~~I have updated the generational mode to avoid subtracting 3 bits from the maximum address offset bit probed by the system, as the generational mode does not use multi-mapping.~~ > > ~~I have also updated the code to set MarkPartialArrayMinSizeShift dynamically depending on the number of address offset bits used. This would avoid running into such problem again if in future maximum address offset bits is increased beyond 44.~~ > > ~~For some reason (that I can't comprehend from the code) the existing implementation for probing the max addressable bit for ppc in non-generation ZGC is very different from other platforms and from generational mode as well. I have kept the existing implementation as is and just fixed it to ensure it does not return value greater than 44 bits.~~ > > Testing: ~~test/hotspot/jtreg/gc/z and test/hotspot/jtreg/gc/x on x86~~ tier1, tier2 and tier3 on aarch64 using fastdebug build with options JTREG="EXTRA_PROBLEM_LISTS=ProblemList-zgc.txt;JAVA_OPTIONS=-XX:+UseZGC -XX:+ZVerifyOops;JOBS=4" (as per the suggestion in [JDK-8330275](https://bugs.openjdk.org/browse/JDK-8330275?focusedId=14667864&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14667864)) > > Update: Striked out the changes that are not relevant now that it is only doing a point fix for aarch64 Ashutosh Mehra has updated the pull request incrementally with one additional commit since the last revision: Remove the confusing comment around addressable memory limit Signed-off-by: Ashutosh Mehra ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18941/files - new: https://git.openjdk.org/jdk/pull/18941/files/198c8b41..1529968d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18941&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18941&range=03-04 Stats: 6 lines in 2 files changed: 0 ins; 4 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/18941.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18941/head:pull/18941 PR: https://git.openjdk.org/jdk/pull/18941 From fyang at openjdk.org Wed May 8 14:19:53 2024 From: fyang at openjdk.org (Fei Yang) Date: Wed, 8 May 2024 14:19:53 GMT Subject: RFR: 8326306: RISC-V: Re-structure MASM calls and jumps [v8] In-Reply-To: References: Message-ID: <7Z672Zbnt7qtcn9vyRvUmEq1mMkxZ0NqzKERxFdeCp4=.c0dd28f7-5f38-4b77-ac8f-e64cf1213314@github.com> On Wed, 8 May 2024 12:41:46 GMT, Robbin Ehn wrote: > But I can revert to la+jalr if you feel strongly about it? Yeah, I would suggest we revert to the original la+jalr for this PR and think more about it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18942#discussion_r1594113508 From stefank at openjdk.org Wed May 8 14:22:55 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Wed, 8 May 2024 14:22:55 GMT Subject: RFR: 8330275: Crash in XMark::follow_array [v5] In-Reply-To: References: Message-ID: <4OWcxO6wrvaJbdiX4yVEiqv_cve3dwU-bYg-LdCAjQ8=.d5e6375b-f756-4d99-9001-5932d6e5adcb@github.com> On Wed, 8 May 2024 14:17:13 GMT, Ashutosh Mehra wrote: >> This PR addresses the issue in ZGC where the number of address offset bits can go beyond the limit imposed by the encoding scheme in mark stack, thereby causing the encoding to fail. >> Encoding of partial array offset in mark stack requires that the max address bit be no more than 46 bit. ~~But the current mechanism to probe maximum address offset bits on aarch64, riscv and ppc platforms can return value larger that 44 bits. This patch sets the maximum address offset bits to 44.~~ >> >> ~~I have updated the generational mode to avoid subtracting 3 bits from the maximum address offset bit probed by the system, as the generational mode does not use multi-mapping.~~ >> >> ~~I have also updated the code to set MarkPartialArrayMinSizeShift dynamically depending on the number of address offset bits used. This would avoid running into such problem again if in future maximum address offset bits is increased beyond 44.~~ >> >> ~~For some reason (that I can't comprehend from the code) the existing implementation for probing the max addressable bit for ppc in non-generation ZGC is very different from other platforms and from generational mode as well. I have kept the existing implementation as is and just fixed it to ensure it does not return value greater than 44 bits.~~ >> >> Testing: ~~test/hotspot/jtreg/gc/z and test/hotspot/jtreg/gc/x on x86~~ tier1, tier2 and tier3 on aarch64 using fastdebug build with options JTREG="EXTRA_PROBLEM_LISTS=ProblemList-zgc.txt;JAVA_OPTIONS=-XX:+UseZGC -XX:+ZVerifyOops;JOBS=4" (as per the suggestion in [JDK-8330275](https://bugs.openjdk.org/browse/JDK-8330275?focusedId=14667864&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14667864)) >> >> Update: Striked out the changes that are not relevant now that it is only doing a point fix for aarch64 > > Ashutosh Mehra has updated the pull request incrementally with one additional commit since the last revision: > > Remove the confusing comment around addressable memory limit > > Signed-off-by: Ashutosh Mehra Marked as reviewed by stefank (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/18941#pullrequestreview-2045819607 From asmehra at openjdk.org Wed May 8 14:39:53 2024 From: asmehra at openjdk.org (Ashutosh Mehra) Date: Wed, 8 May 2024 14:39:53 GMT Subject: RFR: 8330275: Crash in XMark::follow_array [v3] In-Reply-To: References: Message-ID: On Sat, 27 Apr 2024 05:07:08 GMT, Thomas Stuefe wrote: >> I agree from the point of view of backporting, point-fix is all we need in this PR. >> >> @tstuefe As for the other platforms (riscv and ppc), looking at their code they seem to be broken in the same way as aarch64 but then the problem only happens if the user runs with > 1TB heap size with more than 48 addressable bits. >> Again, in the spirit of "do not touch if it is not broken", I am fine if we restrict the change to just aarch64. >> >> @tstuefe @stefank please let me know if you agree with just doing the point-fix to aarch64. > >> I agree from the point of view of backporting, point-fix is all we need in this PR. >> >> @tstuefe As for the other platforms (riscv and ppc), looking at their code they seem to be broken in the same way as aarch64 but then the problem only happens if the user runs with > 1TB heap size with more than 48 addressable bits. Again, in the spirit of "do not touch if it is not broken", I am fine if we restrict the change to just aarch64. >> >> @tstuefe @stefank please let me know if you agree with just doing the point-fix to aarch64. > > Absolutely. We can do any platform testing on other platforms and cleanups in subsequent RFEs. @tstuefe does this look ok? ------------- PR Comment: https://git.openjdk.org/jdk/pull/18941#issuecomment-2100741953 From rehn at openjdk.org Wed May 8 14:54:55 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Wed, 8 May 2024 14:54:55 GMT Subject: RFR: 8326306: RISC-V: Re-structure MASM calls and jumps [v8] In-Reply-To: <7Z672Zbnt7qtcn9vyRvUmEq1mMkxZ0NqzKERxFdeCp4=.c0dd28f7-5f38-4b77-ac8f-e64cf1213314@github.com> References: <7Z672Zbnt7qtcn9vyRvUmEq1mMkxZ0NqzKERxFdeCp4=.c0dd28f7-5f38-4b77-ac8f-e64cf1213314@github.com> Message-ID: <9aD8mbstnaqvPXz1cXZ-F1sRSxhlSRYtDOmq7-kSvzs=.a8c81b7e-9460-4a33-ab3c-16a006d7a7bf@github.com> On Wed, 8 May 2024 14:17:10 GMT, Fei Yang wrote: >> As you turn off "fast JNI" with those option I think it's fine to use rt_call, which would do movptr in this case. >> If you wanted fast JNI you shouldn't turn it off. >> >> But I can revert to la+jalr if you feel strongly about it? > >> But I can revert to la+jalr if you feel strongly about it? > > Yeah, I would suggest we revert to the original la+jalr to keep this PR simple and think more about it. > > (And maybe remove the check for li in `call` at the same time?) Yes, let's finish this PR and make other wanted changes in other PR's :) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18942#discussion_r1594178763 From rehn at openjdk.org Wed May 8 14:58:08 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Wed, 8 May 2024 14:58:08 GMT Subject: RFR: 8326306: RISC-V: Re-structure MASM calls and jumps [v9] In-Reply-To: References: Message-ID: <8eEHoZJlnp0yoT1RxXMgiNdw3kSRej1SXa3_vbXbZTE=.59dc8086-cc8d-4690-ba6d-5270a3458d57@github.com> > Hi, please consider. > > We have code that directly use the asm for call/jumps instead masm. > Our masm have a bit odd naming, and we don't use 'proper' pseudoinstructions/mnemonics. > Suggested by [riscv-asm-manual](https://github.com/riscv-non-isa/riscv-asm-manual/tree/master) > > j offset jal x0, offset Jump > jal offset jal x1, offset Jump and link > jr rs jalr x0, rs, 0 Jump register > jalr rs jalr x1, rs, 0 Jump and link register > ret jalr x0, x1, 0 Return from subroutine > call offset auipc x1, offset[31:12]; jalr x1, x1, offset[11:0] Call far-away subroutine > tail offset auipc x6, offset[31:12]; jalr x0, x6, offset[11:0] Tail call far-away subroutine > > But these can only be implemented like this if you have small enough application. > The fallback of these is to use GOT (your C compiler should place a copy of GOT every 2G so it's always reachable). > We don't have GOT, instead we materialize, so there is still differences between these and ours. > > This patch: > - Tries to follow these suggested mappings as good we can. > - Make sure all jumps/calls go through MASM. (so we get control and can easily change for sites using a certain calling convention) > - To avoid confusion between MASM public/private methods and ASM methods and the mnemonics there are some renaming. > E.g. the mnemonics jal means call offset, as we can't use that so there is no 'jal'. > - I enabled c.j, but right now we never generate it. > - As always the macro does no good and are legacy from when code base did not use templates. (also the x-macros screws up my IDE (vim+rtags)) > > I started down this path due to I have followup patch on top of this which removes trampoline in favor for load-n-jump. > (WIP: https://github.com/robehn/jdk/compare/jal-fixes...robehn:jdk:load-n-link?expand=1) > While looking into our calls it was a bit confusing, this helps. > > Done a couple of t1-3 slightly different version of this patch, and as part of the followup, no issues found. (VF2, qemu, LP4) > Re-running tests, had some last minute changes. > > Thanks, Robbin Robbin Ehn has updated the pull request incrementally with one additional commit since the last revision: Revert JNI field, call()->li() ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18942/files - new: https://git.openjdk.org/jdk/pull/18942/files/8408c027..d08afa51 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18942&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18942&range=07-08 Stats: 9 lines in 3 files changed: 2 ins; 4 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/18942.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18942/head:pull/18942 PR: https://git.openjdk.org/jdk/pull/18942 From rehn at openjdk.org Wed May 8 15:07:09 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Wed, 8 May 2024 15:07:09 GMT Subject: RFR: 8326306: RISC-V: Re-structure MASM calls and jumps [v10] In-Reply-To: References: Message-ID: > Hi, please consider. > > We have code that directly use the asm for call/jumps instead masm. > Our masm have a bit odd naming, and we don't use 'proper' pseudoinstructions/mnemonics. > Suggested by [riscv-asm-manual](https://github.com/riscv-non-isa/riscv-asm-manual/tree/master) > > j offset jal x0, offset Jump > jal offset jal x1, offset Jump and link > jr rs jalr x0, rs, 0 Jump register > jalr rs jalr x1, rs, 0 Jump and link register > ret jalr x0, x1, 0 Return from subroutine > call offset auipc x1, offset[31:12]; jalr x1, x1, offset[11:0] Call far-away subroutine > tail offset auipc x6, offset[31:12]; jalr x0, x6, offset[11:0] Tail call far-away subroutine > > But these can only be implemented like this if you have small enough application. > The fallback of these is to use GOT (your C compiler should place a copy of GOT every 2G so it's always reachable). > We don't have GOT, instead we materialize, so there is still differences between these and ours. > > This patch: > - Tries to follow these suggested mappings as good we can. > - Make sure all jumps/calls go through MASM. (so we get control and can easily change for sites using a certain calling convention) > - To avoid confusion between MASM public/private methods and ASM methods and the mnemonics there are some renaming. > E.g. the mnemonics jal means call offset, as we can't use that so there is no 'jal'. > - I enabled c.j, but right now we never generate it. > - As always the macro does no good and are legacy from when code base did not use templates. (also the x-macros screws up my IDE (vim+rtags)) > > I started down this path due to I have followup patch on top of this which removes trampoline in favor for load-n-jump. > (WIP: https://github.com/robehn/jdk/compare/jal-fixes...robehn:jdk:load-n-link?expand=1) > While looking into our calls it was a bit confusing, this helps. > > Done a couple of t1-3 slightly different version of this patch, and as part of the followup, no issues found. (VF2, qemu, LP4) > Re-running tests, had some last minute changes. > > Thanks, Robbin Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 12 additional commits since the last revision: - Merge branch 'master' into jal-fixes - Revert JNI field, call()->li() - Use li instead of movptr for call - REVERT: Use li instead of movptr - Use li instead of movptr - VM leaf should use li - Merge branch 'master' into jal-fixes - Merge branch 'master' into jal-fixes - Merge branch 'master' into jal-fixes - Corrected method name - ... and 2 more: https://git.openjdk.org/jdk/compare/059cc8dc...d53e9694 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18942/files - new: https://git.openjdk.org/jdk/pull/18942/files/d08afa51..d53e9694 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18942&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18942&range=08-09 Stats: 2368 lines in 157 files changed: 1554 ins; 392 del; 422 mod Patch: https://git.openjdk.org/jdk/pull/18942.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18942/head:pull/18942 PR: https://git.openjdk.org/jdk/pull/18942 From stuefe at openjdk.org Wed May 8 15:46:59 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 8 May 2024 15:46:59 GMT Subject: RFR: 8330275: Crash in XMark::follow_array [v5] In-Reply-To: References: Message-ID: On Wed, 8 May 2024 14:17:13 GMT, Ashutosh Mehra wrote: >> This PR addresses the issue in ZGC where the number of address offset bits can go beyond the limit imposed by the encoding scheme in mark stack, thereby causing the encoding to fail. >> Encoding of partial array offset in mark stack requires that the max address bit be no more than 46 bit. ~~But the current mechanism to probe maximum address offset bits on aarch64, riscv and ppc platforms can return value larger that 44 bits. This patch sets the maximum address offset bits to 44.~~ >> >> ~~I have updated the generational mode to avoid subtracting 3 bits from the maximum address offset bit probed by the system, as the generational mode does not use multi-mapping.~~ >> >> ~~I have also updated the code to set MarkPartialArrayMinSizeShift dynamically depending on the number of address offset bits used. This would avoid running into such problem again if in future maximum address offset bits is increased beyond 44.~~ >> >> ~~For some reason (that I can't comprehend from the code) the existing implementation for probing the max addressable bit for ppc in non-generation ZGC is very different from other platforms and from generational mode as well. I have kept the existing implementation as is and just fixed it to ensure it does not return value greater than 44 bits.~~ >> >> Testing: ~~test/hotspot/jtreg/gc/z and test/hotspot/jtreg/gc/x on x86~~ tier1, tier2 and tier3 on aarch64 using fastdebug build with options JTREG="EXTRA_PROBLEM_LISTS=ProblemList-zgc.txt;JAVA_OPTIONS=-XX:+UseZGC -XX:+ZVerifyOops;JOBS=4" (as per the suggestion in [JDK-8330275](https://bugs.openjdk.org/browse/JDK-8330275?focusedId=14667864&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14667864)) >> >> Update: Striked out the changes that are not relevant now that it is only doing a point fix for aarch64 > > Ashutosh Mehra has updated the pull request incrementally with one additional commit since the last revision: > > Remove the confusing comment around addressable memory limit > > Signed-off-by: Ashutosh Mehra Looks good, question inline. src/hotspot/cpu/aarch64/gc/x/xGlobals_aarch64.cpp line 147: > 145: // Default value if probing is not implemented for a certain platform: 128TB > 146: static const size_t DEFAULT_MAX_ADDRESS_BIT = 47; > 147: // Minimum value returned, if probing fails: 64GB any reason you removed the comment for MINIMUM_MAX_ADDRESS_BIT? ------------- Marked as reviewed by stuefe (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18941#pullrequestreview-2046034124 PR Review Comment: https://git.openjdk.org/jdk/pull/18941#discussion_r1594252812 From rkennke at openjdk.org Wed May 8 15:50:20 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Wed, 8 May 2024 15:50:20 GMT Subject: RFR: 8331098: [Aarch64] Fix crash in Arrays.equals() intrinsic with -CCP [v9] In-Reply-To: <_HzINQ0atD5BmBbIZ6A4A5y1wNvwsvrBxAiaz2Mk9rY=.43cde0ae-1179-4708-afa1-fda64039d722@github.com> References: <_HzINQ0atD5BmBbIZ6A4A5y1wNvwsvrBxAiaz2Mk9rY=.43cde0ae-1179-4708-afa1-fda64039d722@github.com> Message-ID: > The implementations of Arrays.equals() in macroAssembler_aarch64.cpp, MacroAssembler::arrays_equals() assumes that the start of arrays is 8-byte-aligned. Since [JDK-8139457](https://bugs.openjdk.org/browse/JDK-8139457) this is no longer the case, at least when running with -CompressedClassPointers (or Lilliput). The effect is that the loops may run over the array end, and if the array is at heap boundary, and that memory is unmapped, then it may crash. > > The proposed fix aims to always enter the main loop(s) with an aligned address: > - When the array base is 8-byte-aligned (default, with +CCP), then compare the array lengths separately, then enter the main loop with the array base. > - When the array base is not 8-byte-aligned (-CCP and Lilliput), then enter the loop with the address of the array-length (which is then 8-byte-aligned), and compare array lengths in the main loop, and elide the explicit array lengths comparison. > > Testing: > - [x] tier1 (+CCP) > - [x] tier1 (-CCP) > - [x] tier2 (+CCP) > - [x] tier2 (-CCP) Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Improve comment about extra-length ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18948/files - new: https://git.openjdk.org/jdk/pull/18948/files/84f9a933..8f7fd92d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18948&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18948&range=07-08 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/18948.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18948/head:pull/18948 PR: https://git.openjdk.org/jdk/pull/18948 From asmehra at openjdk.org Wed May 8 16:34:07 2024 From: asmehra at openjdk.org (Ashutosh Mehra) Date: Wed, 8 May 2024 16:34:07 GMT Subject: RFR: 8330275: Crash in XMark::follow_array [v6] In-Reply-To: References: Message-ID: > This PR addresses the issue in ZGC where the number of address offset bits can go beyond the limit imposed by the encoding scheme in mark stack, thereby causing the encoding to fail. > Encoding of partial array offset in mark stack requires that the max address bit be no more than 46 bit. ~~But the current mechanism to probe maximum address offset bits on aarch64, riscv and ppc platforms can return value larger that 44 bits. This patch sets the maximum address offset bits to 44.~~ > > ~~I have updated the generational mode to avoid subtracting 3 bits from the maximum address offset bit probed by the system, as the generational mode does not use multi-mapping.~~ > > ~~I have also updated the code to set MarkPartialArrayMinSizeShift dynamically depending on the number of address offset bits used. This would avoid running into such problem again if in future maximum address offset bits is increased beyond 44.~~ > > ~~For some reason (that I can't comprehend from the code) the existing implementation for probing the max addressable bit for ppc in non-generation ZGC is very different from other platforms and from generational mode as well. I have kept the existing implementation as is and just fixed it to ensure it does not return value greater than 44 bits.~~ > > Testing: ~~test/hotspot/jtreg/gc/z and test/hotspot/jtreg/gc/x on x86~~ tier1, tier2 and tier3 on aarch64 using fastdebug build with options JTREG="EXTRA_PROBLEM_LISTS=ProblemList-zgc.txt;JAVA_OPTIONS=-XX:+UseZGC -XX:+ZVerifyOops;JOBS=4" (as per the suggestion in [JDK-8330275](https://bugs.openjdk.org/browse/JDK-8330275?focusedId=14667864&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14667864)) > > Update: Striked out the changes that are not relevant now that it is only doing a point fix for aarch64 Ashutosh Mehra has updated the pull request incrementally with one additional commit since the last revision: Restore the comment around max addressable memory but leave out actual numbers that can be confusing Signed-off-by: Ashutosh Mehra ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18941/files - new: https://git.openjdk.org/jdk/pull/18941/files/1529968d..eae43d0b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18941&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18941&range=04-05 Stats: 4 lines in 2 files changed: 4 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/18941.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18941/head:pull/18941 PR: https://git.openjdk.org/jdk/pull/18941 From asmehra at openjdk.org Wed May 8 16:34:07 2024 From: asmehra at openjdk.org (Ashutosh Mehra) Date: Wed, 8 May 2024 16:34:07 GMT Subject: RFR: 8330275: Crash in XMark::follow_array [v6] In-Reply-To: References: Message-ID: On Wed, 8 May 2024 15:43:37 GMT, Thomas Stuefe wrote: >> Ashutosh Mehra has updated the pull request incrementally with one additional commit since the last revision: >> >> Restore the comment around max addressable memory but leave out actual numbers that can be confusing >> >> Signed-off-by: Ashutosh Mehra > > src/hotspot/cpu/aarch64/gc/x/xGlobals_aarch64.cpp line 147: > >> 145: // Default value if probing is not implemented for a certain platform: 128TB >> 146: static const size_t DEFAULT_MAX_ADDRESS_BIT = 47; >> 147: // Minimum value returned, if probing fails: 64GB > > any reason you removed the comment for MINIMUM_MAX_ADDRESS_BIT? oh! I think I misunderstood stefank's suggestion. I should have just removed the values 64GB and 128TB mentioned in the comment. Let me restore the rest. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18941#discussion_r1594310310 From mli at openjdk.org Wed May 8 17:20:11 2024 From: mli at openjdk.org (Hamlin Li) Date: Wed, 8 May 2024 17:20:11 GMT Subject: RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF [v5] In-Reply-To: <0cUurmXlMJ_B66Wy1umd2n4r9ve7_Q4WOU0ffMd8s5Y=.bbc93b65-382c-4139-aaec-cb835d94a06e@github.com> References: <0cUurmXlMJ_B66Wy1umd2n4r9ve7_Q4WOU0ffMd8s5Y=.bbc93b65-382c-4139-aaec-cb835d94a06e@github.com> Message-ID: > Hi, > Can you help to review the patch? > This pr is based on previous work and discussion in [pr 16234](https://github.com/openjdk/jdk/pull/16234), [pr 18294](https://github.com/openjdk/jdk/pull/18294). > > Compared with previous prs, the major change in this pr is to integrate the source of sleef (for the steps, please check `src/jdk.incubator.vector/linux/native/libvectormath/README`), rather than depends on external sleef things (header or lib) at build or run time. > Besides of this change, also modify the previous changes accordingly, e.g. remove some uncessary files or changes especially in make dir of jdk. > > Besides of the code changes, one important task is to handle the legal process. > > Thanks! > > ## Performance > NOTE: > * `Src` means implementation in this pr, i.e. without depenency on external sleef. > * `Disabled` means disable intrinsics by `-XX:-UseVectorStubs` > * `system_sleef` means implementation in [previous pr 18294](https://github.com/openjdk/jdk/pull/18294), i.e. build and run jdk with depenency on external sleef. > > Basically, the perf data below shows that > * this implementation has better performance than previous version in [pr 18294](https://github.com/openjdk/jdk/pull/18294), > * and both sleef versions has much better performance compared with non-sleef version. > > |Benchmark |(size)|Src |Units|system_sleef|(system_sleef-Src)/Src|Diabled |(Disable-Src)/Src| > |------------------------------|------|---------|-----|------------|----------------------|---------|-----------------| > |3472:Double128Vector.ACOS |1024 |8546.842 |ns/op|8516.007 |-0.004 |16799.273|0.966 | > |3473:Double128Vector.ASIN |1024 |6864.656 |ns/op|6987.328 |0.018 |16602.442|1.419 | > |3474:Double128Vector.ATAN |1024 |11489.255|ns/op|12261.800 |0.067 |26329.320|1.292 | > |3475:Double128Vector.ATAN2 |1024 |16661.170|ns/op|17234.472 |0.034 |42084.100|1.526 | > |3476:Double128Vector.CBRT |1024 |18999.387|ns/op|20298.458 |0.068 |35998.688|0.895 | > |3477:Double128Vector.COS |1024 |14081.857|ns/op|14846.117 |0.054 |24420.692|0.734 | > |3478:Double128Vector.COSH |1024 |12202.306|ns/op|12237.772 |0.003 |21343.863|0.749 | > |3479:Double128Vector.EXP |1024 |4553.108 |ns/op|4777.638 |0.049 |20155.903|3.427 | > |3480:D... Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: add inline header file for riscv64 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18605/files - new: https://git.openjdk.org/jdk/pull/18605/files/cbcd4634..bd9c0931 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18605&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18605&range=03-04 Stats: 7073 lines in 1 file changed: 7073 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/18605.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18605/head:pull/18605 PR: https://git.openjdk.org/jdk/pull/18605 From mli at openjdk.org Wed May 8 17:25:59 2024 From: mli at openjdk.org (Hamlin Li) Date: Wed, 8 May 2024 17:25:59 GMT Subject: RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF [v2] In-Reply-To: References: <0cUurmXlMJ_B66Wy1umd2n4r9ve7_Q4WOU0ffMd8s5Y=.bbc93b65-382c-4139-aaec-cb835d94a06e@github.com> Message-ID: On Tue, 9 Apr 2024 20:10:36 GMT, Mikael Vidstedt wrote: >> Hamlin Li has updated the pull request incrementally with two additional commits since the last revision: >> >> - disable unused-function warnings; add log msg >> - minor > > Thank you for the update and for working on this in general. > > I've started working on JDK-8329816, preparing the change for the SLEEF specific part of the change. Specifically, I'm currently planning on including the three SLEEF header files, the README and a legal/sleef.md file in that change. Let me know if you have any thoughts/concerns. > > Also, just for my understanding, would love to understand your thoughts on the future here (I apologize if this was already discussed elsewhere): > > It seem like SLEEF is (sort of) limited to linux at this point (the SLEEF README mentions that "Due to limited test capacities, SLEEF is currently only officially supported on Linux with gcc or llvm/clang." ). That same README does, however, indicate good test coverage on several architectures in addition to aarch64 (including x86_64, PPC, RISC-V). With that in mind, it looks like we could potentially use SLEEF for other architectures on linux in the future? And potentially additional operating systems as well? Hey @vidmik , I just added inline header file for riscv64, hope to help avoid go through the legal process for arm and riscv header files separately. For implementation on riscv64, I will put it in another pr. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18605#issuecomment-2101055673 From thomas.schatzl at oracle.com Wed May 8 17:32:35 2024 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Wed, 8 May 2024 19:32:35 +0200 Subject: RFR: 8319548: Unexpected internal name for Filler array klass causes error in VisualVM In-Reply-To: <3N7b5H6FtKT1e5pk-IDnU4GtnV1oadvj461vyBwMfRw=.0d25265e-3fa6-4923-9fc9-f9a4ba840592@github.com> References: <3N7b5H6FtKT1e5pk-IDnU4GtnV1oadvj461vyBwMfRw=.0d25265e-3fa6-4923-9fc9-f9a4ba840592@github.com> Message-ID: <17b4e39c-6c84-431b-ba7f-2a44c3c72ab0@oracle.com> (The mail probably did not get sent to the correct recipient, so it did not show up in the PR again; retry) Hi, On 08.05.24 04:21, jjscl8888 wrote: > On Fri, 3 May 2024 12:50:45 GMT, Thomas Schatzl > wrote: > > > Thank you for your previous question. I have another inquiry regarding > compiling the JDK source code. I've noticed that when I compile the > JDK without selecting specific configure parameters, the resulting JDK > size differs from the official version available on the website. I'm > curious to know which configuration parameters were used for the > official LTS (Long-Term Support) version of the JDK. The page at https://wiki.openjdk.org/display/Build/Supported+Build+Platforms contains some information about the build platforms. I will ask around if there is anything more specific than that. Otherwise the people on the build-dev openjdk mailing list may be able to answer you more appropriately. Further I would like to recommend you to join the appropriate mailing list (hotspot-gc-dev at openjdk.org or build-dev at openjdk.org) for asking questions unrelated to this long-closed PR. Hth, Thomas From mli at openjdk.org Wed May 8 17:41:23 2024 From: mli at openjdk.org (Hamlin Li) Date: Wed, 8 May 2024 17:41:23 GMT Subject: RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF [v6] In-Reply-To: <0cUurmXlMJ_B66Wy1umd2n4r9ve7_Q4WOU0ffMd8s5Y=.bbc93b65-382c-4139-aaec-cb835d94a06e@github.com> References: <0cUurmXlMJ_B66Wy1umd2n4r9ve7_Q4WOU0ffMd8s5Y=.bbc93b65-382c-4139-aaec-cb835d94a06e@github.com> Message-ID: <2VxBcA-0qxX3N35u5vnKyT920nTH5llf2k5_sKQcqT8=.23823400-536f-458e-baf7-53f99547abc4@github.com> > Hi, > Can you help to review the patch? > This pr is based on previous work and discussion in [pr 16234](https://github.com/openjdk/jdk/pull/16234), [pr 18294](https://github.com/openjdk/jdk/pull/18294). > > Compared with previous prs, the major change in this pr is to integrate the source of sleef (for the steps, please check `src/jdk.incubator.vector/linux/native/libvectormath/README`), rather than depends on external sleef things (header or lib) at build or run time. > Besides of this change, also modify the previous changes accordingly, e.g. remove some uncessary files or changes especially in make dir of jdk. > > Besides of the code changes, one important task is to handle the legal process. > > Thanks! > > ## Performance > NOTE: > * `Src` means implementation in this pr, i.e. without depenency on external sleef. > * `Disabled` means disable intrinsics by `-XX:-UseVectorStubs` > * `system_sleef` means implementation in [previous pr 18294](https://github.com/openjdk/jdk/pull/18294), i.e. build and run jdk with depenency on external sleef. > > Basically, the perf data below shows that > * this implementation has better performance than previous version in [pr 18294](https://github.com/openjdk/jdk/pull/18294), > * and both sleef versions has much better performance compared with non-sleef version. > > |Benchmark |(size)|Src |Units|system_sleef|(system_sleef-Src)/Src|Diabled |(Disable-Src)/Src| > |------------------------------|------|---------|-----|------------|----------------------|---------|-----------------| > |3472:Double128Vector.ACOS |1024 |8546.842 |ns/op|8516.007 |-0.004 |16799.273|0.966 | > |3473:Double128Vector.ASIN |1024 |6864.656 |ns/op|6987.328 |0.018 |16602.442|1.419 | > |3474:Double128Vector.ATAN |1024 |11489.255|ns/op|12261.800 |0.067 |26329.320|1.292 | > |3475:Double128Vector.ATAN2 |1024 |16661.170|ns/op|17234.472 |0.034 |42084.100|1.526 | > |3476:Double128Vector.CBRT |1024 |18999.387|ns/op|20298.458 |0.068 |35998.688|0.895 | > |3477:Double128Vector.COS |1024 |14081.857|ns/op|14846.117 |0.054 |24420.692|0.734 | > |3478:Double128Vector.COSH |1024 |12202.306|ns/op|12237.772 |0.003 |21343.863|0.749 | > |3479:Double128Vector.EXP |1024 |4553.108 |ns/op|4777.638 |0.049 |20155.903|3.427 | > |3480:D... Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: update header files for arm ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18605/files - new: https://git.openjdk.org/jdk/pull/18605/files/bd9c0931..36415c34 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18605&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18605&range=04-05 Stats: 20 lines in 2 files changed: 14 ins; 2 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/18605.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18605/head:pull/18605 PR: https://git.openjdk.org/jdk/pull/18605 From dlong at openjdk.org Wed May 8 18:23:51 2024 From: dlong at openjdk.org (Dean Long) Date: Wed, 8 May 2024 18:23:51 GMT Subject: RFR: 8329748: Change default value of AssertWXAtThreadSync to true In-Reply-To: References: Message-ID: On Wed, 8 May 2024 10:45:28 GMT, Tobias Holenstein wrote: >> `WXWrite` is needed for >> >> >> JfrIntrinsicSupport::return_lease -> >> ThreadStateTransition::transition_from_native -> >> SafepointMechanism::process_if_requested_with_exit_check -> >> SafepointMechanism::process_if_requested -> >> JavaThread::check_possible_safepoint -> >> assert_wx_state(WXWrite) > >> Normally we would want to be in the WXExec state when executing in _thread_in_native. > I agree. So we would need to aquire `WXWrite` twice just for `ThreadStateTransition::transition_from_java` and again for `ThreadStateTransition::transition_from_native`. I think its a bit unfortune that `WXWrite` is needed for the state transition.. I don't think we need WXWrite for transition_from_java. I don't know how useful AssertWXAtThreadSync is if it forces us to make unnecessary transitions. It seems to go in the opposite direction from the more lazy approaches discussed in JDK-8307817. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19102#discussion_r1594460019 From dlong at openjdk.org Wed May 8 19:45:53 2024 From: dlong at openjdk.org (Dean Long) Date: Wed, 8 May 2024 19:45:53 GMT Subject: RFR: 8314225: SIGSEGV in JavaThread::is_lock_owned [v10] In-Reply-To: References: Message-ID: On Wed, 8 May 2024 08:30:30 GMT, Kevin Walls wrote: >> Removal of JavaThread's MonitorChunks member. This held lock information during deoptimization, but access to it is unnecessary for anything other than the deoptimization itself. >> >> Access to it in is_lock_owned() was racy, and caused rare crashes. > > Kevin Walls has updated the pull request incrementally with one additional commit since the last revision: > > null nullptr oops Looks good. ------------- Marked as reviewed by dlong (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18940#pullrequestreview-2046505933 From iklam at openjdk.org Wed May 8 20:24:20 2024 From: iklam at openjdk.org (Ioi Lam) Date: Wed, 8 May 2024 20:24:20 GMT Subject: RFR: 8330532: Improve line-oriented text parsing in HotSpot [v8] In-Reply-To: References: Message-ID: > (This PR is an alternative to https://github.com/openjdk/jdk/pull/18669 with a better API for reading lines of text) > > HotSpot has a few cases where information is parsed from a file, or from a memory buffer, one line at a time. Example: > > - https://github.com/openjdk/jdk/blob/064628471b83616b4463baa78618d1b7a66d0c7c/src/hotspot/share/cds/classListParser.cpp#L169 > - https://github.com/openjdk/jdk/blob/064628471b83616b4463baa78618d1b7a66d0c7c/src/hotspot/share/compiler/compilerOracle.cpp#L1059-L1066 > > Common problems: > - They use a fixed buffer for reading a line, so long (but valid) lines will cause errors. > - There's ad-hoc code that deals with `FILE*` differently than from memory. > > This RFE implements a common utility, `inputStream`, for reading lines from different sources of input (see `FileInput` and `MemoryInput`). We fixed only `ClassListParser` and `CompilerOracle` in this RFE, but we can fix other readers in follow-up RFEs. > > The API allows other source of input to be implemented. For example, one could implement a `SocketInput` if there's a use case for it. > > In the future, `inputStream` can be extended (or encapsulated in a higher-level reader class) to read typed input tokens (for example, integers, strings, etc.) > > Credit: > The `inputStream` class and friends are contributed by @rose00 . See https://mail.openjdk.org/pipermail/hotspot-dev/2024-April/087077.html . > > John's original version is in the draft PR https://github.com/openjdk/jdk/pull/18773. In order to minimize the size of this PR, I have kept only the functionalities for reading a line and a time. Other features, such as pushing back contents into the `inputStream`, could be added in follow-up PRs. (These removed features can be found in the commit history of this PR). Ioi Lam has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 18 additional commits since the last revision: - Merge branch 'master' into 8330532-improve-line-oriented-text-parsing-in-hotspot - use 'override' instead of 'virtual' - No need to call set_input(null_ptr) from inputStream destructor - Merge branch 'master' of https://github.com/openjdk/jdk into 8330532-improve-line-oriented-text-parsing-in-hotspot - inputStream::_buffer can never be nullptr - set _buffer to _small_buffer in InputStream constructor - removed Input::close() - BlockInputStream is used by gtest only, so moved it there - removed unused set_position(), etc - removed _must_free - ... and 8 more: https://git.openjdk.org/jdk/compare/d05bf4f5...08502fe5 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18833/files - new: https://git.openjdk.org/jdk/pull/18833/files/2ddbfea9..08502fe5 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18833&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18833&range=06-07 Stats: 1712 lines in 99 files changed: 1121 ins; 336 del; 255 mod Patch: https://git.openjdk.org/jdk/pull/18833.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18833/head:pull/18833 PR: https://git.openjdk.org/jdk/pull/18833 From asmehra at openjdk.org Wed May 8 20:29:00 2024 From: asmehra at openjdk.org (Ashutosh Mehra) Date: Wed, 8 May 2024 20:29:00 GMT Subject: RFR: 8330275: Crash in XMark::follow_array [v6] In-Reply-To: References: Message-ID: On Wed, 8 May 2024 16:34:07 GMT, Ashutosh Mehra wrote: >> This PR addresses the issue in ZGC where the number of address offset bits can go beyond the limit imposed by the encoding scheme in mark stack, thereby causing the encoding to fail. >> Encoding of partial array offset in mark stack requires that the max address bit be no more than 46 bit. ~~But the current mechanism to probe maximum address offset bits on aarch64, riscv and ppc platforms can return value larger that 44 bits. This patch sets the maximum address offset bits to 44.~~ >> >> ~~I have updated the generational mode to avoid subtracting 3 bits from the maximum address offset bit probed by the system, as the generational mode does not use multi-mapping.~~ >> >> ~~I have also updated the code to set MarkPartialArrayMinSizeShift dynamically depending on the number of address offset bits used. This would avoid running into such problem again if in future maximum address offset bits is increased beyond 44.~~ >> >> ~~For some reason (that I can't comprehend from the code) the existing implementation for probing the max addressable bit for ppc in non-generation ZGC is very different from other platforms and from generational mode as well. I have kept the existing implementation as is and just fixed it to ensure it does not return value greater than 44 bits.~~ >> >> Testing: ~~test/hotspot/jtreg/gc/z and test/hotspot/jtreg/gc/x on x86~~ tier1, tier2 and tier3 on aarch64 using fastdebug build with options JTREG="EXTRA_PROBLEM_LISTS=ProblemList-zgc.txt;JAVA_OPTIONS=-XX:+UseZGC -XX:+ZVerifyOops;JOBS=4" (as per the suggestion in [JDK-8330275](https://bugs.openjdk.org/browse/JDK-8330275?focusedId=14667864&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14667864)) >> >> Update: Striked out the changes that are not relevant now that it is only doing a point fix for aarch64 > > Ashutosh Mehra has updated the pull request incrementally with one additional commit since the last revision: > > Restore the comment around max addressable memory but leave out actual numbers that can be confusing > > Signed-off-by: Ashutosh Mehra As the last commit is a trivial change to add the comments back, I am not requesting new review and integrating it as is. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18941#issuecomment-2101365823 From asmehra at openjdk.org Wed May 8 20:29:01 2024 From: asmehra at openjdk.org (Ashutosh Mehra) Date: Wed, 8 May 2024 20:29:01 GMT Subject: Integrated: 8330275: Crash in XMark::follow_array In-Reply-To: References: Message-ID: On Wed, 24 Apr 2024 20:22:52 GMT, Ashutosh Mehra wrote: > This PR addresses the issue in ZGC where the number of address offset bits can go beyond the limit imposed by the encoding scheme in mark stack, thereby causing the encoding to fail. > Encoding of partial array offset in mark stack requires that the max address bit be no more than 46 bit. ~~But the current mechanism to probe maximum address offset bits on aarch64, riscv and ppc platforms can return value larger that 44 bits. This patch sets the maximum address offset bits to 44.~~ > > ~~I have updated the generational mode to avoid subtracting 3 bits from the maximum address offset bit probed by the system, as the generational mode does not use multi-mapping.~~ > > ~~I have also updated the code to set MarkPartialArrayMinSizeShift dynamically depending on the number of address offset bits used. This would avoid running into such problem again if in future maximum address offset bits is increased beyond 44.~~ > > ~~For some reason (that I can't comprehend from the code) the existing implementation for probing the max addressable bit for ppc in non-generation ZGC is very different from other platforms and from generational mode as well. I have kept the existing implementation as is and just fixed it to ensure it does not return value greater than 44 bits.~~ > > Testing: ~~test/hotspot/jtreg/gc/z and test/hotspot/jtreg/gc/x on x86~~ tier1, tier2 and tier3 on aarch64 using fastdebug build with options JTREG="EXTRA_PROBLEM_LISTS=ProblemList-zgc.txt;JAVA_OPTIONS=-XX:+UseZGC -XX:+ZVerifyOops;JOBS=4" (as per the suggestion in [JDK-8330275](https://bugs.openjdk.org/browse/JDK-8330275?focusedId=14667864&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14667864)) > > Update: Striked out the changes that are not relevant now that it is only doing a point fix for aarch64 This pull request has now been integrated. Changeset: 42b1d858 Author: Ashutosh Mehra URL: https://git.openjdk.org/jdk/commit/42b1d858d15fd06de9ce41b08b430b12724652e9 Stats: 10 lines in 2 files changed: 4 ins; 0 del; 6 mod 8330275: Crash in XMark::follow_array Reviewed-by: stefank, stuefe ------------- PR: https://git.openjdk.org/jdk/pull/18941 From iklam at openjdk.org Wed May 8 22:28:03 2024 From: iklam at openjdk.org (Ioi Lam) Date: Wed, 8 May 2024 22:28:03 GMT Subject: RFR: 8329418: Replace pointers to tables with offsets in relocation bitmap [v2] In-Reply-To: References: Message-ID: On Tue, 7 May 2024 16:38:23 GMT, Matias Saavedra Silva wrote: >> The beginning of the RW region contains pointers to c++ vtables which are always located at a fixed offset from the shared base address at runtime. This offset can be calculated at dumptime and stored with the read-only tables at the top of the RO region. As a further improvement, all the pointers to RO tables are replaced with offsets as well. >> >> These changes will reduce the number of pointers in the RW and RO regions and will allow for the relocation bitmap size optimizations to be more effective. Verified with tier 1-5 tests. > > Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: > > Chris comments and cleanup Looks good. I will suggest making some clean up to improve the readability of the existing code. src/hotspot/share/cds/archiveUtils.cpp line 325: > 323: assert((intptr_t)obj >= 0 || (intptr_t)obj < -100, > 324: "hit tag while initializing ptrs."); > 325: *p = obj != 0 ? (void*)(SharedBaseAddress + obj) : (void*)obj; `nullptr` should be used instead of `0`. E.g., `obj != (void*)nullptr` src/hotspot/share/cds/cppVtables.cpp line 236: > 234: if (!soc->reading()) { > 235: _vtables_serialized_base = soc->region_top(); > 236: } The new `region_top()` API may be confusing with the existing `do_region()` API, which has a completely different meaning for `region`. I think it's better to rename `do_region()` to // iterate on the pointers from p[0] through p[num_pointers-1] SerializeClosure do_ptrs(void** p, int num_pointers); Also, there's no need to add a new `region_top()` API -- there are already too many functions that deal with a "region" of different types, and you need to wonder what this particular "region" is. `soc->region_top()` can be replaced with `ArchiveBuilder::current()->current_dump_space()->top()` Also, in archiveBuilder.hpp, `dump_space` means the same thing as `dump_region`. All of the former should be changed to the latter for uniformity. ------------- Changes requested by iklam (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19107#pullrequestreview-2046836828 PR Review Comment: https://git.openjdk.org/jdk/pull/19107#discussion_r1594750309 PR Review Comment: https://git.openjdk.org/jdk/pull/19107#discussion_r1594766914 From dlong at openjdk.org Thu May 9 02:38:51 2024 From: dlong at openjdk.org (Dean Long) Date: Thu, 9 May 2024 02:38:51 GMT Subject: RFR: 8329748: Change default value of AssertWXAtThreadSync to true In-Reply-To: References: Message-ID: On Mon, 6 May 2024 11:10:08 GMT, Tobias Holenstein wrote: > The debug flag `-XX:+AssertWXAtThreadSync` conservatively checks for correct W^X thread state at possible safepoints or handshake. The flag is useful to detect missing `MACOS_AARCH64_ONLY(ThreadWXEnable __wx(WXWrite, thread));`. Since the check is cheap and it is a `AARCH64_ONLY(develop(..))` only flag it makes sense to enable the flag by default. > > There was one missing `MACOS_AARCH64_ONLY(ThreadWXEnable __wx(WXWrite, thread));` to make all tests (tier1-7) pass. FWIW, I decided to look into WXExec as default (JDK-8328306), and in my draft so far I have removedAssertWXAtThreadSync completely, and I suspect that a successful implementation of exec-by-default will make JDK-8307817 no longer needed as well. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19102#issuecomment-2101823326 From duke at openjdk.org Thu May 9 03:00:17 2024 From: duke at openjdk.org (Jin Guojie) Date: Thu, 9 May 2024 03:00:17 GMT Subject: RFR: 8331558: AArch64: optimize integer remainder [v4] In-Reply-To: References: Message-ID: > 8331558: AArch64: optimize integer remainder > On some Arm processors, a separate multiply/subtract is actually faster than the combined instruction. > > 8331556: AArch64: CPU_Model support for Neoverse N1/N2/V1/V2 > Add full platform coverage for Neoverse variants in vm_version.?pp > > The following test has passed, which shows definite performance improvement. > > make test TEST="micro:java.lang.IntegerDivMod" > make test TEST="micro:java.lang.LongDivMod" > > * IntegerDivMod.testDivideRemainderUnsigned > baseline(ns/ops) 2223 > with this pacth(ns/ops) 1885 > improvement(%) 17.93% > > * IntegerDivMod.testRemainderUnsigned > baseline(ns/ops) 2225 > with this pacth(ns/ops) 1885 > improvement(%) 18.03% > > * LongDivMod.testDivideRemainderUnsigned > baseline(ns/ops) 2231 > with this pacth(ns/ops) 1894 > improvement(%) 17.79% > > * LongDivMod.testRemainderUnsigned > baseline(ns/ops) 2232 > with this pacth(ns/ops) 1891 > improvement(%) 18.03% Jin Guojie has updated the pull request incrementally with one additional commit since the last revision: refine comments Co-authored-by: Andrew Haley ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19093/files - new: https://git.openjdk.org/jdk/pull/19093/files/d8b8dbfe..85510777 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19093&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19093&range=02-03 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/19093.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19093/head:pull/19093 PR: https://git.openjdk.org/jdk/pull/19093 From duke at openjdk.org Thu May 9 03:04:22 2024 From: duke at openjdk.org (Jin Guojie) Date: Thu, 9 May 2024 03:04:22 GMT Subject: RFR: 8331558: AArch64: optimize integer remainder [v5] In-Reply-To: References: Message-ID: > 8331558: AArch64: optimize integer remainder > On some Arm processors, a separate multiply/subtract is actually faster than the combined instruction. > > 8331556: AArch64: CPU_Model support for Neoverse N1/N2/V1/V2 > Add full platform coverage for Neoverse variants in vm_version.?pp > > The following test has passed, which shows definite performance improvement. > > make test TEST="micro:java.lang.IntegerDivMod" > make test TEST="micro:java.lang.LongDivMod" > > * IntegerDivMod.testDivideRemainderUnsigned > baseline(ns/ops) 2223 > with this pacth(ns/ops) 1885 > improvement(%) 17.93% > > * IntegerDivMod.testRemainderUnsigned > baseline(ns/ops) 2225 > with this pacth(ns/ops) 1885 > improvement(%) 18.03% > > * LongDivMod.testDivideRemainderUnsigned > baseline(ns/ops) 2231 > with this pacth(ns/ops) 1894 > improvement(%) 17.79% > > * LongDivMod.testRemainderUnsigned > baseline(ns/ops) 2232 > with this pacth(ns/ops) 1891 > improvement(%) 18.03% Jin Guojie has updated the pull request incrementally with one additional commit since the last revision: refine comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19093/files - new: https://git.openjdk.org/jdk/pull/19093/files/85510777..87451f56 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19093&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19093&range=03-04 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/19093.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19093/head:pull/19093 PR: https://git.openjdk.org/jdk/pull/19093 From duke at openjdk.org Thu May 9 03:11:18 2024 From: duke at openjdk.org (Jin Guojie) Date: Thu, 9 May 2024 03:11:18 GMT Subject: RFR: 8331558: AArch64: optimize integer remainder [v6] In-Reply-To: References: Message-ID: > 8331558: AArch64: optimize integer remainder > On some Arm processors, a separate multiply/subtract is actually faster than the combined instruction. > > 8331556: AArch64: CPU_Model support for Neoverse N1/N2/V1/V2 > Add full platform coverage for Neoverse variants in vm_version.?pp > > The following test has passed, which shows definite performance improvement. > > make test TEST="micro:java.lang.IntegerDivMod" > make test TEST="micro:java.lang.LongDivMod" > > * IntegerDivMod.testDivideRemainderUnsigned > baseline(ns/ops) 2223 > with this pacth(ns/ops) 1885 > improvement(%) 17.93% > > * IntegerDivMod.testRemainderUnsigned > baseline(ns/ops) 2225 > with this pacth(ns/ops) 1885 > improvement(%) 18.03% > > * LongDivMod.testDivideRemainderUnsigned > baseline(ns/ops) 2231 > with this pacth(ns/ops) 1894 > improvement(%) 17.79% > > * LongDivMod.testRemainderUnsigned > baseline(ns/ops) 2232 > with this pacth(ns/ops) 1891 > improvement(%) 18.03% Jin Guojie has updated the pull request incrementally with one additional commit since the last revision: Refine is_neoverse_family() Co-authored-by: Andrew Haley ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19093/files - new: https://git.openjdk.org/jdk/pull/19093/files/87451f56..a4698a98 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19093&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19093&range=04-05 Stats: 7 lines in 1 file changed: 4 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/19093.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19093/head:pull/19093 PR: https://git.openjdk.org/jdk/pull/19093 From duke at openjdk.org Thu May 9 03:21:02 2024 From: duke at openjdk.org (Jin Guojie) Date: Thu, 9 May 2024 03:21:02 GMT Subject: RFR: 8331558: AArch64: optimize integer remainder [v7] In-Reply-To: References: Message-ID: > 8331558: AArch64: optimize integer remainder > On some Arm processors, a separate multiply/subtract is actually faster than the combined instruction. > > 8331556: AArch64: CPU_Model support for Neoverse N1/N2/V1/V2 > Add full platform coverage for Neoverse variants in vm_version.?pp > > The following test has passed, which shows definite performance improvement. > > make test TEST="micro:java.lang.IntegerDivMod" > make test TEST="micro:java.lang.LongDivMod" > > * IntegerDivMod.testDivideRemainderUnsigned > baseline(ns/ops) 2223 > with this pacth(ns/ops) 1885 > improvement(%) 17.93% > > * IntegerDivMod.testRemainderUnsigned > baseline(ns/ops) 2225 > with this pacth(ns/ops) 1885 > improvement(%) 18.03% > > * LongDivMod.testDivideRemainderUnsigned > baseline(ns/ops) 2231 > with this pacth(ns/ops) 1894 > improvement(%) 17.79% > > * LongDivMod.testRemainderUnsigned > baseline(ns/ops) 2232 > with this pacth(ns/ops) 1891 > improvement(%) 18.03% Jin Guojie has updated the pull request incrementally with one additional commit since the last revision: Remove unused functions ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19093/files - new: https://git.openjdk.org/jdk/pull/19093/files/a4698a98..9cbff831 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19093&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19093&range=05-06 Stats: 14 lines in 1 file changed: 0 ins; 4 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/19093.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19093/head:pull/19093 PR: https://git.openjdk.org/jdk/pull/19093 From duke at openjdk.org Thu May 9 03:24:52 2024 From: duke at openjdk.org (Jin Guojie) Date: Thu, 9 May 2024 03:24:52 GMT Subject: RFR: 8331558: AArch64: optimize integer remainder [v3] In-Reply-To: References: Message-ID: On Wed, 8 May 2024 02:45:13 GMT, Eric Liu wrote: >> Jin Guojie has updated the pull request incrementally with one additional commit since the last revision: >> >> Applicable platforms expanded to the entire neoverse family >> >> Even on the V series (V1 and V2), both sdiv/udiv and msub instructions are executed in M0 unit (Integer multi cycle). It should benefit the V series as well. > > src/hotspot/cpu/aarch64/vm_version_aarch64.hpp line 181: > >> 179: (model_is(CPU_MODEL_NEOVERSE_V1) || model_is(CPU_MODEL_NEOVERSE_V2)); >> 180: } >> 181: > > Not in used. In this commit, is_neoverse_n_series() is removed. https://github.com/openjdk/jdk/pull/19093/commits/9cbff8319ad808276665339f1313e373a81b392d However, is_neoverse_v_series() is called in vm_version_aarch64.cpp, if (is_neoverse_v_series()) { if (FLAG_IS_DEFAULT(UseCryptoPmullForCRC32)) { FLAG_SET_DEFAULT(UseCryptoPmullForCRC32, true); } } so this function remains unchanged. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19093#discussion_r1594918558 From duke at openjdk.org Thu May 9 03:29:58 2024 From: duke at openjdk.org (duke) Date: Thu, 9 May 2024 03:29:58 GMT Subject: Withdrawn: 8327522: Shenandoah: Remove unused references to satb_mark_queue_active_offset In-Reply-To: References: Message-ID: <-gNBJE3fcagLhqWF86kgPgMyvbcoPNpAfRYLYvTd9WQ=.4f1c59de-add5-47f0-b5e9-517fd59e6da0@github.com> On Thu, 7 Mar 2024 08:00:46 GMT, Yude Lin wrote: > Removed an unused variable (trivial) > > Also, there is another place that uses satb_mark_queue_active_offset which is ShenandoahBarrierSetC2::verify_gc_barriers > The current barrier pattern is different from what this code is expecting: > If->Bool->CmpI->AndI->LoadUB->AddP->ConL(gc_state_offset) > rather than > If->Bool->CmpI->LoadB->AddP->ConL(marking_offset) > However, this code isn't doing as much checking as its counterpart in G1 anyway (so I'm thinking removing the incorrect matching code altogether?) Looking forward to your suggestions. This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/18148 From duke at openjdk.org Thu May 9 03:30:04 2024 From: duke at openjdk.org (Jin Guojie) Date: Thu, 9 May 2024 03:30:04 GMT Subject: RFR: 8331558: AArch64: optimize integer remainder [v8] In-Reply-To: References: Message-ID: > 8331558: AArch64: optimize integer remainder > On some Arm processors, a separate multiply/subtract is actually faster than the combined instruction. > > 8331556: AArch64: CPU_Model support for Neoverse N1/N2/V1/V2 > Add full platform coverage for Neoverse variants in vm_version.?pp > > The following test has passed, which shows definite performance improvement. > > make test TEST="micro:java.lang.IntegerDivMod" > make test TEST="micro:java.lang.LongDivMod" > > * IntegerDivMod.testDivideRemainderUnsigned > baseline(ns/ops) 2223 > with this pacth(ns/ops) 1885 > improvement(%) 17.93% > > * IntegerDivMod.testRemainderUnsigned > baseline(ns/ops) 2225 > with this pacth(ns/ops) 1885 > improvement(%) 18.03% > > * LongDivMod.testDivideRemainderUnsigned > baseline(ns/ops) 2231 > with this pacth(ns/ops) 1894 > improvement(%) 17.79% > > * LongDivMod.testRemainderUnsigned > baseline(ns/ops) 2232 > with this pacth(ns/ops) 1891 > improvement(%) 18.03% Jin Guojie has updated the pull request incrementally with one additional commit since the last revision: Refine comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19093/files - new: https://git.openjdk.org/jdk/pull/19093/files/9cbff831..bb417893 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19093&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19093&range=06-07 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/19093.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19093/head:pull/19093 PR: https://git.openjdk.org/jdk/pull/19093 From duke at openjdk.org Thu May 9 03:40:53 2024 From: duke at openjdk.org (Jin Guojie) Date: Thu, 9 May 2024 03:40:53 GMT Subject: RFR: 8331558: AArch64: optimize integer remainder [v3] In-Reply-To: References: Message-ID: On Wed, 8 May 2024 03:14:43 GMT, Eric Liu wrote: >> Jin Guojie has updated the pull request incrementally with one additional commit since the last revision: >> >> Applicable platforms expanded to the entire neoverse family >> >> Even on the V series (V1 and V2), both sdiv/udiv and msub instructions are executed in M0 unit (Integer multi cycle). It should benefit the V series as well. > > src/hotspot/cpu/aarch64/macroAssembler_aarch64.hpp line 465: > >> 463: /* On Neoverse, MSUB uses the same ALU with SDIV. >> 464: * The combination of MUL/SUB can utilize multiple ALUs, >> 465: * and is much faster than MSUB. */ > > Please refine this comment. I suppose this combination can benefit other situiations which multiple instructions grab the M0, not just for MSUB + SDIV. Comments refined in https://github.com/openjdk/jdk/pull/19093/commits/bb4178931eca35cfac26bbdce04c22bdc8026805 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19093#discussion_r1594926362 From duke at openjdk.org Thu May 9 03:56:59 2024 From: duke at openjdk.org (duke) Date: Thu, 9 May 2024 03:56:59 GMT Subject: Withdrawn: 8324433: Introduce a way to determine if an expression is evaluated as a constant by the Jit compiler In-Reply-To: References: Message-ID: On Tue, 23 Jan 2024 08:10:54 GMT, Quan Anh Mai wrote: > Hi, > > This patch introduces `JitCompiler::isConstantExpression` which can be used to statically determine whether an expression has been constant-folded by the Jit compiler, leading to more constant-folding opportunities. For example, it can be used in `MemorySessionImpl::checkValidStateRaw` to eliminate the lifetime check on global sessions without imposing additional branches on other non-global sessions. This is similar to `__builtin_constant_p` in GCC and clang. > > Please kindly give your opinion as well as your reviews, thanks very much. This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/17527 From duke at openjdk.org Thu May 9 04:02:07 2024 From: duke at openjdk.org (duke) Date: Thu, 9 May 2024 04:02:07 GMT Subject: Withdrawn: 8311846: Resolve duplicate 'Thread' related symbols with JDK static linking In-Reply-To: References: Message-ID: On Wed, 17 Jan 2024 00:14:58 GMT, Jiangli Zhou wrote: > Please review this PR with a simple solution for resolving duplicate `Thread` symbol issue. In https://github.com/openjdk/jdk/pull/14808 comments, there was an alternative suggestion to redefine the symbol at build time, such as using`-DThread=HotSpotThread`. That would not address issues when symbol were references as string literals. https://github.com/openjdk/jdk/pull/14808 also discussed using namespace for hotspot code, which can have multiple benefits/motivations. We could explore further using namespace with more consensus on that approach. > > Contributed by Chuck Rasbold and @jianglizhou. This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/17456 From iklam at openjdk.org Thu May 9 04:02:08 2024 From: iklam at openjdk.org (Ioi Lam) Date: Thu, 9 May 2024 04:02:08 GMT Subject: RFR: 8330532: Improve line-oriented text parsing in HotSpot [v9] In-Reply-To: References: Message-ID: > (This PR is an alternative to https://github.com/openjdk/jdk/pull/18669 with a better API for reading lines of text) > > HotSpot has a few cases where information is parsed from a file, or from a memory buffer, one line at a time. Example: > > - https://github.com/openjdk/jdk/blob/064628471b83616b4463baa78618d1b7a66d0c7c/src/hotspot/share/cds/classListParser.cpp#L169 > - https://github.com/openjdk/jdk/blob/064628471b83616b4463baa78618d1b7a66d0c7c/src/hotspot/share/compiler/compilerOracle.cpp#L1059-L1066 > > Common problems: > - They use a fixed buffer for reading a line, so long (but valid) lines will cause errors. > - There's ad-hoc code that deals with `FILE*` differently than from memory. > > This RFE implements a common utility, `inputStream`, for reading lines from different sources of input (see `FileInput` and `MemoryInput`). We fixed only `ClassListParser` and `CompilerOracle` in this RFE, but we can fix other readers in follow-up RFEs. > > The API allows other source of input to be implemented. For example, one could implement a `SocketInput` if there's a use case for it. > > In the future, `inputStream` can be extended (or encapsulated in a higher-level reader class) to read typed input tokens (for example, integers, strings, etc.) > > Credit: > The `inputStream` class and friends are contributed by @rose00 . See https://mail.openjdk.org/pipermail/hotspot-dev/2024-April/087077.html . > > John's original version is in the draft PR https://github.com/openjdk/jdk/pull/18773. In order to minimize the size of this PR, I have kept only the functionalities for reading a line and a time. Other features, such as pushing back contents into the `inputStream`, could be added in follow-up PRs. (These removed features can be found in the commit history of this PR). Ioi Lam has updated the pull request incrementally with two additional commits since the last revision: - fixed windows build - removed unused fileStream::{position/set_position/remaining} ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18833/files - new: https://git.openjdk.org/jdk/pull/18833/files/08502fe5..34becabb Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18833&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18833&range=07-08 Stats: 58 lines in 3 files changed: 0 ins; 57 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/18833.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18833/head:pull/18833 PR: https://git.openjdk.org/jdk/pull/18833 From duke at openjdk.org Thu May 9 04:04:00 2024 From: duke at openjdk.org (duke) Date: Thu, 9 May 2024 04:04:00 GMT Subject: Withdrawn: 8323273: AArch64: Strengthen CompressedClassPointers initialization check for base In-Reply-To: References: Message-ID: On Tue, 16 Jan 2024 02:41:40 GMT, Yude Lin wrote: > Summary: > Add a platform-dependent check for CompressedClassSpaceBaseAddress; > Remove the "reserve anywhere" attempt after the initial mapping attempt failed---this is rarely used and will likely fail anyway, because the accepted mapping is very restricted on aarch64; > Additional assertions after initialization. > > Passed hotspot/jtreg/:tier1 on fastdebug This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/17437 From duke at openjdk.org Thu May 9 04:11:02 2024 From: duke at openjdk.org (duke) Date: Thu, 9 May 2024 04:11:02 GMT Subject: Withdrawn: 8322476: Remove GrowableArray C-Heap version, replace usages with GrowableArrayCHeap In-Reply-To: <7BF6OZ3vRH791MKbVeJqQ5foScHax_gLMFjSKkm3J68=.f29e5b1c-0751-4257-b253-261c6e20a7b9@github.com> References: <7BF6OZ3vRH791MKbVeJqQ5foScHax_gLMFjSKkm3J68=.f29e5b1c-0751-4257-b253-261c6e20a7b9@github.com> Message-ID: On Tue, 19 Dec 2023 16:59:05 GMT, Emanuel Peter wrote: > [JDK-8247755](https://bugs.openjdk.org/browse/JDK-8247755) introduced the `GrowableArrayCHeap`. This duplicates the current C-Heap allocation capability in `GrowableArray`. I now remove that from `GrowableArray` and move all usages to `GrowableArrayCHeap`. > > This has a few advantages: > - Clear separation between arena (and resource area) allocating array and C-heap allocating array. > - We can prevent assigning / copying between arrays of different allocation strategies already at compile time, and not only with asserts at runtime. > - We should not have multiple implementations of the same thing (C-Heap backed array). > - `GrowableArrayCHeap` is NONCOPYABLE. This is a nice restriction, we now know that C-Heap backed arrays do not get copied unknowingly. > > **Bonus** > We can now restrict `GrowableArray` element type `E` to be `std::is_trivially_destructible::value == true`. The idea is that arena / resource allocated arrays get abandoned, often without being even cleared. Hence, the elements in the array are never destructed. But if we only use elements that are trivially destructible, then it makes no difference if the destructors are ever called, or the elements simply abandoned. > > For `GrowableArrayCHeap`, we expect that the user eventually calls the destructor for the array, which in turn calls the destructors of the remaining elements. Hence, it is up to the user to ensure the cleanup. And so we can allow non-trivial destructors. > > **Testing** > Tier1-3 + stress testing: pending This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/17160 From duke at openjdk.org Thu May 9 04:23:05 2024 From: duke at openjdk.org (duke) Date: Thu, 9 May 2024 04:23:05 GMT Subject: Withdrawn: 8320709: AArch64: Vectorized Poly1305 intrinsics In-Reply-To: References: Message-ID: On Fri, 24 Nov 2023 17:12:25 GMT, Andrew Haley wrote: > Vectorizing Poly1305 is quite tricky. We already have a highly- > efficient scalar Poly1305 implementation that runs on the core integer > unit, but it's highly serialized, so it does not make make good use of > the parallelism available. > > The scalar implementation takes advantage of some particular features > of the Poly1305 keys. In particular, certain bits of r, the secret > key, are required to be 0. These make it possible to use a full > 64-bit-wide multiply-accumulate operation without needing to process > carries between partial products, > > While this works well for a serial implementation, a parallel > implementation cannot do this because rather than multiplying by r, > each step multiplies by some integer power of r, modulo > 2^130-5. > > In order to avoid processing carries between partial products we use a > redundant representation, in which each 130-bit integer is encoded > either as a 5-digit integer in base 2^26 or as a 3-digit integer in > base 2^52, depending on whether we are using a 64- or 32-bit > multiply-accumulate. > > In AArch64 Advanced SIMD, there is no 64-bit multiply-accumulate > operation available to us, so we must use 32*32 -> 64-bit operations. > > In order to achieve maximum performance we'd like to get close to the > processor's decode bandwidth, so that every clock cycle does something > useful. In a typical high-end AArch64 implementation, the core integer > unit has a fast 64-bit multiplier pipeline and the ASIMD unit has a > fast(ish) two-way 32-bit multiplier, which may be slower than than the > core integer unit's. It is not at all obvious whether it's best to use > ASIMD or core instructions. > > Fortunately, if we have a wide-bandwidth instruction decode, we can do > both at the same time, by feeding alternating instructions to the core > and the ASIMD units. This also allows us to make good use of all of > the available core and ASIMD registers, in parallel. > > To do this we use generators, which here are a kind of iterator that > emits a group of instructions each time it is called. In this case we > 4 parallel generators, and by calling them alternately we interleave > the ASIMD and the core instructions. We also take care to ensure that > each generator finishes at about the same time, to maximize the > distance between instructions which generate and consume data. > > The results are pretty good, ranging from 2* - 3* speedup. It is > possible that a pure in-order processor (Raspberry Pi?) might be at > some disadvantage because more work is being done even though it is > highly parallel, b... This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/16812 From lmao at openjdk.org Thu May 9 04:50:55 2024 From: lmao at openjdk.org (Liang Mao) Date: Thu, 9 May 2024 04:50:55 GMT Subject: RFR: 8331711: G1 doesn't need pre write barrier for stores from new allocated objects [v2] In-Reply-To: References: <0OdHsQmnM80KQib8u-yWtCSCejCTIK8lJ_bpLk3O_9E=.d727d825-882e-4574-84d9-6a908138066c@github.com> Message-ID: On Wed, 8 May 2024 11:28:15 GMT, Erik ?sterlund wrote: > Did you check how many of the stores where g1_can_remove_pre_barrier said false and you would have said true, were elided anyway during store capturing (cf. InitializeNode::capture_store), or as part of G1BarrierSetC2::eliminate_gc_barrier? In other words, how many barriers are you eliding, that were not in fact already elided, just a bit later on? There is only 1 store that g1_can_remove_pre_barrier return false and was elided by this PR in JBB. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19098#discussion_r1594958770 From iklam at openjdk.org Thu May 9 04:52:54 2024 From: iklam at openjdk.org (Ioi Lam) Date: Thu, 9 May 2024 04:52:54 GMT Subject: RFR: 8330532: Improve line-oriented text parsing in HotSpot In-Reply-To: References: Message-ID: On Wed, 8 May 2024 05:44:46 GMT, John Rose wrote: > Ioi, I would support reducing coupling with fileStream by removing old rewind and new position/remaining functions. But I?d rather keep the new functions, because I think they are likely to be useful. I have future uses in mind, which might or might not happen. For example, open a CDS archive or large config file, position the fileStream at the base address of some textual configuration data, and start reading. Hi John, I decided to remove the position/set_position/remaining functions from the PR, as we no longer have test cases that cover them. When they are needed in the future, we can restore them from history and add the necessary tests. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18833#issuecomment-2101928284 From iklam at openjdk.org Thu May 9 06:03:24 2024 From: iklam at openjdk.org (Ioi Lam) Date: Thu, 9 May 2024 06:03:24 GMT Subject: RFR: 8330532: Improve line-oriented text parsing in HotSpot [v10] In-Reply-To: References: Message-ID: > (This PR is an alternative to https://github.com/openjdk/jdk/pull/18669 with a better API for reading lines of text) > > HotSpot has a few cases where information is parsed from a file, or from a memory buffer, one line at a time. Example: > > - https://github.com/openjdk/jdk/blob/064628471b83616b4463baa78618d1b7a66d0c7c/src/hotspot/share/cds/classListParser.cpp#L169 > - https://github.com/openjdk/jdk/blob/064628471b83616b4463baa78618d1b7a66d0c7c/src/hotspot/share/compiler/compilerOracle.cpp#L1059-L1066 > > Common problems: > - They use a fixed buffer for reading a line, so long (but valid) lines will cause errors. > - There's ad-hoc code that deals with `FILE*` differently than from memory. > > This RFE implements a common utility, `inputStream`, for reading lines from different sources of input (see `FileInput` and `MemoryInput`). We fixed only `ClassListParser` and `CompilerOracle` in this RFE, but we can fix other readers in follow-up RFEs. > > The API allows other source of input to be implemented. For example, one could implement a `SocketInput` if there's a use case for it. > > In the future, `inputStream` can be extended (or encapsulated in a higher-level reader class) to read typed input tokens (for example, integers, strings, etc.) > > Credit: > The `inputStream` class and friends are contributed by @rose00 . See https://mail.openjdk.org/pipermail/hotspot-dev/2024-April/087077.html . > > John's original version is in the draft PR https://github.com/openjdk/jdk/pull/18773. In order to minimize the size of this PR, I have kept only the functionalities for reading a line and a time. Other features, such as pushing back contents into the `inputStream`, could be added in follow-up PRs. (These removed features can be found in the commit history of this PR). Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: Need to apply the same fdopen work-around as in JDK-8216184 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18833/files - new: https://git.openjdk.org/jdk/pull/18833/files/34becabb..909e5175 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18833&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18833&range=08-09 Stats: 17 lines in 3 files changed: 14 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/18833.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18833/head:pull/18833 PR: https://git.openjdk.org/jdk/pull/18833 From eliu at openjdk.org Thu May 9 06:32:53 2024 From: eliu at openjdk.org (Eric Liu) Date: Thu, 9 May 2024 06:32:53 GMT Subject: RFR: 8331558: AArch64: optimize integer remainder [v8] In-Reply-To: References: Message-ID: <1uckGP_h1GTioQazO1a6qoqRhex6Gv3ytolJn63DzyY=.09c6cd50-300c-4ec4-b4f9-76ec58574898@github.com> On Thu, 9 May 2024 03:30:04 GMT, Jin Guojie wrote: >> 8331558: AArch64: optimize integer remainder >> On some Arm processors, a separate multiply/subtract is actually faster than the combined instruction. >> >> 8331556: AArch64: CPU_Model support for Neoverse N1/N2/V1/V2 >> Add full platform coverage for Neoverse variants in vm_version.?pp >> >> The following test has passed, which shows definite performance improvement. >> >> make test TEST="micro:java.lang.IntegerDivMod" >> make test TEST="micro:java.lang.LongDivMod" >> >> * IntegerDivMod.testDivideRemainderUnsigned >> baseline(ns/ops) 2223 >> with this pacth(ns/ops) 1885 >> improvement(%) 17.93% >> >> * IntegerDivMod.testRemainderUnsigned >> baseline(ns/ops) 2225 >> with this pacth(ns/ops) 1885 >> improvement(%) 18.03% >> >> * LongDivMod.testDivideRemainderUnsigned >> baseline(ns/ops) 2231 >> with this pacth(ns/ops) 1894 >> improvement(%) 17.79% >> >> * LongDivMod.testRemainderUnsigned >> baseline(ns/ops) 2232 >> with this pacth(ns/ops) 1891 >> improvement(%) 18.03% > > Jin Guojie has updated the pull request incrementally with one additional commit since the last revision: > > Refine comments Looks good to me. ------------- Marked as reviewed by eliu (Committer). PR Review: https://git.openjdk.org/jdk/pull/19093#pullrequestreview-2047239434 From kirk at kodewerk.com Thu May 9 06:40:36 2024 From: kirk at kodewerk.com (Kirk Pepperdine) Date: Thu, 9 May 2024 08:40:36 +0200 Subject: [EXTERNAL] Discuss: Prevent jlink runtimes from reading _JAVA_OPTIONS In-Reply-To: References: <97917bce-8e4e-48d6-a459-5dc166a7b288@oracle.com> Message-ID: <2C21DE71-43BF-41DA-A936-4EF5DA4F6C20@kodewerk.com> Hi all, The use of _JAVA_OPTIONS is part of the long established pattern to configured a process. This pattern preferes configurations starting with command line over environment variables over configuration files. I can understand why this is problematic if one is unaware but this is a decades old well establish practice that the JVM follows. Kind regards, Kirk > On May 8, 2024, at 7:00 AM, Bruno Borges wrote: > > Thanks Alan. > > I'll follow up there > > Sent from mobile device. > From: Alan Bateman > Sent: Tuesday, May 7, 2024 9:43:27 PM > To: Bruno Borges ; hotspot-dev at openjdk.org > Subject: [EXTERNAL] Re: Discuss: Prevent jlink runtimes from reading _JAVA_OPTIONS > > > On 08/05/2024 04:25, Bruno Borges wrote: > > In this Reddit discussion [1], the user complains that a jlinked > > runtime of their application, packaged with jpackage, was failing to > > some degree due to the environment variable _JAVA_OPTIONS being set > > somewhere else in the system. > > > > I do agree with the user that a runtime shipped as a built-in > > component of a Java-based standalone application should not have its > > properties altered due to a magical environment variable. > > > > I'd like to ask if it is reasonable to suggest that in the case of a > > jlinked runtime, this should not happen. > > > There was another thread about this a few days ago [1]. > > -Alan > > [1] https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmail.openjdk.org%2Fpipermail%2Fhotspot-dev%2F2024-May%2F088245.html&data=05%7C02%7CBruno.Borges%40microsoft.com%7Cf45d571f168c44eb82b908dc6f196f34%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638507402252381755%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=ZCiuEnWjSdSqRLH0luxNRksK0wYm2IGSzDwuvSQkHFY%3D&reserved=0 -------------- next part -------------- An HTML attachment was scrubbed... URL: From david.holmes at oracle.com Thu May 9 07:03:05 2024 From: david.holmes at oracle.com (David Holmes) Date: Thu, 9 May 2024 17:03:05 +1000 Subject: External _JAVA_OPTIONS environment variable sourcing for self-contained applications In-Reply-To: References: Message-ID: <1bc8a1a8-5adf-4a00-800c-cfe626608ae6@oracle.com> Hi, On 4/05/2024 10:49 am, Christopher Schnick wrote: > Hello there, > > I wasn't entirely sure whether this is the correct mailing list for > this, but it was the best match for me skimming through all the > available mailing lists. Feel free to point me to a better suited one if > I'm wrong here. > > We develop and distribute Java desktop applications to users by creating > standalone application images with jpackage. Everything is working fine, > however there was a recent issue where some users couldn't get the > application to work correctly. After some investigation, it turned out > that the affected users had set the environment variable _JAVA_OPTIONS > with a few JVM arguments, particularly Xmx parameters that were way too > low for our application. I was quite surprised that these apply to self > contained jpackage applications, because for me this is not in the > spirit of an isolated and self contained application. I was even more > surprised that it overwrote existing arguments as we had our own values > for Xmx set in the application image, but these were ignored in favour > of _JAVA_OPTIONS. And I'm under the impression that this behavior cannot > be disabled. (Please correct me if I'm wrong) How does such a jpackaged application actually launch/load the JVM? I'm wondering if there is a way to insert a new "shell" environment to launch the JVM without having those env vars present ... though I guess there may be other env vars that your application still needs. David ----- > While I see that there is definitely some use case for having this > option available to allow users to customize their environment > uniformly, I would say that this causes usually more harm than good in > this case. The cases of unintentional interference are probably much > higher than intentional configuration, which requires specific > application knowledge to work in the first place. > > If someone has set up a few Java 8 application on their system via > normal jars and has configured a few options for them, I don't want them > to apply to my application image that runs on Java 21. As the developer, > I also don't want the user even having to bother with thinking about > this possibility. I also don't even know if the application starts up if > the variable contains unrecognized options.? Overall I'm not advocating > here to fully remove this behavior, but at least thinking about giving > application developers some option to disable external JVM argument > sourcing for jlink/jpackage. I hope that this proposal can be considered. > > Best > Christopher Schnick > From mli at openjdk.org Thu May 9 07:07:56 2024 From: mli at openjdk.org (Hamlin Li) Date: Thu, 9 May 2024 07:07:56 GMT Subject: RFR: 8322753: RISC-V: C2 ReverseBytesV [v6] In-Reply-To: <0CfgzUKbuhVHe1V1v03tuRv1VaYvEkWYSyuJAW7oiCk=.5487fa17-7163-4d12-aa6f-6c4bfe45373b@github.com> References: <0CfgzUKbuhVHe1V1v03tuRv1VaYvEkWYSyuJAW7oiCk=.5487fa17-7163-4d12-aa6f-6c4bfe45373b@github.com> Message-ID: On Wed, 8 May 2024 13:29:18 GMT, Hamlin Li wrote: >> Hi, >> Can you review this patch to add ReverseBytesV intrinsic? >> Thanks. > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > space Thanks @luhenry @RealFYang for your reviewing. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19120#issuecomment-2102077469 From mli at openjdk.org Thu May 9 07:07:57 2024 From: mli at openjdk.org (Hamlin Li) Date: Thu, 9 May 2024 07:07:57 GMT Subject: Integrated: 8322753: RISC-V: C2 ReverseBytesV In-Reply-To: References: Message-ID: On Tue, 7 May 2024 13:29:33 GMT, Hamlin Li wrote: > Hi, > Can you review this patch to add ReverseBytesV intrinsic? > Thanks. This pull request has now been integrated. Changeset: 964d6089 Author: Hamlin Li URL: https://git.openjdk.org/jdk/commit/964d60892eec5e64942b49182a4c6d4105620acd Stats: 41 lines in 6 files changed: 36 ins; 0 del; 5 mod 8322753: RISC-V: C2 ReverseBytesV Reviewed-by: fyang ------------- PR: https://git.openjdk.org/jdk/pull/19120 From Alan.Bateman at oracle.com Thu May 9 07:40:23 2024 From: Alan.Bateman at oracle.com (Alan Bateman) Date: Thu, 9 May 2024 08:40:23 +0100 Subject: External _JAVA_OPTIONS environment variable sourcing for self-contained applications In-Reply-To: <1bc8a1a8-5adf-4a00-800c-cfe626608ae6@oracle.com> References: <1bc8a1a8-5adf-4a00-800c-cfe626608ae6@oracle.com> Message-ID: On 09/05/2024 08:03, David Holmes wrote: > > How does such a jpackaged application actually launch/load the JVM? > I'm wondering if there is a way to insert a new "shell" environment to > launch the JVM without having those env vars present ... though I > guess there may be other env vars that your application still needs. For modular applications, there is a jlink option to generate a launcher (script) for the application. That's a potential place to unset environment variables that shouldn't be inherited.? It may not help here as it sounds like this is an application image produced by jpackage with a native launcher, and the warning message is hidden as there is no console (I assume). I think we should consider deprecating and eventually removing _JAVA_OPTIONS. It's always been problematic that it appends rather than prepend and it has issues in areas such as quoting. When JDK_JAVA_OPTIONS was added then we had hoped that developers would move from the undocumented env variable. The new env variable fixes a bunch of things in the areas of quoting, arg files, works with launcher options, and it of course prepends so it doesn't override options. -Alan From iklam at openjdk.org Thu May 9 07:46:00 2024 From: iklam at openjdk.org (Ioi Lam) Date: Thu, 9 May 2024 07:46:00 GMT Subject: Integrated: 8330532: Improve line-oriented text parsing in HotSpot In-Reply-To: References: Message-ID: On Thu, 18 Apr 2024 03:51:06 GMT, Ioi Lam wrote: > (This PR is an alternative to https://github.com/openjdk/jdk/pull/18669 with a better API for reading lines of text) > > HotSpot has a few cases where information is parsed from a file, or from a memory buffer, one line at a time. Example: > > - https://github.com/openjdk/jdk/blob/064628471b83616b4463baa78618d1b7a66d0c7c/src/hotspot/share/cds/classListParser.cpp#L169 > - https://github.com/openjdk/jdk/blob/064628471b83616b4463baa78618d1b7a66d0c7c/src/hotspot/share/compiler/compilerOracle.cpp#L1059-L1066 > > Common problems: > - They use a fixed buffer for reading a line, so long (but valid) lines will cause errors. > - There's ad-hoc code that deals with `FILE*` differently than from memory. > > This RFE implements a common utility, `inputStream`, for reading lines from different sources of input (see `FileInput` and `MemoryInput`). We fixed only `ClassListParser` and `CompilerOracle` in this RFE, but we can fix other readers in follow-up RFEs. > > The API allows other source of input to be implemented. For example, one could implement a `SocketInput` if there's a use case for it. > > In the future, `inputStream` can be extended (or encapsulated in a higher-level reader class) to read typed input tokens (for example, integers, strings, etc.) > > Credit: > The `inputStream` class and friends are contributed by @rose00 . See https://mail.openjdk.org/pipermail/hotspot-dev/2024-April/087077.html . > > John's original version is in the draft PR https://github.com/openjdk/jdk/pull/18773. In order to minimize the size of this PR, I have kept only the functionalities for reading a line and a time. Other features, such as pushing back contents into the `inputStream`, could be added in follow-up PRs. (These removed features can be found in the commit history of this PR). This pull request has now been integrated. Changeset: ac86f59e Author: Ioi Lam URL: https://git.openjdk.org/jdk/commit/ac86f59e4f5382d5c3e8984532dd210611db7dcb Stats: 1355 lines in 11 files changed: 1198 ins; 89 del; 68 mod 8330532: Improve line-oriented text parsing in HotSpot Co-authored-by: John R Rose Reviewed-by: matsaave, jsjolen ------------- PR: https://git.openjdk.org/jdk/pull/18833 From dholmes at openjdk.org Thu May 9 08:03:53 2024 From: dholmes at openjdk.org (David Holmes) Date: Thu, 9 May 2024 08:03:53 GMT Subject: RFR: 8314225: SIGSEGV in JavaThread::is_lock_owned [v10] In-Reply-To: References: Message-ID: <12yOU9bb-Ri1gf0nXqCo1DgOlFRes73FUN4TzVuJmi4=.90af37ca-0aab-416c-94f1-e10352f97696@github.com> On Wed, 8 May 2024 08:30:30 GMT, Kevin Walls wrote: >> Removal of JavaThread's MonitorChunks member. This held lock information during deoptimization, but access to it is unnecessary for anything other than the deoptimization itself. >> >> Access to it in is_lock_owned() was racy, and caused rare crashes. > > Kevin Walls has updated the pull request incrementally with one additional commit since the last revision: > > null nullptr oops Looks good! Thanks for such a detailed and thorough investigation of this issue @kevinjwalls and @dean-long . ------------- Marked as reviewed by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18940#pullrequestreview-2047381821 From kevinw at openjdk.org Thu May 9 08:31:56 2024 From: kevinw at openjdk.org (Kevin Walls) Date: Thu, 9 May 2024 08:31:56 GMT Subject: RFR: 8314225: SIGSEGV in JavaThread::is_lock_owned [v10] In-Reply-To: References: Message-ID: On Wed, 8 May 2024 08:30:30 GMT, Kevin Walls wrote: >> Removal of JavaThread's MonitorChunks member. This held lock information during deoptimization, but access to it is unnecessary for anything other than the deoptimization itself. >> >> Access to it in is_lock_owned() was racy, and caused rare crashes. > > Kevin Walls has updated the pull request incrementally with one additional commit since the last revision: > > null nullptr oops Thanks for the reviews and help! ------------- PR Comment: https://git.openjdk.org/jdk/pull/18940#issuecomment-2102201745 From eosterlund at openjdk.org Thu May 9 09:31:52 2024 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Thu, 9 May 2024 09:31:52 GMT Subject: RFR: 8331711: G1 doesn't need pre write barrier for stores from new allocated objects [v2] In-Reply-To: References: <0OdHsQmnM80KQib8u-yWtCSCejCTIK8lJ_bpLk3O_9E=.d727d825-882e-4574-84d9-6a908138066c@github.com> Message-ID: On Thu, 9 May 2024 04:47:53 GMT, Liang Mao wrote: > > Did you check how many of the stores where g1_can_remove_pre_barrier said false and you would have said true, were elided anyway during store capturing (cf. InitializeNode::capture_store), or as part of G1BarrierSetC2::eliminate_gc_barrier? In other words, how many barriers are you eliding, that were not in fact already elided, just a bit later on? > > > > There is only 1 store that g1_can_remove_pre_barrier return false and was elided by this PR in JBB. Okay. That's what I expected. Given that we are about to remove all of this code in favour of more robust late barrier expansion, I feel like we can live without that one extra store barrier for now. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19098#discussion_r1595206949 From lmao at openjdk.org Thu May 9 09:36:52 2024 From: lmao at openjdk.org (Liang Mao) Date: Thu, 9 May 2024 09:36:52 GMT Subject: RFR: 8331711: G1 doesn't need pre write barrier for stores from new allocated objects [v2] In-Reply-To: References: <0OdHsQmnM80KQib8u-yWtCSCejCTIK8lJ_bpLk3O_9E=.d727d825-882e-4574-84d9-6a908138066c@github.com> Message-ID: On Thu, 9 May 2024 09:29:24 GMT, Erik ?sterlund wrote: > > > Did you check how many of the stores where g1_can_remove_pre_barrier said false and you would have said true, were elided anyway during store capturing (cf. InitializeNode::capture_store), or as part of G1BarrierSetC2::eliminate_gc_barrier? In other words, how many barriers are you eliding, that were not in fact already elided, just a bit later on? > > > > > > There is only 1 store that g1_can_remove_pre_barrier return false and was elided by this PR in JBB. > > Okay. That's what I expected. Given that we are about to remove all of this code in favour of more robust late barrier expansion, I feel like we can live without that one extra store barrier for now. ok. That's fairly reasonable. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19098#discussion_r1595211720 From stuefe at openjdk.org Thu May 9 11:39:03 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 9 May 2024 11:39:03 GMT Subject: RFR: 8312132: Add tracking of multiple address spaces in NMT [v74] In-Reply-To: References: Message-ID: On Wed, 8 May 2024 11:53:17 GMT, Johan Sj?len wrote: >> Hi, >> >> This PR introduces a new abstraction to NMT, named `MemoryFileTracker`. Today, NMT does not track any memory outside of the virtual memory address space. This means that if you allocated memory in something such as a memory-backed file and use `mmap` to map into that memory, then you'll have trouble reporting this to NMT. This is the situation that ZGC is in, and that is what this patch attempts to fix. >> >> ## `MemoryFileTracker` >> >> The `MemoryFileTracker` adds the ability of adding new virtual memory address spaces to NMT and committing memory to these, the basic API is: >> >> ```c++ >> static MemoryFile* make_device(const char* descriptive_name); >> static void free_device(MemoryFile* device); >> >> static void allocate_memory(MemoryFile* device, size_t offset, size_t size, >> MEMFLAGS flag, const NativeCallStack& stack); >> static void free_memory(MemoryFile* device, size_t offset, size_t size); >> >> >> It is easiest to see how this is used by looking at what ZGC's `ZNMT` class does: >> >> ```c++ >> void ZNMT::reserve(zaddress_unsafe start, size_t size) { >> MemTracker::record_virtual_memory_reserve((address)start, size, CALLER_PC, mtJavaHeap); >> } >> void ZNMT::commit(zoffset offset, size_t size) { >> MemTracker::allocate_memory_in(ZNMT::_device, static_cast(offset), size, mtJavaHeap, CALLER_PC); >> } >> void ZNMT::uncommit(zoffset offset, size_t size) { >> MemTracker::free_memory_in(ZNMT::_device, (size_t)offset, size); >> } >> >> void ZNMT::map(zaddress_unsafe addr, size_t size, zoffset offset) { >> // NMT doesn't track mappings at the moment. >> } >> void ZNMT::unmap(zaddress_unsafe addr, size_t size) { >> // NMT doesn't track mappings at the moment. >> } >> >> >> As you can see, any mapping between reserved regions and device-allocated memory is not recorded in NMT. This means that in detailed mode you only get reserved regions printed for the reserved memory, the device-allocated memory is reported separately. When performing summary reporting any memory allocated via these devices is added to the corresponding `MEMFLAGS` as `committed` memory. >> >> This patch is also acting as a base on which we deploy multiple new backend ideas to NMT. These ideas are: >> >> 1. Implement VMA tracking using a balanced binary tree approach. Today's `VirtualMemoryTracker`'s usage of linked lists is slow and brittle, we'd like to move away from it. Our Treap-based approach in this patch gives a performance bo... > > Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: > > Some style More treap comments. Will continue later. src/hotspot/share/nmt/nmtTreap.hpp line 203: > 201: TreapNode* last_seen = nullptr; > 202: bool failed = false; > 203: this->visit_in_order([&](TreapNode* node) { Here, and in other places: what's with the this-> ? src/hotspot/share/nmt/nmtTreap.hpp line 209: > 207: } > 208: int c = COMPARATOR::cmp(last_seen->key(), node->key()); > 209: if (c > 0) { No need for c, combine those lines src/hotspot/share/nmt/nmtTreap.hpp line 230: > 228: > 229: void upsert(const K& k, const V& v) { > 230: assert(verify_self(), "invariant"); Here, and down in remove(): I would consider modifying the verification here, or even remove it, and call it explicitly by callers of the treap at convenient times only. My experiences from doing very similar things in Metaspace was that such fine grained verifications get quickly way to expensive, even for a debug build. In metaspace, I ended up doing something like this: https://github.com/openjdk/jdk/blob/ac86f59e4f5382d5c3e8984532dd210611db7dcb/src/hotspot/share/memory/metaspace/metaspaceCommon.hpp#L119 which adds assertion that are only checked every n times, in that case configurable with a switch. src/hotspot/share/nmt/nmtTreap.hpp line 240: > 238: DEBUG_ONLY(_node_count++;) > 239: // Doesn't exist, make node > 240: void* node_place = ALLOCATOR::allocate(sizeof(TreapNode)); Please make it explicit in the class definition that the ALLOCATOR must be checking for oom and exit or whatever. src/hotspot/share/nmt/nmtTreap.hpp line 254: > 252: > 253: // (LEQ_k, GT_k) > 254: node_pair fst_split = split(this->_root, k, LEQ); Can we afford some more letters please? :-) fst : first snd : second But since you use left and right in other places, I'd use that too here. src/hotspot/share/nmt/nmtTreap.hpp line 278: > 276: to_delete.push(head->_left); > 277: to_delete.push(head->_right); > 278: ALLOCATOR::free(head); If we ever generalize this treap, we may want to add an optional dtor call for the payload here. Not for now, though src/hotspot/share/nmt/nmtTreap.hpp line 287: > 285: if (leqB != nullptr && leqB->key() == key) { > 286: return leqB; > 287: } I don't get this, why is this leq search needed? And if its needed, is that not redundant to the code below that compares for key equality? src/hotspot/share/nmt/nmtTreap.hpp line 307: > 305: } > 306: > 307: TreapNode* closest_leq(const K& key) { I don't understand the naming of the variables. What is A? _n? _r? And "_head" is somewhat misleading. I would have named head=pos or current, leqA_n = best or found or candidate or best_so_far... any of these src/hotspot/share/nmt/nmtTreap.hpp line 321: > 319: head = head->_right; > 320: } else if (cmp_r > 0) { > 321: head = head->_left; Just use else, and optionally assert != 0 test/hotspot/gtest/nmt/test_nmt_treap.cpp line 57: > 55: treap.remove(i); > 56: } > 57: } Please test the find functions (leq, geq etc) and iteration. For the latter, please test that we see every value, guaranteed, and that we see every value only exactly once. Also test: - inserting duplicates should result in only one value - empty treap should work as expected - treap with custom allocator, e.g. a simple array based one. Make sure we don't leak memory after nodes are removed, or after remove_all - treap with different custom comparator, please counter check the iteration order - maybe some more types? at least the common numericals ------------- PR Review: https://git.openjdk.org/jdk/pull/18289#pullrequestreview-2043200272 PR Review Comment: https://git.openjdk.org/jdk/pull/18289#discussion_r1595280793 PR Review Comment: https://git.openjdk.org/jdk/pull/18289#discussion_r1595281186 PR Review Comment: https://git.openjdk.org/jdk/pull/18289#discussion_r1595327894 PR Review Comment: https://git.openjdk.org/jdk/pull/18289#discussion_r1595287004 PR Review Comment: https://git.openjdk.org/jdk/pull/18289#discussion_r1595289941 PR Review Comment: https://git.openjdk.org/jdk/pull/18289#discussion_r1595291696 PR Review Comment: https://git.openjdk.org/jdk/pull/18289#discussion_r1595305355 PR Review Comment: https://git.openjdk.org/jdk/pull/18289#discussion_r1595296855 PR Review Comment: https://git.openjdk.org/jdk/pull/18289#discussion_r1595294771 PR Review Comment: https://git.openjdk.org/jdk/pull/18289#discussion_r1595321380 From stuefe at openjdk.org Thu May 9 11:39:06 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 9 May 2024 11:39:06 GMT Subject: RFR: 8312132: Add tracking of multiple address spaces in NMT [v67] In-Reply-To: References: Message-ID: <5ZO2gFz9Mwb3V8g71tnSpzaGUfEmHsUK_DJbw7fbVAE=.f5c95bc6-fd99-4bde-9c30-7bdcef3234b9@github.com> On Tue, 7 May 2024 11:56:33 GMT, Johan Sj?len wrote: >> Hi, >> >> This PR introduces a new abstraction to NMT, named `MemoryFileTracker`. Today, NMT does not track any memory outside of the virtual memory address space. This means that if you allocated memory in something such as a memory-backed file and use `mmap` to map into that memory, then you'll have trouble reporting this to NMT. This is the situation that ZGC is in, and that is what this patch attempts to fix. >> >> ## `MemoryFileTracker` >> >> The `MemoryFileTracker` adds the ability of adding new virtual memory address spaces to NMT and committing memory to these, the basic API is: >> >> ```c++ >> static MemoryFile* make_device(const char* descriptive_name); >> static void free_device(MemoryFile* device); >> >> static void allocate_memory(MemoryFile* device, size_t offset, size_t size, >> MEMFLAGS flag, const NativeCallStack& stack); >> static void free_memory(MemoryFile* device, size_t offset, size_t size); >> >> >> It is easiest to see how this is used by looking at what ZGC's `ZNMT` class does: >> >> ```c++ >> void ZNMT::reserve(zaddress_unsafe start, size_t size) { >> MemTracker::record_virtual_memory_reserve((address)start, size, CALLER_PC, mtJavaHeap); >> } >> void ZNMT::commit(zoffset offset, size_t size) { >> MemTracker::allocate_memory_in(ZNMT::_device, static_cast(offset), size, mtJavaHeap, CALLER_PC); >> } >> void ZNMT::uncommit(zoffset offset, size_t size) { >> MemTracker::free_memory_in(ZNMT::_device, (size_t)offset, size); >> } >> >> void ZNMT::map(zaddress_unsafe addr, size_t size, zoffset offset) { >> // NMT doesn't track mappings at the moment. >> } >> void ZNMT::unmap(zaddress_unsafe addr, size_t size) { >> // NMT doesn't track mappings at the moment. >> } >> >> >> As you can see, any mapping between reserved regions and device-allocated memory is not recorded in NMT. This means that in detailed mode you only get reserved regions printed for the reserved memory, the device-allocated memory is reported separately. When performing summary reporting any memory allocated via these devices is added to the corresponding `MEMFLAGS` as `committed` memory. >> >> This patch is also acting as a base on which we deploy multiple new backend ideas to NMT. These ideas are: >> >> 1. Implement VMA tracking using a balanced binary tree approach. Today's `VirtualMemoryTracker`'s usage of linked lists is slow and brittle, we'd like to move away from it. Our Treap-based approach in this patch gives a performance bo... > > Johan Sj?len has updated the pull request incrementally with four additional commits since the last revision: > > - Remove GEQ_B > - Move things around slightly to be closer to usage > - Simplify code > - Remove superfluous comment src/hotspot/share/nmt/nmtTreap.hpp line 66: > 64: _right(nullptr) { > 65: } > 66: condense this code a little? Getters can be one-liners src/hotspot/share/nmt/nmtTreap.hpp line 87: > 85: TreapNode* _root; > 86: uint64_t _prng_seed; > 87: DEBUG_ONLY(int _node_count;) I would enable _node_count for release, too. We don't save much by omitting, and we should print the tree state - with at least the node count - out as part of NMT diagnostics and in the NMT section of the hs-err file. src/hotspot/share/nmt/nmtTreap.hpp line 94: > 92: static const uint64_t PrngAdd = 0xB; > 93: static const uint64_t PrngModPower = 48; > 94: static const uint64_t PrngModMask = (static_cast(1) << PrngModPower) - 1; can all be constexpr src/hotspot/share/nmt/nmtTreap.hpp line 170: > 168: } > 169: > 170: bool verify_self() { I would not return false, and then assert in the parent. I would assert right in here. Terser code, and you can write much clearer assertion messages without having to think about passing error infos up to the caller. If you are worried about gtests blowing up instead of giving errors, I would not care. If this tree implementation has a bug, we need to fix it immediately anyway. (Note that I follow that pattern - asserting verifiers called also in gtests - for a lot of code) I would also make this method debug-only. src/hotspot/share/nmt/nmtTreap.hpp line 171: > 169: > 170: bool verify_self() { > 171: double expected_maximum_depth = log(this->_node_count+1) * 5; Here, and positive_infinity: would make those const or constexpr, whatever applies src/hotspot/share/nmt/nmtTreap.hpp line 189: > 187: if (maximum_depth_found < head.depth) { > 188: maximum_depth_found = head.depth; > 189: } can be a one liner with MAX2 src/hotspot/share/nmt/nmtTreap.hpp line 199: > 197: return false; > 198: } > 199: // Visit everything in order, see that the key ordering is monotonically increasing. I also would verify the node count here. src/hotspot/share/nmt/nmtTreap.hpp line 221: > 219: _prng_seed(seed) > 220: DEBUG_ONLY(COMMA _node_count(0)) { > 221: } weird indentation ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18289#discussion_r1592502963 PR Review Comment: https://git.openjdk.org/jdk/pull/18289#discussion_r1592677155 PR Review Comment: https://git.openjdk.org/jdk/pull/18289#discussion_r1592677689 PR Review Comment: https://git.openjdk.org/jdk/pull/18289#discussion_r1592491901 PR Review Comment: https://git.openjdk.org/jdk/pull/18289#discussion_r1592498423 PR Review Comment: https://git.openjdk.org/jdk/pull/18289#discussion_r1592488033 PR Review Comment: https://git.openjdk.org/jdk/pull/18289#discussion_r1592511141 PR Review Comment: https://git.openjdk.org/jdk/pull/18289#discussion_r1592659499 From stuefe at openjdk.org Thu May 9 11:39:07 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 9 May 2024 11:39:07 GMT Subject: RFR: 8312132: Add tracking of multiple address spaces in NMT [v74] In-Reply-To: References: Message-ID: On Thu, 9 May 2024 11:22:30 GMT, Thomas Stuefe wrote: >> Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: >> >> Some style > > test/hotspot/gtest/nmt/test_nmt_treap.cpp line 57: > >> 55: treap.remove(i); >> 56: } >> 57: } > > Please test the find functions (leq, geq etc) and iteration. > > For the latter, please test that we see every value, guaranteed, and that we see every value only exactly once. > > Also test: > - inserting duplicates should result in only one value > - empty treap should work as expected > - treap with custom allocator, e.g. a simple array based one. Make sure we don't leak memory after nodes are removed, or after remove_all > - treap with different custom comparator, please counter check the iteration order > - maybe some more types? at least the common numericals And I would not rely on the internal verification here, but call verify explicitly. (see also note in treap add/remove about verification) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18289#discussion_r1595324461 From stuefe at openjdk.org Thu May 9 11:39:07 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 9 May 2024 11:39:07 GMT Subject: RFR: 8312132: Add tracking of multiple address spaces in NMT [v74] In-Reply-To: References: Message-ID: On Thu, 9 May 2024 11:25:58 GMT, Thomas Stuefe wrote: >> test/hotspot/gtest/nmt/test_nmt_treap.cpp line 57: >> >>> 55: treap.remove(i); >>> 56: } >>> 57: } >> >> Please test the find functions (leq, geq etc) and iteration. >> >> For the latter, please test that we see every value, guaranteed, and that we see every value only exactly once. >> >> Also test: >> - inserting duplicates should result in only one value >> - empty treap should work as expected >> - treap with custom allocator, e.g. a simple array based one. Make sure we don't leak memory after nodes are removed, or after remove_all >> - treap with different custom comparator, please counter check the iteration order >> - maybe some more types? at least the common numericals > > And I would not rely on the internal verification here, but call verify explicitly. (see also note in treap add/remove about verification) Please also test the scoped find function with different sets (eg. empty set, 1 item set etc). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18289#discussion_r1595332326 From kevinw at openjdk.org Thu May 9 11:50:58 2024 From: kevinw at openjdk.org (Kevin Walls) Date: Thu, 9 May 2024 11:50:58 GMT Subject: Integrated: 8314225: SIGSEGV in JavaThread::is_lock_owned In-Reply-To: References: Message-ID: <_KauVg_nhC7sc8lwXxxcEUOeJwh2B6HN-4_M-soQWoE=.7fc2de13-ac5a-422b-b23b-447d9dee02d3@github.com> On Wed, 24 Apr 2024 19:50:08 GMT, Kevin Walls wrote: > Removal of JavaThread's MonitorChunks member. This held lock information during deoptimization, but access to it is unnecessary for anything other than the deoptimization itself. > > Access to it in is_lock_owned() was racy, and caused rare crashes. This pull request has now been integrated. Changeset: ad0b54d4 Author: Kevin Walls URL: https://git.openjdk.org/jdk/commit/ad0b54d429fdbd806c09aa06bb42f1ed4a0297e8 Stats: 98 lines in 11 files changed: 14 ins; 70 del; 14 mod 8314225: SIGSEGV in JavaThread::is_lock_owned Reviewed-by: dlong, dholmes ------------- PR: https://git.openjdk.org/jdk/pull/18940 From sspitsyn at openjdk.org Thu May 9 12:48:53 2024 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Thu, 9 May 2024 12:48:53 GMT Subject: RFR: 8330146: assert(!_thread->is_in_any_VTMS_transition()) failed In-Reply-To: References: Message-ID: On Thu, 2 May 2024 10:07:35 GMT, Serguei Spitsyn wrote: > Any event posting code except CFLH, ClassPrepare and ClassLoad events has a conditional return in case if the event is posted during a VTMS transition. The CFLH, ClassPrepare and ClassLoad event posting code has just an assert instead. The ClassPrepare and ClassLoad events also have a conditional return in a case of temporary VTMS transition. > This update is to align the CFLH, ClassPrepare and ClassLoad events with all other events in this area. > > Testing: > - TBD: submit mach5 tiers 1-6 PING! Need one more review, please. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19054#issuecomment-2102593940 From kevinw at openjdk.org Thu May 9 13:46:54 2024 From: kevinw at openjdk.org (Kevin Walls) Date: Thu, 9 May 2024 13:46:54 GMT Subject: RFR: 8330146: assert(!_thread->is_in_any_VTMS_transition()) failed In-Reply-To: References: Message-ID: On Thu, 2 May 2024 10:07:35 GMT, Serguei Spitsyn wrote: > Any event posting code except CFLH, ClassPrepare and ClassLoad events has a conditional return in case if the event is posted during a VTMS transition. The CFLH, ClassPrepare and ClassLoad event posting code has just an assert instead. The ClassPrepare and ClassLoad events also have a conditional return in a case of temporary VTMS transition. > This update is to align the CFLH, ClassPrepare and ClassLoad events with all other events in this area. > > Testing: > - TBD: submit mach5 tiers 1-6 Looks good to me. post_class_file_load_hook during VirtualThread$VThreadContinuation.onPinned, and because it's the first onPinned call it causes classloading, so just avoid that circular problem (these may not be the class loading events that most people are looking for most of the time). I wonder if this change is needed only until "jdk.tracePinnedThreads" is removed? Maybe this becomes unnecessary if TRACE_PINNING_MODE is gone, as maybe there won't be any possible classloading. Just a thought, no need for any more in this change. ------------- Marked as reviewed by kevinw (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19054#pullrequestreview-2047999334 From sspitsyn at openjdk.org Thu May 9 14:32:57 2024 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Thu, 9 May 2024 14:32:57 GMT Subject: RFR: 8330146: assert(!_thread->is_in_any_VTMS_transition()) failed In-Reply-To: References: Message-ID: <6bT9AIygcEud6XlOjx-uuGdQk2DfcU83UZUqnB_dWpQ=.41cd47df-61d0-4e54-a184-7d08bc13783b@github.com> On Thu, 2 May 2024 10:07:35 GMT, Serguei Spitsyn wrote: > Any event posting code except CFLH, ClassPrepare and ClassLoad events has a conditional return in case if the event is posted during a VTMS transition. The CFLH, ClassPrepare and ClassLoad event posting code has just an assert instead. The ClassPrepare and ClassLoad events also have a conditional return in a case of temporary VTMS transition. > This update is to align the CFLH, ClassPrepare and ClassLoad events with all other events in this area. > > Testing: > - TBD: submit mach5 tiers 1-6 Thank you for review, Kevin! The fix is needed anyway independently of the "jdk.tracePinnedThreads" option. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19054#issuecomment-2102778109 From sspitsyn at openjdk.org Thu May 9 14:32:57 2024 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Thu, 9 May 2024 14:32:57 GMT Subject: Integrated: 8330146: assert(!_thread->is_in_any_VTMS_transition()) failed In-Reply-To: References: Message-ID: On Thu, 2 May 2024 10:07:35 GMT, Serguei Spitsyn wrote: > Any event posting code except CFLH, ClassPrepare and ClassLoad events has a conditional return in case if the event is posted during a VTMS transition. The CFLH, ClassPrepare and ClassLoad event posting code has just an assert instead. The ClassPrepare and ClassLoad events also have a conditional return in a case of temporary VTMS transition. > This update is to align the CFLH, ClassPrepare and ClassLoad events with all other events in this area. > > Testing: > - TBD: submit mach5 tiers 1-6 This pull request has now been integrated. Changeset: c4ff58b9 Author: Serguei Spitsyn URL: https://git.openjdk.org/jdk/commit/c4ff58b9bcfd08eae0623a648a837e08f25b3f9b Stats: 9 lines in 1 file changed: 2 ins; 2 del; 5 mod 8330146: assert(!_thread->is_in_any_VTMS_transition()) failed Reviewed-by: cjplummer, kevinw ------------- PR: https://git.openjdk.org/jdk/pull/19054 From aph at openjdk.org Thu May 9 16:12:55 2024 From: aph at openjdk.org (Andrew Haley) Date: Thu, 9 May 2024 16:12:55 GMT Subject: RFR: 8331558: AArch64: optimize integer remainder [v8] In-Reply-To: References: Message-ID: On Thu, 9 May 2024 03:30:04 GMT, Jin Guojie wrote: >> 8331558: AArch64: optimize integer remainder >> On some Arm processors, a separate multiply/subtract is actually faster than the combined instruction. >> >> 8331556: AArch64: CPU_Model support for Neoverse N1/N2/V1/V2 >> Add full platform coverage for Neoverse variants in vm_version.?pp >> >> The following test has passed, which shows definite performance improvement. >> >> make test TEST="micro:java.lang.IntegerDivMod" >> make test TEST="micro:java.lang.LongDivMod" >> >> * IntegerDivMod.testDivideRemainderUnsigned >> baseline(ns/ops) 2223 >> with this pacth(ns/ops) 1885 >> improvement(%) 17.93% >> >> * IntegerDivMod.testRemainderUnsigned >> baseline(ns/ops) 2225 >> with this pacth(ns/ops) 1885 >> improvement(%) 18.03% >> >> * LongDivMod.testDivideRemainderUnsigned >> baseline(ns/ops) 2231 >> with this pacth(ns/ops) 1894 >> improvement(%) 17.79% >> >> * LongDivMod.testRemainderUnsigned >> baseline(ns/ops) 2232 >> with this pacth(ns/ops) 1891 >> improvement(%) 18.03% > > Jin Guojie has updated the pull request incrementally with one additional commit since the last revision: > > Refine comments src/hotspot/cpu/aarch64/macroAssembler_aarch64.hpp line 472: > 470: } > 471: } > 472: OK, I'm happy to approve this, but these functions are too big now to be inlined. Please put them in macroAssembler_aarch64.cpp and we're done. Thanks. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19093#discussion_r1595668437 From matsaave at openjdk.org Thu May 9 18:31:24 2024 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Thu, 9 May 2024 18:31:24 GMT Subject: RFR: 8329418: Replace pointers to tables with offsets in relocation bitmap [v3] In-Reply-To: References: Message-ID: > The beginning of the RW region contains pointers to c++ vtables which are always located at a fixed offset from the shared base address at runtime. This offset can be calculated at dumptime and stored with the read-only tables at the top of the RO region. As a further improvement, all the pointers to RO tables are replaced with offsets as well. > > These changes will reduce the number of pointers in the RW and RO regions and will allow for the relocation bitmap size optimizations to be more effective. Verified with tier 1-5 tests. Matias Saavedra Silva has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 12 additional commits since the last revision: - Merge branch 'master' into pointer_to_offset_8329418 - Ioi comments - Chris comments and cleanup - Merge branch 'master' into pointer_to_offset_8329418 - Cleanup - Corrected SA - Editing SA - Fixed dynamic dumping - Now works with -Xshare:on - Adjusted serialization - ... and 2 more: https://git.openjdk.org/jdk/compare/aaa2e59e...11f39483 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19107/files - new: https://git.openjdk.org/jdk/pull/19107/files/d40afef9..11f39483 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19107&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19107&range=01-02 Stats: 17065 lines in 289 files changed: 8650 ins; 5891 del; 2524 mod Patch: https://git.openjdk.org/jdk/pull/19107.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19107/head:pull/19107 PR: https://git.openjdk.org/jdk/pull/19107 From iklam at openjdk.org Thu May 9 19:59:54 2024 From: iklam at openjdk.org (Ioi Lam) Date: Thu, 9 May 2024 19:59:54 GMT Subject: RFR: 8329418: Replace pointers to tables with offsets in relocation bitmap [v3] In-Reply-To: References: Message-ID: On Thu, 9 May 2024 18:31:24 GMT, Matias Saavedra Silva wrote: >> The beginning of the RW region contains pointers to c++ vtables which are always located at a fixed offset from the shared base address at runtime. This offset can be calculated at dumptime and stored with the read-only tables at the top of the RO region. As a further improvement, all the pointers to RO tables are replaced with offsets as well. >> >> These changes will reduce the number of pointers in the RW and RO regions and will allow for the relocation bitmap size optimizations to be more effective. Verified with tier 1-5 tests. > > Matias Saavedra Silva has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 12 additional commits since the last revision: > > - Merge branch 'master' into pointer_to_offset_8329418 > - Ioi comments > - Chris comments and cleanup > - Merge branch 'master' into pointer_to_offset_8329418 > - Cleanup > - Corrected SA > - Editing SA > - Fixed dynamic dumping > - Now works with -Xshare:on > - Adjusted serialization > - ... and 2 more: https://git.openjdk.org/jdk/compare/c6bab5bc...11f39483 LGTM. Just one small nit. src/hotspot/share/cds/serializeClosure.hpp line 52: > 50: > 51: // Iterate on the pointers from p[0] through p[num_pointers-1] > 52: void do_ptrs(u_char* start, size_t size) { I think it will be more consistent if we use the same `void** p` parameter as in `do_ptr()`. The `u_char*` here is a historical oddity and should be fixed. ------------- Marked as reviewed by iklam (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19107#pullrequestreview-2048708441 PR Review Comment: https://git.openjdk.org/jdk/pull/19107#discussion_r1595908706 From dcubed at openjdk.org Thu May 9 20:42:55 2024 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Thu, 9 May 2024 20:42:55 GMT Subject: RFR: 8314225: SIGSEGV in JavaThread::is_lock_owned [v3] In-Reply-To: References: Message-ID: On Thu, 2 May 2024 08:59:06 GMT, Kevin Walls wrote: >> Good point. Only JavaThread's can own ObjectMonitors. > > OK yes - can move that to JavaThread, with just adding one cast in synchronizer.cpp, where > ObjectSynchronizer::FastHashCode(Thread*, oop) uses is_lock_owned. > > (ObjectSynchronizer::FastHashCode may be a candidate for taking JavaThread instead, maybe chasing down the users of that is a separate task. 8-) ) FTR: The last time we talked about that we discovered that JVM/TI object tagging still created fast hashcodes and that work is done by the VMThread (or some other non-JavaThread). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18940#discussion_r1595964699 From matsaave at openjdk.org Thu May 9 21:08:36 2024 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Thu, 9 May 2024 21:08:36 GMT Subject: RFR: 8329418: Replace pointers to tables with offsets in relocation bitmap [v4] In-Reply-To: References: Message-ID: <2VF1-BhwWyrgDB09gN9mJRsXhptsQQ5n_wZJAaVlWXo=.9c4a5fbc-f815-4452-a514-52eaeea908d2@github.com> > The beginning of the RW region contains pointers to c++ vtables which are always located at a fixed offset from the shared base address at runtime. This offset can be calculated at dumptime and stored with the read-only tables at the top of the RO region. As a further improvement, all the pointers to RO tables are replaced with offsets as well. > > These changes will reduce the number of pointers in the RW and RO regions and will allow for the relocation bitmap size optimizations to be more effective. Verified with tier 1-5 tests. Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: u_char* to void** ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19107/files - new: https://git.openjdk.org/jdk/pull/19107/files/11f39483..7fca6588 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19107&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19107&range=02-03 Stats: 6 lines in 2 files changed: 0 ins; 0 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/19107.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19107/head:pull/19107 PR: https://git.openjdk.org/jdk/pull/19107 From duke at openjdk.org Thu May 9 22:23:02 2024 From: duke at openjdk.org (Volodymyr Paprotski) Date: Thu, 9 May 2024 22:23:02 GMT Subject: RFR: 8329538: Accelerate P256 on x86_64 using Montgomery intrinsic [v7] In-Reply-To: References: Message-ID: > Performance. Before: > > Benchmark (algorithm) (dataSize) (keyLength) (provider) Mode Cnt Score Error Units > SignatureBench.ECDSA.sign SHA256withECDSA 1024 256 thrpt 3 6443.934 ? 6.491 ops/s > SignatureBench.ECDSA.sign SHA256withECDSA 16384 256 thrpt 3 6152.979 ? 4.954 ops/s > SignatureBench.ECDSA.verify SHA256withECDSA 1024 256 thrpt 3 1895.410 ? 36.979 ops/s > SignatureBench.ECDSA.verify SHA256withECDSA 16384 256 thrpt 3 1878.955 ? 45.487 ops/s > Benchmark (algorithm) (keyLength) (kpgAlgorithm) (provider) Mode Cnt Score Error Units > o.o.b.j.c.full.KeyAgreementBench.EC.generateSecret ECDH 256 EC thrpt 3 1357.810 ? 26.584 ops/s > o.o.b.j.c.small.KeyAgreementBench.EC.generateSecret ECDH 256 EC thrpt 3 1352.119 ? 23.547 ops/s > Benchmark (isMontBench) Mode Cnt Score Error Units > PolynomialP256Bench.benchMultiply false thrpt 3 1746.126 ? 10.970 ops/s > > Performance, no intrinsic: > > Benchmark (algorithm) (dataSize) (keyLength) (provider) Mode Cnt Score Error Units > SignatureBench.ECDSA.sign SHA256withECDSA 1024 256 thrpt 3 6529.839 ? 42.420 ops/s > SignatureBench.ECDSA.sign SHA256withECDSA 16384 256 thrpt 3 6199.747 ? 133.566 ops/s > SignatureBench.ECDSA.verify SHA256withECDSA 1024 256 thrpt 3 1973.676 ? 54.071 ops/s > SignatureBench.ECDSA.verify SHA256withECDSA 16384 256 thrpt 3 1932.127 ? 35.920 ops/s > Benchmark (algorithm) (keyLength) (kpgAlgorithm) (provider) Mode Cnt Score Error Units > o.o.b.j.c.full.KeyAgreementBench.EC.generateSecret ECDH 256 EC thrpt 3 1355.788 ? 29.858 ops/s > o.o.b.j.c.small.KeyAgreementBench.EC.generateSecret ECDH 256 EC thrpt 3 1346.523 ? 28.722 ops/s > Benchmark (isMontBench) Mode Cnt Score Error Units > PolynomialP256Bench.benchMultiply true thrpt 3 1919.574 ? 10.591 ops/s > > Performance, **with intrinsics*... Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision: whitespace ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18583/files - new: https://git.openjdk.org/jdk/pull/18583/files/8ff243a2..1ecfdc44 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18583&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18583&range=05-06 Stats: 753 lines in 9 files changed: 303 ins; 101 del; 349 mod Patch: https://git.openjdk.org/jdk/pull/18583.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18583/head:pull/18583 PR: https://git.openjdk.org/jdk/pull/18583 From ascarpino at openjdk.org Thu May 9 23:39:06 2024 From: ascarpino at openjdk.org (Anthony Scarpino) Date: Thu, 9 May 2024 23:39:06 GMT Subject: RFR: 8329538: Accelerate P256 on x86_64 using Montgomery intrinsic [v7] In-Reply-To: References: Message-ID: On Thu, 9 May 2024 22:23:02 GMT, Volodymyr Paprotski wrote: >> Performance. Before: >> >> Benchmark (algorithm) (dataSize) (keyLength) (provider) Mode Cnt Score Error Units >> SignatureBench.ECDSA.sign SHA256withECDSA 1024 256 thrpt 3 6443.934 ? 6.491 ops/s >> SignatureBench.ECDSA.sign SHA256withECDSA 16384 256 thrpt 3 6152.979 ? 4.954 ops/s >> SignatureBench.ECDSA.verify SHA256withECDSA 1024 256 thrpt 3 1895.410 ? 36.979 ops/s >> SignatureBench.ECDSA.verify SHA256withECDSA 16384 256 thrpt 3 1878.955 ? 45.487 ops/s >> Benchmark (algorithm) (keyLength) (kpgAlgorithm) (provider) Mode Cnt Score Error Units >> o.o.b.j.c.full.KeyAgreementBench.EC.generateSecret ECDH 256 EC thrpt 3 1357.810 ? 26.584 ops/s >> o.o.b.j.c.small.KeyAgreementBench.EC.generateSecret ECDH 256 EC thrpt 3 1352.119 ? 23.547 ops/s >> Benchmark (isMontBench) Mode Cnt Score Error Units >> PolynomialP256Bench.benchMultiply false thrpt 3 1746.126 ? 10.970 ops/s >> >> Performance, no intrinsic: >> >> Benchmark (algorithm) (dataSize) (keyLength) (provider) Mode Cnt Score Error Units >> SignatureBench.ECDSA.sign SHA256withECDSA 1024 256 thrpt 3 6529.839 ? 42.420 ops/s >> SignatureBench.ECDSA.sign SHA256withECDSA 16384 256 thrpt 3 6199.747 ? 133.566 ops/s >> SignatureBench.ECDSA.verify SHA256withECDSA 1024 256 thrpt 3 1973.676 ? 54.071 ops/s >> SignatureBench.ECDSA.verify SHA256withECDSA 16384 256 thrpt 3 1932.127 ? 35.920 ops/s >> Benchmark (algorithm) (keyLength) (kpgAlgorithm) (provider) Mode Cnt Score Error Units >> o.o.b.j.c.full.KeyAgreementBench.EC.generateSecret ECDH 256 EC thrpt 3 1355.788 ? 29.858 ops/s >> o.o.b.j.c.small.KeyAgreementBench.EC.generateSecret ECDH 256 EC thrpt 3 1346.523 ? 28.722 ops/s >> Benchmark (isMontBench) Mode Cnt Score Error Units >> PolynomialP256Bench.benchMultiply true thrpt 3 1919.57... > > Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision: > > whitespace src/java.base/share/classes/sun/security/ec/ECOperations.java line 701: > 699: if (!m.equals(v)) { > 700: java.util.HexFormat hex = java.util.HexFormat.of(); > 701: throw new RuntimeException(); I think your cleanup went to far. You should have some message saying they are not equal and if you don't want to print hex, remove getting an instance. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18583#discussion_r1596099182 From david.holmes at oracle.com Thu May 9 23:47:58 2024 From: david.holmes at oracle.com (David Holmes) Date: Fri, 10 May 2024 09:47:58 +1000 Subject: External _JAVA_OPTIONS environment variable sourcing for self-contained applications In-Reply-To: References: <1bc8a1a8-5adf-4a00-800c-cfe626608ae6@oracle.com> Message-ID: <918f3a96-cc75-43a5-b19b-fefe063e82ea@oracle.com> On 9/05/2024 5:40 pm, Alan Bateman wrote: > On 09/05/2024 08:03, David Holmes wrote: >> >> How does such a jpackaged application actually launch/load the JVM? >> I'm wondering if there is a way to insert a new "shell" environment to >> launch the JVM without having those env vars present ... though I >> guess there may be other env vars that your application still needs. > > For modular applications, there is a jlink option to generate a launcher > (script) for the application. That's a potential place to unset > environment variables that shouldn't be inherited.? It may not help here > as it sounds like this is an application image produced by jpackage with > a native launcher, and the warning message is hidden as there is no > console (I assume). > > I think we should consider deprecating and eventually removing > _JAVA_OPTIONS. It's always been problematic that it appends rather than > prepend and it has issues in areas such as quoting. When > JDK_JAVA_OPTIONS was added then we had hoped that developers would move > from the undocumented env variable. The new env variable fixes a bunch > of things in the areas of quoting, arg files, works with launcher > options, and it of course prepends so it doesn't override options. I think overriding options was a feature of `_JAVA_OPTIONS` not a bug - at least at the time. :) But deployment models have evolved (to a point where I don't even know/understand how things get deployed these days and who has control of the command-line and/or the env!). Deprecation may be a reasonable thing but doesn't help the current situation. David > -Alan From duke at openjdk.org Fri May 10 00:15:40 2024 From: duke at openjdk.org (Volodymyr Paprotski) Date: Fri, 10 May 2024 00:15:40 GMT Subject: RFR: 8329538: Accelerate P256 on x86_64 using Montgomery intrinsic [v8] In-Reply-To: References: Message-ID: > Performance. Before: > > Benchmark (algorithm) (dataSize) (keyLength) (provider) Mode Cnt Score Error Units > SignatureBench.ECDSA.sign SHA256withECDSA 1024 256 thrpt 3 6443.934 ? 6.491 ops/s > SignatureBench.ECDSA.sign SHA256withECDSA 16384 256 thrpt 3 6152.979 ? 4.954 ops/s > SignatureBench.ECDSA.verify SHA256withECDSA 1024 256 thrpt 3 1895.410 ? 36.979 ops/s > SignatureBench.ECDSA.verify SHA256withECDSA 16384 256 thrpt 3 1878.955 ? 45.487 ops/s > Benchmark (algorithm) (keyLength) (kpgAlgorithm) (provider) Mode Cnt Score Error Units > o.o.b.j.c.full.KeyAgreementBench.EC.generateSecret ECDH 256 EC thrpt 3 1357.810 ? 26.584 ops/s > o.o.b.j.c.small.KeyAgreementBench.EC.generateSecret ECDH 256 EC thrpt 3 1352.119 ? 23.547 ops/s > Benchmark (isMontBench) Mode Cnt Score Error Units > PolynomialP256Bench.benchMultiply false thrpt 3 1746.126 ? 10.970 ops/s > > Performance, no intrinsic: > > Benchmark (algorithm) (dataSize) (keyLength) (provider) Mode Cnt Score Error Units > SignatureBench.ECDSA.sign SHA256withECDSA 1024 256 thrpt 3 6529.839 ? 42.420 ops/s > SignatureBench.ECDSA.sign SHA256withECDSA 16384 256 thrpt 3 6199.747 ? 133.566 ops/s > SignatureBench.ECDSA.verify SHA256withECDSA 1024 256 thrpt 3 1973.676 ? 54.071 ops/s > SignatureBench.ECDSA.verify SHA256withECDSA 16384 256 thrpt 3 1932.127 ? 35.920 ops/s > Benchmark (algorithm) (keyLength) (kpgAlgorithm) (provider) Mode Cnt Score Error Units > o.o.b.j.c.full.KeyAgreementBench.EC.generateSecret ECDH 256 EC thrpt 3 1355.788 ? 29.858 ops/s > o.o.b.j.c.small.KeyAgreementBench.EC.generateSecret ECDH 256 EC thrpt 3 1346.523 ? 28.722 ops/s > Benchmark (isMontBench) Mode Cnt Score Error Units > PolynomialP256Bench.benchMultiply true thrpt 3 1919.574 ? 10.591 ops/s > > Performance, **with intrinsics*... Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision: add message back ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18583/files - new: https://git.openjdk.org/jdk/pull/18583/files/1ecfdc44..83b21310 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18583&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18583&range=06-07 Stats: 4 lines in 1 file changed: 3 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/18583.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18583/head:pull/18583 PR: https://git.openjdk.org/jdk/pull/18583 From duke at openjdk.org Fri May 10 00:15:40 2024 From: duke at openjdk.org (Volodymyr Paprotski) Date: Fri, 10 May 2024 00:15:40 GMT Subject: RFR: 8329538: Accelerate P256 on x86_64 using Montgomery intrinsic [v7] In-Reply-To: References: Message-ID: On Thu, 9 May 2024 23:36:03 GMT, Anthony Scarpino wrote: >> Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision: >> >> whitespace > > src/java.base/share/classes/sun/security/ec/ECOperations.java line 701: > >> 699: if (!m.equals(v)) { >> 700: java.util.HexFormat hex = java.util.HexFormat.of(); >> 701: throw new RuntimeException(); > > I think your cleanup went to far. You should have some message saying they are not equal and if you don't want to print hex, remove getting an instance. I put the message back.. I removed it 'half'-intentionally; Was comparing against the original version and it didn't have any details, thought maybe should follow suit. But I did find this message helpful, so its back. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18583#discussion_r1596116606 From duke at openjdk.org Fri May 10 00:19:32 2024 From: duke at openjdk.org (Volodymyr Paprotski) Date: Fri, 10 May 2024 00:19:32 GMT Subject: RFR: 8329538: Accelerate P256 on x86_64 using Montgomery intrinsic [v9] In-Reply-To: References: Message-ID: > Performance. Before: > > Benchmark (algorithm) (dataSize) (keyLength) (provider) Mode Cnt Score Error Units > SignatureBench.ECDSA.sign SHA256withECDSA 1024 256 thrpt 3 6443.934 ? 6.491 ops/s > SignatureBench.ECDSA.sign SHA256withECDSA 16384 256 thrpt 3 6152.979 ? 4.954 ops/s > SignatureBench.ECDSA.verify SHA256withECDSA 1024 256 thrpt 3 1895.410 ? 36.979 ops/s > SignatureBench.ECDSA.verify SHA256withECDSA 16384 256 thrpt 3 1878.955 ? 45.487 ops/s > Benchmark (algorithm) (keyLength) (kpgAlgorithm) (provider) Mode Cnt Score Error Units > o.o.b.j.c.full.KeyAgreementBench.EC.generateSecret ECDH 256 EC thrpt 3 1357.810 ? 26.584 ops/s > o.o.b.j.c.small.KeyAgreementBench.EC.generateSecret ECDH 256 EC thrpt 3 1352.119 ? 23.547 ops/s > Benchmark (isMontBench) Mode Cnt Score Error Units > PolynomialP256Bench.benchMultiply false thrpt 3 1746.126 ? 10.970 ops/s > > Performance, no intrinsic: > > Benchmark (algorithm) (dataSize) (keyLength) (provider) Mode Cnt Score Error Units > SignatureBench.ECDSA.sign SHA256withECDSA 1024 256 thrpt 3 6529.839 ? 42.420 ops/s > SignatureBench.ECDSA.sign SHA256withECDSA 16384 256 thrpt 3 6199.747 ? 133.566 ops/s > SignatureBench.ECDSA.verify SHA256withECDSA 1024 256 thrpt 3 1973.676 ? 54.071 ops/s > SignatureBench.ECDSA.verify SHA256withECDSA 16384 256 thrpt 3 1932.127 ? 35.920 ops/s > Benchmark (algorithm) (keyLength) (kpgAlgorithm) (provider) Mode Cnt Score Error Units > o.o.b.j.c.full.KeyAgreementBench.EC.generateSecret ECDH 256 EC thrpt 3 1355.788 ? 29.858 ops/s > o.o.b.j.c.small.KeyAgreementBench.EC.generateSecret ECDH 256 EC thrpt 3 1346.523 ? 28.722 ops/s > Benchmark (isMontBench) Mode Cnt Score Error Units > PolynomialP256Bench.benchMultiply true thrpt 3 1919.574 ? 10.591 ops/s > > Performance, **with intrinsics*... Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision: whitespace ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18583/files - new: https://git.openjdk.org/jdk/pull/18583/files/83b21310..8cd095dd Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18583&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18583&range=07-08 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/18583.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18583/head:pull/18583 PR: https://git.openjdk.org/jdk/pull/18583 From ccheung at openjdk.org Fri May 10 00:56:35 2024 From: ccheung at openjdk.org (Calvin Cheung) Date: Fri, 10 May 2024 00:56:35 GMT Subject: RFR: 8330198: Add some class loading related perf counters to measure VM startup [v2] In-Reply-To: References: Message-ID: > Adding a few perf counters related to class loading to measure VM startup. The counters are only active if the user specifies `-Xlog:init` in the command line. A diagnostic flag `ProfileClassLinkage` is added to control the new counters. The flag is set to false by default and will be enabled if `-Xlog:init` is specified. > > This change is already in the leyden/premain branch. There are more counters in the branch to measure other stuff. For now, just upstreaming class loader related counters. > > Refer to the [comment](https://bugs.openjdk.org/browse/JDK-8330198?focusedId=14665311&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14665311) in the bug report for an example output. > > Passed tiers 1 - 4 testing. Calvin Cheung has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits: - Merge branch 'master' into xloginit-classloading - fix build issues on macos-x64 and -aarch64 - Merge branch 'master' into xloginit-classloading - fix linux-x86 and minimal build issues - 8330198: Add some class loading related perf counters to measure VM startup ------------- Changes: https://git.openjdk.org/jdk/pull/18790/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=18790&range=01 Stats: 179 lines in 15 files changed: 158 ins; 6 del; 15 mod Patch: https://git.openjdk.org/jdk/pull/18790.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18790/head:pull/18790 PR: https://git.openjdk.org/jdk/pull/18790 From matsaave at openjdk.org Fri May 10 01:37:10 2024 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Fri, 10 May 2024 01:37:10 GMT Subject: RFR: 8329418: Replace pointers to tables with offsets in relocation bitmap [v2] In-Reply-To: References: Message-ID: On Tue, 7 May 2024 21:39:04 GMT, Chris Plummer wrote: >> Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: >> >> Chris comments and cleanup > > SA changes look good. Thanks for taking care of this. Thanks for the reviews @plummercj and @iklam! ------------- PR Comment: https://git.openjdk.org/jdk/pull/19107#issuecomment-2103692236 From matsaave at openjdk.org Fri May 10 01:37:10 2024 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Fri, 10 May 2024 01:37:10 GMT Subject: Integrated: 8329418: Replace pointers to tables with offsets in relocation bitmap In-Reply-To: References: Message-ID: <1BtZbZp_vkiEPIf3xDeZ05CSCh6mnZ2p3qobDuM-aa0=.acaf9e28-1d99-48ee-90fd-0878c99b8ff6@github.com> On Mon, 6 May 2024 17:05:47 GMT, Matias Saavedra Silva wrote: > The beginning of the RW region contains pointers to c++ vtables which are always located at a fixed offset from the shared base address at runtime. This offset can be calculated at dumptime and stored with the read-only tables at the top of the RO region. As a further improvement, all the pointers to RO tables are replaced with offsets as well. > > These changes will reduce the number of pointers in the RW and RO regions and will allow for the relocation bitmap size optimizations to be more effective. Verified with tier 1-5 tests. This pull request has now been integrated. Changeset: a706ca4f Author: Matias Saavedra Silva URL: https://git.openjdk.org/jdk/commit/a706ca4fdb4db4ba36c6ad04a37c37a348f8af0b Stats: 137 lines in 11 files changed: 64 ins; 24 del; 49 mod 8329418: Replace pointers to tables with offsets in relocation bitmap Reviewed-by: cjplummer, iklam ------------- PR: https://git.openjdk.org/jdk/pull/19107 From duke at openjdk.org Fri May 10 02:17:12 2024 From: duke at openjdk.org (Jin Guojie) Date: Fri, 10 May 2024 02:17:12 GMT Subject: RFR: 8331558: AArch64: optimize integer remainder [v9] In-Reply-To: References: Message-ID: > 8331558: AArch64: optimize integer remainder > On some Arm processors, a separate multiply/subtract is actually faster than the combined instruction. > > 8331556: AArch64: CPU_Model support for Neoverse N1/N2/V1/V2 > Add full platform coverage for Neoverse variants in vm_version.?pp > > The following test has passed, which shows definite performance improvement. > > make test TEST="micro:java.lang.IntegerDivMod" > make test TEST="micro:java.lang.LongDivMod" > > * IntegerDivMod.testDivideRemainderUnsigned > baseline(ns/ops) 2223 > with this pacth(ns/ops) 1885 > improvement(%) 17.93% > > * IntegerDivMod.testRemainderUnsigned > baseline(ns/ops) 2225 > with this pacth(ns/ops) 1885 > improvement(%) 18.03% > > * LongDivMod.testDivideRemainderUnsigned > baseline(ns/ops) 2231 > with this pacth(ns/ops) 1894 > improvement(%) 17.79% > > * LongDivMod.testRemainderUnsigned > baseline(ns/ops) 2232 > with this pacth(ns/ops) 1891 > improvement(%) 18.03% Jin Guojie has updated the pull request incrementally with two additional commits since the last revision: - Move big functions out of macroAssembler_aarch64.hpp - Fix is_neoverse() These macros (CPU_MODEL_NEOVERSE_N1...) are definitions of is_model, not _cpu. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19093/files - new: https://git.openjdk.org/jdk/pull/19093/files/bb417893..2a16eba5 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19093&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19093&range=07-08 Stats: 58 lines in 4 files changed: 28 ins; 25 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/19093.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19093/head:pull/19093 PR: https://git.openjdk.org/jdk/pull/19093 From duke at openjdk.org Fri May 10 02:24:05 2024 From: duke at openjdk.org (Jin Guojie) Date: Fri, 10 May 2024 02:24:05 GMT Subject: RFR: 8331558: AArch64: optimize integer remainder [v8] In-Reply-To: References: Message-ID: On Thu, 9 May 2024 16:10:37 GMT, Andrew Haley wrote: >> Jin Guojie has updated the pull request incrementally with one additional commit since the last revision: >> >> Refine comments > > src/hotspot/cpu/aarch64/macroAssembler_aarch64.hpp line 472: > >> 470: } >> 471: } >> 472: > > OK, I'm happy to approve this, but these functions are too big now to be inlined. Please put them in macroAssembler_aarch64.cpp and we're done. Thanks. Done in these commits: https://github.com/openjdk/jdk/pull/19093/commits/e54be61f8c5cb6d95908e7435be5eb85b5df65fd https://github.com/openjdk/jdk/pull/19093/commits/2a16eba516ab7ff9b90f9a2d27dc63cf5a90015b ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19093#discussion_r1596177452 From david.holmes at oracle.com Fri May 10 06:55:16 2024 From: david.holmes at oracle.com (David Holmes) Date: Fri, 10 May 2024 16:55:16 +1000 Subject: Where does the openjdk JVM interpreter execute the bytecode instanceof operation In-Reply-To: References: Message-ID: I think the only open question remaining was about logging in the interpreter, in which case AFAIK the only way to log would be to define an InterpreterRuntime function to do it and call that from the assembly code - passing and preserving whatever args/regs are needed. I am not an expert on the details. David On 5/05/2024 6:13 pm, zhengxianwei wrote: > > Thank you. This is my first time using the mailing list, and I wasn't > aware of this issue. > > > I'll make sure to cc o hotspot-dev at openjdk.org > now. :-) > > On Sun, May 5, 2024 at 3:47?PM Julian Waters > wrote: > > By the way, when you reply to someone, you should also cc to > hotspot-dev at openjdk.org , for your > message to show up on the mailing lists. That way, more people will > see it and your chances of them helping you increase > > best regards, > Julian > > On Fri, May 3, 2024 at 3:54?PM zhengxianwei > wrote: > > I carefully analyzed it and found that what you said is actually > correct. > > I didn't understand it correctly initially. > > Thanks again for your explanation > > On Fri, May 3, 2024 at 11:03?AM Julian Waters > > wrote: > > Hi Xian Wei, > > No, you are right! The code in templateTable_x86.cpp that > you linked to in your post is not part of the Just in Time > Compilers, it is part of the x86 Interpreter! The Java > HotSpot VM actually has 2 different Interpreters, > the?primary Interpreter is written in large chunks of > assembly specific to each platform, which is then processed > by the HotSpot macro assemblers. The bytecodeInterpreter.cpp > file you linked to is part of the second and less often used > Interpreter, which is why modifying the > bytecodeInterpreter.cpp instanceof implementation did > nothing in your case (The Interpreter used actually depends > on the platform, and the secondary Interpreter is not used > on ARM or x86). The details on the macro assemblers > unfortunately elude me since I am not a HotSpot expert > (Although I hope to be one day), but to understand how > instanceof works on x86 and ARM, you need to understand both > x86 and ARM assembly. The Interpreter's instanceof opcode is > implemented on x86 in > https://github.com/openjdk/jdk/blob/6bef0474c8b8773d0d20c0f25c36a2ce9cdbd7e8/src/hotspot/cpu/x86/templateTable_x86.cpp#L4243 and on ARM, it is implemented in https://github.com/openjdk/jdk/blob/6bef0474c8b8773d0d20c0f25c36a2ce9cdbd7e8/src/hotspot/cpu/arm/templateTable_arm.cpp#L4182 > > Happy to help! > > best regards, > Julian > From gli at openjdk.org Fri May 10 07:11:05 2024 From: gli at openjdk.org (Guoxiong Li) Date: Fri, 10 May 2024 07:11:05 GMT Subject: RFR: 8331608: Consolidate EncodeGCModeConcurrentFrameClosure and TransformStackChunkClosure [v3] In-Reply-To: References: Message-ID: On Fri, 3 May 2024 13:46:05 GMT, Guoxiong Li wrote: >> Hi all, >> >> After [JDK-8296875](https://bugs.openjdk.org/browse/JDK-8296875), the classes `EncodeGCModeConcurrentFrameClosure` and `TransformStackChunkClosure` almost have the same code. This patch consolidates them into one. >> >> The tests `make test-hotspot_loom` and `make test-hotspot_gc` passed locally (linux & x64). Thanks for taking the time to review. >> >> Best Regards, >> -- Guoxiong > > Guoxiong Li has updated the pull request incrementally with one additional commit since the last revision: > > Remove parameter Kindly ping for review. Thanks. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19084#issuecomment-2104027810 From aph at openjdk.org Fri May 10 07:47:06 2024 From: aph at openjdk.org (Andrew Haley) Date: Fri, 10 May 2024 07:47:06 GMT Subject: RFR: 8331558: AArch64: optimize integer remainder [v9] In-Reply-To: References: Message-ID: On Fri, 10 May 2024 02:17:12 GMT, Jin Guojie wrote: >> 8331558: AArch64: optimize integer remainder >> On some Arm processors, a separate multiply/subtract is actually faster than the combined instruction. >> >> 8331556: AArch64: CPU_Model support for Neoverse N1/N2/V1/V2 >> Add full platform coverage for Neoverse variants in vm_version.?pp >> >> The following test has passed, which shows definite performance improvement. >> >> make test TEST="micro:java.lang.IntegerDivMod" >> make test TEST="micro:java.lang.LongDivMod" >> >> * IntegerDivMod.testDivideRemainderUnsigned >> baseline(ns/ops) 2223 >> with this pacth(ns/ops) 1885 >> improvement(%) 17.93% >> >> * IntegerDivMod.testRemainderUnsigned >> baseline(ns/ops) 2225 >> with this pacth(ns/ops) 1885 >> improvement(%) 18.03% >> >> * LongDivMod.testDivideRemainderUnsigned >> baseline(ns/ops) 2231 >> with this pacth(ns/ops) 1894 >> improvement(%) 17.79% >> >> * LongDivMod.testRemainderUnsigned >> baseline(ns/ops) 2232 >> with this pacth(ns/ops) 1891 >> improvement(%) 18.03% > > Jin Guojie has updated the pull request incrementally with two additional commits since the last revision: > > - Move big functions out of macroAssembler_aarch64.hpp > - Fix is_neoverse() > > These macros (CPU_MODEL_NEOVERSE_N1...) are definitions of is_model, not _cpu. OK, thanks. ------------- Marked as reviewed by aph (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19093#pullrequestreview-2049478289 From aph at openjdk.org Fri May 10 07:52:08 2024 From: aph at openjdk.org (Andrew Haley) Date: Fri, 10 May 2024 07:52:08 GMT Subject: RFR: 8322770: Implement C2 VectorizedHashCode on AArch64 In-Reply-To: <2VKOC-rT0vOyMcXUX2gs3sOrbZ5H79KBIo50sOOVmyI=.1936f78e-794c-4f54-af3c-b1b97e5fafa8@github.com> References: <2VKOC-rT0vOyMcXUX2gs3sOrbZ5H79KBIo50sOOVmyI=.1936f78e-794c-4f54-af3c-b1b97e5fafa8@github.com> Message-ID: On Tue, 26 Mar 2024 13:59:12 GMT, Mikhail Ablakatov wrote: > Hello, > > Please review the following PR for [JDK-8322770 Implement C2 VectorizedHashCode on AArch64](https://bugs.openjdk.org/browse/JDK-8322770). It follows previous work done in https://github.com/openjdk/jdk/pull/16629 and https://github.com/openjdk/jdk/pull/10847 for RISC-V and x86 respectively. > > The code to calculate a hash code consists of two parts: a vectorized loop of Neon instruction that process 4 or 8 elements per iteration depending on the data type and a fully unrolled scalar "loop" that processes up to 7 tail elements. > > At the time of writing this I don't see potential benefits from providing SVE/SVE2 implementation, but it could be added as a follow-up or independently later if required. > > # Performance > > ## Neoverse N1 > > > -------------------------------------------------------------------------------------------- > Version Baseline This patch > -------------------------------------------------------------------------------------------- > Benchmark (size) Mode Cnt Score Error Score Error Units > -------------------------------------------------------------------------------------------- > ArraysHashCode.bytes 1 avgt 15 1.249 ? 0.060 1.247 ? 0.062 ns/op > ArraysHashCode.bytes 10 avgt 15 8.754 ? 0.028 4.387 ? 0.015 ns/op > ArraysHashCode.bytes 100 avgt 15 98.596 ? 0.051 26.655 ? 0.097 ns/op > ArraysHashCode.bytes 10000 avgt 15 10150.578 ? 1.352 2649.962 ? 216.744 ns/op > ArraysHashCode.chars 1 avgt 15 1.286 ? 0.062 1.246 ? 0.054 ns/op > ArraysHashCode.chars 10 avgt 15 8.731 ? 0.002 5.344 ? 0.003 ns/op > ArraysHashCode.chars 100 avgt 15 98.632 ? 0.048 23.023 ? 0.142 ns/op > ArraysHashCode.chars 10000 avgt 15 10150.658 ? 3.374 2410.504 ? 8.872 ns/op > ArraysHashCode.ints 1 avgt 15 1.189 ? 0.005 1.187 ? 0.001 ns/op > ArraysHashCode.ints 10 avgt 15 8.730 ? 0.002 5.676 ? 0.001 ns/op > ArraysHashCode.ints 100 avgt 15 98.559 ? 0.016 24.378 ? 0.006 ns/op > ArraysHashCode.ints 10000 avgt 15 10148.752 ? 1.336 2419.015 ? 0.492 ns/op > ArraysHashCode.multibytes 1 avgt 15 1.037 ? 0.001 1.037 ? 0.001 ns/op > ArraysHashCode.multibytes 10 avgt 15 5.4... Hi, is this one stuck? What you have today is definitely an improvement, even though it's not as good as what we have for x86. I guess we could commit this and leave widening the arithmetic for a later enhancement if you have no time to work on it. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18487#issuecomment-2104113552 From fyang at openjdk.org Fri May 10 08:04:07 2024 From: fyang at openjdk.org (Fei Yang) Date: Fri, 10 May 2024 08:04:07 GMT Subject: RFR: 8326306: RISC-V: Re-structure MASM calls and jumps [v10] In-Reply-To: References: Message-ID: On Wed, 8 May 2024 15:07:09 GMT, Robbin Ehn wrote: >> Hi, please consider. >> >> We have code that directly use the asm for call/jumps instead masm. >> Our masm have a bit odd naming, and we don't use 'proper' pseudoinstructions/mnemonics. >> Suggested by [riscv-asm-manual](https://github.com/riscv-non-isa/riscv-asm-manual/tree/master) >> >> j offset jal x0, offset Jump >> jal offset jal x1, offset Jump and link >> jr rs jalr x0, rs, 0 Jump register >> jalr rs jalr x1, rs, 0 Jump and link register >> ret jalr x0, x1, 0 Return from subroutine >> call offset auipc x1, offset[31:12]; jalr x1, x1, offset[11:0] Call far-away subroutine >> tail offset auipc x6, offset[31:12]; jalr x0, x6, offset[11:0] Tail call far-away subroutine >> >> But these can only be implemented like this if you have small enough application. >> The fallback of these is to use GOT (your C compiler should place a copy of GOT every 2G so it's always reachable). >> We don't have GOT, instead we materialize, so there is still differences between these and ours. >> >> This patch: >> - Tries to follow these suggested mappings as good we can. >> - Make sure all jumps/calls go through MASM. (so we get control and can easily change for sites using a certain calling convention) >> - To avoid confusion between MASM public/private methods and ASM methods and the mnemonics there are some renaming. >> E.g. the mnemonics jal means call offset, as we can't use that so there is no 'jal'. >> - I enabled c.j, but right now we never generate it. >> - As always the macro does no good and are legacy from when code base did not use templates. (also the x-macros screws up my IDE (vim+rtags)) >> >> I started down this path due to I have followup patch on top of this which removes trampoline in favor for load-n-jump. >> (WIP: https://github.com/robehn/jdk/compare/jal-fixes...robehn:jdk:load-n-link?expand=1) >> While looking into our calls it was a bit confusing, this helps. >> >> Done a couple of t1-3 slightly different version of this patch, and as part of the followup, no issues found. (VF2, qemu, LP4) >> Re-running tests, had some last minute changes. >> >> Thanks, Robbin > > Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 12 additional commits since the last revision: > > - Merge branch 'master' into jal-fixes > - Revert JNI field, call()->li() > - Use li instead of movptr for call > - REVERT: Use li instead of movptr > - Use li instead of movptr > - VM leaf should use li > - Merge branch 'master' into jal-fixes > - Merge branch 'master' into jal-fixes > - Merge branch 'master' into jal-fixes > - Corrected method name > - ... and 2 more: https://git.openjdk.org/jdk/compare/19e5a392...d53e9694 HI, Thanks for the quick update. Two minor comments remain :-) src/hotspot/cpu/riscv/assembler_riscv.hpp line 2836: > 2834: Rd == x0 && > 2835: is_simm12(offset) && ((offset % 2) == 0)) { > 2836: c_j(offset); Is RV32C-only instructions usable for our case? Or will this if block be test covered? src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 653: > 651: push_reg(RegSet::of(t0, xmethod), sp); // push << t0 & xmethod >> to sp > 652: mv(t0, entry_point, offset); > 653: Assembler::jalr(x1, t0, offset); Maybe simply `jalr(t0, offset);`? ------------- PR Review: https://git.openjdk.org/jdk/pull/18942#pullrequestreview-2049498595 PR Review Comment: https://git.openjdk.org/jdk/pull/18942#discussion_r1596408854 PR Review Comment: https://git.openjdk.org/jdk/pull/18942#discussion_r1596403929 From aph at openjdk.org Fri May 10 08:05:08 2024 From: aph at openjdk.org (Andrew Haley) Date: Fri, 10 May 2024 08:05:08 GMT Subject: RFR: 8325821: [REDO] use "dmb.ishst+dmb.ishld" for release barrier [v7] In-Reply-To: References: Message-ID: On Tue, 16 Apr 2024 14:06:14 GMT, Andrew Haley wrote: >> kuaiwei has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix arm build error > > Argh, I found it. It happens because C2 calls `masm->offset()` from `PhaseOutput::fill_buffer()` after every node is emitted. So that trick isn't going to work. > > It was worth a try, but given that C2 expects offset() to be correct after every node, I think we're stuck. Maybe the last idea you had is the best possible without C2 tinkering. > @theRealAph Could you help review this PR? Thanks. I think we should go with your original simple patch for now. Trying to make the Assembler do the optimal thing has not turned out to be very easy, and I'm worried it's too much of a maintenance burden. Simply emitting `dmb st; dmb ld` for releasing stores is enough for now. Thank you for trying to make this work. I still have in my mind that there might be an easy way to do it, but it's looking unlikely. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18467#issuecomment-2104134470 From gli at openjdk.org Fri May 10 08:44:18 2024 From: gli at openjdk.org (Guoxiong Li) Date: Fri, 10 May 2024 08:44:18 GMT Subject: RFR: 8331557: Serial: Refactor SerialHeap::do_collection [v5] In-Reply-To: <3KivzgORzLhAreonPr-CJki3nXgPznlKMpqI4fQCWuk=.f44a8b19-56e6-4809-ac7b-659d700407af@github.com> References: <3KivzgORzLhAreonPr-CJki3nXgPznlKMpqI4fQCWuk=.f44a8b19-56e6-4809-ac7b-659d700407af@github.com> Message-ID: On Wed, 8 May 2024 10:04:15 GMT, Albert Mingkun Yang wrote: >> It's probably easier to read the new code directly. The two classes in `serialVMOperations` serve as entrance points to invoke young/full GCs. Some previously hidden decisions are made more obvious, e.g. if a young-gc fails (or will probablly fail), fallback to full-gc. >> >> Additionally, `StatRecord` is removed, because this kind of info-aggregation should be done outsite VM (by third-party tool). >> >> Test: tier1-6 > > Albert Mingkun Yang has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits: > > - Merge branch 'master' into s1-do-collect > - merge > - review > - Merge branch 'master' into s1-do-collect > - s1-do-collect Looks good. One suggestion, but not necessary. src/hotspot/share/gc/serial/serialHeap.cpp line 656: > 654: do_full_collection_no_gc_locker(clear_soft_refs); > 655: } > 656: Maybe the method `do_young_gc` can be renamed to `do_young_collection` or `do_young_collection_no_gc_locker` which is consistent with `do_full_collection` or `do_full_collection_no_gc_locker`. ------------- Marked as reviewed by gli (Committer). PR Review: https://git.openjdk.org/jdk/pull/19056#pullrequestreview-2049609145 PR Review Comment: https://git.openjdk.org/jdk/pull/19056#discussion_r1596463754 From ayang at openjdk.org Fri May 10 08:55:36 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Fri, 10 May 2024 08:55:36 GMT Subject: RFR: 8331557: Serial: Refactor SerialHeap::do_collection [v6] In-Reply-To: References: Message-ID: > It's probably easier to read the new code directly. The two classes in `serialVMOperations` serve as entrance points to invoke young/full GCs. Some previously hidden decisions are made more obvious, e.g. if a young-gc fails (or will probablly fail), fallback to full-gc. > > Additionally, `StatRecord` is removed, because this kind of info-aggregation should be done outsite VM (by third-party tool). > > Test: tier1-6 Albert Mingkun Yang has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains seven commits: - Merge branch 'master' into s1-do-collect - review - Merge branch 'master' into s1-do-collect - merge - review - Merge branch 'master' into s1-do-collect - s1-do-collect ------------- Changes: https://git.openjdk.org/jdk/pull/19056/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19056&range=05 Stats: 566 lines in 15 files changed: 125 ins; 356 del; 85 mod Patch: https://git.openjdk.org/jdk/pull/19056.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19056/head:pull/19056 PR: https://git.openjdk.org/jdk/pull/19056 From duke at openjdk.org Fri May 10 10:04:21 2024 From: duke at openjdk.org (Jin Guojie) Date: Fri, 10 May 2024 10:04:21 GMT Subject: Integrated: 8331558: AArch64: optimize integer remainder In-Reply-To: References: Message-ID: On Mon, 6 May 2024 01:30:45 GMT, Jin Guojie wrote: > 8331558: AArch64: optimize integer remainder > On some Arm processors, a separate multiply/subtract is actually faster than the combined instruction. > > 8331556: AArch64: CPU_Model support for Neoverse N1/N2/V1/V2 > Add full platform coverage for Neoverse variants in vm_version.?pp > > The following test has passed, which shows definite performance improvement. > > make test TEST="micro:java.lang.IntegerDivMod" > make test TEST="micro:java.lang.LongDivMod" > > * IntegerDivMod.testDivideRemainderUnsigned > baseline(ns/ops) 2223 > with this pacth(ns/ops) 1885 > improvement(%) 17.93% > > * IntegerDivMod.testRemainderUnsigned > baseline(ns/ops) 2225 > with this pacth(ns/ops) 1885 > improvement(%) 18.03% > > * LongDivMod.testDivideRemainderUnsigned > baseline(ns/ops) 2231 > with this pacth(ns/ops) 1894 > improvement(%) 17.79% > > * LongDivMod.testRemainderUnsigned > baseline(ns/ops) 2232 > with this pacth(ns/ops) 1891 > improvement(%) 18.03% This pull request has now been integrated. Changeset: dab92c51 Author: ?? Committer: Eric Liu URL: https://git.openjdk.org/jdk/commit/dab92c51c70767abcda3b1a91dd7d1a9b40290c1 Stats: 69 lines in 4 files changed: 54 ins; 9 del; 6 mod 8331558: AArch64: optimize integer remainder Reviewed-by: eliu, aph ------------- PR: https://git.openjdk.org/jdk/pull/19093 From duke at openjdk.org Fri May 10 10:15:32 2024 From: duke at openjdk.org (Liming Liu) Date: Fri, 10 May 2024 10:15:32 GMT Subject: RFR: 8324781: runtime/Thread/TestAlwaysPreTouchStacks.java failed with Expected a higher ratio between stack committed and reserved [v9] In-Reply-To: References: Message-ID: On Mon, 6 May 2024 08:14:29 GMT, Stefan Karlsson wrote: >> Liming Liu has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix the wrong condition > > Good that you found that `!UseTransparentHugesPages` bug. Hi, @stefank and @jdksjolen, could you please sponsor this? ------------- PR Comment: https://git.openjdk.org/jdk/pull/18592#issuecomment-2104339860 From stuefe at openjdk.org Fri May 10 10:21:49 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Fri, 10 May 2024 10:21:49 GMT Subject: RFR: 8332042: Move MEMFLAGS to its own include file Message-ID: MEMFLAGS, as well as its enum constants, should live in its own include. The constants are used throughout the code base, often without needing the allocation APIs exposed through allocation.hpp. The MEMFLAGS enum def is often needed within NMT itself, again often without needing allocation.hpp. --- This patch moves the enum to its new file. It fixes those `allocation.hpp` includes that where only needed to get MEMFLAGS. It does not fix other includes. For backward compatibility, until we straightened out the dependencies (e.g., fixing all places where we rely on indirect includes), I added memflags.hpp to allocation.hpp. I tested (built) on: - MacOS aarch64, no precompiled headers, fastdebug - Linux x64, no precompiled headers, fastdebug, release, fastdebug crossbuild to aarch64, fastdebug minimal ------------- Commit messages: - Update g1MonotonicArena.hpp - Update g1MonotonicArena.hpp - NMT-factor-out-memflags Changes: https://git.openjdk.org/jdk/pull/19172/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19172&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8332042 Stats: 225 lines in 25 files changed: 124 ins; 64 del; 37 mod Patch: https://git.openjdk.org/jdk/pull/19172.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19172/head:pull/19172 PR: https://git.openjdk.org/jdk/pull/19172 From stuefe at openjdk.org Fri May 10 10:27:10 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Fri, 10 May 2024 10:27:10 GMT Subject: RFR: 8332042: Move MEMFLAGS to its own include file In-Reply-To: References: Message-ID: On Fri, 10 May 2024 09:06:08 GMT, Thomas Stuefe wrote: > MEMFLAGS, as well as its enum constants, should live in its own include. > > The constants are used throughout the code base, often without needing the allocation APIs exposed through allocation.hpp. > > The MEMFLAGS enum def is often needed within NMT itself, again often without needing allocation.hpp. > > --- > > This patch moves the enum to its new file. > > It fixes those `allocation.hpp` includes that where only needed to get MEMFLAGS. It does not fix other includes. > > For backward compatibility, until we straightened out the dependencies (e.g., fixing all places where we rely on indirect includes), I added memflags.hpp to allocation.hpp. > > I tested (built) on: > - MacOS aarch64, no precompiled headers, fastdebug > - Linux x64, no precompiled headers, fastdebug, release, fastdebug crossbuild to aarch64, fastdebug minimal Ping @afshin-zafari @jdksjolen @gerard-ziemski ------------- PR Comment: https://git.openjdk.org/jdk/pull/19172#issuecomment-2104357678 From crschnick at xpipe.io Fri May 10 10:42:13 2024 From: crschnick at xpipe.io (Christopher Schnick) Date: Fri, 10 May 2024 12:42:13 +0200 Subject: External _JAVA_OPTIONS environment variable sourcing for self-contained applications In-Reply-To: <918f3a96-cc75-43a5-b19b-fefe063e82ea@oracle.com> References: <1bc8a1a8-5adf-4a00-800c-cfe626608ae6@oracle.com> <918f3a96-cc75-43a5-b19b-fefe063e82ea@oracle.com> Message-ID: <285f99c9-0689-4059-b9c4-860879332465@xpipe.io> From my perspective, it doesn't really matter which environment variable you're talking about. Even if there are small differences in which order they apply, they generally all cause the issue of a global configuration interfering with a local isolated self contained runtime image. So _JAVA_OPTIONS and JAVA_TOOL_OPTIONS cause the same problems, with only minor differences. In practice, global environment variables are intended for things like Java 8 applications that run via a globally installed JRE. The huge issue is that there is a chance of an option being included in there that is not supported by more recent JVMs like one for Java 21. If this is the case, then ALL self contained graphical Java applications don't even start up due to an unrecognized option and don't show an error message (If you are running a console based application, then it prints something but for desktop applications there is nothing). As of right now, there is no possibility of running a global JRE/JDK configured with certain environment variable options on the same system as a self contained Java application created with the available JDK tools if the options are not exactly compatible. That problem is especially relevant when running JVMs from different vendors for different applications as they differentiate themselves through options. One incompatible option is all it takes for nothing to run anymore. There are multiple different possibilities that I can think of to somehow improve this situation: - Give developers the option to unset these variables in the automatically generated launcher script for jlink. Technically one can modify the launcher script manually, but since it is automatically generated in the beginning, it would be nicer if jlink could do that automatically. Also give developers the option to do the same thing in the generated native jpackage launcher executable. There's currently no other way in jpackage to set any environment variables. - Add some form of JVM option to disable environment variable sourcing for other JVM options. That way this option could be passed in jlink and jpackage, not requiring any modifications to the jlink and jpackage tools. This would also be a good solution. Such an option would also be useful for quick debugging in other cases. On 10/05/2024 01:47, David Holmes wrote: > On 9/05/2024 5:40 pm, Alan Bateman wrote: >> On 09/05/2024 08:03, David Holmes wrote: >>> >>> How does such a jpackaged application actually launch/load the JVM? >>> I'm wondering if there is a way to insert a new "shell" environment >>> to launch the JVM without having those env vars present ... though I >>> guess there may be other env vars that your application still needs. >> >> For modular applications, there is a jlink option to generate a >> launcher (script) for the application. That's a potential place to >> unset environment variables that shouldn't be inherited.? It may not >> help here as it sounds like this is an application image produced by >> jpackage with a native launcher, and the warning message is hidden as >> there is no console (I assume). >> >> I think we should consider deprecating and eventually removing >> _JAVA_OPTIONS. It's always been problematic that it appends rather >> than prepend and it has issues in areas such as quoting. When >> JDK_JAVA_OPTIONS was added then we had hoped that developers would >> move from the undocumented env variable. The new env variable fixes a >> bunch of things in the areas of quoting, arg files, works with >> launcher options, and it of course prepends so it doesn't override >> options. > > I think overriding options was a feature of `_JAVA_OPTIONS` not a bug > - at least at the time. :) But deployment models have evolved (to a > point where I don't even know/understand how things get deployed these > days and who has control of the command-line and/or the env!). > Deprecation may be a reasonable thing but doesn't help the current > situation. > > David > >> -Alan From alanb at openjdk.org Fri May 10 11:10:56 2024 From: alanb at openjdk.org (Alan Bateman) Date: Fri, 10 May 2024 11:10:56 GMT Subject: RFR: 8331558: AArch64: optimize integer remainder [v9] In-Reply-To: References: Message-ID: On Fri, 10 May 2024 02:17:12 GMT, Jin Guojie wrote: >> 8331558: AArch64: optimize integer remainder >> On some Arm processors, a separate multiply/subtract is actually faster than the combined instruction. >> >> 8331556: AArch64: CPU_Model support for Neoverse N1/N2/V1/V2 >> Add full platform coverage for Neoverse variants in vm_version.?pp >> >> The following test has passed, which shows definite performance improvement. >> >> make test TEST="micro:java.lang.IntegerDivMod" >> make test TEST="micro:java.lang.LongDivMod" >> >> * IntegerDivMod.testDivideRemainderUnsigned >> baseline(ns/ops) 2223 >> with this pacth(ns/ops) 1885 >> improvement(%) 17.93% >> >> * IntegerDivMod.testRemainderUnsigned >> baseline(ns/ops) 2225 >> with this pacth(ns/ops) 1885 >> improvement(%) 18.03% >> >> * LongDivMod.testDivideRemainderUnsigned >> baseline(ns/ops) 2231 >> with this pacth(ns/ops) 1894 >> improvement(%) 17.79% >> >> * LongDivMod.testRemainderUnsigned >> baseline(ns/ops) 2232 >> with this pacth(ns/ops) 1891 >> improvement(%) 18.03% > > Jin Guojie has updated the pull request incrementally with two additional commits since the last revision: > > - Move big functions out of macroAssembler_aarch64.hpp > - Fix is_neoverse() > > These macros (CPU_MODEL_NEOVERSE_N1...) are definitions of is_model, not _cpu. There seems to be breakage on aarch64 in tier1 at least. The following are failing: compiler/intrinsics/math/Test8210461.java java/lang/Math/WorstCaseTests.java java/lang/Math/SinCosCornerCasesTests.java ------------- PR Comment: https://git.openjdk.org/jdk/pull/19093#issuecomment-2104416984 From stuefe at openjdk.org Fri May 10 12:15:25 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Fri, 10 May 2024 12:15:25 GMT Subject: RFR: 8312132: Add tracking of multiple address spaces in NMT [v74] In-Reply-To: References: Message-ID: <36UojYkDr9uWmWb_n6ilASrGqOGDuRDsGbfclma5fKQ=.546479d7-96e0-493c-918e-1860de1200e9@github.com> On Wed, 8 May 2024 11:53:17 GMT, Johan Sj?len wrote: >> Hi, >> >> This PR introduces a new abstraction to NMT, named `MemoryFileTracker`. Today, NMT does not track any memory outside of the virtual memory address space. This means that if you allocated memory in something such as a memory-backed file and use `mmap` to map into that memory, then you'll have trouble reporting this to NMT. This is the situation that ZGC is in, and that is what this patch attempts to fix. >> >> ## `MemoryFileTracker` >> >> The `MemoryFileTracker` adds the ability of adding new virtual memory address spaces to NMT and committing memory to these, the basic API is: >> >> ```c++ >> static MemoryFile* make_device(const char* descriptive_name); >> static void free_device(MemoryFile* device); >> >> static void allocate_memory(MemoryFile* device, size_t offset, size_t size, >> MEMFLAGS flag, const NativeCallStack& stack); >> static void free_memory(MemoryFile* device, size_t offset, size_t size); >> >> >> It is easiest to see how this is used by looking at what ZGC's `ZNMT` class does: >> >> ```c++ >> void ZNMT::reserve(zaddress_unsafe start, size_t size) { >> MemTracker::record_virtual_memory_reserve((address)start, size, CALLER_PC, mtJavaHeap); >> } >> void ZNMT::commit(zoffset offset, size_t size) { >> MemTracker::allocate_memory_in(ZNMT::_device, static_cast(offset), size, mtJavaHeap, CALLER_PC); >> } >> void ZNMT::uncommit(zoffset offset, size_t size) { >> MemTracker::free_memory_in(ZNMT::_device, (size_t)offset, size); >> } >> >> void ZNMT::map(zaddress_unsafe addr, size_t size, zoffset offset) { >> // NMT doesn't track mappings at the moment. >> } >> void ZNMT::unmap(zaddress_unsafe addr, size_t size) { >> // NMT doesn't track mappings at the moment. >> } >> >> >> As you can see, any mapping between reserved regions and device-allocated memory is not recorded in NMT. This means that in detailed mode you only get reserved regions printed for the reserved memory, the device-allocated memory is reported separately. When performing summary reporting any memory allocated via these devices is added to the corresponding `MEMFLAGS` as `committed` memory. >> >> This patch is also acting as a base on which we deploy multiple new backend ideas to NMT. These ideas are: >> >> 1. Implement VMA tracking using a balanced binary tree approach. Today's `VirtualMemoryTracker`'s usage of linked lists is slow and brittle, we'd like to move away from it. Our Treap-based approach in this patch gives a performance bo... > > Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: > > Some style More Treap reviewing, started with VMATree src/hotspot/share/nmt/nmtTreap.hpp line 373: > 371: } > 372: } > 373: }; `visit_range_in_order` looks okay, though it surely is a bit of a brain teaser. So the first half of the algorithm walks the outer-left flank of whatever the current head is, up to the point where we "stick a toe outside the range", then it repeats that part, basically, with the right node, which could be a way in, which is why we again have to walk the left flank. I assume this is based on a paper, but trust you to have checked that. However, I would really like regression testing for this. Lots of range checks in gtest, please (can just be tacked onto what I did ask you for yesterday). src/hotspot/share/nmt/virtualMemoryTracker.hpp line 33: > 31: #include "nmt/allocationSite.hpp" > 32: #include "nmt/nmtCommon.hpp" > 33: #include "runtime/atomic.hpp" Please add includes only where needed, directly. Let's not rely on indirect includes. Unless this is a remnant from some earlier version, then pls just remove it. src/hotspot/share/nmt/vmatree.cpp line 2: > 1: /* > 2: * Copyright (c) 2024, Oracle and/or its affiliates. All rights reserved. Be a dear and add us, please :-) src/hotspot/share/nmt/vmatree.cpp line 156: > 154: if (to_be_deleted_inbetween_a_b.length() == 0 && LEQ_A_found) { > 155: // We must have smashed a hole in an existing region (or replaced it entirely). > 156: // LEQ_A - A - B - (some node >= B) nit, clearer comment (a bit) since at first glance looks like substraction: `LEQ_A < A < B < (some node >= B)`. Alternatively, `LEQ_A [A, B) C` src/hotspot/share/nmt/vmatree.cpp line 198: > 196: > 197: // Finally, we can register the new region [A, B)'s summary data. > 198: auto& rescom = diff.flag[NMTUtil::flag_to_index(metadata.flag)]; Can we please not use `auto` when unnecessary? IDEs have enough of a hard time understanding our code. `SingleDiff&`, please. src/hotspot/share/nmt/vmatree.hpp line 67: > 65: > 66: Metadata(NativeCallStackStorage::StackIndex stack_idx, MEMFLAGS flag) > 67: : stack_idx(stack_idx), flag(flag) {} I would assert here that with state=released, we only ever want to see mtNone. src/hotspot/share/nmt/vmatree.hpp line 91: > 89: StateType type() const { > 90: return static_cast(type_flag[0]); > 91: } Proposal: provide `is_reserved()` and `is_committed()` and replace manual comparisons with the state enum with those. Easier on the eye. src/hotspot/share/nmt/vmatree.hpp line 114: > 112: bool is_noop() { > 113: return (in.type() == StateType::Released && out.type() == StateType::Released) || > 114: (in.type() == out.type() && Metadata::equals(in.metadata(), out.metadata())); We require a released region to be mtNone, state Released. You ensure so below. So here, you should just need the second condition, no special handling for released regions needed. src/hotspot/share/nmt/vmatree.hpp line 117: > 115: } > 116: }; > 117: Do `IntervalState` and `IntervalChange` need to be exposed in the header? src/hotspot/share/nmt/vmatree.hpp line 151: > 149: > 150: SummaryDiff release_mapping(position from, position sz) { > 151: Metadata empty; Just a nit, but instead of the Metadata::Metadata() ctor creating an empty object, could we possibly scrap the default ctor and have an explicit static constexpr Metadata empty with invalid stackindex and mtNone as ctor args? I find that nicer to read. ------------- PR Review: https://git.openjdk.org/jdk/pull/18289#pullrequestreview-2049875773 PR Review Comment: https://git.openjdk.org/jdk/pull/18289#discussion_r1596632543 PR Review Comment: https://git.openjdk.org/jdk/pull/18289#discussion_r1596633794 PR Review Comment: https://git.openjdk.org/jdk/pull/18289#discussion_r1596649353 PR Review Comment: https://git.openjdk.org/jdk/pull/18289#discussion_r1596665032 PR Review Comment: https://git.openjdk.org/jdk/pull/18289#discussion_r1596670430 PR Review Comment: https://git.openjdk.org/jdk/pull/18289#discussion_r1596647037 PR Review Comment: https://git.openjdk.org/jdk/pull/18289#discussion_r1596667575 PR Review Comment: https://git.openjdk.org/jdk/pull/18289#discussion_r1596645720 PR Review Comment: https://git.openjdk.org/jdk/pull/18289#discussion_r1596659540 PR Review Comment: https://git.openjdk.org/jdk/pull/18289#discussion_r1596644523 From stuefe at openjdk.org Fri May 10 12:15:25 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Fri, 10 May 2024 12:15:25 GMT Subject: RFR: 8312132: Add tracking of multiple address spaces in NMT [v74] In-Reply-To: <36UojYkDr9uWmWb_n6ilASrGqOGDuRDsGbfclma5fKQ=.546479d7-96e0-493c-918e-1860de1200e9@github.com> References: <36UojYkDr9uWmWb_n6ilASrGqOGDuRDsGbfclma5fKQ=.546479d7-96e0-493c-918e-1860de1200e9@github.com> Message-ID: <5V3WB9R1kvj6MFOU8rt8_XeiMJy4UHmS-DvSHxwiwGE=.b7514b9c-e875-4621-a325-a068b7754358@github.com> On Fri, 10 May 2024 11:41:05 GMT, Thomas Stuefe wrote: >> Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: >> >> Some style > > src/hotspot/share/nmt/vmatree.cpp line 2: > >> 1: /* >> 2: * Copyright (c) 2024, Oracle and/or its affiliates. All rights reserved. > > Be a dear and add us, please :-) Though by this time, you spent more time on the VMATree then I originally did. Still .. > src/hotspot/share/nmt/vmatree.hpp line 151: > >> 149: >> 150: SummaryDiff release_mapping(position from, position sz) { >> 151: Metadata empty; > > Just a nit, but instead of the Metadata::Metadata() ctor creating an empty object, could we possibly scrap the default ctor and have an explicit static constexpr Metadata empty with invalid stackindex and mtNone as ctor args? I find that nicer to read. Oh, another thing, maybe rename this to something else. `Metadata` has a clear meaning in hotspot. Maybe something like RegionData? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18289#discussion_r1596677600 PR Review Comment: https://git.openjdk.org/jdk/pull/18289#discussion_r1596676673 From jkratochvil at openjdk.org Fri May 10 12:19:29 2024 From: jkratochvil at openjdk.org (Jan Kratochvil) Date: Fri, 10 May 2024 12:19:29 GMT Subject: Integrated: 8331352: error: template-id not allowed for constructor/destructor in C++20 In-Reply-To: References: Message-ID: On Tue, 30 Apr 2024 02:01:01 GMT, Jan Kratochvil wrote: > When compiling trunk (819f3d6fc70ff6fe54ac5f9033c17c3dd4326aa5 2024-04-29) by gcc-14.0.1-0.15.fc40.x86_64 there are many errors: > > In file included from src/hotspot/share/memory/allocation.hpp:30, > from src/hotspot/share/ci/ciBaseObject.hpp:29, > from src/hotspot/share/ci/ciMetadata.hpp:28, > from src/hotspot/share/ci/ciType.hpp:28, > from src/hotspot/share/ci/ciKlass.hpp:28, > from src/hotspot/share/ci/ciArrayKlass.hpp:28, > from src/hotspot/share/ci/ciArray.hpp:28, > from src/hotspot/share/ci/compilerInterface.hpp:28, > from src/hotspot/share/compiler/abstractCompiler.hpp:28, > from src/hotspot/share/compiler/abstractCompiler.cpp:25: > src/hotspot/share/utilities/linkedlist.hpp:85:15: error: template-id not allowed for constructor in C++20 [-Werror=template-id-cdtor] > 85 | NONCOPYABLE(LinkedList); > | ^~~~~~~~~~~~~ > src/hotspot/share/utilities/globalDefinitions.hpp:87:26: note: in definition of macro ?NONCOPYABLE? > 87 | #define NONCOPYABLE(C) C(C const&) = delete; C& operator=(C const&) = delete /* next token must be ; */ > | ^ > src/hotspot/share/utilities/linkedlist.hpp:85:15: note: remove the ?< >? > 85 | NONCOPYABLE(LinkedList); > | ^~~~~~~~~~~~~ > src/hotspot/share/utilities/globalDefinitions.hpp:87:26: note: in definition of macro ?NONCOPYABLE? > 87 | #define NONCOPYABLE(C) C(C const&) = delete; C& operator=(C const&) = delete /* next token must be ; */ > | ^ > > In file included from src/hotspot/share/gc/z/zGranuleMap.inline.hpp:30, > from src/hotspot/share/gc/z/zForwardingTable.inline.hpp:32, > from src/hotspot/share/gc/z/zHeap.inline.hpp:30, > from src/hotspot/share/gc/z/zGeneration.inline.hpp:30, > from src/hotspot/share/gc/z/zBarrier.inline.hpp:30, > from src/hotspot/share/gc/z/zBarrierSet.inline.hpp:31, > from src/hotspot/share/gc/shared/barrierSetConfig.inline.hpp:44, > from src/hotspot/share/oops/access.inline.hpp:31, > from src/hotspot/share/memory/iterator.inline.hpp:32, > from src/hotspot/share/oops/oop.inline.hpp:31, > from src/hotspot/share/compiler/abstractDisassembler.cpp:32: > src/hotspot/share/gc/z/zArray.inline.hpp:99:21: error: template-id not allowed f... This pull request has now been integrated. Changeset: 45792c58 Author: Jan Kratochvil Committer: Yuri Nesterenko URL: https://git.openjdk.org/jdk/commit/45792c5829fb1d5ee016c4a1fd6badb5d2b4239c Stats: 4 lines in 4 files changed: 0 ins; 0 del; 4 mod 8331352: error: template-id not allowed for constructor/destructor in C++20 Reviewed-by: kbarrett, stefank ------------- PR: https://git.openjdk.org/jdk/pull/19009 From duke at openjdk.org Fri May 10 12:21:05 2024 From: duke at openjdk.org (Mikhail Ablakatov) Date: Fri, 10 May 2024 12:21:05 GMT Subject: RFR: 8322770: Implement C2 VectorizedHashCode on AArch64 In-Reply-To: References: <2VKOC-rT0vOyMcXUX2gs3sOrbZ5H79KBIo50sOOVmyI=.1936f78e-794c-4f54-af3c-b1b97e5fafa8@github.com> Message-ID: On Fri, 10 May 2024 07:49:32 GMT, Andrew Haley wrote: >> Hello, >> >> Please review the following PR for [JDK-8322770 Implement C2 VectorizedHashCode on AArch64](https://bugs.openjdk.org/browse/JDK-8322770). It follows previous work done in https://github.com/openjdk/jdk/pull/16629 and https://github.com/openjdk/jdk/pull/10847 for RISC-V and x86 respectively. >> >> The code to calculate a hash code consists of two parts: a vectorized loop of Neon instruction that process 4 or 8 elements per iteration depending on the data type and a fully unrolled scalar "loop" that processes up to 7 tail elements. >> >> At the time of writing this I don't see potential benefits from providing SVE/SVE2 implementation, but it could be added as a follow-up or independently later if required. >> >> # Performance >> >> ## Neoverse N1 >> >> >> -------------------------------------------------------------------------------------------- >> Version Baseline This patch >> -------------------------------------------------------------------------------------------- >> Benchmark (size) Mode Cnt Score Error Score Error Units >> -------------------------------------------------------------------------------------------- >> ArraysHashCode.bytes 1 avgt 15 1.249 ? 0.060 1.247 ? 0.062 ns/op >> ArraysHashCode.bytes 10 avgt 15 8.754 ? 0.028 4.387 ? 0.015 ns/op >> ArraysHashCode.bytes 100 avgt 15 98.596 ? 0.051 26.655 ? 0.097 ns/op >> ArraysHashCode.bytes 10000 avgt 15 10150.578 ? 1.352 2649.962 ? 216.744 ns/op >> ArraysHashCode.chars 1 avgt 15 1.286 ? 0.062 1.246 ? 0.054 ns/op >> ArraysHashCode.chars 10 avgt 15 8.731 ? 0.002 5.344 ? 0.003 ns/op >> ArraysHashCode.chars 100 avgt 15 98.632 ? 0.048 23.023 ? 0.142 ns/op >> ArraysHashCode.chars 10000 avgt 15 10150.658 ? 3.374 2410.504 ? 8.872 ns/op >> ArraysHashCode.ints 1 avgt 15 1.189 ? 0.005 1.187 ? 0.001 ns/op >> ArraysHashCode.ints 10 avgt 15 8.730 ? 0.002 5.676 ? 0.001 ns/op >> ArraysHashCode.ints 100 avgt 15 98.559 ? 0.016 24.378 ? 0.006 ns/op >> ArraysHashCode.ints 10000 avgt 15 10148.752 ? 1.336 2419.015 ? 0.492 ns/op >> ArraysHashCode.multibytes 1 avgt 15 1.037 ? 0.001 1.037 ? 0.001 ... > > Hi, is this one stuck? What you have today is definitely an improvement, even though it's not as good as what we have for x86. I guess we could commit this and leave widening the arithmetic for a later enhancement if you have no time to work on it. Hi @theRealAph , following your suggestions I've got this working for ints and can confirm that it improves the performance. I don't have enough time at the moment to finish it for shorts and bytes though. I can update the patch with current results on Monday and we could decide how to proceed with this PR after that. Sounds good? ------------- PR Comment: https://git.openjdk.org/jdk/pull/18487#issuecomment-2104449607 From duke at openjdk.org Fri May 10 12:29:04 2024 From: duke at openjdk.org (Mikhail Ablakatov) Date: Fri, 10 May 2024 12:29:04 GMT Subject: RFR: 8322770: Implement C2 VectorizedHashCode on AArch64 In-Reply-To: References: <2VKOC-rT0vOyMcXUX2gs3sOrbZ5H79KBIo50sOOVmyI=.1936f78e-794c-4f54-af3c-b1b97e5fafa8@github.com> Message-ID: On Fri, 10 May 2024 07:49:32 GMT, Andrew Haley wrote: >> Hello, >> >> Please review the following PR for [JDK-8322770 Implement C2 VectorizedHashCode on AArch64](https://bugs.openjdk.org/browse/JDK-8322770). It follows previous work done in https://github.com/openjdk/jdk/pull/16629 and https://github.com/openjdk/jdk/pull/10847 for RISC-V and x86 respectively. >> >> The code to calculate a hash code consists of two parts: a vectorized loop of Neon instruction that process 4 or 8 elements per iteration depending on the data type and a fully unrolled scalar "loop" that processes up to 7 tail elements. >> >> At the time of writing this I don't see potential benefits from providing SVE/SVE2 implementation, but it could be added as a follow-up or independently later if required. >> >> # Performance >> >> ## Neoverse N1 >> >> >> -------------------------------------------------------------------------------------------- >> Version Baseline This patch >> -------------------------------------------------------------------------------------------- >> Benchmark (size) Mode Cnt Score Error Score Error Units >> -------------------------------------------------------------------------------------------- >> ArraysHashCode.bytes 1 avgt 15 1.249 ? 0.060 1.247 ? 0.062 ns/op >> ArraysHashCode.bytes 10 avgt 15 8.754 ? 0.028 4.387 ? 0.015 ns/op >> ArraysHashCode.bytes 100 avgt 15 98.596 ? 0.051 26.655 ? 0.097 ns/op >> ArraysHashCode.bytes 10000 avgt 15 10150.578 ? 1.352 2649.962 ? 216.744 ns/op >> ArraysHashCode.chars 1 avgt 15 1.286 ? 0.062 1.246 ? 0.054 ns/op >> ArraysHashCode.chars 10 avgt 15 8.731 ? 0.002 5.344 ? 0.003 ns/op >> ArraysHashCode.chars 100 avgt 15 98.632 ? 0.048 23.023 ? 0.142 ns/op >> ArraysHashCode.chars 10000 avgt 15 10150.658 ? 3.374 2410.504 ? 8.872 ns/op >> ArraysHashCode.ints 1 avgt 15 1.189 ? 0.005 1.187 ? 0.001 ns/op >> ArraysHashCode.ints 10 avgt 15 8.730 ? 0.002 5.676 ? 0.001 ns/op >> ArraysHashCode.ints 100 avgt 15 98.559 ? 0.016 24.378 ? 0.006 ns/op >> ArraysHashCode.ints 10000 avgt 15 10148.752 ? 1.336 2419.015 ? 0.492 ns/op >> ArraysHashCode.multibytes 1 avgt 15 1.037 ? 0.001 1.037 ? 0.001 ... > > Hi, is this one stuck? What you have today is definitely an improvement, even though it's not as good as what we have for x86. I guess we could commit this and leave widening the arithmetic for a later enhancement if you have no time to work on it. Hi @theRealAph , following your suggestions I've got this working for ints and can confirm that it improves the performance. I don't have enough time at the moment to finish it for shorts and bytes though. I can update the patch with current results on Monday and we could decide how to proceed with this PR after that. Sounds good? ------------- PR Comment: https://git.openjdk.org/jdk/pull/18487#issuecomment-2104520482 From aph at openjdk.org Fri May 10 12:41:15 2024 From: aph at openjdk.org (Andrew Haley) Date: Fri, 10 May 2024 12:41:15 GMT Subject: RFR: 8322770: Implement C2 VectorizedHashCode on AArch64 In-Reply-To: <2VKOC-rT0vOyMcXUX2gs3sOrbZ5H79KBIo50sOOVmyI=.1936f78e-794c-4f54-af3c-b1b97e5fafa8@github.com> References: <2VKOC-rT0vOyMcXUX2gs3sOrbZ5H79KBIo50sOOVmyI=.1936f78e-794c-4f54-af3c-b1b97e5fafa8@github.com> Message-ID: On Tue, 26 Mar 2024 13:59:12 GMT, Mikhail Ablakatov wrote: > Hello, > > Please review the following PR for [JDK-8322770 Implement C2 VectorizedHashCode on AArch64](https://bugs.openjdk.org/browse/JDK-8322770). It follows previous work done in https://github.com/openjdk/jdk/pull/16629 and https://github.com/openjdk/jdk/pull/10847 for RISC-V and x86 respectively. > > The code to calculate a hash code consists of two parts: a vectorized loop of Neon instruction that process 4 or 8 elements per iteration depending on the data type and a fully unrolled scalar "loop" that processes up to 7 tail elements. > > At the time of writing this I don't see potential benefits from providing SVE/SVE2 implementation, but it could be added as a follow-up or independently later if required. > > # Performance > > ## Neoverse N1 > > > -------------------------------------------------------------------------------------------- > Version Baseline This patch > -------------------------------------------------------------------------------------------- > Benchmark (size) Mode Cnt Score Error Score Error Units > -------------------------------------------------------------------------------------------- > ArraysHashCode.bytes 1 avgt 15 1.249 ? 0.060 1.247 ? 0.062 ns/op > ArraysHashCode.bytes 10 avgt 15 8.754 ? 0.028 4.387 ? 0.015 ns/op > ArraysHashCode.bytes 100 avgt 15 98.596 ? 0.051 26.655 ? 0.097 ns/op > ArraysHashCode.bytes 10000 avgt 15 10150.578 ? 1.352 2649.962 ? 216.744 ns/op > ArraysHashCode.chars 1 avgt 15 1.286 ? 0.062 1.246 ? 0.054 ns/op > ArraysHashCode.chars 10 avgt 15 8.731 ? 0.002 5.344 ? 0.003 ns/op > ArraysHashCode.chars 100 avgt 15 98.632 ? 0.048 23.023 ? 0.142 ns/op > ArraysHashCode.chars 10000 avgt 15 10150.658 ? 3.374 2410.504 ? 8.872 ns/op > ArraysHashCode.ints 1 avgt 15 1.189 ? 0.005 1.187 ? 0.001 ns/op > ArraysHashCode.ints 10 avgt 15 8.730 ? 0.002 5.676 ? 0.001 ns/op > ArraysHashCode.ints 100 avgt 15 98.559 ? 0.016 24.378 ? 0.006 ns/op > ArraysHashCode.ints 10000 avgt 15 10148.752 ? 1.336 2419.015 ? 0.492 ns/op > ArraysHashCode.multibytes 1 avgt 15 1.037 ? 0.001 1.037 ? 0.001 ns/op > ArraysHashCode.multibytes 10 avgt 15 5.4... Hi, > I can update the patch with current results on Monday and we could decide how to proceed with this PR after that. Sounds good? Yes, that's right. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18487#issuecomment-2104537521 From aph at openjdk.org Fri May 10 13:16:24 2024 From: aph at openjdk.org (Andrew Haley) Date: Fri, 10 May 2024 13:16:24 GMT Subject: RFR: 8332066: AArch64: Math test failures since JDK-8331558 Message-ID: Revert "8331558: AArch64: optimize integer remainder" This reverts commit dab92c51c70767abcda3b1a91dd7d1a9b40290c1. ------------- Commit messages: - 8332066: AArch64: Math test failures since JDK-8331558 Changes: https://git.openjdk.org/jdk/pull/19177/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19177&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8332066 Stats: 69 lines in 4 files changed: 9 ins; 54 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/19177.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19177/head:pull/19177 PR: https://git.openjdk.org/jdk/pull/19177 From kvn at openjdk.org Fri May 10 14:50:14 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 10 May 2024 14:50:14 GMT Subject: RFR: 8332066: AArch64: Math test failures since JDK-8331558 In-Reply-To: References: Message-ID: On Fri, 10 May 2024 13:12:27 GMT, Andrew Haley wrote: > Revert "8331558: AArch64: optimize integer remainder" > This reverts commit dab92c51c70767abcda3b1a91dd7d1a9b40290c1. Good and trivial. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19177#pullrequestreview-2050247378 From aph at openjdk.org Fri May 10 15:09:12 2024 From: aph at openjdk.org (Andrew Haley) Date: Fri, 10 May 2024 15:09:12 GMT Subject: Integrated: 8332066: AArch64: Math test failures since JDK-8331558 In-Reply-To: References: Message-ID: On Fri, 10 May 2024 13:12:27 GMT, Andrew Haley wrote: > Revert "8331558: AArch64: optimize integer remainder" > This reverts commit dab92c51c70767abcda3b1a91dd7d1a9b40290c1. This pull request has now been integrated. Changeset: d215bc46 Author: Andrew Haley URL: https://git.openjdk.org/jdk/commit/d215bc46475b90abd898e995c1b4a6aa4b6cb024 Stats: 69 lines in 4 files changed: 9 ins; 54 del; 6 mod 8332066: AArch64: Math test failures since JDK-8331558 Reviewed-by: kvn ------------- PR: https://git.openjdk.org/jdk/pull/19177 From wkemper at openjdk.org Fri May 10 16:20:34 2024 From: wkemper at openjdk.org (William Kemper) Date: Fri, 10 May 2024 16:20:34 GMT Subject: RFR: 8332082: Shenandoah: Use SATB active flag for C2 pre-write barrier on x86 and PPC Message-ID: This is consistent with c1 and other platforms. ------------- Commit messages: - Check for satb active flag, rather than gc state Changes: https://git.openjdk.org/jdk/pull/19180/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19180&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8332082 Stats: 15 lines in 2 files changed: 9 ins; 0 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/19180.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19180/head:pull/19180 PR: https://git.openjdk.org/jdk/pull/19180 From Bruno.Borges at microsoft.com Fri May 10 16:40:54 2024 From: Bruno.Borges at microsoft.com (Bruno Borges) Date: Fri, 10 May 2024 16:40:54 +0000 Subject: [EXTERNAL] Re: External _JAVA_OPTIONS environment variable sourcing for self-contained applications In-Reply-To: <285f99c9-0689-4059-b9c4-860879332465@xpipe.io> References: <1bc8a1a8-5adf-4a00-800c-cfe626608ae6@oracle.com> <918f3a96-cc75-43a5-b19b-fefe063e82ea@oracle.com> <285f99c9-0689-4059-b9c4-860879332465@xpipe.io> Message-ID: Java runtime sharing (among multiple applications in the same environment) has become less and less important, and I think that is what those environment variables were meant for, to ensure any JVM would start with values from these env vars. But deployment models have certainly evolved: * More than half of Java applications in the Cloud are deployed as containers (see New Relic report from 2023). * Java applications deployed to Virtual Machines tend to have exclusivity over the VM resources. Example: big data solutions are pushed to VMs dedicated to them. * Developers tend to have multiple JDKs installed these days, from 8 all the way to 21. Expecting flags in those environment variables to work consistently across all versions is unrealistic. * Some developer tools have been shipping their own java runtimes for quite some time already (e.g. JetBrains and Eclipse IDEs). I do like Christopher's suggestion of an option in the JVM to disable environment variable sourcing of _JAVA_OPTIONS and JAVA_TOOL_OPTIONS. It gives back control to the application developer on how the runtime should behave, especially in the scenario of Java desktop applications, and it would align with the intents of jlink/jpackage. ________________________________ From: hotspot-dev on behalf of Christopher Schnick Sent: May 10, 2024 3:42 AM To: David Holmes Cc: hotspot-dev at openjdk.org Subject: [EXTERNAL] Re: External _JAVA_OPTIONS environment variable sourcing for self-contained applications [You don't often get email from crschnick at xpipe.io. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ] From my perspective, it doesn't really matter which environment variable you're talking about. Even if there are small differences in which order they apply, they generally all cause the issue of a global configuration interfering with a local isolated self contained runtime image. So _JAVA_OPTIONS and JAVA_TOOL_OPTIONS cause the same problems, with only minor differences. In practice, global environment variables are intended for things like Java 8 applications that run via a globally installed JRE. The huge issue is that there is a chance of an option being included in there that is not supported by more recent JVMs like one for Java 21. If this is the case, then ALL self contained graphical Java applications don't even start up due to an unrecognized option and don't show an error message (If you are running a console based application, then it prints something but for desktop applications there is nothing). As of right now, there is no possibility of running a global JRE/JDK configured with certain environment variable options on the same system as a self contained Java application created with the available JDK tools if the options are not exactly compatible. That problem is especially relevant when running JVMs from different vendors for different applications as they differentiate themselves through options. One incompatible option is all it takes for nothing to run anymore. There are multiple different possibilities that I can think of to somehow improve this situation: - Give developers the option to unset these variables in the automatically generated launcher script for jlink. Technically one can modify the launcher script manually, but since it is automatically generated in the beginning, it would be nicer if jlink could do that automatically. Also give developers the option to do the same thing in the generated native jpackage launcher executable. There's currently no other way in jpackage to set any environment variables. - Add some form of JVM option to disable environment variable sourcing for other JVM options. That way this option could be passed in jlink and jpackage, not requiring any modifications to the jlink and jpackage tools. This would also be a good solution. Such an option would also be useful for quick debugging in other cases. On 10/05/2024 01:47, David Holmes wrote: > On 9/05/2024 5:40 pm, Alan Bateman wrote: >> On 09/05/2024 08:03, David Holmes wrote: >>> >>> How does such a jpackaged application actually launch/load the JVM? >>> I'm wondering if there is a way to insert a new "shell" environment >>> to launch the JVM without having those env vars present ... though I >>> guess there may be other env vars that your application still needs. >> >> For modular applications, there is a jlink option to generate a >> launcher (script) for the application. That's a potential place to >> unset environment variables that shouldn't be inherited. It may not >> help here as it sounds like this is an application image produced by >> jpackage with a native launcher, and the warning message is hidden as >> there is no console (I assume). >> >> I think we should consider deprecating and eventually removing >> _JAVA_OPTIONS. It's always been problematic that it appends rather >> than prepend and it has issues in areas such as quoting. When >> JDK_JAVA_OPTIONS was added then we had hoped that developers would >> move from the undocumented env variable. The new env variable fixes a >> bunch of things in the areas of quoting, arg files, works with >> launcher options, and it of course prepends so it doesn't override >> options. > > I think overriding options was a feature of `_JAVA_OPTIONS` not a bug > - at least at the time. :) But deployment models have evolved (to a > point where I don't even know/understand how things get deployed these > days and who has control of the command-line and/or the env!). > Deprecation may be a reasonable thing but doesn't help the current > situation. > > David > >> -Alan -------------- next part -------------- An HTML attachment was scrubbed... URL: From kdnilsen at openjdk.org Fri May 10 16:55:03 2024 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Fri, 10 May 2024 16:55:03 GMT Subject: RFR: 8332082: Shenandoah: Use SATB active flag for C2 pre-write barrier on x86 and PPC In-Reply-To: References: Message-ID: On Fri, 10 May 2024 16:13:51 GMT, William Kemper wrote: > This is consistent with c1 and other platforms. Marked as reviewed by kdnilsen (no project role). ------------- PR Review: https://git.openjdk.org/jdk/pull/19180#pullrequestreview-2050483938 From rkennke at openjdk.org Fri May 10 17:05:28 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Fri, 10 May 2024 17:05:28 GMT Subject: RFR: 8331098: [Aarch64] Fix crash in Arrays.equals() intrinsic with -CCP [v10] In-Reply-To: <_HzINQ0atD5BmBbIZ6A4A5y1wNvwsvrBxAiaz2Mk9rY=.43cde0ae-1179-4708-afa1-fda64039d722@github.com> References: <_HzINQ0atD5BmBbIZ6A4A5y1wNvwsvrBxAiaz2Mk9rY=.43cde0ae-1179-4708-afa1-fda64039d722@github.com> Message-ID: > The implementations of Arrays.equals() in macroAssembler_aarch64.cpp, MacroAssembler::arrays_equals() assumes that the start of arrays is 8-byte-aligned. Since [JDK-8139457](https://bugs.openjdk.org/browse/JDK-8139457) this is no longer the case, at least when running with -CompressedClassPointers (or Lilliput). The effect is that the loops may run over the array end, and if the array is at heap boundary, and that memory is unmapped, then it may crash. > > The proposed fix aims to always enter the main loop(s) with an aligned address: > - When the array base is 8-byte-aligned (default, with +CCP), then compare the array lengths separately, then enter the main loop with the array base. > - When the array base is not 8-byte-aligned (-CCP and Lilliput), then enter the loop with the address of the array-length (which is then 8-byte-aligned), and compare array lengths in the main loop, and elide the explicit array lengths comparison. > > Testing: > - [x] tier1 (+CCP) > - [x] tier1 (-CCP) > - [x] tier2 (+CCP) > - [x] tier2 (-CCP) Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Remove trailing whitespace ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18948/files - new: https://git.openjdk.org/jdk/pull/18948/files/8f7fd92d..0f11b014 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18948&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18948&range=08-09 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/18948.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18948/head:pull/18948 PR: https://git.openjdk.org/jdk/pull/18948 From rkennke at openjdk.org Fri May 10 21:16:25 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Fri, 10 May 2024 21:16:25 GMT Subject: RFR: 8331098: [Aarch64] Fix crash in Arrays.equals() intrinsic with -CCP [v10] In-Reply-To: References: <_HzINQ0atD5BmBbIZ6A4A5y1wNvwsvrBxAiaz2Mk9rY=.43cde0ae-1179-4708-afa1-fda64039d722@github.com> Message-ID: <4Mdtuos5Uc2vjCXBbFs_tdnsm5-_VLqIlav0uT9yb0I=.04b8ab22-a3ba-4ec9-a5e0-c3ddb4c6e0f8@github.com> On Fri, 10 May 2024 17:05:28 GMT, Roman Kennke wrote: >> The implementations of Arrays.equals() in macroAssembler_aarch64.cpp, MacroAssembler::arrays_equals() assumes that the start of arrays is 8-byte-aligned. Since [JDK-8139457](https://bugs.openjdk.org/browse/JDK-8139457) this is no longer the case, at least when running with -CompressedClassPointers (or Lilliput). The effect is that the loops may run over the array end, and if the array is at heap boundary, and that memory is unmapped, then it may crash. >> >> The proposed fix aims to always enter the main loop(s) with an aligned address: >> - When the array base is 8-byte-aligned (default, with +CCP), then compare the array lengths separately, then enter the main loop with the array base. >> - When the array base is not 8-byte-aligned (-CCP and Lilliput), then enter the loop with the address of the array-length (which is then 8-byte-aligned), and compare array lengths in the main loop, and elide the explicit array lengths comparison. >> >> Testing: >> - [x] tier1 (+CCP) >> - [x] tier1 (-CCP) >> - [x] tier2 (+CCP) >> - [x] tier2 (-CCP) > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Remove trailing whitespace Thanks for all reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/18948#issuecomment-2105278620 From rkennke at openjdk.org Fri May 10 21:16:26 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Fri, 10 May 2024 21:16:26 GMT Subject: Integrated: 8331098: [Aarch64] Fix crash in Arrays.equals() intrinsic with -CCP In-Reply-To: <_HzINQ0atD5BmBbIZ6A4A5y1wNvwsvrBxAiaz2Mk9rY=.43cde0ae-1179-4708-afa1-fda64039d722@github.com> References: <_HzINQ0atD5BmBbIZ6A4A5y1wNvwsvrBxAiaz2Mk9rY=.43cde0ae-1179-4708-afa1-fda64039d722@github.com> Message-ID: On Thu, 25 Apr 2024 10:38:55 GMT, Roman Kennke wrote: > The implementations of Arrays.equals() in macroAssembler_aarch64.cpp, MacroAssembler::arrays_equals() assumes that the start of arrays is 8-byte-aligned. Since [JDK-8139457](https://bugs.openjdk.org/browse/JDK-8139457) this is no longer the case, at least when running with -CompressedClassPointers (or Lilliput). The effect is that the loops may run over the array end, and if the array is at heap boundary, and that memory is unmapped, then it may crash. > > The proposed fix aims to always enter the main loop(s) with an aligned address: > - When the array base is 8-byte-aligned (default, with +CCP), then compare the array lengths separately, then enter the main loop with the array base. > - When the array base is not 8-byte-aligned (-CCP and Lilliput), then enter the loop with the address of the array-length (which is then 8-byte-aligned), and compare array lengths in the main loop, and elide the explicit array lengths comparison. > > Testing: > - [x] tier1 (+CCP) > - [x] tier1 (-CCP) > - [x] tier2 (+CCP) > - [x] tier2 (-CCP) This pull request has now been integrated. Changeset: 1dac34fa Author: Roman Kennke URL: https://git.openjdk.org/jdk/commit/1dac34fa757f1d603f0bc9b9c1994c114c276add Stats: 32 lines in 1 file changed: 13 ins; 9 del; 10 mod 8331098: [Aarch64] Fix crash in Arrays.equals() intrinsic with -CCP Reviewed-by: aboldtch, aph ------------- PR: https://git.openjdk.org/jdk/pull/18948 From dcubed at openjdk.org Fri May 10 22:12:21 2024 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Fri, 10 May 2024 22:12:21 GMT Subject: RFR: 8330969: scalability issue with loaded JVMTI agent [v2] In-Reply-To: References: Message-ID: On Tue, 30 Apr 2024 01:49:31 GMT, Serguei Spitsyn wrote: >> src/hotspot/share/prims/jvmtiThreadState.cpp line 366: >> >>> 364: attempts--; >>> 365: } >>> 366: DEBUG_ONLY(if (attempts == 0) break;) >> >> Previously `_VTMS_transition_count` considered all threads at the same time. Now you are iterating through the threads and looking at a flag in each one. Is it guaranteed that once the `_VTMS_transition_mark` flag has been verified not to be set in a thread it won't get set while still iterating in the threads loop? > > Thank you for the comment. It is thinking in a right direction. > Each `JavaThread` set the `VTM_transition_mark` only once and then checks for disable counters: > - `_VTMS_transition_disable_for_all_count` > - `java_lang_Thread::VTMS_transition_disable_count(vth())` > > If any of the disable counters is not zero then each `JavaThread` clears the optimistically set mark and continues under protection of the `JvmtiVTMSTransition_lock`. I'm not sure this answered Chris' query properly. Or I'm reading Chris' query wrong. Perhaps this is not what Chris had in mind, but I'm wondering what happens in some Thread-A when it is checked and passed by but then Thread-A sets the flag in itself after the for-loop has passed it by. Does that Thread-A flag value get lost? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18937#discussion_r1597266296 From iklam at openjdk.org Fri May 10 23:07:06 2024 From: iklam at openjdk.org (Ioi Lam) Date: Fri, 10 May 2024 23:07:06 GMT Subject: RFR: 8330198: Add some class loading related perf counters to measure VM startup [v2] In-Reply-To: References: Message-ID: <7X-7dZ5vX54h9wzSJNIuDDqLxHZ6562nQLK2r9Kv54U=.c67a9f9f-714c-45f9-968b-4178a67f6fdd@github.com> On Fri, 10 May 2024 00:56:35 GMT, Calvin Cheung wrote: >> Adding a few perf counters related to class loading to measure VM startup. The counters are only active if the user specifies `-Xlog:init` in the command line. A diagnostic flag `ProfileClassLinkage` is added to control the new counters. The flag is set to false by default and will be enabled if `-Xlog:init` is specified. >> >> This change is already in the leyden/premain branch. There are more counters in the branch to measure other stuff. For now, just upstreaming class loader related counters. >> >> Refer to the [comment](https://bugs.openjdk.org/browse/JDK-8330198?focusedId=14665311&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14665311) in the bug report for an example output. >> >> Passed tiers 1 - 4 testing. > > Calvin Cheung has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits: > > - Merge branch 'master' into xloginit-classloading > - fix build issues on macos-x64 and -aarch64 > - Merge branch 'master' into xloginit-classloading > - fix linux-x86 and minimal build issues > - 8330198: Add some class loading related perf counters to measure VM startup src/hotspot/share/runtime/java.cpp line 245: > 243: #else > 244: > 245: void print_method_invocation_histogram() {} Is this change necessary? src/hotspot/share/runtime/perfData.hpp line 420: > 418: inline void inc(jlong val) { (*(jlong*)_valuep) += val; } > 419: inline void dec(jlong val) { inc(-val); } > 420: inline void reset() { (*(jlong*)_valuep) = 0; } This new function doesn't seem to be used. src/hotspot/share/runtime/perfData.hpp line 835: > 833: public: > 834: inline PerfTraceTime(PerfLongCounter* timerp, bool is_on = true) : _timerp(timerp) { > 835: if (!is_on || !UsePerfData) return; Instead of having a separate `is_on` parameter, can we check for `timerp == nullptr1` instead? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18790#discussion_r1597288341 PR Review Comment: https://git.openjdk.org/jdk/pull/18790#discussion_r1597289948 PR Review Comment: https://git.openjdk.org/jdk/pull/18790#discussion_r1597289730 From stuefe at openjdk.org Sat May 11 10:03:25 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Sat, 11 May 2024 10:03:25 GMT Subject: RFR: 8332105: Exploded JDK does not include CDS Message-ID: An exploded JDK cannot be used with either -Xshare:on or -Xshare:auto. That causes tests like runtime/CompressedOops/CompressedCPUSpecificClassSpaceReservation.java to fail when running on an exploded JDK. Since an exploded JDK cannot use CDS, we should - for tests - treat it as if CDS had not been included. ---- Note that I was torn between two ways to fix this: - either this fix, which is rather simple and automatically updates the "vm.cds" `@requires` property - or to expose "exploded-ness" as a boolean property via `WhiteBox` and `VMProps`(`jdk.exploded`). See this draft PR: https://github.com/openjdk/jdk/pull/19178 . The latter is cleaner and clearer, conveying the message of exploded-ness without muddling it with the CDS aspect. But OTOH the complexity may not be required. I can go either way, though I have a slight preference for this PR, which is why I posted it. ------------- Commit messages: - JDK-8332105-Exploded-JDK-should-count-as-if-CDS-had-not-been-included-in-the-build Changes: https://git.openjdk.org/jdk/pull/19188/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19188&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8332105 Stats: 6 lines in 2 files changed: 5 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/19188.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19188/head:pull/19188 PR: https://git.openjdk.org/jdk/pull/19188 From duke at openjdk.org Sat May 11 11:15:03 2024 From: duke at openjdk.org (Lei Zaakjyu) Date: Sat, 11 May 2024 11:15:03 GMT Subject: RFR: 8330694: Rename 'HeapRegion' to 'G1HeapRegion' [v9] In-Reply-To: <3x1oThcCfOj6FR0ZJoH5ipYkrHTFAzrgJXm69Tggb8k=.83dba355-787a-4f05-a721-df5aee8fd810@github.com> References: <3IdWn9VGEERd8v9RcH2E_LzjVo0L8nMfi5jGWmhgVuM=.6b5b3be4-bfbd-4376-9580-48d78d75665c@github.com> <7Aud9EX-Q09Bx3MmZjM182gBp9sDmbvIt7rSmtBa1FM=.cc43a81c-7431-484d-9eae-295da93c9a52@github.com> <3x1oThcCfOj6FR0ZJoH5ipYkrHTFAzrgJXm69Tggb8k=.83dba355-787a-4f05-a721-df5aee8fd810@github.com> Message-ID: On Mon, 6 May 2024 22:19:10 GMT, Chris Plummer wrote: > In SA I see references to heapRegionIterate() that possibly should be renamed. > > I noticed that the HeapRegionManager and HeapRegionClosure classes were not renamed (in the hotspot source). Is this intentional or an oversite? OK, I will do all the SA part here. However, I do think that the other classes named 'HeapRegion*' in the hotspot source should be dealt with in follow-up PRs. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18871#issuecomment-2105680607 From duke at openjdk.org Sun May 12 02:55:45 2024 From: duke at openjdk.org (Lei Zaakjyu) Date: Sun, 12 May 2024 02:55:45 GMT Subject: RFR: 8330694: Rename 'HeapRegion' to 'G1HeapRegion' [v10] In-Reply-To: <3IdWn9VGEERd8v9RcH2E_LzjVo0L8nMfi5jGWmhgVuM=.6b5b3be4-bfbd-4376-9580-48d78d75665c@github.com> References: <3IdWn9VGEERd8v9RcH2E_LzjVo0L8nMfi5jGWmhgVuM=.6b5b3be4-bfbd-4376-9580-48d78d75665c@github.com> Message-ID: > follow up 8267941 Lei Zaakjyu has updated the pull request incrementally with one additional commit since the last revision: rename ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18871/files - new: https://git.openjdk.org/jdk/pull/18871/files/b007eb01..65a4bbf9 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18871&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18871&range=08-09 Stats: 29 lines in 8 files changed: 0 ins; 0 del; 29 mod Patch: https://git.openjdk.org/jdk/pull/18871.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18871/head:pull/18871 PR: https://git.openjdk.org/jdk/pull/18871 From duke at openjdk.org Sun May 12 03:07:15 2024 From: duke at openjdk.org (Lei Zaakjyu) Date: Sun, 12 May 2024 03:07:15 GMT Subject: RFR: 8330694: Rename 'HeapRegion' to 'G1HeapRegion' [v10] In-Reply-To: References: <3IdWn9VGEERd8v9RcH2E_LzjVo0L8nMfi5jGWmhgVuM=.6b5b3be4-bfbd-4376-9580-48d78d75665c@github.com> Message-ID: On Sun, 12 May 2024 02:55:45 GMT, Lei Zaakjyu wrote: >> follow up 8267941 > > Lei Zaakjyu has updated the pull request incrementally with one additional commit since the last revision: > > rename Should we also rename 'HeapRegionType' to 'G1HeapRegionType', then rename the current 'G1HeapRegionType' to 'G1 HeapRegionTypeEnum'? src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/gc/g1/G1CollectedHeap.java line 131: > 129: if (hr.isInRegion(addr)) { > 130: return hr; > 131: } Since these three methods are G1 specific, I'd prefer not to add the 'g1' prefix. ------------- PR Review: https://git.openjdk.org/jdk/pull/18871#pullrequestreview-2051282649 PR Review Comment: https://git.openjdk.org/jdk/pull/18871#discussion_r1597540754 From duke at openjdk.org Sun May 12 06:01:27 2024 From: duke at openjdk.org (Lei Zaakjyu) Date: Sun, 12 May 2024 06:01:27 GMT Subject: RFR: 8330694: Rename 'HeapRegion' to 'G1HeapRegion' [v11] In-Reply-To: <3IdWn9VGEERd8v9RcH2E_LzjVo0L8nMfi5jGWmhgVuM=.6b5b3be4-bfbd-4376-9580-48d78d75665c@github.com> References: <3IdWn9VGEERd8v9RcH2E_LzjVo0L8nMfi5jGWmhgVuM=.6b5b3be4-bfbd-4376-9580-48d78d75665c@github.com> Message-ID: <-TstTKGbE-Ewn6GcQHrBqW4XQPpeMmOwxb-TeeXMLdA=.4e38776a-1439-4112-9350-9cec07c0bd83@github.com> > follow up 8267941 Lei Zaakjyu has updated the pull request incrementally with one additional commit since the last revision: fix ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18871/files - new: https://git.openjdk.org/jdk/pull/18871/files/65a4bbf9..dafdc775 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18871&range=10 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18871&range=09-10 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/18871.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18871/head:pull/18871 PR: https://git.openjdk.org/jdk/pull/18871 From jsjolen at openjdk.org Sun May 12 10:54:14 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Sun, 12 May 2024 10:54:14 GMT Subject: RFR: 8312132: Add tracking of multiple address spaces in NMT [v74] In-Reply-To: <36UojYkDr9uWmWb_n6ilASrGqOGDuRDsGbfclma5fKQ=.546479d7-96e0-493c-918e-1860de1200e9@github.com> References: <36UojYkDr9uWmWb_n6ilASrGqOGDuRDsGbfclma5fKQ=.546479d7-96e0-493c-918e-1860de1200e9@github.com> Message-ID: <2g5NtAKCzkk_SO8cDcBmpxNdBW5xx0QcDO_N7nctEsw=.cb259fce-2efb-47d6-81d2-7e84f01e126d@github.com> On Fri, 10 May 2024 11:21:51 GMT, Thomas Stuefe wrote: >> Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: >> >> Some style > > src/hotspot/share/nmt/nmtTreap.hpp line 373: > >> 371: } >> 372: } >> 373: }; > > `visit_range_in_order` looks okay, though it surely is a bit of a brain teaser. So the first half of the algorithm walks the outer-left flank of whatever the current head is, up to the point where we "stick a toe outside the range", then it repeats that part, basically, with the right node, which could be a way in, which is why we again have to walk the left flank. > > I assume this is based on a paper, but trust you to have checked that. However, I would really like regression testing for this. Lots of range checks in gtest, please (can just be tacked onto what I did ask you for yesterday). If you look at the `visit_in_order` function you see the typical iterative algorithm for visiting a binary tree in order, this is modified to being able to shorten the search a bit. This is a case of the recursive algorithm being far clearer, with the stack being implicit. In pseudo-code: walk_in_order(from, to, f, head) { if (head == nullptr) return; hk = head->key(); if (hk >= from) { walk_in_order(from, to, f, head->left()); } if (hk >= from && hk < to) { f(hk); } if (hk < to) { walk_in_order(from, to, f, head->right()); } } ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18289#discussion_r1597610845 From azafari at openjdk.org Sun May 12 17:54:03 2024 From: azafari at openjdk.org (Afshin Zafari) Date: Sun, 12 May 2024 17:54:03 GMT Subject: RFR: 8332042: Move MEMFLAGS to its own include file In-Reply-To: References: Message-ID: On Fri, 10 May 2024 09:06:08 GMT, Thomas Stuefe wrote: > MEMFLAGS, as well as its enum constants, should live in its own include. > > The constants are used throughout the code base, often without needing the allocation APIs exposed through allocation.hpp. > > The MEMFLAGS enum def is often needed within NMT itself, again often without needing allocation.hpp. > > --- > > This patch moves the enum to its new file. > > It fixes those `allocation.hpp` includes that where only needed to get MEMFLAGS. It does not fix other includes. > > For backward compatibility, until we straightened out the dependencies (e.g., fixing all places where we rely on indirect includes), I added memflags.hpp to allocation.hpp. > > I tested (built) on: > - MacOS aarch64, no precompiled headers, fastdebug > - Linux x64, no precompiled headers, fastdebug, release, fastdebug crossbuild to aarch64, fastdebug minimal src/hotspot/share/services/mallocLimit.hpp line 3: > 1: /* > 2: * Copyright (c) 2023 SAP SE. All rights reserved. > 3: * Copyright (c) 2024, 2024, Oracle and/or its affiliates. All rights reserved. Maybe 2023, 2024 instead? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19172#discussion_r1597683285 From lmesnik at openjdk.org Sun May 12 21:38:32 2024 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Sun, 12 May 2024 21:38:32 GMT Subject: RFR: 8332112: Update nsk.share.Log to don't be Finalizable Message-ID: <08rpfgXgTS5RvsqbnwgKdUKo3ADDDGuieSJclVz7leg=.28cded8e-3d2e-4fab-92f6-be89f7ddc6ce@github.com> The nsk.share.Log doing some cleanup and reporting errors in the cleanup method. This method is supposed to be executed by finalizer originally. However, now it is called only during shutdown hook. The cleanup using Cleaner doesn't work. See https://bugs.openjdk.org/browse/JDK-8330760 The cleanup method flush stream and print summary which should be already printed by complain method. This cleanup is not necessary and printing summary usually is just disabled. It is enabled if the test called 'complain' method. However, the error should have been printed already in this method. Note: The 'verboseOnErrorEnabled' is just not used. See isVerboseOnErrorEnabled. public boolean isVerboseOnErrorEnabled() { - return errorsSummaryEnabled; - } ------------- Commit messages: - revertdc removal - 8332112: Update nsk.share.Log to don't be Finalizable Changes: https://git.openjdk.org/jdk/pull/19209/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19209&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8332112 Stats: 141 lines in 30 files changed: 2 ins; 131 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/19209.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19209/head:pull/19209 PR: https://git.openjdk.org/jdk/pull/19209 From dlong at openjdk.org Sun May 12 22:26:04 2024 From: dlong at openjdk.org (Dean Long) Date: Sun, 12 May 2024 22:26:04 GMT Subject: RFR: 8330171: Lazy W^X switch implementation In-Reply-To: <9eymaXovxUNFdkAkzojFQP5trwl_yyY0jE2GzcMEjR4=.02ee2ef9-c476-4c7c-9e4a-e021425c38bc@github.com> References: <9eymaXovxUNFdkAkzojFQP5trwl_yyY0jE2GzcMEjR4=.02ee2ef9-c476-4c7c-9e4a-e021425c38bc@github.com> Message-ID: On Fri, 12 Apr 2024 14:40:05 GMT, Sergey Nazarkin wrote: > An alternative for preemptively switching the W^X thread mode on macOS with an AArch64 CPU. This implementation triggers the switch in response to the SIGBUS signal if the *si_addr* belongs to the CodeCache area. With this approach, it is now feasible to eliminate all WX guards and avoid potentially costly operations. However, no significant improvement or degradation in performance has been observed. Additionally, considering the issue with AsyncGetCallTrace, the patched JVM has been successfully operated with [asgct_bottom](https://github.com/parttimenerd/asgct_bottom) and [async-profiler](https://github.com/async-profiler/async-profiler). > > Additional testing: > - [x] MacOS AArch64 server fastdebug *gtets* > - [ ] MacOS AArch64 server fastdebug *jtreg:hotspot:tier4* > - [ ] Benchmarking > > @apangin and @parttimenerd could you please check the patch on your scenarios?? I think there is a sweet-spot middle-ground between the two extremes: full-lazy, ideal for performance, and fine-grained execute-by-default, ideal for security. I don't think we should change to full-lazy and remove all the guard rails at this time. I am investigating execute-by-default, and it looks promising. ------------- Changes requested by dlong (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18762#pullrequestreview-2051465621 From dholmes at openjdk.org Mon May 13 01:39:09 2024 From: dholmes at openjdk.org (David Holmes) Date: Mon, 13 May 2024 01:39:09 GMT Subject: RFR: 8332066: AArch64: Math test failures since JDK-8331558 In-Reply-To: References: Message-ID: On Fri, 10 May 2024 13:12:27 GMT, Andrew Haley wrote: > Revert "8331558: AArch64: optimize integer remainder" > This reverts commit dab92c51c70767abcda3b1a91dd7d1a9b40290c1. If this was a backout of the earlier change then it should have followed the official process for backing out a change: https://openjdk.org/guide/#backing-out-a-change ------------- PR Comment: https://git.openjdk.org/jdk/pull/19177#issuecomment-2106479510 From dholmes at openjdk.org Mon May 13 01:47:05 2024 From: dholmes at openjdk.org (David Holmes) Date: Mon, 13 May 2024 01:47:05 GMT Subject: RFR: 8332112: Update nsk.share.Log to don't be Finalizable In-Reply-To: <08rpfgXgTS5RvsqbnwgKdUKo3ADDDGuieSJclVz7leg=.28cded8e-3d2e-4fab-92f6-be89f7ddc6ce@github.com> References: <08rpfgXgTS5RvsqbnwgKdUKo3ADDDGuieSJclVz7leg=.28cded8e-3d2e-4fab-92f6-be89f7ddc6ce@github.com> Message-ID: On Sun, 12 May 2024 21:34:41 GMT, Leonid Mesnik wrote: > The nsk.share.Log doing some cleanup and reporting errors in the cleanup method. This method is supposed to be executed by finalizer originally. However, now it is called only during shutdown hook. > The cleanup using Cleaner doesn't work. See https://bugs.openjdk.org/browse/JDK-8330760 > > The cleanup() method flush stream and print summary which should be already printed by complain method. > > This cleanup is not necessary and printing summary usually is just disabled. It is enabled if the test called 'complain' method. However, the error should have been printed already in this method. > > So it would be simple to remove this cleanup and reduce usage of Finalizable in vmTestbase tests. > > Note: The 'verboseOnErrorEnabled' is just not used. > > See isVerboseOnErrorEnabled. > > public boolean isVerboseOnErrorEnabled() { > return errorsSummaryEnabled; > } > > > Tested with by running tests with different combinations (tier4-7) and tier1. There seems to be very little in this PR that pertains to "finalize" so perhaps the JBS title etc could be updated to reflect what most of this PR is actually about. > However, now it is called only during shutdown hook. Where does this get set up? ------------- PR Review: https://git.openjdk.org/jdk/pull/19209#pullrequestreview-2051556955 From fyang at openjdk.org Mon May 13 04:15:11 2024 From: fyang at openjdk.org (Fei Yang) Date: Mon, 13 May 2024 04:15:11 GMT Subject: RFR: 8326306: RISC-V: Re-structure MASM calls and jumps [v10] In-Reply-To: References: Message-ID: On Wed, 8 May 2024 15:07:09 GMT, Robbin Ehn wrote: >> Hi, please consider. >> >> We have code that directly use the asm for call/jumps instead masm. >> Our masm have a bit odd naming, and we don't use 'proper' pseudoinstructions/mnemonics. >> Suggested by [riscv-asm-manual](https://github.com/riscv-non-isa/riscv-asm-manual/tree/master) >> >> j offset jal x0, offset Jump >> jal offset jal x1, offset Jump and link >> jr rs jalr x0, rs, 0 Jump register >> jalr rs jalr x1, rs, 0 Jump and link register >> ret jalr x0, x1, 0 Return from subroutine >> call offset auipc x1, offset[31:12]; jalr x1, x1, offset[11:0] Call far-away subroutine >> tail offset auipc x6, offset[31:12]; jalr x0, x6, offset[11:0] Tail call far-away subroutine >> >> But these can only be implemented like this if you have small enough application. >> The fallback of these is to use GOT (your C compiler should place a copy of GOT every 2G so it's always reachable). >> We don't have GOT, instead we materialize, so there is still differences between these and ours. >> >> This patch: >> - Tries to follow these suggested mappings as good we can. >> - Make sure all jumps/calls go through MASM. (so we get control and can easily change for sites using a certain calling convention) >> - To avoid confusion between MASM public/private methods and ASM methods and the mnemonics there are some renaming. >> E.g. the mnemonics jal means call offset, as we can't use that so there is no 'jal'. >> - I enabled c.j, but right now we never generate it. >> - As always the macro does no good and are legacy from when code base did not use templates. (also the x-macros screws up my IDE (vim+rtags)) >> >> I started down this path due to I have followup patch on top of this which removes trampoline in favor for load-n-jump. >> (WIP: https://github.com/robehn/jdk/compare/jal-fixes...robehn:jdk:load-n-link?expand=1) >> While looking into our calls it was a bit confusing, this helps. >> >> Done a couple of t1-3 slightly different version of this patch, and as part of the followup, no issues found. (VF2, qemu, LP4) >> Re-running tests, had some last minute changes. >> >> Thanks, Robbin > > Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 12 additional commits since the last revision: > > - Merge branch 'master' into jal-fixes > - Revert JNI field, call()->li() > - Use li instead of movptr for call > - REVERT: Use li instead of movptr > - Use li instead of movptr > - VM leaf should use li > - Merge branch 'master' into jal-fixes > - Merge branch 'master' into jal-fixes > - Merge branch 'master' into jal-fixes > - Corrected method name > - ... and 2 more: https://git.openjdk.org/jdk/compare/b8d8e2ca...d53e9694 src/hotspot/cpu/riscv/macroAssembler_riscv.hpp line 717: > 715: return x < (twoG - twoK) && x >= (-twoG - twoK); > 716: } > 717: Can you add a code comment for this function to make it easier to understand? Maybe something like: `Ensure that the auipc can reach the destination at x from anywhere within the code cache so that if it is relocated we know it will still reach.` src/hotspot/cpu/riscv/macroAssembler_riscv.hpp line 738: > 736: typedef void (MacroAssembler::* load_insn_by_temp)(Register Rt, address dest, Register temp); > 737: > 738: void wrap_label(Register r, Label &L, Register t, load_insn_by_temp insn); Better to remove the `load_insn_by_temp` declaration from file macroAssembler_riscv.hpp as it is not used anymore after this change. `typedef void (MacroAssembler::* load_insn_by_temp)(Register Rt, address dest, Register temp);` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18942#discussion_r1597846211 PR Review Comment: https://git.openjdk.org/jdk/pull/18942#discussion_r1597844719 From stuefe at openjdk.org Mon May 13 04:55:24 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Mon, 13 May 2024 04:55:24 GMT Subject: RFR: 8332042: Move MEMFLAGS to its own include file [v2] In-Reply-To: References: Message-ID: > MEMFLAGS, as well as its enum constants, should live in its own include. > > The constants are used throughout the code base, often without needing the allocation APIs exposed through allocation.hpp. > > The MEMFLAGS enum def is often needed within NMT itself, again often without needing allocation.hpp. > > --- > > This patch moves the enum to its new file. > > It fixes those `allocation.hpp` includes that where only needed to get MEMFLAGS. It does not fix other includes. > > For backward compatibility, until we straightened out the dependencies (e.g., fixing all places where we rely on indirect includes), I added memflags.hpp to allocation.hpp. > > I tested (built) on: > - MacOS aarch64, no precompiled headers, fastdebug > - Linux x64, no precompiled headers, fastdebug, release, fastdebug crossbuild to aarch64, fastdebug minimal Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: Update mallocLimit.hpp ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19172/files - new: https://git.openjdk.org/jdk/pull/19172/files/9a27048a..42361558 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19172&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19172&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/19172.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19172/head:pull/19172 PR: https://git.openjdk.org/jdk/pull/19172 From rehn at openjdk.org Mon May 13 06:19:11 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Mon, 13 May 2024 06:19:11 GMT Subject: RFR: 8326306: RISC-V: Re-structure MASM calls and jumps [v10] In-Reply-To: References: Message-ID: <0fZosZuwQ6oUzBprAnR_V32Gt5gTK5QF5khkQg4aqi0=.4776b227-eb5f-4cb7-90a3-2059ba554bce@github.com> On Fri, 10 May 2024 07:56:06 GMT, Fei Yang wrote: >> Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 12 additional commits since the last revision: >> >> - Merge branch 'master' into jal-fixes >> - Revert JNI field, call()->li() >> - Use li instead of movptr for call >> - REVERT: Use li instead of movptr >> - Use li instead of movptr >> - VM leaf should use li >> - Merge branch 'master' into jal-fixes >> - Merge branch 'master' into jal-fixes >> - Merge branch 'master' into jal-fixes >> - Corrected method name >> - ... and 2 more: https://git.openjdk.org/jdk/compare/4aeaf336...d53e9694 > > src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 653: > >> 651: push_reg(RegSet::of(t0, xmethod), sp); // push << t0 & xmethod >> to sp >> 652: mv(t0, entry_point, offset); >> 653: Assembler::jalr(x1, t0, offset); > > You might want a `jalr(t0, offset)` here? > > (I also see serveral other occurrences of `Assembler::jalr(Register Rd, Register Rs, const int32_t offset)` like in `MacroAssembler::call`. Is there a reason to not use `MacroAssembler::jalr(Register Rs, int32_t offset)` for those places?) Fixed ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18942#discussion_r1597927063 From jsjolen at openjdk.org Mon May 13 06:26:04 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Mon, 13 May 2024 06:26:04 GMT Subject: RFR: 8332042: Move MEMFLAGS to its own include file [v2] In-Reply-To: References: Message-ID: On Mon, 13 May 2024 04:55:24 GMT, Thomas Stuefe wrote: >> MEMFLAGS, as well as its enum constants, should live in its own include. >> >> The constants are used throughout the code base, often without needing the allocation APIs exposed through allocation.hpp. >> >> The MEMFLAGS enum def is often needed within NMT itself, again often without needing allocation.hpp. >> >> --- >> >> This patch moves the enum to its new file. >> >> It fixes those `allocation.hpp` includes that where only needed to get MEMFLAGS. It does not fix other includes. >> >> For backward compatibility, until we straightened out the dependencies (e.g., fixing all places where we rely on indirect includes), I added memflags.hpp to allocation.hpp. >> >> I tested (built) on: >> - MacOS aarch64, no precompiled headers, fastdebug >> - Linux x64, no precompiled headers, fastdebug, release, fastdebug crossbuild to aarch64, fastdebug minimal > > Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: > > Update mallocLimit.hpp Seems reasonable and LGTM, thanks. ------------- Marked as reviewed by jsjolen (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19172#pullrequestreview-2051788348 From jsjolen at openjdk.org Mon May 13 06:30:13 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Mon, 13 May 2024 06:30:13 GMT Subject: RFR: 8312132: Add tracking of multiple address spaces in NMT [v67] In-Reply-To: <5ZO2gFz9Mwb3V8g71tnSpzaGUfEmHsUK_DJbw7fbVAE=.f5c95bc6-fd99-4bde-9c30-7bdcef3234b9@github.com> References: <5ZO2gFz9Mwb3V8g71tnSpzaGUfEmHsUK_DJbw7fbVAE=.f5c95bc6-fd99-4bde-9c30-7bdcef3234b9@github.com> Message-ID: On Tue, 7 May 2024 13:32:53 GMT, Thomas Stuefe wrote: >I would also make this method debug-only. It is in `#define ASSERT`, that's what you meant, right? Rest sounds good. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18289#discussion_r1597935810 From jsjolen at openjdk.org Mon May 13 06:34:14 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Mon, 13 May 2024 06:34:14 GMT Subject: RFR: 8312132: Add tracking of multiple address spaces in NMT [v67] In-Reply-To: <5ZO2gFz9Mwb3V8g71tnSpzaGUfEmHsUK_DJbw7fbVAE=.f5c95bc6-fd99-4bde-9c30-7bdcef3234b9@github.com> References: <5ZO2gFz9Mwb3V8g71tnSpzaGUfEmHsUK_DJbw7fbVAE=.f5c95bc6-fd99-4bde-9c30-7bdcef3234b9@github.com> Message-ID: On Tue, 7 May 2024 13:40:07 GMT, Thomas Stuefe wrote: >> Johan Sj?len has updated the pull request incrementally with four additional commits since the last revision: >> >> - Remove GEQ_B >> - Move things around slightly to be closer to usage >> - Simplify code >> - Remove superfluous comment > > src/hotspot/share/nmt/nmtTreap.hpp line 66: > >> 64: _right(nullptr) { >> 65: } >> 66: > > condense this code a little? Getters can be one-liners I personally prefer it being non-condensed, but Afshin also mentioned this so I consider myself over-ruled :). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18289#discussion_r1597939452 From jsjolen at openjdk.org Mon May 13 06:54:12 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Mon, 13 May 2024 06:54:12 GMT Subject: RFR: 8312132: Add tracking of multiple address spaces in NMT [v74] In-Reply-To: References: Message-ID: On Thu, 9 May 2024 11:04:41 GMT, Thomas Stuefe wrote: >> Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: >> >> Some style > > src/hotspot/share/nmt/nmtTreap.hpp line 287: > >> 285: if (leqB != nullptr && leqB->key() == key) { >> 286: return leqB; >> 287: } > > I don't get this, why is this leq search needed? And if its needed, is that not redundant to the code below that compares for key equality? Hi, this function is deleted as it's no longer in use. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18289#discussion_r1597957808 From rehn at openjdk.org Mon May 13 07:00:16 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Mon, 13 May 2024 07:00:16 GMT Subject: RFR: 8326306: RISC-V: Re-structure MASM calls and jumps [v10] In-Reply-To: References: Message-ID: <4gLXDJvqkICJALID11qOTUjW_HKTVP0Ltnq3bN1UPoY=.35148a65-1a55-420e-99b7-72ad324a03e1@github.com> On Fri, 10 May 2024 08:00:49 GMT, Fei Yang wrote: >> Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 12 additional commits since the last revision: >> >> - Merge branch 'master' into jal-fixes >> - Revert JNI field, call()->li() >> - Use li instead of movptr for call >> - REVERT: Use li instead of movptr >> - Use li instead of movptr >> - VM leaf should use li >> - Merge branch 'master' into jal-fixes >> - Merge branch 'master' into jal-fixes >> - Merge branch 'master' into jal-fixes >> - Corrected method name >> - ... and 2 more: https://git.openjdk.org/jdk/compare/3438e4d1...d53e9694 > > src/hotspot/cpu/riscv/assembler_riscv.hpp line 2836: > >> 2834: Rd == x0 && >> 2835: is_simm12(offset) && ((offset % 2) == 0)) { >> 2836: c_j(offset); > > Is RV32C-only instructions usable for RV64C which is our case? Or will this if block be test covered? > > The spec says: > > In addition, RV32C includes a compressed jump and link instruction to compress > short-range subroutine calls, where the same opcode is used to compress ADDIW for RV64C and > RV128C. There are two instructions using the CJ format: c.j and c.jal. - c.j is already in our assembler. - c.jal is not in our assembler and is RV32C only. 000000000000064e : 64e: 11 a0 j 0x652 // 0xa011 is c.j 4 650: 2a 85 mv a0, a0 652: 82 80 ret Works fine on both VF2 and qemu-rv64. The comment was just saying that we can't try to map to c.jal since it's RV32C. The instruction c.j have the same test coverage as before (as we already have it in assembler). The toogle to C I think is untested as it only can be generated for short backwards branches in non-relocated code. As the comment is obvious confusing, suggestions to change it? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18942#discussion_r1597965012 From fyang at openjdk.org Mon May 13 07:21:05 2024 From: fyang at openjdk.org (Fei Yang) Date: Mon, 13 May 2024 07:21:05 GMT Subject: RFR: 8326306: RISC-V: Re-structure MASM calls and jumps [v10] In-Reply-To: <4gLXDJvqkICJALID11qOTUjW_HKTVP0Ltnq3bN1UPoY=.35148a65-1a55-420e-99b7-72ad324a03e1@github.com> References: <4gLXDJvqkICJALID11qOTUjW_HKTVP0Ltnq3bN1UPoY=.35148a65-1a55-420e-99b7-72ad324a03e1@github.com> Message-ID: On Mon, 13 May 2024 06:57:26 GMT, Robbin Ehn wrote: >> src/hotspot/cpu/riscv/assembler_riscv.hpp line 2836: >> >>> 2834: Rd == x0 && >>> 2835: is_simm12(offset) && ((offset % 2) == 0)) { >>> 2836: c_j(offset); >> >> Is RV32C-only instructions usable for RV64C which is our case? Or will this if block be test covered? >> >> The spec says: >> >> In addition, RV32C includes a compressed jump and link instruction to compress >> short-range subroutine calls, where the same opcode is used to compress ADDIW for RV64C and >> RV128C. > > There are two instructions using the CJ format: c.j and c.jal. > - c.j is already in our assembler. > - c.jal is not in our assembler and is RV32C only. > > > 000000000000064e : > 64e: 11 a0 j 0x652 // 0xa011 is c.j 4 > 650: 2a 85 mv a0, a0 > 652: 82 80 ret > > > Works fine on both VF2 and qemu-rv64. > > The comment was just saying that we can't try to map to c.jal since it's RV32C. > > The instruction c.j have the same test coverage as before (as we already have it in assembler). > The toogle to C I think is untested as it only can be generated for short backwards branches in non-relocated code. > > As the comment is obvious confusing, suggestions to change it? Uh, I see. I think I mis-read the code comment. I am OK with the current shape. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18942#discussion_r1597985825 From fyang at openjdk.org Mon May 13 07:30:09 2024 From: fyang at openjdk.org (Fei Yang) Date: Mon, 13 May 2024 07:30:09 GMT Subject: RFR: 8326306: RISC-V: Re-structure MASM calls and jumps [v10] In-Reply-To: References: Message-ID: On Wed, 8 May 2024 15:07:09 GMT, Robbin Ehn wrote: >> Hi, please consider. >> >> We have code that directly use the asm for call/jumps instead masm. >> Our masm have a bit odd naming, and we don't use 'proper' pseudoinstructions/mnemonics. >> Suggested by [riscv-asm-manual](https://github.com/riscv-non-isa/riscv-asm-manual/tree/master) >> >> j offset jal x0, offset Jump >> jal offset jal x1, offset Jump and link >> jr rs jalr x0, rs, 0 Jump register >> jalr rs jalr x1, rs, 0 Jump and link register >> ret jalr x0, x1, 0 Return from subroutine >> call offset auipc x1, offset[31:12]; jalr x1, x1, offset[11:0] Call far-away subroutine >> tail offset auipc x6, offset[31:12]; jalr x0, x6, offset[11:0] Tail call far-away subroutine >> >> But these can only be implemented like this if you have small enough application. >> The fallback of these is to use GOT (your C compiler should place a copy of GOT every 2G so it's always reachable). >> We don't have GOT, instead we materialize, so there is still differences between these and ours. >> >> This patch: >> - Tries to follow these suggested mappings as good we can. >> - Make sure all jumps/calls go through MASM. (so we get control and can easily change for sites using a certain calling convention) >> - To avoid confusion between MASM public/private methods and ASM methods and the mnemonics there are some renaming. >> E.g. the mnemonics jal means call offset, as we can't use that so there is no 'jal'. >> - I enabled c.j, but right now we never generate it. >> - As always the macro does no good and are legacy from when code base did not use templates. (also the x-macros screws up my IDE (vim+rtags)) >> >> I started down this path due to I have followup patch on top of this which removes trampoline in favor for load-n-jump. >> (WIP: https://github.com/robehn/jdk/compare/jal-fixes...robehn:jdk:load-n-link?expand=1) >> While looking into our calls it was a bit confusing, this helps. >> >> Done a couple of t1-3 slightly different version of this patch, and as part of the followup, no issues found. (VF2, qemu, LP4) >> Re-running tests, had some last minute changes. >> >> Thanks, Robbin > > Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 12 additional commits since the last revision: > > - Merge branch 'master' into jal-fixes > - Revert JNI field, call()->li() > - Use li instead of movptr for call > - REVERT: Use li instead of movptr > - Use li instead of movptr > - VM leaf should use li > - Merge branch 'master' into jal-fixes > - Merge branch 'master' into jal-fixes > - Merge branch 'master' into jal-fixes > - Corrected method name > - ... and 2 more: https://git.openjdk.org/jdk/compare/0d421873...d53e9694 src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 858: > 856: assert(temp != noreg && temp != x0, "expecting a register"); > 857: int32_t offset = 0; > 858: movptr(temp, dest, offset); One more question: will `la(temp, dest, offset)` work here? I see you've changed to use `la` for `MacroAssembler::j(const address dest, Register temp)` instead of the `movptr` used before. Seems that we can do similar change here as well. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18942#discussion_r1597996185 From rehn at openjdk.org Mon May 13 07:40:16 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Mon, 13 May 2024 07:40:16 GMT Subject: RFR: 8326306: RISC-V: Re-structure MASM calls and jumps [v10] In-Reply-To: References: Message-ID: On Mon, 13 May 2024 04:05:26 GMT, Fei Yang wrote: >> Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 12 additional commits since the last revision: >> >> - Merge branch 'master' into jal-fixes >> - Revert JNI field, call()->li() >> - Use li instead of movptr for call >> - REVERT: Use li instead of movptr >> - Use li instead of movptr >> - VM leaf should use li >> - Merge branch 'master' into jal-fixes >> - Merge branch 'master' into jal-fixes >> - Merge branch 'master' into jal-fixes >> - Corrected method name >> - ... and 2 more: https://git.openjdk.org/jdk/compare/e688815a...d53e9694 > > src/hotspot/cpu/riscv/macroAssembler_riscv.hpp line 738: > >> 736: typedef void (MacroAssembler::* load_insn_by_temp)(Register Rt, address dest, Register temp); >> 737: >> 738: void wrap_label(Register r, Label &L, Register t, load_insn_by_temp insn); > > Better to remove the `load_insn_by_temp` declaration from file macroAssembler_riscv.hpp as it is not used anymore after this change. > > `typedef void (MacroAssembler::* load_insn_by_temp)(Register Rt, address dest, Register temp);` Fixed ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18942#discussion_r1598009838 From rehn at openjdk.org Mon May 13 08:02:37 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Mon, 13 May 2024 08:02:37 GMT Subject: RFR: 8326306: RISC-V: Re-structure MASM calls and jumps [v11] In-Reply-To: References: Message-ID: > Hi, please consider. > > We have code that directly use the asm for call/jumps instead masm. > Our masm have a bit odd naming, and we don't use 'proper' pseudoinstructions/mnemonics. > Suggested by [riscv-asm-manual](https://github.com/riscv-non-isa/riscv-asm-manual/tree/master) > > j offset jal x0, offset Jump > jal offset jal x1, offset Jump and link > jr rs jalr x0, rs, 0 Jump register > jalr rs jalr x1, rs, 0 Jump and link register > ret jalr x0, x1, 0 Return from subroutine > call offset auipc x1, offset[31:12]; jalr x1, x1, offset[11:0] Call far-away subroutine > tail offset auipc x6, offset[31:12]; jalr x0, x6, offset[11:0] Tail call far-away subroutine > > But these can only be implemented like this if you have small enough application. > The fallback of these is to use GOT (your C compiler should place a copy of GOT every 2G so it's always reachable). > We don't have GOT, instead we materialize, so there is still differences between these and ours. > > This patch: > - Tries to follow these suggested mappings as good we can. > - Make sure all jumps/calls go through MASM. (so we get control and can easily change for sites using a certain calling convention) > - To avoid confusion between MASM public/private methods and ASM methods and the mnemonics there are some renaming. > E.g. the mnemonics jal means call offset, as we can't use that so there is no 'jal'. > - I enabled c.j, but right now we never generate it. > - As always the macro does no good and are legacy from when code base did not use templates. (also the x-macros screws up my IDE (vim+rtags)) > > I started down this path due to I have followup patch on top of this which removes trampoline in favor for load-n-jump. > (WIP: https://github.com/robehn/jdk/compare/jal-fixes...robehn:jdk:load-n-link?expand=1) > While looking into our calls it was a bit confusing, this helps. > > Done a couple of t1-3 slightly different version of this patch, and as part of the followup, no issues found. (VF2, qemu, LP4) > Re-running tests, had some last minute changes. > > Thanks, Robbin Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 14 additional commits since the last revision: - Review changes - Merge branch 'master' into jal-fixes - Merge branch 'master' into jal-fixes - Revert JNI field, call()->li() - Use li instead of movptr for call - REVERT: Use li instead of movptr - Use li instead of movptr - VM leaf should use li - Merge branch 'master' into jal-fixes - Merge branch 'master' into jal-fixes - ... and 4 more: https://git.openjdk.org/jdk/compare/93beb923...c9b59d93 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18942/files - new: https://git.openjdk.org/jdk/pull/18942/files/d53e9694..c9b59d93 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18942&range=10 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18942&range=09-10 Stats: 17499 lines in 277 files changed: 8723 ins; 6115 del; 2661 mod Patch: https://git.openjdk.org/jdk/pull/18942.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18942/head:pull/18942 PR: https://git.openjdk.org/jdk/pull/18942 From rehn at openjdk.org Mon May 13 08:02:39 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Mon, 13 May 2024 08:02:39 GMT Subject: RFR: 8326306: RISC-V: Re-structure MASM calls and jumps [v10] In-Reply-To: References: Message-ID: On Mon, 13 May 2024 04:09:36 GMT, Fei Yang wrote: >> Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 12 additional commits since the last revision: >> >> - Merge branch 'master' into jal-fixes >> - Revert JNI field, call()->li() >> - Use li instead of movptr for call >> - REVERT: Use li instead of movptr >> - Use li instead of movptr >> - VM leaf should use li >> - Merge branch 'master' into jal-fixes >> - Merge branch 'master' into jal-fixes >> - Merge branch 'master' into jal-fixes >> - Corrected method name >> - ... and 2 more: https://git.openjdk.org/jdk/compare/41e8083a...d53e9694 > > src/hotspot/cpu/riscv/macroAssembler_riscv.hpp line 717: > >> 715: return x < (twoG - twoK) && x >= (-twoG - twoK); >> 716: } >> 717: > > Can you add a code comment for this new function (`is_32bit_offset_from_codecache`) to make it easier to understand? Maybe something like: > `Ensure that the auipc can reach the destination at x from anywhere within the code cache so that if it is relocated we know it will still reach.` Fixed ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18942#discussion_r1598037295 From aph at openjdk.org Mon May 13 08:42:07 2024 From: aph at openjdk.org (Andrew Haley) Date: Mon, 13 May 2024 08:42:07 GMT Subject: RFR: 8332066: AArch64: Math test failures since JDK-8331558 In-Reply-To: References: Message-ID: <5RxDBhPOV324hGgMAIdYjIbvC1YmvzZLoYe9Pc59dfY=.bb9c5ff6-c4bc-4548-b106-46e2cf993170@github.com> On Mon, 13 May 2024 01:36:28 GMT, David Holmes wrote: > If this was a backout of the earlier change then it should have followed the official process for backing out a change: https://openjdk.org/guide/#backing-out-a-change What does that mean in this case? There was an issue raised about the regression, [JDK-8332066](https://bugs.openjdk.org/browse/JDK-8332066), but the process doesn't mention an issue of this kind. What should happen to it? Should an _additional_ backout issue have been created? Or should JDK-8332066 have been renamed? ------------- PR Comment: https://git.openjdk.org/jdk/pull/19177#issuecomment-2106981614 From tschatzl at openjdk.org Mon May 13 08:59:11 2024 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Mon, 13 May 2024 08:59:11 GMT Subject: RFR: 8330694: Rename 'HeapRegion' to 'G1HeapRegion' [v11] In-Reply-To: <-TstTKGbE-Ewn6GcQHrBqW4XQPpeMmOwxb-TeeXMLdA=.4e38776a-1439-4112-9350-9cec07c0bd83@github.com> References: <3IdWn9VGEERd8v9RcH2E_LzjVo0L8nMfi5jGWmhgVuM=.6b5b3be4-bfbd-4376-9580-48d78d75665c@github.com> <-TstTKGbE-Ewn6GcQHrBqW4XQPpeMmOwxb-TeeXMLdA=.4e38776a-1439-4112-9350-9cec07c0bd83@github.com> Message-ID: On Sun, 12 May 2024 06:01:27 GMT, Lei Zaakjyu wrote: >> follow up 8267941 > > Lei Zaakjyu has updated the pull request incrementally with one additional commit since the last revision: > > fix I'm good with leaving the `heapRegionIterator()` method name as is, but please make sure that @plummercj is good with this too. ------------- Marked as reviewed by tschatzl (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18871#pullrequestreview-2052094846 From ayang at openjdk.org Mon May 13 09:47:12 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Mon, 13 May 2024 09:47:12 GMT Subject: RFR: 8329839: Cleanup ZPhysicalMemoryBacking trace logging In-Reply-To: References: Message-ID: <-Vi1F9tLhV6tCthw0-O_pIKAH85RW0QyP709rR3o3FA=.bade0fea-7bc6-4a8d-928e-81f4441e6aa5@github.com> On Mon, 8 Apr 2024 09:12:33 GMT, Axel Boldt-Christmas wrote: > On bsd the MB scaling is only performed on the length and not the base offset so the numbers printed are wrong. > > On all other platforms the `zoffset` type is used incorrectly and should use `zoffset_end` when printing offsets that point to the end of a range. Marked as reviewed by ayang (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/18671#pullrequestreview-2052206406 From stefank at openjdk.org Mon May 13 10:06:17 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Mon, 13 May 2024 10:06:17 GMT Subject: RFR: 8330275: Crash in XMark::follow_array [v6] In-Reply-To: References: Message-ID: <5UNz9xRBR7ZN44DF3siVI2XKEUr-q1GUovaTo3xXDWY=.848e3fc8-e24c-4512-a724-0adbf2c03d9e@github.com> On Wed, 8 May 2024 16:34:07 GMT, Ashutosh Mehra wrote: >> This PR addresses the issue in ZGC where the number of address offset bits can go beyond the limit imposed by the encoding scheme in mark stack, thereby causing the encoding to fail. >> Encoding of partial array offset in mark stack requires that the max address bit be no more than 46 bit. ~~But the current mechanism to probe maximum address offset bits on aarch64, riscv and ppc platforms can return value larger that 44 bits. This patch sets the maximum address offset bits to 44.~~ >> >> ~~I have updated the generational mode to avoid subtracting 3 bits from the maximum address offset bit probed by the system, as the generational mode does not use multi-mapping.~~ >> >> ~~I have also updated the code to set MarkPartialArrayMinSizeShift dynamically depending on the number of address offset bits used. This would avoid running into such problem again if in future maximum address offset bits is increased beyond 44.~~ >> >> ~~For some reason (that I can't comprehend from the code) the existing implementation for probing the max addressable bit for ppc in non-generation ZGC is very different from other platforms and from generational mode as well. I have kept the existing implementation as is and just fixed it to ensure it does not return value greater than 44 bits.~~ >> >> Testing: ~~test/hotspot/jtreg/gc/z and test/hotspot/jtreg/gc/x on x86~~ tier1, tier2 and tier3 on aarch64 using fastdebug build with options JTREG="EXTRA_PROBLEM_LISTS=ProblemList-zgc.txt;JAVA_OPTIONS=-XX:+UseZGC -XX:+ZVerifyOops;JOBS=4" (as per the suggestion in [JDK-8330275](https://bugs.openjdk.org/browse/JDK-8330275?focusedId=14667864&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14667864)) >> >> Update: Striked out the changes that are not relevant now that it is only doing a point fix for aarch64 > > Ashutosh Mehra has updated the pull request incrementally with one additional commit since the last revision: > > Restore the comment around max addressable memory but leave out actual numbers that can be confusing > > Signed-off-by: Ashutosh Mehra src/hotspot/cpu/aarch64/gc/z/zAddress_aarch64.cpp line 41: > 39: // Default value if probing is not implemented for a certain platform > 40: // Max address bit is restricted by implicit assumptions in the code, for instance > 41: // the bit layout of XForwardingEntry or Partial array entry (see XMarkStackEntry) in mark stack This comment was copy-n-pasted without updating the names to ZForwardingEntry and ZMarkStackEntry. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18941#discussion_r1598214359 From rehn at openjdk.org Mon May 13 10:20:30 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Mon, 13 May 2024 10:20:30 GMT Subject: RFR: 8326306: RISC-V: Re-structure MASM calls and jumps [v12] In-Reply-To: References: Message-ID: > Hi, please consider. > > We have code that directly use the asm for call/jumps instead masm. > Our masm have a bit odd naming, and we don't use 'proper' pseudoinstructions/mnemonics. > Suggested by [riscv-asm-manual](https://github.com/riscv-non-isa/riscv-asm-manual/tree/master) > > j offset jal x0, offset Jump > jal offset jal x1, offset Jump and link > jr rs jalr x0, rs, 0 Jump register > jalr rs jalr x1, rs, 0 Jump and link register > ret jalr x0, x1, 0 Return from subroutine > call offset auipc x1, offset[31:12]; jalr x1, x1, offset[11:0] Call far-away subroutine > tail offset auipc x6, offset[31:12]; jalr x0, x6, offset[11:0] Tail call far-away subroutine > > But these can only be implemented like this if you have small enough application. > The fallback of these is to use GOT (your C compiler should place a copy of GOT every 2G so it's always reachable). > We don't have GOT, instead we materialize, so there is still differences between these and ours. > > This patch: > - Tries to follow these suggested mappings as good we can. > - Make sure all jumps/calls go through MASM. (so we get control and can easily change for sites using a certain calling convention) > - To avoid confusion between MASM public/private methods and ASM methods and the mnemonics there are some renaming. > E.g. the mnemonics jal means call offset, as we can't use that so there is no 'jal'. > - I enabled c.j, but right now we never generate it. > - As always the macro does no good and are legacy from when code base did not use templates. (also the x-macros screws up my IDE (vim+rtags)) > > I started down this path due to I have followup patch on top of this which removes trampoline in favor for load-n-jump. > (WIP: https://github.com/robehn/jdk/compare/jal-fixes...robehn:jdk:load-n-link?expand=1) > While looking into our calls it was a bit confusing, this helps. > > Done a couple of t1-3 slightly different version of this patch, and as part of the followup, no issues found. (VF2, qemu, LP4) > Re-running tests, had some last minute changes. > > Thanks, Robbin Robbin Ehn has updated the pull request incrementally with one additional commit since the last revision: Use la() instead movptr where ok. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18942/files - new: https://git.openjdk.org/jdk/pull/18942/files/c9b59d93..b663e872 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18942&range=11 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18942&range=10-11 Stats: 6 lines in 1 file changed: 0 ins; 0 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/18942.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18942/head:pull/18942 PR: https://git.openjdk.org/jdk/pull/18942 From rehn at openjdk.org Mon May 13 10:20:32 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Mon, 13 May 2024 10:20:32 GMT Subject: RFR: 8326306: RISC-V: Re-structure MASM calls and jumps [v10] In-Reply-To: References: Message-ID: On Mon, 13 May 2024 07:26:14 GMT, Fei Yang wrote: >> Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 12 additional commits since the last revision: >> >> - Merge branch 'master' into jal-fixes >> - Revert JNI field, call()->li() >> - Use li instead of movptr for call >> - REVERT: Use li instead of movptr >> - Use li instead of movptr >> - VM leaf should use li >> - Merge branch 'master' into jal-fixes >> - Merge branch 'master' into jal-fixes >> - Merge branch 'master' into jal-fixes >> - Corrected method name >> - ... and 2 more: https://git.openjdk.org/jdk/compare/2d58c054...d53e9694 > > src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 858: > >> 856: assert(temp != noreg && temp != x0, "expecting a register"); >> 857: int32_t offset = 0; >> 858: movptr(temp, dest, offset); > > One more question: will `la(temp, dest, offset)` work here? I see you've changed to use `la` for `MacroAssembler::j(const address dest, Register temp)` instead of the `movptr` used before. Seems that we can do similar change here as well. > (BTW: `rt_call` is another place where we could make use of `la` to replace `movptr` in the else block) Jump can 'only' go to code cache (if someone jumps to C/C++ method we would be in trouble). Therefore I changed to la(), otherwise I tried to keep generate code same:ish. I don't see any reason for why we can't use it for those cases. I think any movptr in a compressible region should be la(). Sanity tested with RCC 2047 and unset. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18942#discussion_r1598231289 From stefank at openjdk.org Mon May 13 10:22:09 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Mon, 13 May 2024 10:22:09 GMT Subject: RFR: 8332042: Move MEMFLAGS to its own include file [v2] In-Reply-To: References: Message-ID: On Mon, 13 May 2024 04:55:24 GMT, Thomas Stuefe wrote: >> MEMFLAGS, as well as its enum constants, should live in its own include. >> >> The constants are used throughout the code base, often without needing the allocation APIs exposed through allocation.hpp. >> >> The MEMFLAGS enum def is often needed within NMT itself, again often without needing allocation.hpp. >> >> --- >> >> This patch moves the enum to its new file. >> >> It fixes those `allocation.hpp` includes that where only needed to get MEMFLAGS. It does not fix other includes. >> >> For backward compatibility, until we straightened out the dependencies (e.g., fixing all places where we rely on indirect includes), I added memflags.hpp to allocation.hpp. >> >> I tested (built) on: >> - MacOS aarch64, no precompiled headers, fastdebug >> - Linux x64, no precompiled headers, fastdebug, release, fastdebug crossbuild to aarch64, fastdebug minimal > > Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: > > Update mallocLimit.hpp Changes requested by stefank (Reviewer). src/hotspot/share/nmt/mallocTracker.hpp line 29: > 27: #define SHARE_NMT_MALLOCTRACKER_HPP > 28: > 29: #include "nmt/memflags.hpp" Should go after mallocHeader.hpp src/hotspot/share/nmt/memflags.cpp line 27: > 25: #include "precompiled.hpp" > 26: > 27: #include "nmt/memflags.hpp" There should be no blankline between precompiled.hpp and the rest of the includes. src/hotspot/share/nmt/memflags.cpp line 31: > 29: > 30: // Extra insurance that MEMFLAGS truly has the same size as uint8_t. > 31: STATIC_ASSERT(sizeof(MEMFLAGS) == sizeof(uint8_t)); I think you can remove this entire .cpp file. There's no need to check the size of an enum with a specified base type. src/hotspot/share/nmt/memflags.hpp line 30: > 28: #include "utilities/globalDefinitions.hpp" > 29: > 30: #define MEMORY_TYPES_DO(f) \ Open-ended comment/question: We call it MEMORY_TYPE and mt, but then we call the type MEMFLAGS (with a completely non-standard UPPERCASE style). Maybe it is time to rename MEMFLAGS? src/hotspot/share/services/mallocLimit.cpp line 28: > 26: #include "precompiled.hpp" > 27: > 28: #include "nmt/memflags.hpp" While poking around in the includes, could you remove the blankline on 27. This style inconsistency has slowly crept into the code base. ------------- PR Review: https://git.openjdk.org/jdk/pull/19172#pullrequestreview-2052269037 PR Review Comment: https://git.openjdk.org/jdk/pull/19172#discussion_r1598224275 PR Review Comment: https://git.openjdk.org/jdk/pull/19172#discussion_r1598225428 PR Review Comment: https://git.openjdk.org/jdk/pull/19172#discussion_r1598226640 PR Review Comment: https://git.openjdk.org/jdk/pull/19172#discussion_r1598229830 PR Review Comment: https://git.openjdk.org/jdk/pull/19172#discussion_r1598231329 From jsjolen at openjdk.org Mon May 13 10:30:17 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Mon, 13 May 2024 10:30:17 GMT Subject: RFR: 8312132: Add tracking of multiple address spaces in NMT [v74] In-Reply-To: <36UojYkDr9uWmWb_n6ilASrGqOGDuRDsGbfclma5fKQ=.546479d7-96e0-493c-918e-1860de1200e9@github.com> References: <36UojYkDr9uWmWb_n6ilASrGqOGDuRDsGbfclma5fKQ=.546479d7-96e0-493c-918e-1860de1200e9@github.com> Message-ID: On Fri, 10 May 2024 11:58:42 GMT, Thomas Stuefe wrote: >> Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: >> >> Some style > > src/hotspot/share/nmt/vmatree.cpp line 156: > >> 154: if (to_be_deleted_inbetween_a_b.length() == 0 && LEQ_A_found) { >> 155: // We must have smashed a hole in an existing region (or replaced it entirely). >> 156: // LEQ_A - A - B - (some node >= B) > > nit, clearer comment (a bit) since at first glance looks like substraction: `LEQ_A < A < B < (some node >= B)`. Alternatively, `LEQ_A [A, B) C` `// LEQ_A < A < B <= C` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18289#discussion_r1598243943 From jsjolen at openjdk.org Mon May 13 10:34:15 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Mon, 13 May 2024 10:34:15 GMT Subject: RFR: 8312132: Add tracking of multiple address spaces in NMT [v74] In-Reply-To: <5V3WB9R1kvj6MFOU8rt8_XeiMJy4UHmS-DvSHxwiwGE=.b7514b9c-e875-4621-a325-a068b7754358@github.com> References: <36UojYkDr9uWmWb_n6ilASrGqOGDuRDsGbfclma5fKQ=.546479d7-96e0-493c-918e-1860de1200e9@github.com> <5V3WB9R1kvj6MFOU8rt8_XeiMJy4UHmS-DvSHxwiwGE=.b7514b9c-e875-4621-a325-a068b7754358@github.com> Message-ID: On Fri, 10 May 2024 12:12:55 GMT, Thomas Stuefe wrote: >> src/hotspot/share/nmt/vmatree.cpp line 2: >> >>> 1: /* >>> 2: * Copyright (c) 2024, Oracle and/or its affiliates. All rights reserved. >> >> Be a dear and add us, please :-) > > Though by this time, you spent more time on the VMATree then I originally did. Still .. Added :-). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18289#discussion_r1598246877 From fyang at openjdk.org Mon May 13 10:46:05 2024 From: fyang at openjdk.org (Fei Yang) Date: Mon, 13 May 2024 10:46:05 GMT Subject: RFR: 8326306: RISC-V: Re-structure MASM calls and jumps [v12] In-Reply-To: References: Message-ID: On Mon, 13 May 2024 10:20:30 GMT, Robbin Ehn wrote: >> Hi, please consider. >> >> We have code that directly use the asm for call/jumps instead masm. >> Our masm have a bit odd naming, and we don't use 'proper' pseudoinstructions/mnemonics. >> Suggested by [riscv-asm-manual](https://github.com/riscv-non-isa/riscv-asm-manual/tree/master) >> >> j offset jal x0, offset Jump >> jal offset jal x1, offset Jump and link >> jr rs jalr x0, rs, 0 Jump register >> jalr rs jalr x1, rs, 0 Jump and link register >> ret jalr x0, x1, 0 Return from subroutine >> call offset auipc x1, offset[31:12]; jalr x1, x1, offset[11:0] Call far-away subroutine >> tail offset auipc x6, offset[31:12]; jalr x0, x6, offset[11:0] Tail call far-away subroutine >> >> But these can only be implemented like this if you have small enough application. >> The fallback of these is to use GOT (your C compiler should place a copy of GOT every 2G so it's always reachable). >> We don't have GOT, instead we materialize, so there is still differences between these and ours. >> >> This patch: >> - Tries to follow these suggested mappings as good we can. >> - Make sure all jumps/calls go through MASM. (so we get control and can easily change for sites using a certain calling convention) >> - To avoid confusion between MASM public/private methods and ASM methods and the mnemonics there are some renaming. >> E.g. the mnemonics jal means call offset, as we can't use that so there is no 'jal'. >> - I enabled c.j, but right now we never generate it. >> - As always the macro does no good and are legacy from when code base did not use templates. (also the x-macros screws up my IDE (vim+rtags)) >> >> I started down this path due to I have followup patch on top of this which removes trampoline in favor for load-n-jump. >> (WIP: https://github.com/robehn/jdk/compare/jal-fixes...robehn:jdk:load-n-link?expand=1) >> While looking into our calls it was a bit confusing, this helps. >> >> Done a couple of t1-3 slightly different version of this patch, and as part of the followup, no issues found. (VF2, qemu, LP4) >> Re-running tests, had some last minute changes. >> >> Thanks, Robbin > > Robbin Ehn has updated the pull request incrementally with one additional commit since the last revision: > > Use la() instead movptr where ok. Updated change looks great! Thank you. ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18942#pullrequestreview-2052334710 From mcimadamore at openjdk.org Mon May 13 11:08:51 2024 From: mcimadamore at openjdk.org (Maurizio Cimadamore) Date: Mon, 13 May 2024 11:08:51 GMT Subject: RFR: 8331671: Implement JEP 472: Prepare to Restrict the Use of JNI Message-ID: This PR implements [JEP 472](https://openjdk.org/jeps/472), by restricting the use of JNI in the following ways: * `System::load` and `System::loadLibrary` are now restricted methods * `Runtime::load` and `Runtime::loadLibrary` are now restricted methods * binding a JNI `native` method declaration to a native implementation is now considered a restricted operation This PR slightly changes the way in which the JDK deals with restricted methods, even for FFM API calls. In Java 22, the single `--enable-native-access` was used both to specify a set of modules for which native access should be allowed *and* to specify whether illegal native access (that is, native access occurring from a module not specified by `--enable-native-access`) should be treated as an error or a warning. More specifically, an error is only issued if the `--enable-native-access flag` is used at least once. Here, a new flag is introduced, namely `illegal-native-access=allow/warn/deny`, which is used to specify what should happen when access to a restricted method and/or functionality is found outside the set of modules specified with `--enable-native-access`. The default policy is `warn`, but users can select `allow` to suppress the warnings, or `deny` to cause `IllegalCallerException` to be thrown. This aligns the treatment of restricted methods with other mechanisms, such as `--illegal-access` and the more recent `--sun-misc-unsafe-memory-access`. Some changes were required in the package-info javadoc for `java.lang.foreign`, to reflect the changes in the command line flags described above. ------------- Commit messages: - Initial push Changes: https://git.openjdk.org/jdk/pull/19213/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19213&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8331671 Stats: 466 lines in 99 files changed: 301 ins; 53 del; 112 mod Patch: https://git.openjdk.org/jdk/pull/19213.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19213/head:pull/19213 PR: https://git.openjdk.org/jdk/pull/19213 From mcimadamore at openjdk.org Mon May 13 11:08:51 2024 From: mcimadamore at openjdk.org (Maurizio Cimadamore) Date: Mon, 13 May 2024 11:08:51 GMT Subject: RFR: 8331671: Implement JEP 472: Prepare to Restrict the Use of JNI In-Reply-To: References: Message-ID: On Mon, 13 May 2024 10:42:26 GMT, Maurizio Cimadamore wrote: > This PR implements [JEP 472](https://openjdk.org/jeps/472), by restricting the use of JNI in the following ways: > > * `System::load` and `System::loadLibrary` are now restricted methods > * `Runtime::load` and `Runtime::loadLibrary` are now restricted methods > * binding a JNI `native` method declaration to a native implementation is now considered a restricted operation > > This PR slightly changes the way in which the JDK deals with restricted methods, even for FFM API calls. In Java 22, the single `--enable-native-access` was used both to specify a set of modules for which native access should be allowed *and* to specify whether illegal native access (that is, native access occurring from a module not specified by `--enable-native-access`) should be treated as an error or a warning. More specifically, an error is only issued if the `--enable-native-access flag` is used at least once. > > Here, a new flag is introduced, namely `illegal-native-access=allow/warn/deny`, which is used to specify what should happen when access to a restricted method and/or functionality is found outside the set of modules specified with `--enable-native-access`. The default policy is `warn`, but users can select `allow` to suppress the warnings, or `deny` to cause `IllegalCallerException` to be thrown. This aligns the treatment of restricted methods with other mechanisms, such as `--illegal-access` and the more recent `--sun-misc-unsafe-memory-access`. > > Some changes were required in the package-info javadoc for `java.lang.foreign`, to reflect the changes in the command line flags described above. Javadoc: https://cr.openjdk.org/~mcimadamore/jdk/8331671/v1/javadoc/api/index.html Specdiff: https://cr.openjdk.org/~mcimadamore/jdk/8331671/v1/specdiff_out/overview-summary.html make/conf/module-loader-map.conf line 105: > 103: java.smartcardio \ > 104: jdk.accessibility \ > 105: jdk.attach \ The list of allowed modules has been rewritten from scratch, by looking at the set of modules containing at least one `native` method declaration. src/hotspot/share/prims/nativeLookup.cpp line 277: > 275: > 276: Klass* klass = vmClasses::ClassLoader_klass(); > 277: Handle jni_class(THREAD, method->method_holder()->java_mirror()); This is the biggest change in this PR. That is, we need to pass enough arguments to `ClassLoader::findNative` so that the method can start a restricted check accordingly. src/java.base/share/classes/java/lang/Module.java line 311: > 309: Module target = moduleForNativeAccess(); > 310: ModuleBootstrap.IllegalNativeAccess illegalNativeAccess = ModuleBootstrap.illegalNativeAccess(); > 311: if (illegalNativeAccess != ModuleBootstrap.IllegalNativeAccess.ALLOW && There are some changes in this code: * this code is no-op if `--illegal-native-access` is set to `allow` * we also attach the location of the problematic class to the warning message, using `CodeSource` * we use the "initial error stream" to emit the warning, similarly to what is done for other runtime warnings src/java.base/share/classes/jdk/internal/reflect/Reflection.java line 115: > 113: @ForceInline > 114: public static void ensureNativeAccess(Class currentClass, Class owner, String methodName) { > 115: if (VM.isModuleSystemInited()) { If we call this code too early, we can see cases where `module` is `null`. src/java.desktop/macosx/classes/com/apple/eio/FileManager.java line 61: > 59: } > 60: > 61: @SuppressWarnings({"removal", "restricted"}) There are several of these changes. One option might have been to just disable restricted warnings when building. But on a deeper look, I realized that in all these places we already disabled deprecation warnings for the use of security manager, so I also added a new suppression instead. test/jdk/java/foreign/enablenativeaccess/panama_jni_load_module/module-info.java line 24: > 22: */ > 23: > 24: module panama_jni_load_module { This module setup is a bit convoluted, but I wanted to make sure that we got separate warnings for `System.loadLibrary` and binding of the `native` method, and that warning on the _use_ of the native method was not generated (typically, all three operations occur in the same module). ------------- PR Comment: https://git.openjdk.org/jdk/pull/19213#issuecomment-2107272261 PR Review Comment: https://git.openjdk.org/jdk/pull/19213#discussion_r1598269825 PR Review Comment: https://git.openjdk.org/jdk/pull/19213#discussion_r1598271285 PR Review Comment: https://git.openjdk.org/jdk/pull/19213#discussion_r1598274987 PR Review Comment: https://git.openjdk.org/jdk/pull/19213#discussion_r1598276455 PR Review Comment: https://git.openjdk.org/jdk/pull/19213#discussion_r1598277853 PR Review Comment: https://git.openjdk.org/jdk/pull/19213#discussion_r1598279827 From jsjolen at openjdk.org Mon May 13 11:14:40 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Mon, 13 May 2024 11:14:40 GMT Subject: RFR: 8312132: Add tracking of multiple address spaces in NMT [v75] In-Reply-To: References: Message-ID: > Hi, > > This PR introduces a new abstraction to NMT, named `MemoryFileTracker`. Today, NMT does not track any memory outside of the virtual memory address space. This means that if you allocated memory in something such as a memory-backed file and use `mmap` to map into that memory, then you'll have trouble reporting this to NMT. This is the situation that ZGC is in, and that is what this patch attempts to fix. > > ## `MemoryFileTracker` > > The `MemoryFileTracker` adds the ability of adding new virtual memory address spaces to NMT and committing memory to these, the basic API is: > > ```c++ > static MemoryFile* make_device(const char* descriptive_name); > static void free_device(MemoryFile* device); > > static void allocate_memory(MemoryFile* device, size_t offset, size_t size, > MEMFLAGS flag, const NativeCallStack& stack); > static void free_memory(MemoryFile* device, size_t offset, size_t size); > > > It is easiest to see how this is used by looking at what ZGC's `ZNMT` class does: > > ```c++ > void ZNMT::reserve(zaddress_unsafe start, size_t size) { > MemTracker::record_virtual_memory_reserve((address)start, size, CALLER_PC, mtJavaHeap); > } > void ZNMT::commit(zoffset offset, size_t size) { > MemTracker::allocate_memory_in(ZNMT::_device, static_cast(offset), size, mtJavaHeap, CALLER_PC); > } > void ZNMT::uncommit(zoffset offset, size_t size) { > MemTracker::free_memory_in(ZNMT::_device, (size_t)offset, size); > } > > void ZNMT::map(zaddress_unsafe addr, size_t size, zoffset offset) { > // NMT doesn't track mappings at the moment. > } > void ZNMT::unmap(zaddress_unsafe addr, size_t size) { > // NMT doesn't track mappings at the moment. > } > > > As you can see, any mapping between reserved regions and device-allocated memory is not recorded in NMT. This means that in detailed mode you only get reserved regions printed for the reserved memory, the device-allocated memory is reported separately. When performing summary reporting any memory allocated via these devices is added to the corresponding `MEMFLAGS` as `committed` memory. > > This patch is also acting as a base on which we deploy multiple new backend ideas to NMT. These ideas are: > > 1. Implement VMA tracking using a balanced binary tree approach. Today's `VirtualMemoryTracker`'s usage of linked lists is slow and brittle, we'd like to move away from it. Our Treap-based approach in this patch gives a performance boost such that we see 25x better performance in a benchmark. The idea and draft of this... Johan Sj?len has updated the pull request incrementally with seven additional commits since the last revision: - Constify more of the API - Introduce empty_regiondata - Expand fst, snd and remove this-> - Rename - Fixes - Rename to RegionData - Cleanup tstuefe no0 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18289/files - new: https://git.openjdk.org/jdk/pull/18289/files/137af84f..622fe067 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18289&range=74 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18289&range=73-74 Stats: 213 lines in 4 files changed: 36 ins; 48 del; 129 mod Patch: https://git.openjdk.org/jdk/pull/18289.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18289/head:pull/18289 PR: https://git.openjdk.org/jdk/pull/18289 From jsjolen at openjdk.org Mon May 13 11:25:39 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Mon, 13 May 2024 11:25:39 GMT Subject: RFR: 8312132: Add tracking of multiple address spaces in NMT [v75] In-Reply-To: References: Message-ID: On Mon, 13 May 2024 11:14:40 GMT, Johan Sj?len wrote: >> Hi, >> >> This PR introduces a new abstraction to NMT, named `MemoryFileTracker`. Today, NMT does not track any memory outside of the virtual memory address space. This means that if you allocated memory in something such as a memory-backed file and use `mmap` to map into that memory, then you'll have trouble reporting this to NMT. This is the situation that ZGC is in, and that is what this patch attempts to fix. >> >> ## `MemoryFileTracker` >> >> The `MemoryFileTracker` adds the ability of adding new virtual memory address spaces to NMT and committing memory to these, the basic API is: >> >> ```c++ >> static MemoryFile* make_device(const char* descriptive_name); >> static void free_device(MemoryFile* device); >> >> static void allocate_memory(MemoryFile* device, size_t offset, size_t size, >> MEMFLAGS flag, const NativeCallStack& stack); >> static void free_memory(MemoryFile* device, size_t offset, size_t size); >> >> >> It is easiest to see how this is used by looking at what ZGC's `ZNMT` class does: >> >> ```c++ >> void ZNMT::reserve(zaddress_unsafe start, size_t size) { >> MemTracker::record_virtual_memory_reserve((address)start, size, CALLER_PC, mtJavaHeap); >> } >> void ZNMT::commit(zoffset offset, size_t size) { >> MemTracker::allocate_memory_in(ZNMT::_device, static_cast(offset), size, mtJavaHeap, CALLER_PC); >> } >> void ZNMT::uncommit(zoffset offset, size_t size) { >> MemTracker::free_memory_in(ZNMT::_device, (size_t)offset, size); >> } >> >> void ZNMT::map(zaddress_unsafe addr, size_t size, zoffset offset) { >> // NMT doesn't track mappings at the moment. >> } >> void ZNMT::unmap(zaddress_unsafe addr, size_t size) { >> // NMT doesn't track mappings at the moment. >> } >> >> >> As you can see, any mapping between reserved regions and device-allocated memory is not recorded in NMT. This means that in detailed mode you only get reserved regions printed for the reserved memory, the device-allocated memory is reported separately. When performing summary reporting any memory allocated via these devices is added to the corresponding `MEMFLAGS` as `committed` memory. >> >> This patch is also acting as a base on which we deploy multiple new backend ideas to NMT. These ideas are: >> >> 1. Implement VMA tracking using a balanced binary tree approach. Today's `VirtualMemoryTracker`'s usage of linked lists is slow and brittle, we'd like to move away from it. Our Treap-based approach in this patch gives a performance bo... > > Johan Sj?len has updated the pull request incrementally with seven additional commits since the last revision: > > - Constify more of the API > - Introduce empty_regiondata > - Expand fst, snd and remove this-> > - Rename > - Fixes > - Rename to RegionData > - Cleanup tstuefe no0 One more round of handling reviews. Still have a lot left. ------------- PR Review: https://git.openjdk.org/jdk/pull/18289#pullrequestreview-2052318563 From jsjolen at openjdk.org Mon May 13 11:25:40 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Mon, 13 May 2024 11:25:40 GMT Subject: RFR: 8312132: Add tracking of multiple address spaces in NMT [v74] In-Reply-To: References: Message-ID: On Thu, 9 May 2024 10:38:14 GMT, Thomas Stuefe wrote: >> Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: >> >> Some style > > src/hotspot/share/nmt/nmtTreap.hpp line 203: > >> 201: TreapNode* last_seen = nullptr; >> 202: bool failed = false; >> 203: this->visit_in_order([&](TreapNode* node) { > > Here, and in other places: what's with the this-> ? Just being very explicit :-). No need for it. > src/hotspot/share/nmt/nmtTreap.hpp line 240: > >> 238: DEBUG_ONLY(_node_count++;) >> 239: // Doesn't exist, make node >> 240: void* node_place = ALLOCATOR::allocate(sizeof(TreapNode)); > > Please make it explicit in the class definition that the ALLOCATOR must be checking for oom and exit or whatever. Done. > src/hotspot/share/nmt/nmtTreap.hpp line 254: > >> 252: >> 253: // (LEQ_k, GT_k) >> 254: node_pair fst_split = split(this->_root, k, LEQ); > > Can we afford some more letters please? :-) fst : first snd : second > But since you use left and right in other places, I'd use that too here. Left/right are a bit confusing here, first and second do not correspond to each other in a left/right fashion while the resulting pair of a split does. It is true that `merge(second.left, second.right) == first.left`. > src/hotspot/share/nmt/nmtTreap.hpp line 307: > >> 305: } >> 306: >> 307: TreapNode* closest_leq(const K& key) { > > I don't understand the naming of the variables. What is A? _n? _r? > And "_head" is somewhat misleading. I would have named head=pos or current, leqA_n = best or found or candidate or best_so_far... any of these That's fair > src/hotspot/share/nmt/virtualMemoryTracker.hpp line 33: > >> 31: #include "nmt/allocationSite.hpp" >> 32: #include "nmt/nmtCommon.hpp" >> 33: #include "runtime/atomic.hpp" > > Please add includes only where needed, directly. Let's not rely on indirect includes. Unless this is a remnant from some earlier version, then pls just remove it. L71: ```c++ inline size_t peak_size() const { return Atomic::load(&_peak_size); } We probably were relying on an indirect include previously. Can we keep this even though it's irrelevant to the PR :-)? > src/hotspot/share/nmt/vmatree.hpp line 117: > >> 115: } >> 116: }; >> 117: > > Do `IntervalState` and `IntervalChange` need to be exposed in the header? No, they do not. I've found it difficult to clean up so that we have the minimal amount of private/public switches in the class, so I'll do the easiest change instead of the cleanest change. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18289#discussion_r1598277413 PR Review Comment: https://git.openjdk.org/jdk/pull/18289#discussion_r1598276045 PR Review Comment: https://git.openjdk.org/jdk/pull/18289#discussion_r1598274713 PR Review Comment: https://git.openjdk.org/jdk/pull/18289#discussion_r1598269216 PR Review Comment: https://git.openjdk.org/jdk/pull/18289#discussion_r1598251906 PR Review Comment: https://git.openjdk.org/jdk/pull/18289#discussion_r1598256008 From jsjolen at openjdk.org Mon May 13 11:25:40 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Mon, 13 May 2024 11:25:40 GMT Subject: RFR: 8312132: Add tracking of multiple address spaces in NMT [v74] In-Reply-To: <5V3WB9R1kvj6MFOU8rt8_XeiMJy4UHmS-DvSHxwiwGE=.b7514b9c-e875-4621-a325-a068b7754358@github.com> References: <36UojYkDr9uWmWb_n6ilASrGqOGDuRDsGbfclma5fKQ=.546479d7-96e0-493c-918e-1860de1200e9@github.com> <5V3WB9R1kvj6MFOU8rt8_XeiMJy4UHmS-DvSHxwiwGE=.b7514b9c-e875-4621-a325-a068b7754358@github.com> Message-ID: On Fri, 10 May 2024 12:11:56 GMT, Thomas Stuefe wrote: >> src/hotspot/share/nmt/vmatree.hpp line 151: >> >>> 149: >>> 150: SummaryDiff release_mapping(position from, position sz) { >>> 151: Metadata empty; >> >> Just a nit, but instead of the Metadata::Metadata() ctor creating an empty object, could we possibly scrap the default ctor and have an explicit static constexpr Metadata empty with invalid stackindex and mtNone as ctor args? I find that nicer to read. > > Oh, another thing, maybe rename this to something else. `Metadata` has a clear meaning in hotspot. Maybe something like RegionData? Hi Thomas, what naming scheme should I use for the empty static RegionData? `EmptyRegionData` or `empty_regiondata`? It can't be `constexpr`, but we can make it `const`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18289#discussion_r1598288301 From shade at openjdk.org Mon May 13 11:26:09 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 13 May 2024 11:26:09 GMT Subject: RFR: 8332082: Shenandoah: Use SATB active flag for C2 pre-write barrier on x86 and PPC In-Reply-To: References: Message-ID: On Fri, 10 May 2024 16:13:51 GMT, William Kemper wrote: > This is consistent with c1 and other platforms. I agree there is an inconsistency, but I also see it is deeper than just these two. https://github.com/openjdk/jdk/blob/1484153c1a092cefc20270b35aa1e508280843a4/src/hotspot/cpu/aarch64/gc/shenandoah/shenandoahBarrierSetAssembler_aarch64.cpp#L696-L698 Before we go hard on either gc-state or SATB "active" flag, let's decide which way we go? The underlying issue, IIRC, was that hardly any other GC implementation has a GC state flag, so they are forced to use the SATB "active" flags. But I am thinking that testing the gc-state flag is better on these paths, since the gc-state flag is guaranteed to be uncontended and fast-accessible. ------------- PR Review: https://git.openjdk.org/jdk/pull/19180#pullrequestreview-2052409802 From rehn at openjdk.org Mon May 13 11:32:15 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Mon, 13 May 2024 11:32:15 GMT Subject: RFR: 8326306: RISC-V: Re-structure MASM calls and jumps [v12] In-Reply-To: References: Message-ID: On Mon, 13 May 2024 10:43:04 GMT, Fei Yang wrote: > Updated change looks great! Thank you. Thanks for sticking with it, and thanks for a good review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/18942#issuecomment-2107330359 From mcimadamore at openjdk.org Mon May 13 11:42:04 2024 From: mcimadamore at openjdk.org (Maurizio Cimadamore) Date: Mon, 13 May 2024 11:42:04 GMT Subject: RFR: 8331671: Implement JEP 472: Prepare to Restrict the Use of JNI [v2] In-Reply-To: References: Message-ID: > This PR implements [JEP 472](https://openjdk.org/jeps/472), by restricting the use of JNI in the following ways: > > * `System::load` and `System::loadLibrary` are now restricted methods > * `Runtime::load` and `Runtime::loadLibrary` are now restricted methods > * binding a JNI `native` method declaration to a native implementation is now considered a restricted operation > > This PR slightly changes the way in which the JDK deals with restricted methods, even for FFM API calls. In Java 22, the single `--enable-native-access` was used both to specify a set of modules for which native access should be allowed *and* to specify whether illegal native access (that is, native access occurring from a module not specified by `--enable-native-access`) should be treated as an error or a warning. More specifically, an error is only issued if the `--enable-native-access flag` is used at least once. > > Here, a new flag is introduced, namely `illegal-native-access=allow/warn/deny`, which is used to specify what should happen when access to a restricted method and/or functionality is found outside the set of modules specified with `--enable-native-access`. The default policy is `warn`, but users can select `allow` to suppress the warnings, or `deny` to cause `IllegalCallerException` to be thrown. This aligns the treatment of restricted methods with other mechanisms, such as `--illegal-access` and the more recent `--sun-misc-unsafe-memory-access`. > > Some changes were required in the package-info javadoc for `java.lang.foreign`, to reflect the changes in the command line flags described above. Maurizio Cimadamore has updated the pull request incrementally with one additional commit since the last revision: Avoid call to VM::isModuleSystemInited Use initial error stream ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19213/files - new: https://git.openjdk.org/jdk/pull/19213/files/d9fe9a71..c4938dc7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19213&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19213&range=00-01 Stats: 11 lines in 2 files changed: 3 ins; 0 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/19213.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19213/head:pull/19213 PR: https://git.openjdk.org/jdk/pull/19213 From mcimadamore at openjdk.org Mon May 13 11:42:04 2024 From: mcimadamore at openjdk.org (Maurizio Cimadamore) Date: Mon, 13 May 2024 11:42:04 GMT Subject: RFR: 8331671: Implement JEP 472: Prepare to Restrict the Use of JNI [v2] In-Reply-To: References: Message-ID: On Mon, 13 May 2024 11:38:40 GMT, Maurizio Cimadamore wrote: >> This PR implements [JEP 472](https://openjdk.org/jeps/472), by restricting the use of JNI in the following ways: >> >> * `System::load` and `System::loadLibrary` are now restricted methods >> * `Runtime::load` and `Runtime::loadLibrary` are now restricted methods >> * binding a JNI `native` method declaration to a native implementation is now considered a restricted operation >> >> This PR slightly changes the way in which the JDK deals with restricted methods, even for FFM API calls. In Java 22, the single `--enable-native-access` was used both to specify a set of modules for which native access should be allowed *and* to specify whether illegal native access (that is, native access occurring from a module not specified by `--enable-native-access`) should be treated as an error or a warning. More specifically, an error is only issued if the `--enable-native-access flag` is used at least once. >> >> Here, a new flag is introduced, namely `illegal-native-access=allow/warn/deny`, which is used to specify what should happen when access to a restricted method and/or functionality is found outside the set of modules specified with `--enable-native-access`. The default policy is `warn`, but users can select `allow` to suppress the warnings, or `deny` to cause `IllegalCallerException` to be thrown. This aligns the treatment of restricted methods with other mechanisms, such as `--illegal-access` and the more recent `--sun-misc-unsafe-memory-access`. >> >> Some changes were required in the package-info javadoc for `java.lang.foreign`, to reflect the changes in the command line flags described above. > > Maurizio Cimadamore has updated the pull request incrementally with one additional commit since the last revision: > > Avoid call to VM::isModuleSystemInited > Use initial error stream src/java.base/share/classes/jdk/internal/reflect/Reflection.java line 124: > 122: if (module != null) { > 123: // not in init phase > 124: Holder.JLA.ensureNativeAccess(module, owner, methodName, currentClass); In an earlier iteration I had a call to `VM::isModuleSystemInited`, but I discovered that caused a performance regression, since that method involves a volatile access. Perhaps we should rethink that part of the init code to use stable fields, but it's probably better done separately. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19213#discussion_r1598328283 From mcimadamore at openjdk.org Mon May 13 11:47:38 2024 From: mcimadamore at openjdk.org (Maurizio Cimadamore) Date: Mon, 13 May 2024 11:47:38 GMT Subject: RFR: 8331671: Implement JEP 472: Prepare to Restrict the Use of JNI [v3] In-Reply-To: References: Message-ID: > This PR implements [JEP 472](https://openjdk.org/jeps/472), by restricting the use of JNI in the following ways: > > * `System::load` and `System::loadLibrary` are now restricted methods > * `Runtime::load` and `Runtime::loadLibrary` are now restricted methods > * binding a JNI `native` method declaration to a native implementation is now considered a restricted operation > > This PR slightly changes the way in which the JDK deals with restricted methods, even for FFM API calls. In Java 22, the single `--enable-native-access` was used both to specify a set of modules for which native access should be allowed *and* to specify whether illegal native access (that is, native access occurring from a module not specified by `--enable-native-access`) should be treated as an error or a warning. More specifically, an error is only issued if the `--enable-native-access flag` is used at least once. > > Here, a new flag is introduced, namely `illegal-native-access=allow/warn/deny`, which is used to specify what should happen when access to a restricted method and/or functionality is found outside the set of modules specified with `--enable-native-access`. The default policy is `warn`, but users can select `allow` to suppress the warnings, or `deny` to cause `IllegalCallerException` to be thrown. This aligns the treatment of restricted methods with other mechanisms, such as `--illegal-access` and the more recent `--sun-misc-unsafe-memory-access`. > > Some changes were required in the package-info javadoc for `java.lang.foreign`, to reflect the changes in the command line flags described above. Maurizio Cimadamore has updated the pull request incrementally with three additional commits since the last revision: - Fix another typo - Fix typo - Add more comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19213/files - new: https://git.openjdk.org/jdk/pull/19213/files/c4938dc7..bad10942 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19213&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19213&range=01-02 Stats: 5 lines in 1 file changed: 4 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/19213.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19213/head:pull/19213 PR: https://git.openjdk.org/jdk/pull/19213 From stefank at openjdk.org Mon May 13 12:15:09 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Mon, 13 May 2024 12:15:09 GMT Subject: RFR: 8326957: Implement JEP 474: ZGC: Generational Mode by Default [v5] In-Reply-To: References: Message-ID: On Mon, 6 May 2024 07:23:14 GMT, Axel Boldt-Christmas wrote: >> This is the implementation task for `JEP 474: ZGC: Generational Mode by Default`. See the JEP for details. [JDK-8326667](https://bugs.openjdk.org/browse/JDK-8326667) > > Axel Boldt-Christmas has updated the pull request incrementally with one additional commit since the last revision: > > Default to non generational ZGC with JVMCI This is my proposal: For this JEP, I propose that we stick with the current plan that -XX:+UseZGC always mean Generational ZGC. If the user try to enable the Graal JIT when running Generational ZGC, we'll print a warning and switch over to C2, just like we're doing today. Then, as a follow-up, we can figure out what the correct behavior should be if the user specifies an incompatible GC and compiler combination. @tkrodriguez Is this an OK action plan for you? ------------- PR Comment: https://git.openjdk.org/jdk/pull/18393#issuecomment-2107412501 From jsjolen at openjdk.org Mon May 13 12:23:30 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Mon, 13 May 2024 12:23:30 GMT Subject: RFR: 8312132: Add tracking of multiple address spaces in NMT [v76] In-Reply-To: References: Message-ID: > Hi, > > This PR introduces a new abstraction to NMT, named `MemoryFileTracker`. Today, NMT does not track any memory outside of the virtual memory address space. This means that if you allocated memory in something such as a memory-backed file and use `mmap` to map into that memory, then you'll have trouble reporting this to NMT. This is the situation that ZGC is in, and that is what this patch attempts to fix. > > ## `MemoryFileTracker` > > The `MemoryFileTracker` adds the ability of adding new virtual memory address spaces to NMT and committing memory to these, the basic API is: > > ```c++ > static MemoryFile* make_device(const char* descriptive_name); > static void free_device(MemoryFile* device); > > static void allocate_memory(MemoryFile* device, size_t offset, size_t size, > MEMFLAGS flag, const NativeCallStack& stack); > static void free_memory(MemoryFile* device, size_t offset, size_t size); > > > It is easiest to see how this is used by looking at what ZGC's `ZNMT` class does: > > ```c++ > void ZNMT::reserve(zaddress_unsafe start, size_t size) { > MemTracker::record_virtual_memory_reserve((address)start, size, CALLER_PC, mtJavaHeap); > } > void ZNMT::commit(zoffset offset, size_t size) { > MemTracker::allocate_memory_in(ZNMT::_device, static_cast(offset), size, mtJavaHeap, CALLER_PC); > } > void ZNMT::uncommit(zoffset offset, size_t size) { > MemTracker::free_memory_in(ZNMT::_device, (size_t)offset, size); > } > > void ZNMT::map(zaddress_unsafe addr, size_t size, zoffset offset) { > // NMT doesn't track mappings at the moment. > } > void ZNMT::unmap(zaddress_unsafe addr, size_t size) { > // NMT doesn't track mappings at the moment. > } > > > As you can see, any mapping between reserved regions and device-allocated memory is not recorded in NMT. This means that in detailed mode you only get reserved regions printed for the reserved memory, the device-allocated memory is reported separately. When performing summary reporting any memory allocated via these devices is added to the corresponding `MEMFLAGS` as `committed` memory. > > This patch is also acting as a base on which we deploy multiple new backend ideas to NMT. These ideas are: > > 1. Implement VMA tracking using a balanced binary tree approach. Today's `VirtualMemoryTracker`'s usage of linked lists is slow and brittle, we'd like to move away from it. Our Treap-based approach in this patch gives a performance boost such that we see 25x better performance in a benchmark. The idea and draft of this... Johan Sj?len has updated the pull request incrementally with two additional commits since the last revision: - type == Released implies mtNone - Rename accessor to regiondata ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18289/files - new: https://git.openjdk.org/jdk/pull/18289/files/622fe067..6e30331a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18289&range=75 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18289&range=74-75 Stats: 6 lines in 2 files changed: 1 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/18289.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18289/head:pull/18289 PR: https://git.openjdk.org/jdk/pull/18289 From jsjolen at openjdk.org Mon May 13 12:23:30 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Mon, 13 May 2024 12:23:30 GMT Subject: RFR: 8312132: Add tracking of multiple address spaces in NMT [v74] In-Reply-To: <36UojYkDr9uWmWb_n6ilASrGqOGDuRDsGbfclma5fKQ=.546479d7-96e0-493c-918e-1860de1200e9@github.com> References: <36UojYkDr9uWmWb_n6ilASrGqOGDuRDsGbfclma5fKQ=.546479d7-96e0-493c-918e-1860de1200e9@github.com> Message-ID: <3PMzmANSCkXiAHX1DgXrTrOgSM9dwzshODQWL99Hlt0=.60c3162b-6df4-4166-bc04-602fe8c83c10@github.com> On Fri, 10 May 2024 11:38:36 GMT, Thomas Stuefe wrote: >> Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: >> >> Some style > > src/hotspot/share/nmt/vmatree.hpp line 67: > >> 65: >> 66: Metadata(NativeCallStackStorage::StackIndex stack_idx, MEMFLAGS flag) >> 67: : stack_idx(stack_idx), flag(flag) {} > > I would assert here that with state=released, we only ever want to see mtNone. We don't have the state here, do you mean in `IntervalState` ctr? I'm doing `!(type == Released) || flag == mtNone` instead of `!=` to make the usage of `~P v Q equiv. P => Q` obvious. > src/hotspot/share/nmt/vmatree.hpp line 91: > >> 89: StateType type() const { >> 90: return static_cast(type_flag[0]); >> 91: } > > Proposal: provide `is_reserved()` and `is_committed()` and replace manual comparisons with the state enum with those. Easier on the eye. Afshin mentioned this too, I believe, but I want to push back here. I prefer showing what we're doing here (simple comparison) rather than hiding it behind a utility function. Requires less jumping around when reading unknown code. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18289#discussion_r1598375334 PR Review Comment: https://git.openjdk.org/jdk/pull/18289#discussion_r1598378197 From aboldtch at openjdk.org Mon May 13 12:34:53 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Mon, 13 May 2024 12:34:53 GMT Subject: RFR: 8332139: SymbolTableHash::Node allocations allocates twice the required memory Message-ID: <8Q1-f5OGC6_vqM0W-k370VibVVLs7M8Dsyyele4FWT8=.53e09e58-0b6d-437c-85e4-ca89de97c123@github.com> The symbols are inline and allocated together with the ConcurrentHashTable (CHT) Nodes. The calculation used for the required size is `alloc_size = size + value.byte_size() + value.effective_length();` Where * `size == sizeof(SymbolTableHash::Node) == sizeof(void*) + sizeof(Symbol)` * `value.byte_size() == dynamic_sizeof(Symbol) == sizeof(Symbol) + ` * `value.effective_length() == dynamic_sizeof(Symbol) - sizeof(Symbol) == ` So `alloc_size` ends up being `sizeof(void*) /* node metadata */ + 2 * dynamic_sizeof(Symbol)` Because using the CHT with dynamically sized (and inlined) types requires knowing about its implementation details I chose to make the functionality for calculating the the allocation size a property of the CHT. It now queries the CHT for the node allocation size given the dynamic size required for the VALUE. The only current (implicit) restriction regarding using dynamically sized (and inlined) types in CHT is that the _value field C++ object ends where the Node object ends, so there is not padding bytes where the dynamic payload is allocated. (effectively `sizeof(VALUE) % alignof(Node) == 0` as long as there are no non-standard alignment fields in the Node metadata). I chose to test this as a runtime assert that the _value ends where the Node object ends, instead of a static assert with the alignment as it seemed to more explicitly show the intent of the check. Running testing tier1-7 ------------- Commit messages: - 8332139: SymbolTableHash::Node allocations allocates twice the required memory Changes: https://git.openjdk.org/jdk/pull/19214/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19214&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8332139 Stats: 22 lines in 4 files changed: 17 ins; 2 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/19214.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19214/head:pull/19214 PR: https://git.openjdk.org/jdk/pull/19214 From stuefe at openjdk.org Mon May 13 12:54:12 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Mon, 13 May 2024 12:54:12 GMT Subject: RFR: 8312132: Add tracking of multiple address spaces in NMT [v74] In-Reply-To: <3PMzmANSCkXiAHX1DgXrTrOgSM9dwzshODQWL99Hlt0=.60c3162b-6df4-4166-bc04-602fe8c83c10@github.com> References: <36UojYkDr9uWmWb_n6ilASrGqOGDuRDsGbfclma5fKQ=.546479d7-96e0-493c-918e-1860de1200e9@github.com> <3PMzmANSCkXiAHX1DgXrTrOgSM9dwzshODQWL99Hlt0=.60c3162b-6df4-4166-bc04-602fe8c83c10@github.com> Message-ID: On Mon, 13 May 2024 12:17:42 GMT, Johan Sj?len wrote: >> src/hotspot/share/nmt/vmatree.hpp line 67: >> >>> 65: >>> 66: Metadata(NativeCallStackStorage::StackIndex stack_idx, MEMFLAGS flag) >>> 67: : stack_idx(stack_idx), flag(flag) {} >> >> I would assert here that with state=released, we only ever want to see mtNone. > > We don't have the state here, do you mean in `IntervalState` ctr? I'm doing `!(type == Released) || flag == mtNone` instead of `!=` to make the usage of `~P v Q equiv. P => Q` obvious. Sure, works for me >> src/hotspot/share/nmt/vmatree.hpp line 91: >> >>> 89: StateType type() const { >>> 90: return static_cast(type_flag[0]); >>> 91: } >> >> Proposal: provide `is_reserved()` and `is_committed()` and replace manual comparisons with the state enum with those. Easier on the eye. > > Afshin mentioned this too, I believe, but I want to push back here. I prefer showing what we're doing here (simple comparison) rather than hiding it behind a utility function. Requires less jumping around when reading unknown code. Tiny utility functions like this are effectively resolved in modern C++ IDEs like CDS. (In stark contrast to templates, which make IDEs very confused). And a clear name provides safety against accidental typos. I leave this up to you. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18289#discussion_r1598420951 PR Review Comment: https://git.openjdk.org/jdk/pull/18289#discussion_r1598426270 From stuefe at openjdk.org Mon May 13 12:54:12 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Mon, 13 May 2024 12:54:12 GMT Subject: RFR: 8312132: Add tracking of multiple address spaces in NMT [v74] In-Reply-To: References: <36UojYkDr9uWmWb_n6ilASrGqOGDuRDsGbfclma5fKQ=.546479d7-96e0-493c-918e-1860de1200e9@github.com> <5V3WB9R1kvj6MFOU8rt8_XeiMJy4UHmS-DvSHxwiwGE=.b7514b9c-e875-4621-a325-a068b7754358@github.com> Message-ID: On Mon, 13 May 2024 11:05:47 GMT, Johan Sj?len wrote: >> Oh, another thing, maybe rename this to something else. `Metadata` has a clear meaning in hotspot. Maybe something like RegionData? > > Hi Thomas, what naming scheme should I use for the empty static RegionData? `EmptyRegionData` or `empty_regiondata`? It can't be `constexpr`, but we can make it `const`. If its scoped to RegionData, I would just call it "empty". `RegionData::empty` is clear enough. If its global scope, I'd call it empty_regiondata. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18289#discussion_r1598423629 From eastigeevich at openjdk.org Mon May 13 13:11:15 2024 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Mon, 13 May 2024 13:11:15 GMT Subject: RFR: 8332111: [BACKOUT] A way to align already compiled methods with compiler directives Message-ID: Backout of [JDK-8309271](https://bugs.openjdk.org/browse/JDK-8309271) which has known bugs, possible bugs and performance issues. Found bugs: - When refreshing `CompilerDirectivesAddDCmd::execute` will call `DirectivesStack::hasMatchingDirectives(mh, true)` which only considers the compiler directive which is on the top of the directives stack. As more than one directive can be added, `CompilerDirectivesAddDCmd::execute` will not behave as expected. - A Java method with old directives might be in the compilation queue. A request to recompile it with new directives will be ignored. There are other concerns: bugs and performance issues. Possible bugs: - `has_matching_directives` might not be cleared. A nmethod might get into the unloading state before `CodeCache::recompile_marked_directives_matches`. If the nmethod has been used to mark a Java method and it is the only nmethod, there will be no nmethod in CodeCache to reach the Java method to clear the mark. - A Java method might have been compiled with new directives before `CodeCache::recompile_marked_directives_matches`. `CodeCache::recompile_marked_directives_matches` will recompile it again. - JIT compiler might be compiling a Java method with old directives. A request to recompile it with new directives will be ignored. Performance issues: - Usually directives are updated for a small number of Java methods. If CodeCache has thousands of nmethods, `CodeCache::recompile_marked_directives_matches` will be traversing nmethods most of which don't need recompilation. The backout is not clean because of removal of `CompiledMethod`. Tested with release and fastdebug builds: tier1 and tier2 passed. ------------- Commit messages: - 8332111: [BACKOUT] A way to align already compiled methods with compiler directives Changes: https://git.openjdk.org/jdk/pull/19215/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19215&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8332111 Stats: 380 lines in 15 files changed: 3 ins; 347 del; 30 mod Patch: https://git.openjdk.org/jdk/pull/19215.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19215/head:pull/19215 PR: https://git.openjdk.org/jdk/pull/19215 From shade at openjdk.org Mon May 13 13:21:05 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 13 May 2024 13:21:05 GMT Subject: RFR: 8332111: [BACKOUT] A way to align already compiled methods with compiler directives In-Reply-To: References: Message-ID: <_Hf9ur_fzBA6MysoCZHn7KAjJwC0ubP8v4SKBvethOw=.63d58c21-c8ef-4b5a-b878-7fd330e0d654@github.com> On Mon, 13 May 2024 13:03:26 GMT, Evgeny Astigeevich wrote: > Backout of [JDK-8309271](https://bugs.openjdk.org/browse/JDK-8309271) which has known bugs, possible bugs and performance issues. > > Found bugs: > - When refreshing `CompilerDirectivesAddDCmd::execute` will call `DirectivesStack::hasMatchingDirectives(mh, true)` which only considers the compiler directive which is on the top of the directives stack. As more than one directive can be added, `CompilerDirectivesAddDCmd::execute` will not behave as expected. > - A Java method with old directives might be in the compilation queue. A request to recompile it with new directives will be ignored. > > There are other concerns: bugs and performance issues. > > Possible bugs: > - `has_matching_directives` might not be cleared. A nmethod might get into the unloading state before `CodeCache::recompile_marked_directives_matches`. If the nmethod has been used to mark a Java method and it is the only nmethod, there will be no nmethod in CodeCache to reach the Java method to clear the mark. > - A Java method might have been compiled with new directives before `CodeCache::recompile_marked_directives_matches`. `CodeCache::recompile_marked_directives_matches` will recompile it again. > - JIT compiler might be compiling a Java method with old directives. A request to recompile it with new directives will be ignored. > > Performance issues: > - Usually directives are updated for a small number of Java methods. If CodeCache has thousands of nmethods, `CodeCache::recompile_marked_directives_matches` will be traversing nmethods most of which don't need recompilation. > > The backout is not clean because of removal of `CompiledMethod`. > > Tested with release and fastdebug builds: tier1 and tier2 passed. The reversal looks fine. ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19215#pullrequestreview-2052683089 From erikj at openjdk.org Mon May 13 13:23:11 2024 From: erikj at openjdk.org (Erik Joelsson) Date: Mon, 13 May 2024 13:23:11 GMT Subject: RFR: 8331671: Implement JEP 472: Prepare to Restrict the Use of JNI [v3] In-Reply-To: References: Message-ID: On Mon, 13 May 2024 11:47:38 GMT, Maurizio Cimadamore wrote: >> This PR implements [JEP 472](https://openjdk.org/jeps/472), by restricting the use of JNI in the following ways: >> >> * `System::load` and `System::loadLibrary` are now restricted methods >> * `Runtime::load` and `Runtime::loadLibrary` are now restricted methods >> * binding a JNI `native` method declaration to a native implementation is now considered a restricted operation >> >> This PR slightly changes the way in which the JDK deals with restricted methods, even for FFM API calls. In Java 22, the single `--enable-native-access` was used both to specify a set of modules for which native access should be allowed *and* to specify whether illegal native access (that is, native access occurring from a module not specified by `--enable-native-access`) should be treated as an error or a warning. More specifically, an error is only issued if the `--enable-native-access flag` is used at least once. >> >> Here, a new flag is introduced, namely `illegal-native-access=allow/warn/deny`, which is used to specify what should happen when access to a restricted method and/or functionality is found outside the set of modules specified with `--enable-native-access`. The default policy is `warn`, but users can select `allow` to suppress the warnings, or `deny` to cause `IllegalCallerException` to be thrown. This aligns the treatment of restricted methods with other mechanisms, such as `--illegal-access` and the more recent `--sun-misc-unsafe-memory-access`. >> >> Some changes were required in the package-info javadoc for `java.lang.foreign`, to reflect the changes in the command line flags described above. > > Maurizio Cimadamore has updated the pull request incrementally with three additional commits since the last revision: > > - Fix another typo > - Fix typo > - Add more comments Build changes look good. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19213#issuecomment-2107563120 From weijun at openjdk.org Mon May 13 13:48:23 2024 From: weijun at openjdk.org (Weijun Wang) Date: Mon, 13 May 2024 13:48:23 GMT Subject: RFR: 8331671: Implement JEP 472: Prepare to Restrict the Use of JNI [v3] In-Reply-To: References: Message-ID: On Mon, 13 May 2024 11:47:38 GMT, Maurizio Cimadamore wrote: >> This PR implements [JEP 472](https://openjdk.org/jeps/472), by restricting the use of JNI in the following ways: >> >> * `System::load` and `System::loadLibrary` are now restricted methods >> * `Runtime::load` and `Runtime::loadLibrary` are now restricted methods >> * binding a JNI `native` method declaration to a native implementation is now considered a restricted operation >> >> This PR slightly changes the way in which the JDK deals with restricted methods, even for FFM API calls. In Java 22, the single `--enable-native-access` was used both to specify a set of modules for which native access should be allowed *and* to specify whether illegal native access (that is, native access occurring from a module not specified by `--enable-native-access`) should be treated as an error or a warning. More specifically, an error is only issued if the `--enable-native-access flag` is used at least once. >> >> Here, a new flag is introduced, namely `illegal-native-access=allow/warn/deny`, which is used to specify what should happen when access to a restricted method and/or functionality is found outside the set of modules specified with `--enable-native-access`. The default policy is `warn`, but users can select `allow` to suppress the warnings, or `deny` to cause `IllegalCallerException` to be thrown. This aligns the treatment of restricted methods with other mechanisms, such as `--illegal-access` and the more recent `--sun-misc-unsafe-memory-access`. >> >> Some changes were required in the package-info javadoc for `java.lang.foreign`, to reflect the changes in the command line flags described above. > > Maurizio Cimadamore has updated the pull request incrementally with three additional commits since the last revision: > > - Fix another typo > - Fix typo > - Add more comments security changes (`java.security.jgss`, `jdk.crypto.cryptoki`, `jdk.crypto.mscapi`, and `jdk.security.auth`) look good. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19213#issuecomment-2107621474 From dchuyko at openjdk.org Mon May 13 13:55:10 2024 From: dchuyko at openjdk.org (Dmitry Chuyko) Date: Mon, 13 May 2024 13:55:10 GMT Subject: RFR: 8332111: [BACKOUT] A way to align already compiled methods with compiler directives In-Reply-To: References: Message-ID: On Mon, 13 May 2024 13:03:26 GMT, Evgeny Astigeevich wrote: > Backout of [JDK-8309271](https://bugs.openjdk.org/browse/JDK-8309271) which has known bugs, possible bugs and performance issues. REDO work is tracked by [JDK-8331749](https://bugs.openjdk.org/browse/JDK-8331749). > > Found bugs: > - When refreshing `CompilerDirectivesAddDCmd::execute` will call `DirectivesStack::hasMatchingDirectives(mh, true)` which only considers the compiler directive which is on the top of the directives stack. As more than one directive can be added, `CompilerDirectivesAddDCmd::execute` will not behave as expected. > - A Java method with old directives might be in the compilation queue. A request to recompile it with new directives will be ignored. > > There are other concerns: bugs and performance issues. > > Possible bugs: > - `has_matching_directives` might not be cleared. A nmethod might get into the unloading state before `CodeCache::recompile_marked_directives_matches`. If the nmethod has been used to mark a Java method and it is the only nmethod, there will be no nmethod in CodeCache to reach the Java method to clear the mark. > - A Java method might have been compiled with new directives before `CodeCache::recompile_marked_directives_matches`. `CodeCache::recompile_marked_directives_matches` will recompile it again. > - JIT compiler might be compiling a Java method with old directives. A request to recompile it with new directives will be ignored. > > Performance issues: > - Usually directives are updated for a small number of Java methods. If CodeCache has thousands of nmethods, `CodeCache::recompile_marked_directives_matches` will be traversing nmethods most of which don't need recompilation. > > The backout is not clean because of removal of `CompiledMethod`. > > Tested with release and fastdebug builds: tier1 and tier2 passed. Are there any high severity problems caused by the original PR? Especially not in the new functionality. Minor issues could be probably addressed without backing out the entire functionality. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19215#issuecomment-2107638223 From dfuchs at openjdk.org Mon May 13 14:18:11 2024 From: dfuchs at openjdk.org (Daniel Fuchs) Date: Mon, 13 May 2024 14:18:11 GMT Subject: RFR: 8331671: Implement JEP 472: Prepare to Restrict the Use of JNI [v3] In-Reply-To: References: Message-ID: On Mon, 13 May 2024 11:47:38 GMT, Maurizio Cimadamore wrote: >> This PR implements [JEP 472](https://openjdk.org/jeps/472), by restricting the use of JNI in the following ways: >> >> * `System::load` and `System::loadLibrary` are now restricted methods >> * `Runtime::load` and `Runtime::loadLibrary` are now restricted methods >> * binding a JNI `native` method declaration to a native implementation is now considered a restricted operation >> >> This PR slightly changes the way in which the JDK deals with restricted methods, even for FFM API calls. In Java 22, the single `--enable-native-access` was used both to specify a set of modules for which native access should be allowed *and* to specify whether illegal native access (that is, native access occurring from a module not specified by `--enable-native-access`) should be treated as an error or a warning. More specifically, an error is only issued if the `--enable-native-access flag` is used at least once. >> >> Here, a new flag is introduced, namely `illegal-native-access=allow/warn/deny`, which is used to specify what should happen when access to a restricted method and/or functionality is found outside the set of modules specified with `--enable-native-access`. The default policy is `warn`, but users can select `allow` to suppress the warnings, or `deny` to cause `IllegalCallerException` to be thrown. This aligns the treatment of restricted methods with other mechanisms, such as `--illegal-access` and the more recent `--sun-misc-unsafe-memory-access`. >> >> Some changes were required in the package-info javadoc for `java.lang.foreign`, to reflect the changes in the command line flags described above. > > Maurizio Cimadamore has updated the pull request incrementally with three additional commits since the last revision: > > - Fix another typo > - Fix typo > - Add more comments Changes to jdk.net and jdk.sctp look ok. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19213#issuecomment-2107695217 From jsjolen at openjdk.org Mon May 13 14:18:39 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Mon, 13 May 2024 14:18:39 GMT Subject: RFR: 8312132: Add tracking of multiple address spaces in NMT [v77] In-Reply-To: References: Message-ID: > Hi, > > This PR introduces a new abstraction to NMT, named `MemoryFileTracker`. Today, NMT does not track any memory outside of the virtual memory address space. This means that if you allocated memory in something such as a memory-backed file and use `mmap` to map into that memory, then you'll have trouble reporting this to NMT. This is the situation that ZGC is in, and that is what this patch attempts to fix. > > ## `MemoryFileTracker` > > The `MemoryFileTracker` adds the ability of adding new virtual memory address spaces to NMT and committing memory to these, the basic API is: > > ```c++ > static MemoryFile* make_device(const char* descriptive_name); > static void free_device(MemoryFile* device); > > static void allocate_memory(MemoryFile* device, size_t offset, size_t size, > MEMFLAGS flag, const NativeCallStack& stack); > static void free_memory(MemoryFile* device, size_t offset, size_t size); > > > It is easiest to see how this is used by looking at what ZGC's `ZNMT` class does: > > ```c++ > void ZNMT::reserve(zaddress_unsafe start, size_t size) { > MemTracker::record_virtual_memory_reserve((address)start, size, CALLER_PC, mtJavaHeap); > } > void ZNMT::commit(zoffset offset, size_t size) { > MemTracker::allocate_memory_in(ZNMT::_device, static_cast(offset), size, mtJavaHeap, CALLER_PC); > } > void ZNMT::uncommit(zoffset offset, size_t size) { > MemTracker::free_memory_in(ZNMT::_device, (size_t)offset, size); > } > > void ZNMT::map(zaddress_unsafe addr, size_t size, zoffset offset) { > // NMT doesn't track mappings at the moment. > } > void ZNMT::unmap(zaddress_unsafe addr, size_t size) { > // NMT doesn't track mappings at the moment. > } > > > As you can see, any mapping between reserved regions and device-allocated memory is not recorded in NMT. This means that in detailed mode you only get reserved regions printed for the reserved memory, the device-allocated memory is reported separately. When performing summary reporting any memory allocated via these devices is added to the corresponding `MEMFLAGS` as `committed` memory. > > This patch is also acting as a base on which we deploy multiple new backend ideas to NMT. These ideas are: > > 1. Implement VMA tracking using a balanced binary tree approach. Today's `VirtualMemoryTracker`'s usage of linked lists is slow and brittle, we'd like to move away from it. Our Treap-based approach in this patch gives a performance boost such that we see 25x better performance in a benchmark. The idea and draft of this... Johan Sj?len has updated the pull request incrementally with five additional commits since the last revision: - Fix comparison in find - Fix the assertions - Missing refactoring - Tests for treap - Make allocator concrete, and use this to create tests for the treap ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18289/files - new: https://git.openjdk.org/jdk/pull/18289/files/6e30331a..327094bc Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18289&range=76 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18289&range=75-76 Stats: 184 lines in 3 files changed: 156 ins; 4 del; 24 mod Patch: https://git.openjdk.org/jdk/pull/18289.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18289/head:pull/18289 PR: https://git.openjdk.org/jdk/pull/18289 From eastigeevich at openjdk.org Mon May 13 14:24:18 2024 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Mon, 13 May 2024 14:24:18 GMT Subject: RFR: 8332111: [BACKOUT] A way to align already compiled methods with compiler directives In-Reply-To: References: Message-ID: On Mon, 13 May 2024 13:52:17 GMT, Dmitry Chuyko wrote: > Are there any high severity problems caused by the original PR? Especially not in the new functionality. Minor issues could be probably addressed without backing out the entire functionality. Yes, there are: > 1. Usually directives are updated for a small number of Java methods. If CodeCache has thousands of nmethods, CodeCache::recompile_marked_directives_matches will be traversing nmethods most of which don't need recompilation. > 2. has_matching_directives might not be cleared. > 3. A Java method is not recompiled as requested. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19215#issuecomment-2107720199 From stuefe at openjdk.org Mon May 13 14:31:10 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Mon, 13 May 2024 14:31:10 GMT Subject: RFR: 8332042: Move MEMFLAGS to its own include file [v2] In-Reply-To: References: Message-ID: On Mon, 13 May 2024 10:16:36 GMT, Stefan Karlsson wrote: >> Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: >> >> Update mallocLimit.hpp > > src/hotspot/share/nmt/memflags.hpp line 30: > >> 28: #include "utilities/globalDefinitions.hpp" >> 29: >> 30: #define MEMORY_TYPES_DO(f) \ > > Open-ended comment/question: We call it MEMORY_TYPE and mt, but then we call the type MEMFLAGS (with a completely non-standard UPPERCASE style). Maybe it is time to rename MEMFLAGS? I don't feel like starting that particular bike shedding discussion :) But sure, sometime in the future we should do this. Here, I want it to be a simple renaming change. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19172#discussion_r1598580049 From stuefe at openjdk.org Mon May 13 14:34:04 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Mon, 13 May 2024 14:34:04 GMT Subject: RFR: 8332042: Move MEMFLAGS to its own include file [v2] In-Reply-To: References: Message-ID: On Mon, 13 May 2024 10:13:50 GMT, Stefan Karlsson wrote: >> Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: >> >> Update mallocLimit.hpp > > src/hotspot/share/nmt/memflags.cpp line 31: > >> 29: >> 30: // Extra insurance that MEMFLAGS truly has the same size as uint8_t. >> 31: STATIC_ASSERT(sizeof(MEMFLAGS) == sizeof(uint8_t)); > > I think you can remove this entire .cpp file. There's no need to check the size of an enum with a specified base type. I rather have this explicit check. If MEMFLAGS>1byte, things break, and I would like to make that explicit. That said, I can move this static assert to the header. I just wanted to avoid including debug.hpp. My original intent was for this cpp file to be the place in the future for any MEMFLAGS related utility functions, e.g. to-and-from-string conversations. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19172#discussion_r1598583814 From dchuyko at openjdk.org Mon May 13 14:37:03 2024 From: dchuyko at openjdk.org (Dmitry Chuyko) Date: Mon, 13 May 2024 14:37:03 GMT Subject: RFR: 8332111: [BACKOUT] A way to align already compiled methods with compiler directives In-Reply-To: References: Message-ID: <7kfgb5FXqda4SzqPO2XUXdx6CM_Z-G970nSpqvJVSYw=.b6b01073-66af-4c7d-8d7c-528a4f87707d@github.com> On Mon, 13 May 2024 14:21:35 GMT, Evgeny Astigeevich wrote: > > Are there any high severity problems caused by the original PR? Especially not in the new functionality. Minor issues could be probably addressed without backing out the entire functionality. > > > > Yes, there are: > > > > > 1. Usually directives are updated for a small number of Java methods. If CodeCache has thousands of nmethods, CodeCache::recompile_marked_directives_matches will be traversing nmethods most of which don't need recompilation. > > > 2. has_matching_directives might not be cleared. > > > 3. A Java method is not recompiled as requested. > > So there are cases when new functionality doesn't work as expected (I don't see any other users impacted). Why not file bugs for those cases and estimate their impact? ------------- PR Comment: https://git.openjdk.org/jdk/pull/19215#issuecomment-2107777980 From eastigeevich at openjdk.org Mon May 13 14:45:02 2024 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Mon, 13 May 2024 14:45:02 GMT Subject: RFR: 8332111: [BACKOUT] A way to align already compiled methods with compiler directives In-Reply-To: <7kfgb5FXqda4SzqPO2XUXdx6CM_Z-G970nSpqvJVSYw=.b6b01073-66af-4c7d-8d7c-528a4f87707d@github.com> References: <7kfgb5FXqda4SzqPO2XUXdx6CM_Z-G970nSpqvJVSYw=.b6b01073-66af-4c7d-8d7c-528a4f87707d@github.com> Message-ID: On Mon, 13 May 2024 14:34:50 GMT, Dmitry Chuyko wrote: > So there are cases when new functionality doesn't work as expected (I don't see any other users impacted). Why not file bugs for those cases and estimate their impact? Do you know any users using the new functionality? ------------- PR Comment: https://git.openjdk.org/jdk/pull/19215#issuecomment-2107799744 From eastigeevich at openjdk.org Mon May 13 14:45:03 2024 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Mon, 13 May 2024 14:45:03 GMT Subject: RFR: 8332111: [BACKOUT] A way to align already compiled methods with compiler directives In-Reply-To: References: Message-ID: On Mon, 13 May 2024 13:03:26 GMT, Evgeny Astigeevich wrote: > Backout of [JDK-8309271](https://bugs.openjdk.org/browse/JDK-8309271) which has known bugs, possible bugs and performance issues. REDO work is tracked by [JDK-8331749](https://bugs.openjdk.org/browse/JDK-8331749). > > Found bugs: > - When refreshing `CompilerDirectivesAddDCmd::execute` will call `DirectivesStack::hasMatchingDirectives(mh, true)` which only considers the compiler directive which is on the top of the directives stack. As more than one directive can be added, `CompilerDirectivesAddDCmd::execute` will not behave as expected. > - A Java method with old directives might be in the compilation queue. A request to recompile it with new directives will be ignored. > > There are other concerns: bugs and performance issues. > > Possible bugs: > - `has_matching_directives` might not be cleared. A nmethod might get into the unloading state before `CodeCache::recompile_marked_directives_matches`. If the nmethod has been used to mark a Java method and it is the only nmethod, there will be no nmethod in CodeCache to reach the Java method to clear the mark. > - A Java method might have been compiled with new directives before `CodeCache::recompile_marked_directives_matches`. `CodeCache::recompile_marked_directives_matches` will recompile it again. > - JIT compiler might be compiling a Java method with old directives. A request to recompile it with new directives will be ignored. > > Performance issues: > - Usually directives are updated for a small number of Java methods. If CodeCache has thousands of nmethods, `CodeCache::recompile_marked_directives_matches` will be traversing nmethods most of which don't need recompilation. > > The backout is not clean because of removal of `CompiledMethod`. > > Tested with release and fastdebug builds: tier1 and tier2 passed. IMO if nobody uses it and the amount of code is small, it is better to back out it and to reimplement it. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19215#issuecomment-2107809381 From stefank at openjdk.org Mon May 13 14:54:05 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Mon, 13 May 2024 14:54:05 GMT Subject: RFR: 8332042: Move MEMFLAGS to its own include file [v2] In-Reply-To: References: Message-ID: On Mon, 13 May 2024 14:31:22 GMT, Thomas Stuefe wrote: >> src/hotspot/share/nmt/memflags.cpp line 31: >> >>> 29: >>> 30: // Extra insurance that MEMFLAGS truly has the same size as uint8_t. >>> 31: STATIC_ASSERT(sizeof(MEMFLAGS) == sizeof(uint8_t)); >> >> I think you can remove this entire .cpp file. There's no need to check the size of an enum with a specified base type. > > I rather have this explicit check. If MEMFLAGS>1byte, things break, and I would like to make that explicit. > > That said, I can move this static assert to the header. I just wanted to avoid including debug.hpp. My original intent was for this cpp file to be the place in the future for any MEMFLAGS related utility functions, e.g. to-and-from-string conversations. Could you instead put the static_assert near the code that will break? Right now it looks obscure and weird to have this check when it is obviously correct as long as no one changes the definition. Would it be enough to write a comment in the header that this needs to be 1 byte? >> src/hotspot/share/nmt/memflags.hpp line 30: >> >>> 28: #include "utilities/globalDefinitions.hpp" >>> 29: >>> 30: #define MEMORY_TYPES_DO(f) \ >> >> Open-ended comment/question: We call it MEMORY_TYPE and mt, but then we call the type MEMFLAGS (with a completely non-standard UPPERCASE style). Maybe it is time to rename MEMFLAGS? > > I don't feel like starting that particular bike shedding discussion :) But sure, sometime in the future we should do this. Here, I want it to be a simple renaming change. Right. That's why I prefixed this with "Open-ended comment/question", trying to make it super clear that it wasn't intended as a request for this PR, but rather a way to at least plant the seed of an idea that we might want to fix this eyesore. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19172#discussion_r1598603277 PR Review Comment: https://git.openjdk.org/jdk/pull/19172#discussion_r1598608110 From dcubed at openjdk.org Mon May 13 15:14:10 2024 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Mon, 13 May 2024 15:14:10 GMT Subject: RFR: 8332066: AArch64: Math test failures since JDK-8331558 In-Reply-To: References: Message-ID: On Fri, 10 May 2024 13:12:27 GMT, Andrew Haley wrote: > Revert "8331558: AArch64: optimize integer remainder" > This reverts commit dab92c51c70767abcda3b1a91dd7d1a9b40290c1. [JDK-8332066](https://bugs.openjdk.org/browse/JDK-8332066) should have been renamed to: [BACKOUT] AArch64: optimize integer remainder or [BACKOUT] JDK-8331558 AArch64: optimize integer remainder I believe that this scenario is covered by: Alternative 2 - an investigation issue was created (I), and during the investigation backing out the change is identified as the best solution. Use the investigation issue (I) for the backout. Change summary of (I) to the same as (O) and prefix with [BACKOUT]. Move and change type of (I) to become a sub-task of (R). ------------- PR Comment: https://git.openjdk.org/jdk/pull/19177#issuecomment-2107934786 From dcubed at openjdk.org Mon May 13 15:30:07 2024 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Mon, 13 May 2024 15:30:07 GMT Subject: RFR: 8332042: Move MEMFLAGS to its own include file [v2] In-Reply-To: References: Message-ID: <3E6BcBm0RVHLAGmuNjoYoeNs5-JLcYT5KgKVfmcxYAc=.4537bffa-ea3c-43f2-bc13-16710444b355@github.com> On Mon, 13 May 2024 14:44:05 GMT, Stefan Karlsson wrote: >> I rather have this explicit check. If MEMFLAGS>1byte, things break, and I would like to make that explicit. >> >> That said, I can move this static assert to the header. I just wanted to avoid including debug.hpp. My original intent was for this cpp file to be the place in the future for any MEMFLAGS related utility functions, e.g. to-and-from-string conversations. > > Could you instead put the static_assert near the code that will break? Right now it looks obscure and weird to have this check when it is obviously correct as long as no one changes the definition. Would it be enough to write a comment in the header that this needs to be 1 byte? To quote @robehn - Why write a comment for a rule if you can enforce it with code instead... ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19172#discussion_r1598665179 From alanb at openjdk.org Mon May 13 15:35:08 2024 From: alanb at openjdk.org (Alan Bateman) Date: Mon, 13 May 2024 15:35:08 GMT Subject: RFR: 8331671: Implement JEP 472: Prepare to Restrict the Use of JNI [v3] In-Reply-To: References: Message-ID: On Mon, 13 May 2024 11:47:38 GMT, Maurizio Cimadamore wrote: >> This PR implements [JEP 472](https://openjdk.org/jeps/472), by restricting the use of JNI in the following ways: >> >> * `System::load` and `System::loadLibrary` are now restricted methods >> * `Runtime::load` and `Runtime::loadLibrary` are now restricted methods >> * binding a JNI `native` method declaration to a native implementation is now considered a restricted operation >> >> This PR slightly changes the way in which the JDK deals with restricted methods, even for FFM API calls. In Java 22, the single `--enable-native-access` was used both to specify a set of modules for which native access should be allowed *and* to specify whether illegal native access (that is, native access occurring from a module not specified by `--enable-native-access`) should be treated as an error or a warning. More specifically, an error is only issued if the `--enable-native-access flag` is used at least once. >> >> Here, a new flag is introduced, namely `illegal-native-access=allow/warn/deny`, which is used to specify what should happen when access to a restricted method and/or functionality is found outside the set of modules specified with `--enable-native-access`. The default policy is `warn`, but users can select `allow` to suppress the warnings, or `deny` to cause `IllegalCallerException` to be thrown. This aligns the treatment of restricted methods with other mechanisms, such as `--illegal-access` and the more recent `--sun-misc-unsafe-memory-access`. >> >> Some changes were required in the package-info javadoc for `java.lang.foreign`, to reflect the changes in the command line flags described above. > > Maurizio Cimadamore has updated the pull request incrementally with three additional commits since the last revision: > > - Fix another typo > - Fix typo > - Add more comments src/hotspot/share/runtime/arguments.cpp line 2271: > 2269: } else if (match_option(option, "--illegal-native-access=", &tail)) { > 2270: if (!create_module_property("jdk.module.illegal.native.access", tail, InternalProperty)) { > 2271: return JNI_ENOMEM; I think it would be helpful to get guidance on if this is the right way to add this system property, only because this one not a "module property". The configuration (WriteableProperty + InternalProperty) look right. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19213#discussion_r1598673962 From stefank at openjdk.org Mon May 13 15:39:17 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Mon, 13 May 2024 15:39:17 GMT Subject: RFR: 8332042: Move MEMFLAGS to its own include file [v2] In-Reply-To: <3E6BcBm0RVHLAGmuNjoYoeNs5-JLcYT5KgKVfmcxYAc=.4537bffa-ea3c-43f2-bc13-16710444b355@github.com> References: <3E6BcBm0RVHLAGmuNjoYoeNs5-JLcYT5KgKVfmcxYAc=.4537bffa-ea3c-43f2-bc13-16710444b355@github.com> Message-ID: <6qHBMgu-ZgoMPPym-sSJJR9szoVattW-JuHeXaX4JY0=.05710a47-19fe-48ba-920c-218c16473fbb@github.com> On Mon, 13 May 2024 15:26:18 GMT, Daniel D. Daugherty wrote: >> Could you instead put the static_assert near the code that will break? Right now it looks obscure and weird to have this check when it is obviously correct as long as no one changes the definition. Would it be enough to write a comment in the header that this needs to be 1 byte? > > To quote @robehn - Why write a comment for a rule if you can enforce it with code instead... I tend to agree with that. My earlier question still stands is there a better place to put it? Right now the "enforced" code in a stand-alone file doesn't tell me "why" this is important. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19172#discussion_r1598679057 From stefank at openjdk.org Mon May 13 15:51:11 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Mon, 13 May 2024 15:51:11 GMT Subject: RFR: 8332042: Move MEMFLAGS to its own include file [v2] In-Reply-To: <6qHBMgu-ZgoMPPym-sSJJR9szoVattW-JuHeXaX4JY0=.05710a47-19fe-48ba-920c-218c16473fbb@github.com> References: <3E6BcBm0RVHLAGmuNjoYoeNs5-JLcYT5KgKVfmcxYAc=.4537bffa-ea3c-43f2-bc13-16710444b355@github.com> <6qHBMgu-ZgoMPPym-sSJJR9szoVattW-JuHeXaX4JY0=.05710a47-19fe-48ba-920c-218c16473fbb@github.com> Message-ID: On Mon, 13 May 2024 15:36:13 GMT, Stefan Karlsson wrote: >> To quote @robehn - Why write a comment for a rule if you can enforce it with code instead... > > I tend to agree with that. My earlier question still stands: is there a better place to put it? Right now the "enforced with code" in a stand-alone file doesn't tell me "why" this rule is important. If you want to keep the static_assert it in the .cpp file, then I won't block that. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19172#discussion_r1598695748 From kevinw at openjdk.org Mon May 13 15:54:07 2024 From: kevinw at openjdk.org (Kevin Walls) Date: Mon, 13 May 2024 15:54:07 GMT Subject: RFR: 8330755: ProblemList files have entries referring to non-existent tests [v2] In-Reply-To: References: <5vZvc83Zn4IhI5s_IdYqRqw4zjWF93TcQUzl2cD5JLU=.12464c13-9ccc-47d8-851e-883f3fea4a04@github.com> Message-ID: On Wed, 24 Apr 2024 10:50:44 GMT, Doug Simon wrote: >> This PR adds a check for the format of ProblemList files and ensures they only have entries referring to existing tests. >> >> The cleanups in the second commit of this PR were done based on the output of `CheckProblemLists`: >> >>> make test TEST=build/problemLists/CheckProblemLists.java >> ... >> STDOUT: >> Checking /Users/dnsimon/dev/jdk-jdk/open/test/hotspot/jtreg/ProblemList-Virtual.txt >> Checking /Users/dnsimon/dev/jdk-jdk/open/test/hotspot/jtreg/ProblemList-Xcomp.txt >> Checking /Users/dnsimon/dev/jdk-jdk/open/test/hotspot/jtreg/ProblemList-generational-zgc.txt >> Checking /Users/dnsimon/dev/jdk-jdk/open/test/hotspot/jtreg/ProblemList-zgc.txt >> Checking /Users/dnsimon/dev/jdk-jdk/open/test/hotspot/jtreg/ProblemList.txt >> Checking /Users/dnsimon/dev/jdk-jdk/open/test/jaxp/ProblemList.txt >> Checking /Users/dnsimon/dev/jdk-jdk/open/test/jdk/ProblemList-Virtual.txt >> Checking /Users/dnsimon/dev/jdk-jdk/open/test/jdk/ProblemList-Xcomp.txt >> Checking /Users/dnsimon/dev/jdk-jdk/open/test/jdk/ProblemList-generational-zgc.txt >> Checking /Users/dnsimon/dev/jdk-jdk/open/test/jdk/ProblemList-zgc.txt >> Checking /Users/dnsimon/dev/jdk-jdk/open/test/jdk/ProblemList.txt >> Checking /Users/dnsimon/dev/jdk-jdk/open/test/langtools/ProblemList.txt >> Checking /Users/dnsimon/dev/jdk-jdk/open/test/lib-test/ProblemList.txt >> Checked 13 problem list files >> Test roots: >> /Users/dnsimon/dev/jdk-jdk/open/test/jdk >> /Users/dnsimon/dev/jdk-jdk/open/test/lib-test >> /Users/dnsimon/dev/jdk-jdk/open/test/failure_handler/test >> /Users/dnsimon/dev/jdk-jdk/open/test/jaxp >> /Users/dnsimon/dev/jdk-jdk/open/test/langtools >> /Users/dnsimon/dev/jdk-jdk/open/test/hotspot/jtreg >> Following errors found: >> /Users/dnsimon/dev/jdk-jdk/open/test/hotspot/jtreg/ProblemList.txt:174: vmTestbase/gc/lock/jni/jnilock002/TestDescription.java does not exist under any test root >> vmTestbase/gc/lock/jni/jnilock002/TestDescription.java 8192647 generic-all >> >> /Users/dnsimon/dev/jdk-jdk/open/test/jdk/ProblemList-Virtual.txt:77: TestAndIssue[test=java/util/Properties/StoreReproducibilityTest.java, issueId=0000000] duplicates /Users/dnsimon/dev/jdk-jdk/open/test/jdk/ProblemList-Virtual.txt:76 >> java/util/Properties/StoreReproducibilityTest.java 0000000 generic-all >> >> /Users/dnsimon/dev/jdk-jdk/open/test/jdk/ProblemList.txt:516: java/lang/management/MemoryMXBean/PendingAllGC.sh does not exist under any test root >> java/lang/management/MemoryMXBean/PendingAllGC.sh 8158837 generic-all >> >> ... > > Doug Simon has updated the pull request incrementally with one additional commit since the last revision: > > removed CheckProblemLists.java The tidyup looks good! I don't understand that this is titled as JDK-8330755 but that's already integrated. So this needs to be done in a separate JBS entry and if the suggested CheckProblemLists.java is not going to be in it, we remove that from the description. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18879#issuecomment-2108075635 From lmesnik at openjdk.org Mon May 13 15:56:17 2024 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Mon, 13 May 2024 15:56:17 GMT Subject: RFR: 8332112: Update nsk.share.Log to don't be Finalizable In-Reply-To: <08rpfgXgTS5RvsqbnwgKdUKo3ADDDGuieSJclVz7leg=.28cded8e-3d2e-4fab-92f6-be89f7ddc6ce@github.com> References: <08rpfgXgTS5RvsqbnwgKdUKo3ADDDGuieSJclVz7leg=.28cded8e-3d2e-4fab-92f6-be89f7ddc6ce@github.com> Message-ID: On Sun, 12 May 2024 21:34:41 GMT, Leonid Mesnik wrote: > The nsk.share.Log doing some cleanup and reporting errors in the cleanup method. This method is supposed to be executed by finalizer originally. However, now it is called only during shutdown hook. > The cleanup using Cleaner doesn't work. See https://bugs.openjdk.org/browse/JDK-8330760 > > The cleanup() method flush stream and print summary which should be already printed by complain method. > > This cleanup is not necessary and printing summary usually is just disabled. It is enabled if the test called 'complain' method. However, the error should have been printed already in this method. > > So it would be simple to remove this cleanup and reduce usage of Finalizable in vmTestbase tests. > > Note: The 'verboseOnErrorEnabled' is just not used. > > See isVerboseOnErrorEnabled. > > public boolean isVerboseOnErrorEnabled() { > return errorsSummaryEnabled; > } > > > Tested with by running tests with different combinations (tier4-7) and tier1. Every log (as any Finalazible object) is registered using registerCleanup() https://github.com/openjdk/jdk/blob/master/test/hotspot/jtreg/vmTestbase/nsk/share/Finalizable.java#L59 This function add object to FinalizerThread stack . This stack is processed and method cleanup is called for each object during shutdown. See https://github.com/openjdk/jdk/blob/master/test/hotspot/jtreg/vmTestbase/nsk/share/Finalizer.java#L105 for adding hook and https://github.com/openjdk/jdk/blob/master/test/hotspot/jtreg/vmTestbase/nsk/share/Finalizer.java#L118 for processing methods. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19209#issuecomment-2108082162 From lmesnik at openjdk.org Mon May 13 16:00:14 2024 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Mon, 13 May 2024 16:00:14 GMT Subject: RFR: 8332112: Update nsk.share.Log to don't be Finalizable In-Reply-To: <08rpfgXgTS5RvsqbnwgKdUKo3ADDDGuieSJclVz7leg=.28cded8e-3d2e-4fab-92f6-be89f7ddc6ce@github.com> References: <08rpfgXgTS5RvsqbnwgKdUKo3ADDDGuieSJclVz7leg=.28cded8e-3d2e-4fab-92f6-be89f7ddc6ce@github.com> Message-ID: On Sun, 12 May 2024 21:34:41 GMT, Leonid Mesnik wrote: > The nsk.share.Log doing some cleanup and reporting errors in the cleanup method. This method is supposed to be executed by finalizer originally. However, now it is called only during shutdown hook. > The cleanup using Cleaner doesn't work. See https://bugs.openjdk.org/browse/JDK-8330760 > > The cleanup() method flush stream and print summary which should be already printed by complain method. > > This cleanup is not necessary and printing summary usually is just disabled. It is enabled if the test called 'complain' method. However, the error should have been printed already in this method. > > So it would be simple to remove this cleanup and reduce usage of Finalizable in vmTestbase tests. > > Note: The 'verboseOnErrorEnabled' is just not used. > > See isVerboseOnErrorEnabled. > > public boolean isVerboseOnErrorEnabled() { > return errorsSummaryEnabled; > } > > > Tested with by running tests with different combinations (tier4-7) and tier1. Please, not that shutdown hook is not compatible with jtreg agentvm execution. Really, it is not the recommended to use System.exit() and do something after main() in jtreg. Even in main/othervm mode jtreg call class using some wrapper. This worked differently in previous in tonga and need to be adopted. So the plan is to remove such cleanup as much as possible. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19209#issuecomment-2108091553 From ascarpino at openjdk.org Mon May 13 16:22:09 2024 From: ascarpino at openjdk.org (Anthony Scarpino) Date: Mon, 13 May 2024 16:22:09 GMT Subject: RFR: 8329538: Accelerate P256 on x86_64 using Montgomery intrinsic [v9] In-Reply-To: References: Message-ID: On Fri, 10 May 2024 00:19:32 GMT, Volodymyr Paprotski wrote: >> Performance. Before: >> >> Benchmark (algorithm) (dataSize) (keyLength) (provider) Mode Cnt Score Error Units >> SignatureBench.ECDSA.sign SHA256withECDSA 1024 256 thrpt 3 6443.934 ? 6.491 ops/s >> SignatureBench.ECDSA.sign SHA256withECDSA 16384 256 thrpt 3 6152.979 ? 4.954 ops/s >> SignatureBench.ECDSA.verify SHA256withECDSA 1024 256 thrpt 3 1895.410 ? 36.979 ops/s >> SignatureBench.ECDSA.verify SHA256withECDSA 16384 256 thrpt 3 1878.955 ? 45.487 ops/s >> Benchmark (algorithm) (keyLength) (kpgAlgorithm) (provider) Mode Cnt Score Error Units >> o.o.b.j.c.full.KeyAgreementBench.EC.generateSecret ECDH 256 EC thrpt 3 1357.810 ? 26.584 ops/s >> o.o.b.j.c.small.KeyAgreementBench.EC.generateSecret ECDH 256 EC thrpt 3 1352.119 ? 23.547 ops/s >> Benchmark (isMontBench) Mode Cnt Score Error Units >> PolynomialP256Bench.benchMultiply false thrpt 3 1746.126 ? 10.970 ops/s >> >> Performance, no intrinsic: >> >> Benchmark (algorithm) (dataSize) (keyLength) (provider) Mode Cnt Score Error Units >> SignatureBench.ECDSA.sign SHA256withECDSA 1024 256 thrpt 3 6529.839 ? 42.420 ops/s >> SignatureBench.ECDSA.sign SHA256withECDSA 16384 256 thrpt 3 6199.747 ? 133.566 ops/s >> SignatureBench.ECDSA.verify SHA256withECDSA 1024 256 thrpt 3 1973.676 ? 54.071 ops/s >> SignatureBench.ECDSA.verify SHA256withECDSA 16384 256 thrpt 3 1932.127 ? 35.920 ops/s >> Benchmark (algorithm) (keyLength) (kpgAlgorithm) (provider) Mode Cnt Score Error Units >> o.o.b.j.c.full.KeyAgreementBench.EC.generateSecret ECDH 256 EC thrpt 3 1355.788 ? 29.858 ops/s >> o.o.b.j.c.small.KeyAgreementBench.EC.generateSecret ECDH 256 EC thrpt 3 1346.523 ? 28.722 ops/s >> Benchmark (isMontBench) Mode Cnt Score Error Units >> PolynomialP256Bench.benchMultiply true thrpt 3 1919.57... > > Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision: > > whitespace The changes look good and have passed testing ------------- Marked as reviewed by ascarpino (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18583#pullrequestreview-2053158639 From kvn at openjdk.org Mon May 13 16:32:15 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 13 May 2024 16:32:15 GMT Subject: RFR: 8332111: [BACKOUT] A way to align already compiled methods with compiler directives In-Reply-To: References: Message-ID: On Mon, 13 May 2024 14:42:26 GMT, Evgeny Astigeevich wrote: >> Backout of [JDK-8309271](https://bugs.openjdk.org/browse/JDK-8309271) which has known bugs, possible bugs and performance issues. REDO work is tracked by [JDK-8331749](https://bugs.openjdk.org/browse/JDK-8331749). >> >> Found bugs: >> - When refreshing `CompilerDirectivesAddDCmd::execute` will call `DirectivesStack::hasMatchingDirectives(mh, true)` which only considers the compiler directive which is on the top of the directives stack. As more than one directive can be added, `CompilerDirectivesAddDCmd::execute` will not behave as expected. >> - A Java method with old directives might be in the compilation queue. A request to recompile it with new directives will be ignored. >> >> There are other concerns: bugs and performance issues. >> >> Possible bugs: >> - `has_matching_directives` might not be cleared. A nmethod might get into the unloading state before `CodeCache::recompile_marked_directives_matches`. If the nmethod has been used to mark a Java method and it is the only nmethod, there will be no nmethod in CodeCache to reach the Java method to clear the mark. >> - A Java method might have been compiled with new directives before `CodeCache::recompile_marked_directives_matches`. `CodeCache::recompile_marked_directives_matches` will recompile it again. >> - JIT compiler might be compiling a Java method with old directives. A request to recompile it with new directives will be ignored. >> >> Performance issues: >> - Usually directives are updated for a small number of Java methods. If CodeCache has thousands of nmethods, `CodeCache::recompile_marked_directives_matches` will be traversing nmethods most of which don't need recompilation. >> >> The backout is not clean because of removal of `CompiledMethod`. >> >> Tested with release and fastdebug builds: tier1 and tier2 passed. > > IMO if nobody uses it and the amount of code is small, it is better to back out it and to reimplement it. @eastig do you have tests which shows issues you listed in description? I don't see any reference to them in this sub-task and in [REDO] bug [JDK-8331749](https://bugs.openjdk.org/browse/JDK-8331749). How you found these issues? ------------- PR Comment: https://git.openjdk.org/jdk/pull/19215#issuecomment-2108154151 From never at openjdk.org Mon May 13 17:14:05 2024 From: never at openjdk.org (Tom Rodriguez) Date: Mon, 13 May 2024 17:14:05 GMT Subject: RFR: 8326957: Implement JEP 474: ZGC: Generational Mode by Default [v5] In-Reply-To: References: Message-ID: On Mon, 6 May 2024 07:23:14 GMT, Axel Boldt-Christmas wrote: >> This is the implementation task for `JEP 474: ZGC: Generational Mode by Default`. See the JEP for details. [JDK-8326667](https://bugs.openjdk.org/browse/JDK-8326667) > > Axel Boldt-Christmas has updated the pull request incrementally with one additional commit since the last revision: > > Default to non generational ZGC with JVMCI Sorry I didn't reply sooner. That sounds like a fine interim solution. I agree that automatically selecting something different can be very confusing. Anyway, hopefully this kind of problem will be a transient one. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18393#issuecomment-2108274487 From stefank at openjdk.org Mon May 13 17:20:14 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Mon, 13 May 2024 17:20:14 GMT Subject: RFR: 8326957: Implement JEP 474: ZGC: Generational Mode by Default [v5] In-Reply-To: References: Message-ID: On Mon, 6 May 2024 07:23:14 GMT, Axel Boldt-Christmas wrote: >> This is the implementation task for `JEP 474: ZGC: Generational Mode by Default`. See the JEP for details. [JDK-8326667](https://bugs.openjdk.org/browse/JDK-8326667) > > Axel Boldt-Christmas has updated the pull request incrementally with one additional commit since the last revision: > > Default to non generational ZGC with JVMCI Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/18393#issuecomment-2108303189 From cjplummer at openjdk.org Mon May 13 17:30:05 2024 From: cjplummer at openjdk.org (Chris Plummer) Date: Mon, 13 May 2024 17:30:05 GMT Subject: RFR: 8330694: Rename 'HeapRegion' to 'G1HeapRegion' [v9] In-Reply-To: References: <3IdWn9VGEERd8v9RcH2E_LzjVo0L8nMfi5jGWmhgVuM=.6b5b3be4-bfbd-4376-9580-48d78d75665c@github.com> <7Aud9EX-Q09Bx3MmZjM182gBp9sDmbvIt7rSmtBa1FM=.cc43a81c-7431-484d-9eae-295da93c9a52@github.com> <3x1oThcCfOj6FR0ZJoH5ipYkrHTFAzrgJXm69Tggb8k=.83dba355-787a-4f05-a721-df5aee8fd810@github.com> Message-ID: On Sat, 11 May 2024 11:12:44 GMT, Lei Zaakjyu wrote: > > I noticed that the HeapRegionManager and HeapRegionClosure classes were not renamed (in the hotspot source). Is this intentional or an oversite? > > OK, I will do all the SA part here. However, I do think that the other classes named 'HeapRegion*' in the hotspot source should be dealt with in follow-up PRs. I think there was a misunderstanding here. I was not asking you to rename these in SA. I was asking why they were not renamed in hotspot. If you want to do them in a follow-up then that is ok, but both hotspot and SA should be done together. So that means either undoing the most recent SA change or applying the same rename change to hotspot. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18871#issuecomment-2108342113 From duke at openjdk.org Mon May 13 17:34:13 2024 From: duke at openjdk.org (duke) Date: Mon, 13 May 2024 17:34:13 GMT Subject: Withdrawn: 8325932: Replace ATTRIBUTE_NORETURN with direct [[noreturn]] In-Reply-To: References: Message-ID: On Thu, 15 Feb 2024 09:10:51 GMT, Julian Waters wrote: > With clang 13 being the minimum required JDK-8325878, the noreturn bug that requires the ATTRIBUTE_NORETURN workaround now vanishes, and we can use [[noreturn]] directly within HotSpot. We should remove the workaround as soon as possible, given the chance This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/17868 From cjplummer at openjdk.org Mon May 13 17:35:08 2024 From: cjplummer at openjdk.org (Chris Plummer) Date: Mon, 13 May 2024 17:35:08 GMT Subject: RFR: 8330694: Rename 'HeapRegion' to 'G1HeapRegion' [v10] In-Reply-To: References: <3IdWn9VGEERd8v9RcH2E_LzjVo0L8nMfi5jGWmhgVuM=.6b5b3be4-bfbd-4376-9580-48d78d75665c@github.com> Message-ID: On Sun, 12 May 2024 03:04:45 GMT, Lei Zaakjyu wrote: > Should we also rename 'HeapRegionType' to 'G1HeapRegionType', then rename the current 'G1HeapRegionType' to 'G1 HeapRegionTypeEnum'? For this PR the SA renames should match the hotspot renames. It looks like you have not renamed this in hotspot yet so it should not be renamed in SA either. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18871#issuecomment-2108362277 From dnsimon at openjdk.org Mon May 13 17:42:15 2024 From: dnsimon at openjdk.org (Doug Simon) Date: Mon, 13 May 2024 17:42:15 GMT Subject: RFR: 8326957: Implement JEP 474: ZGC: Generational Mode by Default [v5] In-Reply-To: References: Message-ID: <4YJzxY5ls_dBK_4FvPh-fKw7yr0aXQbKneMIQu8N53k=.f2f6fc2c-dbef-41ff-85af-cc57a421a980@github.com> On Mon, 6 May 2024 07:23:14 GMT, Axel Boldt-Christmas wrote: >> This is the implementation task for `JEP 474: ZGC: Generational Mode by Default`. See the JEP for details. [JDK-8326667](https://bugs.openjdk.org/browse/JDK-8326667) > > Axel Boldt-Christmas has updated the pull request incrementally with one additional commit since the last revision: > > Default to non generational ZGC with JVMCI After discussing with Tom offline, here's one more change you could make in this PR: * Change the warning in `check_jvmci_supported_gc` to a hard error when an unsupported GC + compiler combination is requested. Either way, as soon we have Gen ZGC implemented for Graal, we will: 1. Update `gc_supports_jvmci`. 2. Update `check_jvmci_supported_gc` to no longer warn and switch compilers but to exit the VM when an unsupported combination is requested. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18393#issuecomment-2108387245 From kbarrett at openjdk.org Mon May 13 18:30:04 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Mon, 13 May 2024 18:30:04 GMT Subject: RFR: 8332042: Move MEMFLAGS to its own include file [v2] In-Reply-To: References: Message-ID: <4m9MF6p8mpiKijVsc-mg0IDBQE9lLvPZQjqzZeV1kQo=.97de93cf-8f20-406a-886b-a7c98bd3ccb1@github.com> On Mon, 13 May 2024 04:55:24 GMT, Thomas Stuefe wrote: >> MEMFLAGS, as well as its enum constants, should live in its own include. >> >> The constants are used throughout the code base, often without needing the allocation APIs exposed through allocation.hpp. >> >> The MEMFLAGS enum def is often needed within NMT itself, again often without needing allocation.hpp. >> >> --- >> >> This patch moves the enum to its new file. >> >> It fixes those `allocation.hpp` includes that where only needed to get MEMFLAGS. It does not fix other includes. >> >> For backward compatibility, until we straightened out the dependencies (e.g., fixing all places where we rely on indirect includes), I added memflags.hpp to allocation.hpp. >> >> I tested (built) on: >> - MacOS aarch64, no precompiled headers, fastdebug >> - Linux x64, no precompiled headers, fastdebug, release, fastdebug crossbuild to aarch64, fastdebug minimal > > Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: > > Update mallocLimit.hpp Looks good, subject to addressing the minor issues already reported by others. Leaving unapproved, but don't wait for my approval once you have others. ------------- PR Review: https://git.openjdk.org/jdk/pull/19172#pullrequestreview-2053415755 From wkemper at openjdk.org Mon May 13 18:40:13 2024 From: wkemper at openjdk.org (William Kemper) Date: Mon, 13 May 2024 18:40:13 GMT Subject: RFR: 8332082: Shenandoah: Use SATB active flag for C2 pre-write barrier on x86 and PPC In-Reply-To: References: Message-ID: On Fri, 10 May 2024 16:13:51 GMT, William Kemper wrote: > This is consistent with c1 and other platforms. Both the gc-state and satb-active flags are synchronized to thread local fields on a safepoint. I expect the cost for the barrier to access either of them is similar. However, there is some additional c2 code that looks like it recognizes the barrier IR based on the presence of nodes to load the gc state: (for example, `ShenandoahBarrierSetC2::is_shenandoah_marking_if`). ------------- PR Comment: https://git.openjdk.org/jdk/pull/19180#issuecomment-2108547726 From wkemper at openjdk.org Mon May 13 18:45:05 2024 From: wkemper at openjdk.org (William Kemper) Date: Mon, 13 May 2024 18:45:05 GMT Subject: RFR: 8332082: Shenandoah: Use SATB active flag for C2 pre-write barrier on x86 and PPC In-Reply-To: References: Message-ID: <77VGGeS1lFsQ5_Qp5jZkGIqSw5dDgmTTJtH5QmoA4eo=.8d9d1728-0e62-4a31-a109-4268c916416a@github.com> On Fri, 10 May 2024 16:13:51 GMT, William Kemper wrote: > This is consistent with c1 and other platforms. Also, as pointed out in https://github.com/openjdk/jdk/pull/18148, `ShenandoahBarrierSetC2::verify_gc_barriers` is looking for the IR nodes that check `ShenandoahThreadLocalData::satb_mark_queue_active_offset`, so inconsistencies abound. I've never seen `verify_gc_barriers` raise an `assert` , even though it is `trueInDebug`. Is `verify_gc_barriers` broken? ------------- PR Comment: https://git.openjdk.org/jdk/pull/19180#issuecomment-2108557318 From kbarrett at openjdk.org Mon May 13 18:54:09 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Mon, 13 May 2024 18:54:09 GMT Subject: RFR: 8325932: Replace ATTRIBUTE_NORETURN with direct [[noreturn]] In-Reply-To: References: Message-ID: On Thu, 15 Feb 2024 09:10:51 GMT, Julian Waters wrote: > With clang 13 being the minimum required JDK-8325878, the noreturn bug that requires the ATTRIBUTE_NORETURN workaround now vanishes, and we can use [[noreturn]] directly within HotSpot. We should remove the workaround as soon as possible, given the chance This seems to have gotten lost while waiting for JDK-8325878. Please re-open and update. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17868#issuecomment-2108570742 From dnsimon at openjdk.org Mon May 13 19:40:22 2024 From: dnsimon at openjdk.org (Doug Simon) Date: Mon, 13 May 2024 19:40:22 GMT Subject: RFR: 8330755: ProblemList files have entries referring to non-existent tests [v2] In-Reply-To: References: <5vZvc83Zn4IhI5s_IdYqRqw4zjWF93TcQUzl2cD5JLU=.12464c13-9ccc-47d8-851e-883f3fea4a04@github.com> Message-ID: On Wed, 24 Apr 2024 10:50:44 GMT, Doug Simon wrote: >> This PR adds a check for the format of ProblemList files and ensures they only have entries referring to existing tests. >> >> The cleanups in the second commit of this PR were done based on the output of `CheckProblemLists`: >> >>> make test TEST=build/problemLists/CheckProblemLists.java >> ... >> STDOUT: >> Checking /Users/dnsimon/dev/jdk-jdk/open/test/hotspot/jtreg/ProblemList-Virtual.txt >> Checking /Users/dnsimon/dev/jdk-jdk/open/test/hotspot/jtreg/ProblemList-Xcomp.txt >> Checking /Users/dnsimon/dev/jdk-jdk/open/test/hotspot/jtreg/ProblemList-generational-zgc.txt >> Checking /Users/dnsimon/dev/jdk-jdk/open/test/hotspot/jtreg/ProblemList-zgc.txt >> Checking /Users/dnsimon/dev/jdk-jdk/open/test/hotspot/jtreg/ProblemList.txt >> Checking /Users/dnsimon/dev/jdk-jdk/open/test/jaxp/ProblemList.txt >> Checking /Users/dnsimon/dev/jdk-jdk/open/test/jdk/ProblemList-Virtual.txt >> Checking /Users/dnsimon/dev/jdk-jdk/open/test/jdk/ProblemList-Xcomp.txt >> Checking /Users/dnsimon/dev/jdk-jdk/open/test/jdk/ProblemList-generational-zgc.txt >> Checking /Users/dnsimon/dev/jdk-jdk/open/test/jdk/ProblemList-zgc.txt >> Checking /Users/dnsimon/dev/jdk-jdk/open/test/jdk/ProblemList.txt >> Checking /Users/dnsimon/dev/jdk-jdk/open/test/langtools/ProblemList.txt >> Checking /Users/dnsimon/dev/jdk-jdk/open/test/lib-test/ProblemList.txt >> Checked 13 problem list files >> Test roots: >> /Users/dnsimon/dev/jdk-jdk/open/test/jdk >> /Users/dnsimon/dev/jdk-jdk/open/test/lib-test >> /Users/dnsimon/dev/jdk-jdk/open/test/failure_handler/test >> /Users/dnsimon/dev/jdk-jdk/open/test/jaxp >> /Users/dnsimon/dev/jdk-jdk/open/test/langtools >> /Users/dnsimon/dev/jdk-jdk/open/test/hotspot/jtreg >> Following errors found: >> /Users/dnsimon/dev/jdk-jdk/open/test/hotspot/jtreg/ProblemList.txt:174: vmTestbase/gc/lock/jni/jnilock002/TestDescription.java does not exist under any test root >> vmTestbase/gc/lock/jni/jnilock002/TestDescription.java 8192647 generic-all >> >> /Users/dnsimon/dev/jdk-jdk/open/test/jdk/ProblemList-Virtual.txt:77: TestAndIssue[test=java/util/Properties/StoreReproducibilityTest.java, issueId=0000000] duplicates /Users/dnsimon/dev/jdk-jdk/open/test/jdk/ProblemList-Virtual.txt:76 >> java/util/Properties/StoreReproducibilityTest.java 0000000 generic-all >> >> /Users/dnsimon/dev/jdk-jdk/open/test/jdk/ProblemList.txt:516: java/lang/management/MemoryMXBean/PendingAllGC.sh does not exist under any test root >> java/lang/management/MemoryMXBean/PendingAllGC.sh 8158837 generic-all >> >> ... > > Doug Simon has updated the pull request incrementally with one additional commit since the last revision: > > removed CheckProblemLists.java The issue was resolved when I merged the PR to clean up the closed problem lists. I'll just close this PR and leave it as documentation for future open ProblemList cleanup if someone wants to take it on. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18879#issuecomment-2108657466 From dnsimon at openjdk.org Mon May 13 19:40:22 2024 From: dnsimon at openjdk.org (Doug Simon) Date: Mon, 13 May 2024 19:40:22 GMT Subject: Withdrawn: 8330755: ProblemList files have entries referring to non-existent tests In-Reply-To: <5vZvc83Zn4IhI5s_IdYqRqw4zjWF93TcQUzl2cD5JLU=.12464c13-9ccc-47d8-851e-883f3fea4a04@github.com> References: <5vZvc83Zn4IhI5s_IdYqRqw4zjWF93TcQUzl2cD5JLU=.12464c13-9ccc-47d8-851e-883f3fea4a04@github.com> Message-ID: On Sun, 21 Apr 2024 22:00:52 GMT, Doug Simon wrote: > This PR adds a check for the format of ProblemList files and ensures they only have entries referring to existing tests. > > The cleanups in the second commit of this PR were done based on the output of `CheckProblemLists`: > >> make test TEST=build/problemLists/CheckProblemLists.java > ... > STDOUT: > Checking /Users/dnsimon/dev/jdk-jdk/open/test/hotspot/jtreg/ProblemList-Virtual.txt > Checking /Users/dnsimon/dev/jdk-jdk/open/test/hotspot/jtreg/ProblemList-Xcomp.txt > Checking /Users/dnsimon/dev/jdk-jdk/open/test/hotspot/jtreg/ProblemList-generational-zgc.txt > Checking /Users/dnsimon/dev/jdk-jdk/open/test/hotspot/jtreg/ProblemList-zgc.txt > Checking /Users/dnsimon/dev/jdk-jdk/open/test/hotspot/jtreg/ProblemList.txt > Checking /Users/dnsimon/dev/jdk-jdk/open/test/jaxp/ProblemList.txt > Checking /Users/dnsimon/dev/jdk-jdk/open/test/jdk/ProblemList-Virtual.txt > Checking /Users/dnsimon/dev/jdk-jdk/open/test/jdk/ProblemList-Xcomp.txt > Checking /Users/dnsimon/dev/jdk-jdk/open/test/jdk/ProblemList-generational-zgc.txt > Checking /Users/dnsimon/dev/jdk-jdk/open/test/jdk/ProblemList-zgc.txt > Checking /Users/dnsimon/dev/jdk-jdk/open/test/jdk/ProblemList.txt > Checking /Users/dnsimon/dev/jdk-jdk/open/test/langtools/ProblemList.txt > Checking /Users/dnsimon/dev/jdk-jdk/open/test/lib-test/ProblemList.txt > Checked 13 problem list files > Test roots: > /Users/dnsimon/dev/jdk-jdk/open/test/jdk > /Users/dnsimon/dev/jdk-jdk/open/test/lib-test > /Users/dnsimon/dev/jdk-jdk/open/test/failure_handler/test > /Users/dnsimon/dev/jdk-jdk/open/test/jaxp > /Users/dnsimon/dev/jdk-jdk/open/test/langtools > /Users/dnsimon/dev/jdk-jdk/open/test/hotspot/jtreg > Following errors found: > /Users/dnsimon/dev/jdk-jdk/open/test/hotspot/jtreg/ProblemList.txt:174: vmTestbase/gc/lock/jni/jnilock002/TestDescription.java does not exist under any test root > vmTestbase/gc/lock/jni/jnilock002/TestDescription.java 8192647 generic-all > > /Users/dnsimon/dev/jdk-jdk/open/test/jdk/ProblemList-Virtual.txt:77: TestAndIssue[test=java/util/Properties/StoreReproducibilityTest.java, issueId=0000000] duplicates /Users/dnsimon/dev/jdk-jdk/open/test/jdk/ProblemList-Virtual.txt:76 > java/util/Properties/StoreReproducibilityTest.java 0000000 generic-all > > /Users/dnsimon/dev/jdk-jdk/open/test/jdk/ProblemList.txt:516: java/lang/management/MemoryMXBean/PendingAllGC.sh does not exist under any test root > java/lang/management/MemoryMXBean/PendingAllGC.sh 8158837 generic-all > > /Users/dnsimon/dev/jdk-jdk/open/test/jdk/ProblemList.txt:667: javax/swing/JFi... This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/18879 From eastigeevich at openjdk.org Mon May 13 20:37:40 2024 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Mon, 13 May 2024 20:37:40 GMT Subject: RFR: 8332111: [BACKOUT] A way to align already compiled methods with compiler directives In-Reply-To: References: Message-ID: On Mon, 13 May 2024 16:29:35 GMT, Vladimir Kozlov wrote: > do you have tests which shows issues you listed in description? Here is a jtreg test: - `refresh_control.02.txt` [ { match: "serviceability.dcmd.compiler.DirectivesRefreshTest::callable", c2: { PrintOptoAssembly: true } } ] - `DirectivesRefreshTest02.java` /** * @test DirectivesRefreshTest02 * @summary Test of forced recompile after compiler directives changes by diagnostic command * @requires vm.compiler1.enabled & vm.compiler2.enabled * @library /test/lib / * @modules java.base/jdk.internal.misc * * @build jdk.test.whitebox.WhiteBox * @run driver jdk.test.lib.helpers.ClassFileInstaller jdk.test.whitebox.WhiteBox * * @run main/othervm -Xbootclasspath/a:. -XX:+UnlockDiagnosticVMOptions -XX:+WhiteBoxAPI * -XX:+BackgroundCompilation -Xlog:codecache=trace -XX:-Inline -XX:+TieredCompilation -XX:CICompilerCount=2 * -XX:+UnlockDiagnosticVMOptions * serviceability.dcmd.compiler.DirectivesRefreshTest02 */ package serviceability.dcmd.compiler; import jdk.test.whitebox.WhiteBox; import jdk.test.lib.process.OutputAnalyzer; import jdk.test.lib.dcmd.CommandExecutor; import jdk.test.lib.dcmd.JMXExecutor; import java.nio.file.Path; import java.nio.file.Paths; import java.lang.reflect.Method; import java.util.Random; import static jdk.test.lib.Asserts.assertEQ; import static compiler.whitebox.CompilerWhiteBoxTest.COMP_LEVEL_NONE; import static compiler.whitebox.CompilerWhiteBoxTest.COMP_LEVEL_SIMPLE; import static compiler.whitebox.CompilerWhiteBoxTest.COMP_LEVEL_FULL_OPTIMIZATION; public class DirectivesRefreshTest02 { static Path cmdPath = Paths.get(System.getProperty("test.src", "."), "refresh_control.02.txt"); static WhiteBox wb = WhiteBox.getWhiteBox(); static Random random = new Random(); static Method method; static CommandExecutor executor; static int callable() { int result = 0; for (int i = 0; i < 100; i++) { result += random.nextInt(100); } return result; } static void setup() throws Exception { method = DirectivesRefreshTest.class.getDeclaredMethod("callable"); executor = new JMXExecutor(); wb.enqueueMethodForCompilation(method, COMP_LEVEL_SIMPLE); while (wb.isMethodQueuedForCompilation(method)) { Thread.onSpinWait(); } wb.lockCompilation(); boolean r = wb.enqueueMethodForCompilation(method, COMP_LEVEL_FULL_OPTIMIZATION); System.out.println("Method enqueued: " + r); } static void testDirectivesAddRefresh() { var output = executor.execute("Compiler.directives_add -r " + cmdPath.toString()); output.stderrShouldBeEmpty().shouldContain("1 compiler directives added"); System.out.println("Method enqueued: " + wb.isMethodQueuedForCompilation(method)); wb.unlockCompilation(); wb.enqueueMethodForCompilation(method, COMP_LEVEL_FULL_OPTIMIZATION); while (wb.isMethodQueuedForCompilation(method)) { Thread.onSpinWait(); } System.out.println("Method compilation level: " + wb.getMethodCompilationLevel(method)); assertEQ(true, false, "Stop here"); } public static void main(String[] args) throws Exception { setup(); testDirectivesAddRefresh(); } } ------------- PR Comment: https://git.openjdk.org/jdk/pull/19215#issuecomment-2108744800 From eastigeevich at openjdk.org Mon May 13 20:40:49 2024 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Mon, 13 May 2024 20:40:49 GMT Subject: RFR: 8332111: [BACKOUT] A way to align already compiled methods with compiler directives In-Reply-To: References: Message-ID: On Mon, 13 May 2024 13:03:26 GMT, Evgeny Astigeevich wrote: > Backout of [JDK-8309271](https://bugs.openjdk.org/browse/JDK-8309271) which has known bugs, possible bugs and performance issues. REDO work is tracked by [JDK-8331749](https://bugs.openjdk.org/browse/JDK-8331749). > > Found bugs: > - When refreshing `CompilerDirectivesAddDCmd::execute` will call `DirectivesStack::hasMatchingDirectives(mh, true)` which only considers the compiler directive which is on the top of the directives stack. As more than one directive can be added, `CompilerDirectivesAddDCmd::execute` will not behave as expected. > - A Java method with old directives might be in the compilation queue. A request to recompile it with new directives will be ignored. > > There are other concerns: bugs and performance issues. > > Possible bugs: > - `has_matching_directives` might not be cleared. A nmethod might get into the unloading state before `CodeCache::recompile_marked_directives_matches`. If the nmethod has been used to mark a Java method and it is the only nmethod, there will be no nmethod in CodeCache to reach the Java method to clear the mark. > - A Java method might have been compiled with new directives before `CodeCache::recompile_marked_directives_matches`. `CodeCache::recompile_marked_directives_matches` will recompile it again. > - JIT compiler might be compiling a Java method with old directives. A request to recompile it with new directives will be ignored. > > Performance issues: > - Usually directives are updated for a small number of Java methods. If CodeCache has thousands of nmethods, `CodeCache::recompile_marked_directives_matches` will be traversing nmethods most of which don't need recompilation. > > The backout is not clean because of removal of `CompiledMethod`. > > Tested with release and fastdebug builds: tier1 and tier2 passed. There is no `PrintOptoAssembly` in output. I use `lockCompilation()`/`unlockCompilation()` to simulate: > A Java method with old directives might be in the compilation queue. A request to recompile it with new directives will be ignored. I think using them we can also simulate, though it would not be easy to write a test: > JIT compiler might be compiling a Java method with old directives. A request to recompile it with new directives will be ignored. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19215#issuecomment-2108759073 From eastigeevich at openjdk.org Mon May 13 20:50:01 2024 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Mon, 13 May 2024 20:50:01 GMT Subject: RFR: 8332111: [BACKOUT] A way to align already compiled methods with compiler directives In-Reply-To: References: Message-ID: <43tyZlzDKG1-M3YMBjjSKx2R3OosZuyfQySaBuV_KTc=.45597f64-6ff7-4d83-8416-aa29154d92df@github.com> On Mon, 13 May 2024 16:29:35 GMT, Vladimir Kozlov wrote: > How you found these issues? I've been backporting JDK-8309271 to downstream 17 and 21. As compilations happens in background but a test from JDK-8309271 runs with background compilation off, I asked myself what might happen with background compilation. I have a patch fixing the test above. I don't think it is a complete fix. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19215#issuecomment-2108770472 From eastigeevich at openjdk.org Mon May 13 21:11:02 2024 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Mon, 13 May 2024 21:11:02 GMT Subject: RFR: 8332111: [BACKOUT] A way to align already compiled methods with compiler directives In-Reply-To: References: Message-ID: On Mon, 13 May 2024 13:03:26 GMT, Evgeny Astigeevich wrote: > Backout of [JDK-8309271](https://bugs.openjdk.org/browse/JDK-8309271) which has known bugs, possible bugs and performance issues. REDO work is tracked by [JDK-8331749](https://bugs.openjdk.org/browse/JDK-8331749). > > Found bugs: > - When refreshing `CompilerDirectivesAddDCmd::execute` will call `DirectivesStack::hasMatchingDirectives(mh, true)` which only considers the compiler directive which is on the top of the directives stack. As more than one directive can be added, `CompilerDirectivesAddDCmd::execute` will not behave as expected. > - A Java method with old directives might be in the compilation queue. A request to recompile it with new directives will be ignored. > > There are other concerns: bugs and performance issues. > > Possible bugs: > - `has_matching_directives` might not be cleared. A nmethod might get into the unloading state before `CodeCache::recompile_marked_directives_matches`. If the nmethod has been used to mark a Java method and it is the only nmethod, there will be no nmethod in CodeCache to reach the Java method to clear the mark. > - A Java method might have been compiled with new directives before `CodeCache::recompile_marked_directives_matches`. `CodeCache::recompile_marked_directives_matches` will recompile it again. > - JIT compiler might be compiling a Java method with old directives. A request to recompile it with new directives will be ignored. > > Performance issues: > - Usually directives are updated for a small number of Java methods. If CodeCache has thousands of nmethods, `CodeCache::recompile_marked_directives_matches` will be traversing nmethods most of which don't need recompilation. > > The backout is not clean because of removal of `CompiledMethod`. > > Tested with release and fastdebug builds: tier1 and tier2 passed. What if instead of backing out we will use an experimental JVM flag: `XX:+CompilerDirectivesRefreshSupport`? ------------- PR Comment: https://git.openjdk.org/jdk/pull/19215#issuecomment-2108802569 From kvn at openjdk.org Mon May 13 22:46:02 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 13 May 2024 22:46:02 GMT Subject: RFR: 8332111: [BACKOUT] A way to align already compiled methods with compiler directives In-Reply-To: <43tyZlzDKG1-M3YMBjjSKx2R3OosZuyfQySaBuV_KTc=.45597f64-6ff7-4d83-8416-aa29154d92df@github.com> References: <43tyZlzDKG1-M3YMBjjSKx2R3OosZuyfQySaBuV_KTc=.45597f64-6ff7-4d83-8416-aa29154d92df@github.com> Message-ID: On Mon, 13 May 2024 20:46:06 GMT, Evgeny Astigeevich wrote: > There is a race among a thread updating directives, compiler threads and CodeCache cleaning threads. We don't properly lock the directives stack, the compile queue and CodeCache to manage the race. This is indeed concerning. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19215#issuecomment-2108925371 From kvn at openjdk.org Mon May 13 22:46:03 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 13 May 2024 22:46:03 GMT Subject: RFR: 8332111: [BACKOUT] A way to align already compiled methods with compiler directives In-Reply-To: References: Message-ID: On Mon, 13 May 2024 21:08:08 GMT, Evgeny Astigeevich wrote: > What if instead of backing out we will use an experimental JVM flag: `XX:+CompilerDirectivesRefreshSupport`? I don't think this is correct way to fix the bug. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19215#issuecomment-2108926307 From kvn at openjdk.org Mon May 13 22:52:05 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 13 May 2024 22:52:05 GMT Subject: RFR: 8332111: [BACKOUT] A way to align already compiled methods with compiler directives In-Reply-To: References: Message-ID: On Mon, 13 May 2024 13:03:26 GMT, Evgeny Astigeevich wrote: > Backout of [JDK-8309271](https://bugs.openjdk.org/browse/JDK-8309271) which has known bugs, possible bugs and performance issues. REDO work is tracked by [JDK-8331749](https://bugs.openjdk.org/browse/JDK-8331749). > > Found bugs: > - When refreshing `CompilerDirectivesAddDCmd::execute` will call `DirectivesStack::hasMatchingDirectives(mh, true)` which only considers the compiler directive which is on the top of the directives stack. As more than one directive can be added, `CompilerDirectivesAddDCmd::execute` will not behave as expected. > - A Java method with old directives might be in the compilation queue. A request to recompile it with new directives will be ignored. > > There are other concerns: bugs and performance issues. > > Possible bugs: > - `has_matching_directives` might not be cleared. A nmethod might get into the unloading state before `CodeCache::recompile_marked_directives_matches`. If the nmethod has been used to mark a Java method and it is the only nmethod, there will be no nmethod in CodeCache to reach the Java method to clear the mark. > - A Java method might have been compiled with new directives before `CodeCache::recompile_marked_directives_matches`. `CodeCache::recompile_marked_directives_matches` will recompile it again. > - JIT compiler might be compiling a Java method with old directives. A request to recompile it with new directives will be ignored. > > Performance issues: > - Usually directives are updated for a small number of Java methods. If CodeCache has thousands of nmethods, `CodeCache::recompile_marked_directives_matches` will be traversing nmethods most of which don't need recompilation. > > The backout is not clean because of removal of `CompiledMethod`. > > Tested with release and fastdebug builds: tier1 and tier2 passed. I agree with this backout. Thank you @eastig for explaining your point. We have about 3 weeks before RDP1 and it is better we have less issues before that. Let redo implementation in next release taking into account the issues you found and have more time for testing. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19215#pullrequestreview-2053940066 From ccheung at openjdk.org Mon May 13 23:02:27 2024 From: ccheung at openjdk.org (Calvin Cheung) Date: Mon, 13 May 2024 23:02:27 GMT Subject: RFR: 8330198: Add some class loading related perf counters to measure VM startup [v3] In-Reply-To: References: Message-ID: > Adding a few perf counters related to class loading to measure VM startup. The counters are only active if the user specifies `-Xlog:init` in the command line. A diagnostic flag `ProfileClassLinkage` is added to control the new counters. The flag is set to false by default and will be enabled if `-Xlog:init` is specified. > > This change is already in the leyden/premain branch. There are more counters in the branch to measure other stuff. For now, just upstreaming class loader related counters. > > Refer to the [comment](https://bugs.openjdk.org/browse/JDK-8330198?focusedId=14665311&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14665311) in the bug report for an example output. > > Passed tiers 1 - 4 testing. Calvin Cheung has updated the pull request incrementally with one additional commit since the last revision: comments from Ioi ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18790/files - new: https://git.openjdk.org/jdk/pull/18790/files/3c1e8854..51b86d42 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18790&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18790&range=01-02 Stats: 18 lines in 6 files changed: 0 ins; 6 del; 12 mod Patch: https://git.openjdk.org/jdk/pull/18790.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18790/head:pull/18790 PR: https://git.openjdk.org/jdk/pull/18790 From ccheung at openjdk.org Mon May 13 23:02:30 2024 From: ccheung at openjdk.org (Calvin Cheung) Date: Mon, 13 May 2024 23:02:30 GMT Subject: RFR: 8330198: Add some class loading related perf counters to measure VM startup [v2] In-Reply-To: <7X-7dZ5vX54h9wzSJNIuDDqLxHZ6562nQLK2r9Kv54U=.c67a9f9f-714c-45f9-968b-4178a67f6fdd@github.com> References: <7X-7dZ5vX54h9wzSJNIuDDqLxHZ6562nQLK2r9Kv54U=.c67a9f9f-714c-45f9-968b-4178a67f6fdd@github.com> Message-ID: On Fri, 10 May 2024 22:58:25 GMT, Ioi Lam wrote: >> Calvin Cheung has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits: >> >> - Merge branch 'master' into xloginit-classloading >> - fix build issues on macos-x64 and -aarch64 >> - Merge branch 'master' into xloginit-classloading >> - fix linux-x86 and minimal build issues >> - 8330198: Add some class loading related perf counters to measure VM startup > > src/hotspot/share/runtime/java.cpp line 245: > >> 243: #else >> 244: >> 245: void print_method_invocation_histogram() {} > > Is this change necessary? No, I've removed it. > src/hotspot/share/runtime/perfData.hpp line 420: > >> 418: inline void inc(jlong val) { (*(jlong*)_valuep) += val; } >> 419: inline void dec(jlong val) { inc(-val); } >> 420: inline void reset() { (*(jlong*)_valuep) = 0; } > > This new function doesn't seem to be used. Removed. > src/hotspot/share/runtime/perfData.hpp line 835: > >> 833: public: >> 834: inline PerfTraceTime(PerfLongCounter* timerp, bool is_on = true) : _timerp(timerp) { >> 835: if (!is_on || !UsePerfData) return; > > Instead of having a separate `is_on` parameter, can we check for `timerp == nullptr1` instead? It works. I've pushed another commit. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18790#discussion_r1599184407 PR Review Comment: https://git.openjdk.org/jdk/pull/18790#discussion_r1599184566 PR Review Comment: https://git.openjdk.org/jdk/pull/18790#discussion_r1599184469 From sviswanathan at openjdk.org Mon May 13 23:15:07 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Mon, 13 May 2024 23:15:07 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v19] In-Reply-To: References: Message-ID: On Sat, 4 May 2024 19:35:21 GMT, Scott Gibbons wrote: >> Re-write the IndexOf code without the use of the pcmpestri instruction, only using AVX2 instructions. This change accelerates String.IndexOf on average 1.3x for AVX2. The benchmark numbers: >> >> >> Benchmark Score Latest >> StringIndexOf.advancedWithMediumSub 343.573 317.934 0.925375393x >> StringIndexOf.advancedWithShortSub1 1039.081 1053.96 1.014319384x >> StringIndexOf.advancedWithShortSub2 55.828 110.541 1.980027943x >> StringIndexOf.constantPattern 9.361 11.906 1.271872663x >> StringIndexOf.searchCharLongSuccess 4.216 4.218 1.000474383x >> StringIndexOf.searchCharMediumSuccess 3.133 3.216 1.02649218x >> StringIndexOf.searchCharShortSuccess 3.76 3.761 1.000265957x >> StringIndexOf.success 9.186 9.713 1.057369911x >> StringIndexOf.successBig 14.341 46.343 3.231504079x >> StringIndexOfChar.latin1_AVX2_String 6220.918 12154.52 1.953814533x >> StringIndexOfChar.latin1_AVX2_char 5503.556 5540.044 1.006629895x >> StringIndexOfChar.latin1_SSE4_String 6978.854 6818.689 0.977049957x >> StringIndexOfChar.latin1_SSE4_char 5657.499 5474.624 0.967675646x >> StringIndexOfChar.latin1_Short_String 7132.541 6863.359 0.962260014x >> StringIndexOfChar.latin1_Short_char 16013.389 16162.437 1.009307711x >> StringIndexOfChar.latin1_mixed_String 7386.123 14771.622 1.999915517x >> StringIndexOfChar.latin1_mixed_char 9901.671 9782.245 0.987938803 > > Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: > > Rearrange; add lambdas for clarity src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1054: > 1052: } else if (isUL) { > 1053: __ movzbl(rTmp, Address(needle, 2)); > 1054: __ movdl(byte_1, rTmp); Should be: __ movdl(byte_2, rTmp); src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1056: > 1054: __ movdl(byte_1, rTmp); > 1055: // 1st byte of needle in words > 1056: __ vpbroadcastw(byte_1, byte_1, Assembler::AVX_256bit); Should be: __ vpbroadcastw(byte_2, byte_2, Assembler::AVX_256bit); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1599194092 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1599194375 From duke at openjdk.org Mon May 13 23:54:09 2024 From: duke at openjdk.org (Volodymyr Paprotski) Date: Mon, 13 May 2024 23:54:09 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v19] In-Reply-To: References: Message-ID: On Sat, 4 May 2024 19:35:21 GMT, Scott Gibbons wrote: >> Re-write the IndexOf code without the use of the pcmpestri instruction, only using AVX2 instructions. This change accelerates String.IndexOf on average 1.3x for AVX2. The benchmark numbers: >> >> >> Benchmark Score Latest >> StringIndexOf.advancedWithMediumSub 343.573 317.934 0.925375393x >> StringIndexOf.advancedWithShortSub1 1039.081 1053.96 1.014319384x >> StringIndexOf.advancedWithShortSub2 55.828 110.541 1.980027943x >> StringIndexOf.constantPattern 9.361 11.906 1.271872663x >> StringIndexOf.searchCharLongSuccess 4.216 4.218 1.000474383x >> StringIndexOf.searchCharMediumSuccess 3.133 3.216 1.02649218x >> StringIndexOf.searchCharShortSuccess 3.76 3.761 1.000265957x >> StringIndexOf.success 9.186 9.713 1.057369911x >> StringIndexOf.successBig 14.341 46.343 3.231504079x >> StringIndexOfChar.latin1_AVX2_String 6220.918 12154.52 1.953814533x >> StringIndexOfChar.latin1_AVX2_char 5503.556 5540.044 1.006629895x >> StringIndexOfChar.latin1_SSE4_String 6978.854 6818.689 0.977049957x >> StringIndexOfChar.latin1_SSE4_char 5657.499 5474.624 0.967675646x >> StringIndexOfChar.latin1_Short_String 7132.541 6863.359 0.962260014x >> StringIndexOfChar.latin1_Short_char 16013.389 16162.437 1.009307711x >> StringIndexOfChar.latin1_mixed_String 7386.123 14771.622 1.999915517x >> StringIndexOfChar.latin1_mixed_char 9901.671 9782.245 0.987938803 > > Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: > > Rearrange; add lambdas for clarity src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 4492: > 4490: > 4491: // Compare char[] or byte[] arrays aligned to 4 bytes or substrings. > 4492: void C2_MacroAssembler::arrays_equals(bool is_array_equ, Register ary1, I liked the old style better, fewer longer lines.. same for rest of the changes in this file. src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 4594: > 4592: #endif //_LP64 > 4593: bind(COMPARE_WIDE_VECTORS); > 4594: vmovdqu(vec1, Address(ary1, limit, create a local scale variable instead of ternary operators. Used several times. src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 4250: > 4248: generate_chacha_stubs(); > 4249: > 4250: if ((UseAVX == 2) && EnableX86ECoreOpts && VM_Version::supports_avx2()) { Just `if (EnableX86ECoreOpts)`? src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 391: > 389: } > 390: > 391: __ cmpq(needle_len, isU ? 2 : 1); Can we remove this comparison? i.e. - broadcast first and last character unconditionally (same character). Or - move broadcasts 'down' into individual cases.. There is already specialized code to handle needle of size 1.. This adds extra pathlength. (Will we actually call this intrinsic for needle_size==1? Assume length>=2?) src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1365: > 1363: // Compare first byte of needle to haystack > 1364: vpcmpeq(cmp_0, byte_0, Address(haystack, 0), Assembler::AVX_256bit); > 1365: if (size != (isU ? 2 : 1)) { `if (size != scale)` Though in this case, `elem_size` might hold more meaning. src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1372: > 1370: > 1371: if (bytesToCompare > 2) { > 1372: if (size > (isU ? 4 : 2)) { `if (size > 2*scale)`? src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1373: > 1371: if (bytesToCompare > 2) { > 1372: if (size > (isU ? 4 : 2)) { > 1373: if (doEarlyBailout) { Is there a big perf difference when `doEarlyBailout` is enabled? And/or just for this function? (i.e. removing `doEarlyBailout` in this function will mean less pathlength. Feels like a few extra vpands should be cheap enough.) src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1469: > 1467: > 1468: if (isU && (size & 1)) { > 1469: __ emit_int8(0xcc); This should also be an `assert()` to catch this at compile-time. src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1633: > 1631: if (isU) { > 1632: if ((size & 1) != 0) { > 1633: __ emit_int8(0xcc); Compile-time assert to ensure this code is never called instead? src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1889: > 1887: // r13 = (needle length - 1) > 1888: // r14 = &needle > 1889: // r15 = unused There is quite a bit of redundancy in register usage. Its not incorrect, but looks odd. Not clear if this duplication can easily be removed (or if/why needed). // rbx = &haystack // rdi = &haystack // rdx = &needle // r14 = &needle // rcx = haystack length // rsi = haystack length // r12 = needle length // r13 = (needle length - 1) // r10 = hs_len - needle len // rbp = -1 // rax = unused // r11 = unused // r8 = unused // r9 = unused // r15 = unused (Could this comment be out-of-sync with the code? Looks like only rbx, r14 and temps out of unused registers are used few lines down) src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1950: > 1948: // r13 = (needle length - 1) > 1949: // r14 = &needle > 1950: // r15 = unused Same as for the small case ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1592834449 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1592838385 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1592831339 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1599131482 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1599146451 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1599144855 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1599143784 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1599151000 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1599204083 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1599209564 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1599213635 From sviswanathan at openjdk.org Tue May 14 00:51:08 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Tue, 14 May 2024 00:51:08 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v19] In-Reply-To: References: Message-ID: On Sat, 4 May 2024 19:35:21 GMT, Scott Gibbons wrote: >> Re-write the IndexOf code without the use of the pcmpestri instruction, only using AVX2 instructions. This change accelerates String.IndexOf on average 1.3x for AVX2. The benchmark numbers: >> >> >> Benchmark Score Latest >> StringIndexOf.advancedWithMediumSub 343.573 317.934 0.925375393x >> StringIndexOf.advancedWithShortSub1 1039.081 1053.96 1.014319384x >> StringIndexOf.advancedWithShortSub2 55.828 110.541 1.980027943x >> StringIndexOf.constantPattern 9.361 11.906 1.271872663x >> StringIndexOf.searchCharLongSuccess 4.216 4.218 1.000474383x >> StringIndexOf.searchCharMediumSuccess 3.133 3.216 1.02649218x >> StringIndexOf.searchCharShortSuccess 3.76 3.761 1.000265957x >> StringIndexOf.success 9.186 9.713 1.057369911x >> StringIndexOf.successBig 14.341 46.343 3.231504079x >> StringIndexOfChar.latin1_AVX2_String 6220.918 12154.52 1.953814533x >> StringIndexOfChar.latin1_AVX2_char 5503.556 5540.044 1.006629895x >> StringIndexOfChar.latin1_SSE4_String 6978.854 6818.689 0.977049957x >> StringIndexOfChar.latin1_SSE4_char 5657.499 5474.624 0.967675646x >> StringIndexOfChar.latin1_Short_String 7132.541 6863.359 0.962260014x >> StringIndexOfChar.latin1_Short_char 16013.389 16162.437 1.009307711x >> StringIndexOfChar.latin1_mixed_String 7386.123 14771.622 1.999915517x >> StringIndexOfChar.latin1_mixed_char 9901.671 9782.245 0.987938803 > > Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: > > Rearrange; add lambdas for clarity src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1083: > 1081: // haystack - the address of the first byte of the haystack > 1082: // hsLen - the sizeof the haystack > 1083: // isU - true if argument encoding is either UU or UL We need to list needleLen here as well? src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1096: > 1094: MacroAssembler *_masm) { > 1095: > 1096: assert_different_registers(eq_mask, haystack, needleLen, rTmp, hsLen, r10); r10 kind of stands out here. You could say nMinusK in this assert. The assert following to this one is checking for nMinusK==r10 so that should suffice. BTW, didn't see anything in the code below that needs nMinuxK to be r10. src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1120: > 1118: #define cmp_0 XMM_TMP3 > 1119: #undef cmp_k > 1120: #define cmp_k XMM_TMP4 XMM_TMP4 is not reused so cmp_k could be declared as const. In general limiting undef/define pair only to reused registers would make the review easier. src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1125: > 1123: #undef lastMask > 1124: > 1125: int sizeIncr = isU ? 2 : 1; sizeIncr and scale seems to be same, we could just use one of them in this function. src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1178: > 1176: __ andq(eq_mask, lastMask); > 1177: if (needToSaveRCX) { > 1178: __ movdq(rcx, saveRCX); movdq is an expensive instruction (about 3 cycle). If we have another gpr temporary available here for shiftVal, then we dont need to do save/restore rcx. src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1183: > 1181: > 1182: if (bytesToCompare > 2) { > 1183: if (size > (isU ? 4 : 2)) { this and other usages could be simplified to: size > 2 * scale ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1599201163 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1599203881 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1599211645 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1599202848 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1599242323 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1599228299 From duke at openjdk.org Tue May 14 01:03:09 2024 From: duke at openjdk.org (ExE Boss) Date: Tue, 14 May 2024 01:03:09 GMT Subject: RFR: 8331671: Implement JEP 472: Prepare to Restrict the Use of JNI [v3] In-Reply-To: References: Message-ID: On Mon, 13 May 2024 11:47:38 GMT, Maurizio Cimadamore wrote: >> This PR implements [JEP 472](https://openjdk.org/jeps/472), by restricting the use of JNI in the following ways: >> >> * `System::load` and `System::loadLibrary` are now restricted methods >> * `Runtime::load` and `Runtime::loadLibrary` are now restricted methods >> * binding a JNI `native` method declaration to a native implementation is now considered a restricted operation >> >> This PR slightly changes the way in which the JDK deals with restricted methods, even for FFM API calls. In Java 22, the single `--enable-native-access` was used both to specify a set of modules for which native access should be allowed *and* to specify whether illegal native access (that is, native access occurring from a module not specified by `--enable-native-access`) should be treated as an error or a warning. More specifically, an error is only issued if the `--enable-native-access flag` is used at least once. >> >> Here, a new flag is introduced, namely `illegal-native-access=allow/warn/deny`, which is used to specify what should happen when access to a restricted method and/or functionality is found outside the set of modules specified with `--enable-native-access`. The default policy is `warn`, but users can select `allow` to suppress the warnings, or `deny` to cause `IllegalCallerException` to be thrown. This aligns the treatment of restricted methods with other mechanisms, such as `--illegal-access` and the more recent `--sun-misc-unsafe-memory-access`. >> >> Some changes were required in the package-info javadoc for `java.lang.foreign`, to reflect the changes in the command line flags described above. > > Maurizio Cimadamore has updated the pull request incrementally with three additional commits since the last revision: > > - Fix another typo > - Fix typo > - Add more comments src/hotspot/share/prims/nativeLookup.cpp line 275: > 273: > 274: // Otherwise call static method findNative in ClassLoader > 275: Suggestion: src/hotspot/share/prims/nativeLookup.cpp line 419: > 417: if (entry != nullptr) return entry; > 418: > 419: Suggestion: src/hotspot/share/prims/nativeLookup.cpp line 426: > 424: return nullptr; > 425: } > 426: } Suggestion: } src/java.base/share/classes/java/lang/Module.java line 331: > 329: String modflag = isNamed() ? getName() : "ALL-UNNAMED"; > 330: String caller = currentClass != null ? currentClass.getName() : "code"; > 331: System.err.printf(""" This?message should?probably be?different when?linking native?methods, since otherwise it?ll be: WARNING: A restricted method in foo has been called WARNING: bar has been called by Baz in Baz WARNING: Use --enable-native-access=foo to avoid a warning for callers in this module WARNING: Restricted methods will be blocked in a future release unless native access is enabled when?it?should really?be something?like: WARNING: A JNI native method in foo has been linked WARNING: bar has been linked in Baz WARNING: Use --enable-native-access=foo to avoid a warning for native methods in this module WARNING: Native methods will be blocked in a future release unless native access is enabled ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19213#discussion_r1599248442 PR Review Comment: https://git.openjdk.org/jdk/pull/19213#discussion_r1599248501 PR Review Comment: https://git.openjdk.org/jdk/pull/19213#discussion_r1599248577 PR Review Comment: https://git.openjdk.org/jdk/pull/19213#discussion_r1599253428 From duke at openjdk.org Tue May 14 02:28:06 2024 From: duke at openjdk.org (kuaiwei) Date: Tue, 14 May 2024 02:28:06 GMT Subject: RFR: 8325821: [REDO] use "dmb.ishst+dmb.ishld" for release barrier [v7] In-Reply-To: References: Message-ID: On Fri, 10 May 2024 08:02:30 GMT, Andrew Haley wrote: > > @theRealAph Could you help review this PR? Thanks. > > I think we should go with your original simple patch for now. Trying to make the Assembler do the optimal thing has not turned out to be very easy, and I'm worried it's too much of a maintenance burden. Simply emitting `dmb st; dmb ld` for releasing stores is enough for now. > > Thank you for trying to make this work. I still have in my mind that there might be an easy way to do it, but it's looking unlikely. I also think it has risk to pending instruction. This PR has much discussion on state machine. I will submit a new PR with my origin patch. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18467#issuecomment-2109155105 From fyang at openjdk.org Tue May 14 02:41:03 2024 From: fyang at openjdk.org (Fei Yang) Date: Tue, 14 May 2024 02:41:03 GMT Subject: RFR: 8326306: RISC-V: Re-structure MASM calls and jumps [v12] In-Reply-To: References: Message-ID: On Mon, 13 May 2024 10:20:30 GMT, Robbin Ehn wrote: >> Hi, please consider. >> >> We have code that directly use the asm for call/jumps instead masm. >> Our masm have a bit odd naming, and we don't use 'proper' pseudoinstructions/mnemonics. >> Suggested by [riscv-asm-manual](https://github.com/riscv-non-isa/riscv-asm-manual/tree/master) >> >> j offset jal x0, offset Jump >> jal offset jal x1, offset Jump and link >> jr rs jalr x0, rs, 0 Jump register >> jalr rs jalr x1, rs, 0 Jump and link register >> ret jalr x0, x1, 0 Return from subroutine >> call offset auipc x1, offset[31:12]; jalr x1, x1, offset[11:0] Call far-away subroutine >> tail offset auipc x6, offset[31:12]; jalr x0, x6, offset[11:0] Tail call far-away subroutine >> >> But these can only be implemented like this if you have small enough application. >> The fallback of these is to use GOT (your C compiler should place a copy of GOT every 2G so it's always reachable). >> We don't have GOT, instead we materialize, so there is still differences between these and ours. >> >> This patch: >> - Tries to follow these suggested mappings as good we can. >> - Make sure all jumps/calls go through MASM. (so we get control and can easily change for sites using a certain calling convention) >> - To avoid confusion between MASM public/private methods and ASM methods and the mnemonics there are some renaming. >> E.g. the mnemonics jal means call offset, as we can't use that so there is no 'jal'. >> - I enabled c.j, but right now we never generate it. >> - As always the macro does no good and are legacy from when code base did not use templates. (also the x-macros screws up my IDE (vim+rtags)) >> >> I started down this path due to I have followup patch on top of this which removes trampoline in favor for load-n-jump. >> (WIP: https://github.com/robehn/jdk/compare/jal-fixes...robehn:jdk:load-n-link?expand=1) >> While looking into our calls it was a bit confusing, this helps. >> >> Done a couple of t1-3 slightly different version of this patch, and as part of the followup, no issues found. (VF2, qemu, LP4) >> Re-running tests, had some last minute changes. >> >> Thanks, Robbin > > Robbin Ehn has updated the pull request incrementally with one additional commit since the last revision: > > Use la() instead movptr where ok. So now `MacroAssembler::rt_call` is effectively this after this PR change: void MacroAssembler::rt_call(address dest, Register tmp) { RuntimeAddress target(dest); relocate(target.rspec(), [&] { int32_t offset; la(tmp, target.target(), offset); jalr(tmp, offset); }); } Here are some of my thoughts on the next step/cleanup for reference: 1. For `MacroAssembler::far_call` and `MacroAssembler::far_jump`, I would suggest we use direct `auipc` instead of `la` for them as the destination is expected to be in code cache. This will help distinguish these two functions from `MacroAssembler::rt_call`. 2. Since there are only a few uses of `MacroAssembler::call`, we might want to remove this function replacing its callsites with `MacroAssembler::rt_call`. This would help avoid possible confusion between `call` and `rt_call`. I don't think the relocation added by `rt_call` would make a difference. 3. As we discussed in this PR, we also want to fix Shenandoah related callsites to `call_VM_leaf`. And let `call_VM_leaf` to use `rt_call` so that we emit `auipc` when possible, which should be better in performance. 4. We might want to turn some other places in the code where we do `mv + jalr` into a `rt_call`. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18942#issuecomment-2109162337 From kbarrett at openjdk.org Tue May 14 02:55:05 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Tue, 14 May 2024 02:55:05 GMT Subject: RFR: 8325932: Replace ATTRIBUTE_NORETURN with direct [[noreturn]] In-Reply-To: References: Message-ID: On Thu, 15 Feb 2024 09:10:51 GMT, Julian Waters wrote: > With clang 13 being the minimum required JDK-8325878, the noreturn bug that requires the ATTRIBUTE_NORETURN workaround now vanishes, and we can use [[noreturn]] directly within HotSpot. We should remove the workaround as soon as possible, given the chance Looks good. ------------- Marked as reviewed by kbarrett (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17868#pullrequestreview-2054155854 From duke at openjdk.org Tue May 14 06:01:08 2024 From: duke at openjdk.org (xiaotaonan) Date: Tue, 14 May 2024 06:01:08 GMT Subject: RFR: 8301464: Code in GenFullCP is still disabled after JDK-8079697 was fixed Message-ID: Code in GenFullCP is still disabled after JDK-8079697 was fixed note:I have not found any relevant information on why ClassWriter.COMPUTE_FRAMES is disabled in JDK-8079697. ------------- Commit messages: - Code in GenFullCP is still disabled after JDK-8079697 was fixed Changes: https://git.openjdk.org/jdk/pull/19228/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19228&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8301464 Stats: 3 lines in 1 file changed: 0 ins; 1 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/19228.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19228/head:pull/19228 PR: https://git.openjdk.org/jdk/pull/19228 From rehn at openjdk.org Tue May 14 06:19:04 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Tue, 14 May 2024 06:19:04 GMT Subject: RFR: 8326306: RISC-V: Re-structure MASM calls and jumps [v12] In-Reply-To: References: Message-ID: On Mon, 13 May 2024 10:20:30 GMT, Robbin Ehn wrote: >> Hi, please consider. >> >> We have code that directly use the asm for call/jumps instead masm. >> Our masm have a bit odd naming, and we don't use 'proper' pseudoinstructions/mnemonics. >> Suggested by [riscv-asm-manual](https://github.com/riscv-non-isa/riscv-asm-manual/tree/master) >> >> j offset jal x0, offset Jump >> jal offset jal x1, offset Jump and link >> jr rs jalr x0, rs, 0 Jump register >> jalr rs jalr x1, rs, 0 Jump and link register >> ret jalr x0, x1, 0 Return from subroutine >> call offset auipc x1, offset[31:12]; jalr x1, x1, offset[11:0] Call far-away subroutine >> tail offset auipc x6, offset[31:12]; jalr x0, x6, offset[11:0] Tail call far-away subroutine >> >> But these can only be implemented like this if you have small enough application. >> The fallback of these is to use GOT (your C compiler should place a copy of GOT every 2G so it's always reachable). >> We don't have GOT, instead we materialize, so there is still differences between these and ours. >> >> This patch: >> - Tries to follow these suggested mappings as good we can. >> - Make sure all jumps/calls go through MASM. (so we get control and can easily change for sites using a certain calling convention) >> - To avoid confusion between MASM public/private methods and ASM methods and the mnemonics there are some renaming. >> E.g. the mnemonics jal means call offset, as we can't use that so there is no 'jal'. >> - I enabled c.j, but right now we never generate it. >> - As always the macro does no good and are legacy from when code base did not use templates. (also the x-macros screws up my IDE (vim+rtags)) >> >> I started down this path due to I have followup patch on top of this which removes trampoline in favor for load-n-jump. >> (WIP: https://github.com/robehn/jdk/compare/jal-fixes...robehn:jdk:load-n-link?expand=1) >> While looking into our calls it was a bit confusing, this helps. >> >> Done a couple of t1-3 slightly different version of this patch, and as part of the followup, no issues found. (VF2, qemu, LP4) >> Re-running tests, had some last minute changes. >> >> Thanks, Robbin > > Robbin Ehn has updated the pull request incrementally with one additional commit since the last revision: > > Use la() instead movptr where ok. Yes, in general agreed. One issue is that RV MASM often have lower-case address, where other platforms use Address, i.e. AddressLiteral on x86. I think we should use upper case Address as then we can check for relocation and produce all different code patterns with same function, li/auipc/movptr. While looking at this it seem like we have some funny things: src/hotspot/cpu/riscv/c1_CodeStubs_riscv.cpp: __ rt_call(Runtime1::entry_for(stub_id), ra); src/hotspot/cpu/riscv/c1_CodeStubs_riscv.cpp: __ far_call(RuntimeAddress(Runtime1::entry_for(_stub_id))); Note that 'mv reg, imm' is not a recognized mnemonic (should be 'li reg, imm') AFAIK. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18942#issuecomment-2109368805 From stuefe at openjdk.org Tue May 14 07:01:03 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 14 May 2024 07:01:03 GMT Subject: RFR: 8332042: Move MEMFLAGS to its own include file [v2] In-Reply-To: References: <3E6BcBm0RVHLAGmuNjoYoeNs5-JLcYT5KgKVfmcxYAc=.4537bffa-ea3c-43f2-bc13-16710444b355@github.com> <6qHBMgu-ZgoMPPym-sSJJR9szoVattW-JuHeXaX4JY0=.05710a47-19fe-48ba-920c-218c16473fbb@github.com> Message-ID: On Mon, 13 May 2024 15:48:43 GMT, Stefan Karlsson wrote: >> I tend to agree with that. My earlier question still stands: is there a better place to put it? Right now the "enforced with code" in a stand-alone file doesn't tell me "why" this rule is important. > > If you want to keep the static_assert it in the .cpp file, then I won't block that. > Could you instead put the static_assert near the code that will break? I like that. I will do that. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19172#discussion_r1599476559 From stuefe at openjdk.org Tue May 14 07:01:03 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 14 May 2024 07:01:03 GMT Subject: RFR: 8332042: Move MEMFLAGS to its own include file [v2] In-Reply-To: References: Message-ID: <7X2GeAXsh_ukEC7SeVR_U0VlLveqVb7qOrjrO0cK76U=.8db41eba-86fd-4943-b403-283b467f0392@github.com> On Mon, 13 May 2024 14:47:18 GMT, Stefan Karlsson wrote: >> I don't feel like starting that particular bike shedding discussion :) But sure, sometime in the future we should do this. Here, I want it to be a simple renaming change. > > Right. That's why I prefixed this with "Open-ended comment/question", trying to make it super clear that it wasn't intended as a request for this PR, but rather a way to at least plant the seed of an idea that we might want to fix this eyesore. I agree with you on the eyesore. MEMFLAGS does not follow any established convention, the implied plural is strange (its just one flag, not a set of), etc. We will change it sometime in the future. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19172#discussion_r1599478327 From fyang at openjdk.org Tue May 14 07:19:05 2024 From: fyang at openjdk.org (Fei Yang) Date: Tue, 14 May 2024 07:19:05 GMT Subject: RFR: 8326306: RISC-V: Re-structure MASM calls and jumps [v12] In-Reply-To: References: Message-ID: On Tue, 14 May 2024 02:33:13 GMT, Fei Yang wrote: > RuntimeAddress target(dest); Yeah, I know what you mean. As you see, the `address` passed to `rt_call` is used to construct a `RuntimeAddress` in this function which is also an `Address`. It seems that it will be more consistent to build a `RuntimeAddress` and pass that to `rt_call`, which is similar with others like `far_call`. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18942#issuecomment-2109450998 From stuefe at openjdk.org Tue May 14 07:19:32 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 14 May 2024 07:19:32 GMT Subject: RFR: 8332042: Move MEMFLAGS to its own include file [v3] In-Reply-To: References: Message-ID: <-CWxZnLMZoA9GeqM-hJ8m2d8-HWDQ7bVRhoWbf80MTE=.e77cd29b-70f9-4732-b04c-12a333bc559c@github.com> > MEMFLAGS, as well as its enum constants, should live in its own include. > > The constants are used throughout the code base, often without needing the allocation APIs exposed through allocation.hpp. > > The MEMFLAGS enum def is often needed within NMT itself, again often without needing allocation.hpp. > > --- > > This patch moves the enum to its new file. > > It fixes those `allocation.hpp` includes that where only needed to get MEMFLAGS. It does not fix other includes. > > For backward compatibility, until we straightened out the dependencies (e.g., fixing all places where we rely on indirect includes), I added memflags.hpp to allocation.hpp. > > I tested (built) on: > - MacOS aarch64, no precompiled headers, fastdebug > - Linux x64, no precompiled headers, fastdebug, release, fastdebug crossbuild to aarch64, fastdebug minimal Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: Feedback StefanK ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19172/files - new: https://git.openjdk.org/jdk/pull/19172/files/42361558..2fc98923 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19172&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19172&range=01-02 Stats: 41 lines in 4 files changed: 7 ins; 34 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/19172.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19172/head:pull/19172 PR: https://git.openjdk.org/jdk/pull/19172 From stuefe at openjdk.org Tue May 14 07:19:32 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 14 May 2024 07:19:32 GMT Subject: RFR: 8332042: Move MEMFLAGS to its own include file [v2] In-Reply-To: References: Message-ID: On Mon, 13 May 2024 10:17:57 GMT, Stefan Karlsson wrote: >> Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: >> >> Update mallocLimit.hpp > > Changes requested by stefank (Reviewer). @stefank New version, hopefully addressed all your remarks. Thanks! > src/hotspot/share/nmt/memflags.cpp line 27: > >> 25: #include "precompiled.hpp" >> 26: >> 27: #include "nmt/memflags.hpp" > > There should be no blankline between precompiled.hpp and the rest of the includes. I removed the file. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19172#issuecomment-2109454676 PR Review Comment: https://git.openjdk.org/jdk/pull/19172#discussion_r1599498716 From dholmes at openjdk.org Tue May 14 07:51:01 2024 From: dholmes at openjdk.org (David Holmes) Date: Tue, 14 May 2024 07:51:01 GMT Subject: RFR: 8332112: Update nsk.share.Log to don't print summary during VM shutdown hook In-Reply-To: References: <08rpfgXgTS5RvsqbnwgKdUKo3ADDDGuieSJclVz7leg=.28cded8e-3d2e-4fab-92f6-be89f7ddc6ce@github.com> Message-ID: On Mon, 13 May 2024 15:53:26 GMT, Leonid Mesnik wrote: > Every log (as any Finalazible object) is registered using registerCleanup() But you have changed Log so it is no longer a FinalizableObject. ?? Ah I see this is what you meant by disabling it. Now a Log is a plain old Java object with no special cleanup. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19209#issuecomment-2109509115 From dholmes at openjdk.org Tue May 14 08:42:03 2024 From: dholmes at openjdk.org (David Holmes) Date: Tue, 14 May 2024 08:42:03 GMT Subject: RFR: 8332112: Update nsk.share.Log to don't print summary during VM shutdown hook In-Reply-To: <08rpfgXgTS5RvsqbnwgKdUKo3ADDDGuieSJclVz7leg=.28cded8e-3d2e-4fab-92f6-be89f7ddc6ce@github.com> References: <08rpfgXgTS5RvsqbnwgKdUKo3ADDDGuieSJclVz7leg=.28cded8e-3d2e-4fab-92f6-be89f7ddc6ce@github.com> Message-ID: On Sun, 12 May 2024 21:34:41 GMT, Leonid Mesnik wrote: > The nsk.share.Log doing some cleanup and reporting errors in the cleanup method. This method is supposed to be executed by finalizer originally. However, now it is called only during shutdown hook. > The cleanup using Cleaner doesn't work. See https://bugs.openjdk.org/browse/JDK-8330760 > > The cleanup() method flush stream and print summary which should be already printed by complain method. > > This cleanup is not necessary and printing summary usually is just disabled. It is enabled if the test called 'complain' method. However, the error should have been printed already in this method. > > So it would be simple to remove this cleanup and reduce usage of Finalizable in vmTestbase tests. > > Note: The 'verboseOnErrorEnabled' is just not used. > > See isVerboseOnErrorEnabled. > > public boolean isVerboseOnErrorEnabled() { > return errorsSummaryEnabled; > } > > > Tested with by running tests with different combinations (tier4-7) and tier1. Okay - seems fine. Thanks. ------------- Marked as reviewed by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19209#pullrequestreview-2054686733 From duke at openjdk.org Tue May 14 10:06:03 2024 From: duke at openjdk.org (Mikhail Ablakatov) Date: Tue, 14 May 2024 10:06:03 GMT Subject: RFR: 8322770: Implement C2 VectorizedHashCode on AArch64 In-Reply-To: <2VKOC-rT0vOyMcXUX2gs3sOrbZ5H79KBIo50sOOVmyI=.1936f78e-794c-4f54-af3c-b1b97e5fafa8@github.com> References: <2VKOC-rT0vOyMcXUX2gs3sOrbZ5H79KBIo50sOOVmyI=.1936f78e-794c-4f54-af3c-b1b97e5fafa8@github.com> Message-ID: <67V-hXengkU80wniJ-WIobK5jxJIdM08xu8kbYJf8Ro=.28045233-8987-41bb-af41-da2baac07d2d@github.com> On Tue, 26 Mar 2024 13:59:12 GMT, Mikhail Ablakatov wrote: > Hello, > > Please review the following PR for [JDK-8322770 Implement C2 VectorizedHashCode on AArch64](https://bugs.openjdk.org/browse/JDK-8322770). It follows previous work done in https://github.com/openjdk/jdk/pull/16629 and https://github.com/openjdk/jdk/pull/10847 for RISC-V and x86 respectively. > > The code to calculate a hash code consists of two parts: a vectorized loop of Neon instruction that process 4 or 8 elements per iteration depending on the data type and a fully unrolled scalar "loop" that processes up to 7 tail elements. > > At the time of writing this I don't see potential benefits from providing SVE/SVE2 implementation, but it could be added as a follow-up or independently later if required. > > # Performance > > ## Neoverse N1 > > > -------------------------------------------------------------------------------------------- > Version Baseline This patch > -------------------------------------------------------------------------------------------- > Benchmark (size) Mode Cnt Score Error Score Error Units > -------------------------------------------------------------------------------------------- > ArraysHashCode.bytes 1 avgt 15 1.249 ? 0.060 1.247 ? 0.062 ns/op > ArraysHashCode.bytes 10 avgt 15 8.754 ? 0.028 4.387 ? 0.015 ns/op > ArraysHashCode.bytes 100 avgt 15 98.596 ? 0.051 26.655 ? 0.097 ns/op > ArraysHashCode.bytes 10000 avgt 15 10150.578 ? 1.352 2649.962 ? 216.744 ns/op > ArraysHashCode.chars 1 avgt 15 1.286 ? 0.062 1.246 ? 0.054 ns/op > ArraysHashCode.chars 10 avgt 15 8.731 ? 0.002 5.344 ? 0.003 ns/op > ArraysHashCode.chars 100 avgt 15 98.632 ? 0.048 23.023 ? 0.142 ns/op > ArraysHashCode.chars 10000 avgt 15 10150.658 ? 3.374 2410.504 ? 8.872 ns/op > ArraysHashCode.ints 1 avgt 15 1.189 ? 0.005 1.187 ? 0.001 ns/op > ArraysHashCode.ints 10 avgt 15 8.730 ? 0.002 5.676 ? 0.001 ns/op > ArraysHashCode.ints 100 avgt 15 98.559 ? 0.016 24.378 ? 0.006 ns/op > ArraysHashCode.ints 10000 avgt 15 10148.752 ? 1.336 2419.015 ? 0.492 ns/op > ArraysHashCode.multibytes 1 avgt 15 1.037 ? 0.001 1.037 ? 0.001 ns/op > ArraysHashCode.multibytes 10 avgt 15 5.4... Just as a note to not miss it later: the implementation might be affected by https://bugs.openjdk.org/browse/JDK-8139457 I'm finishing up a patch, hopefully I'll push it later today. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18487#issuecomment-2109791736 PR Comment: https://git.openjdk.org/jdk/pull/18487#issuecomment-2109794609 From jsjolen at openjdk.org Tue May 14 10:27:25 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Tue, 14 May 2024 10:27:25 GMT Subject: RFR: 8312132: Add tracking of multiple address spaces in NMT [v78] In-Reply-To: References: Message-ID: <5Mr-DQJRQe3BWTBBq11njD39zewp7utObr0mUf3bTFI=.2b699c09-2cc8-400e-9eae-965212acf2fb@github.com> > Hi, > > This PR introduces a new abstraction to NMT, named `MemoryFileTracker`. Today, NMT does not track any memory outside of the virtual memory address space. This means that if you allocated memory in something such as a memory-backed file and use `mmap` to map into that memory, then you'll have trouble reporting this to NMT. This is the situation that ZGC is in, and that is what this patch attempts to fix. > > ## `MemoryFileTracker` > > The `MemoryFileTracker` adds the ability of adding new virtual memory address spaces to NMT and committing memory to these, the basic API is: > > ```c++ > static MemoryFile* make_device(const char* descriptive_name); > static void free_device(MemoryFile* device); > > static void allocate_memory(MemoryFile* device, size_t offset, size_t size, > MEMFLAGS flag, const NativeCallStack& stack); > static void free_memory(MemoryFile* device, size_t offset, size_t size); > > > It is easiest to see how this is used by looking at what ZGC's `ZNMT` class does: > > ```c++ > void ZNMT::reserve(zaddress_unsafe start, size_t size) { > MemTracker::record_virtual_memory_reserve((address)start, size, CALLER_PC, mtJavaHeap); > } > void ZNMT::commit(zoffset offset, size_t size) { > MemTracker::allocate_memory_in(ZNMT::_device, static_cast(offset), size, mtJavaHeap, CALLER_PC); > } > void ZNMT::uncommit(zoffset offset, size_t size) { > MemTracker::free_memory_in(ZNMT::_device, (size_t)offset, size); > } > > void ZNMT::map(zaddress_unsafe addr, size_t size, zoffset offset) { > // NMT doesn't track mappings at the moment. > } > void ZNMT::unmap(zaddress_unsafe addr, size_t size) { > // NMT doesn't track mappings at the moment. > } > > > As you can see, any mapping between reserved regions and device-allocated memory is not recorded in NMT. This means that in detailed mode you only get reserved regions printed for the reserved memory, the device-allocated memory is reported separately. When performing summary reporting any memory allocated via these devices is added to the corresponding `MEMFLAGS` as `committed` memory. > > This patch is also acting as a base on which we deploy multiple new backend ideas to NMT. These ideas are: > > 1. Implement VMA tracking using a balanced binary tree approach. Today's `VirtualMemoryTracker`'s usage of linked lists is slow and brittle, we'd like to move away from it. Our Treap-based approach in this patch gives a performance boost such that we see 25x better performance in a benchmark. The idea and draft of this... Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: Remove calls to verify_self ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18289/files - new: https://git.openjdk.org/jdk/pull/18289/files/327094bc..115f7e57 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18289&range=77 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18289&range=76-77 Stats: 4 lines in 1 file changed: 0 ins; 4 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/18289.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18289/head:pull/18289 PR: https://git.openjdk.org/jdk/pull/18289 From dchuyko at openjdk.org Tue May 14 10:48:04 2024 From: dchuyko at openjdk.org (Dmitry Chuyko) Date: Tue, 14 May 2024 10:48:04 GMT Subject: RFR: 8332111: [BACKOUT] A way to align already compiled methods with compiler directives In-Reply-To: References: Message-ID: On Mon, 13 May 2024 21:08:08 GMT, Evgeny Astigeevich wrote: >> Backout of [JDK-8309271](https://bugs.openjdk.org/browse/JDK-8309271) which has known bugs, possible bugs and performance issues. REDO work is tracked by [JDK-8331749](https://bugs.openjdk.org/browse/JDK-8331749). >> >> Found bugs: >> - When refreshing `CompilerDirectivesAddDCmd::execute` will call `DirectivesStack::hasMatchingDirectives(mh, true)` which only considers the compiler directive which is on the top of the directives stack. As more than one directive can be added, `CompilerDirectivesAddDCmd::execute` will not behave as expected. >> - A Java method with old directives might be in the compilation queue. A request to recompile it with new directives will be ignored. >> >> There are other concerns: bugs and performance issues. >> >> Possible bugs: >> - `has_matching_directives` might not be cleared. A nmethod might get into the unloading state before `CodeCache::recompile_marked_directives_matches`. If the nmethod has been used to mark a Java method and it is the only nmethod, there will be no nmethod in CodeCache to reach the Java method to clear the mark. >> - A Java method might have been compiled with new directives before `CodeCache::recompile_marked_directives_matches`. `CodeCache::recompile_marked_directives_matches` will recompile it again. >> - JIT compiler might be compiling a Java method with old directives. A request to recompile it with new directives will be ignored. >> >> Performance issues: >> - Usually directives are updated for a small number of Java methods. If CodeCache has thousands of nmethods, `CodeCache::recompile_marked_directives_matches` will be traversing nmethods most of which don't need recompilation. >> >> The backout is not clean because of removal of `CompiledMethod`. >> >> Tested with release and fastdebug builds: tier1 and tier2 passed. > > What if instead of backing out we will use an experimental JVM flag: `XX:+CompilerDirectivesRefreshSupport`? > I agree with this backout. Thank you @eastig for explaining your point. We have about 3 weeks before RDP1 and it is better we have less issues before that. Let redo implementation in next release taking into account the issues you found and have more time for testing. OK. I hope it takes less time to get back into the source tree than it did initially. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19215#issuecomment-2109874596 From jsjolen at openjdk.org Tue May 14 10:48:41 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Tue, 14 May 2024 10:48:41 GMT Subject: RFR: 8312132: Add tracking of multiple address spaces in NMT [v79] In-Reply-To: References: Message-ID: > Hi, > > This PR introduces a new abstraction to NMT, named `MemoryFileTracker`. Today, NMT does not track any memory outside of the virtual memory address space. This means that if you allocated memory in something such as a memory-backed file and use `mmap` to map into that memory, then you'll have trouble reporting this to NMT. This is the situation that ZGC is in, and that is what this patch attempts to fix. > > ## `MemoryFileTracker` > > The `MemoryFileTracker` adds the ability of adding new virtual memory address spaces to NMT and committing memory to these, the basic API is: > > ```c++ > static MemoryFile* make_device(const char* descriptive_name); > static void free_device(MemoryFile* device); > > static void allocate_memory(MemoryFile* device, size_t offset, size_t size, > MEMFLAGS flag, const NativeCallStack& stack); > static void free_memory(MemoryFile* device, size_t offset, size_t size); > > > It is easiest to see how this is used by looking at what ZGC's `ZNMT` class does: > > ```c++ > void ZNMT::reserve(zaddress_unsafe start, size_t size) { > MemTracker::record_virtual_memory_reserve((address)start, size, CALLER_PC, mtJavaHeap); > } > void ZNMT::commit(zoffset offset, size_t size) { > MemTracker::allocate_memory_in(ZNMT::_device, static_cast(offset), size, mtJavaHeap, CALLER_PC); > } > void ZNMT::uncommit(zoffset offset, size_t size) { > MemTracker::free_memory_in(ZNMT::_device, (size_t)offset, size); > } > > void ZNMT::map(zaddress_unsafe addr, size_t size, zoffset offset) { > // NMT doesn't track mappings at the moment. > } > void ZNMT::unmap(zaddress_unsafe addr, size_t size) { > // NMT doesn't track mappings at the moment. > } > > > As you can see, any mapping between reserved regions and device-allocated memory is not recorded in NMT. This means that in detailed mode you only get reserved regions printed for the reserved memory, the device-allocated memory is reported separately. When performing summary reporting any memory allocated via these devices is added to the corresponding `MEMFLAGS` as `committed` memory. > > This patch is also acting as a base on which we deploy multiple new backend ideas to NMT. These ideas are: > > 1. Implement VMA tracking using a balanced binary tree approach. Today's `VirtualMemoryTracker`'s usage of linked lists is slow and brittle, we'd like to move away from it. Our Treap-based approach in this patch gives a performance boost such that we see 25x better performance in a benchmark. The idea and draft of this... Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: At end of remove_all set _root to nullptr ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18289/files - new: https://git.openjdk.org/jdk/pull/18289/files/115f7e57..e8f33dfa Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18289&range=78 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18289&range=77-78 Stats: 3 lines in 2 files changed: 3 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/18289.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18289/head:pull/18289 PR: https://git.openjdk.org/jdk/pull/18289 From jwaters at openjdk.org Tue May 14 10:54:04 2024 From: jwaters at openjdk.org (Julian Waters) Date: Tue, 14 May 2024 10:54:04 GMT Subject: RFR: 8325932: Replace ATTRIBUTE_NORETURN with direct [[noreturn]] In-Reply-To: References: Message-ID: <7gb1dCW9UzLuKvskIyRbexhP9RPCSLPnJ9_qdVO-sZk=.16337a28-935f-45ba-a343-a4d0fa96cbc8@github.com> On Thu, 15 Feb 2024 09:10:51 GMT, Julian Waters wrote: > With clang 13 being the minimum required JDK-8325878, the noreturn bug that requires the ATTRIBUTE_NORETURN workaround now vanishes, and we can use [[noreturn]] directly within HotSpot. We should remove the workaround as soon as possible, given the chance Thanks Kim. Anyone else want to be the second reviewer? ------------- PR Comment: https://git.openjdk.org/jdk/pull/17868#issuecomment-2109889652 From jsjolen at openjdk.org Tue May 14 11:24:41 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Tue, 14 May 2024 11:24:41 GMT Subject: RFR: 8312132: Add tracking of multiple address spaces in NMT [v80] In-Reply-To: References: Message-ID: > Hi, > > This PR introduces a new abstraction to NMT, named `MemoryFileTracker`. Today, NMT does not track any memory outside of the virtual memory address space. This means that if you allocated memory in something such as a memory-backed file and use `mmap` to map into that memory, then you'll have trouble reporting this to NMT. This is the situation that ZGC is in, and that is what this patch attempts to fix. > > ## `MemoryFileTracker` > > The `MemoryFileTracker` adds the ability of adding new virtual memory address spaces to NMT and committing memory to these, the basic API is: > > ```c++ > static MemoryFile* make_device(const char* descriptive_name); > static void free_device(MemoryFile* device); > > static void allocate_memory(MemoryFile* device, size_t offset, size_t size, > MEMFLAGS flag, const NativeCallStack& stack); > static void free_memory(MemoryFile* device, size_t offset, size_t size); > > > It is easiest to see how this is used by looking at what ZGC's `ZNMT` class does: > > ```c++ > void ZNMT::reserve(zaddress_unsafe start, size_t size) { > MemTracker::record_virtual_memory_reserve((address)start, size, CALLER_PC, mtJavaHeap); > } > void ZNMT::commit(zoffset offset, size_t size) { > MemTracker::allocate_memory_in(ZNMT::_device, static_cast(offset), size, mtJavaHeap, CALLER_PC); > } > void ZNMT::uncommit(zoffset offset, size_t size) { > MemTracker::free_memory_in(ZNMT::_device, (size_t)offset, size); > } > > void ZNMT::map(zaddress_unsafe addr, size_t size, zoffset offset) { > // NMT doesn't track mappings at the moment. > } > void ZNMT::unmap(zaddress_unsafe addr, size_t size) { > // NMT doesn't track mappings at the moment. > } > > > As you can see, any mapping between reserved regions and device-allocated memory is not recorded in NMT. This means that in detailed mode you only get reserved regions printed for the reserved memory, the device-allocated memory is reported separately. When performing summary reporting any memory allocated via these devices is added to the corresponding `MEMFLAGS` as `committed` memory. > > This patch is also acting as a base on which we deploy multiple new backend ideas to NMT. These ideas are: > > 1. Implement VMA tracking using a balanced binary tree approach. Today's `VirtualMemoryTracker`'s usage of linked lists is slow and brittle, we'd like to move away from it. Our Treap-based approach in this patch gives a performance boost such that we see 25x better performance in a benchmark. The idea and draft of this... Johan Sj?len has updated the pull request incrementally with two additional commits since the last revision: - Add a ResourceMark - More testing of the treap ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18289/files - new: https://git.openjdk.org/jdk/pull/18289/files/e8f33dfa..fb42fb0d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18289&range=79 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18289&range=78-79 Stats: 26 lines in 1 file changed: 26 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/18289.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18289/head:pull/18289 PR: https://git.openjdk.org/jdk/pull/18289 From aboldtch at openjdk.org Tue May 14 11:26:29 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Tue, 14 May 2024 11:26:29 GMT Subject: RFR: 8326957: Implement JEP 474: ZGC: Generational Mode by Default [v6] In-Reply-To: References: Message-ID: > This is the implementation task for `JEP 474: ZGC: Generational Mode by Default`. See the JEP for details. [JDK-8326667](https://bugs.openjdk.org/browse/JDK-8326667) Axel Boldt-Christmas has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 11 commits: - Merge remote-tracking branch 'upstream_jdk/master' into JDK-8326957 - Revert "Default to non generational ZGC with JVMCI" This reverts commit 4de30da2661c64ae00d13a5e15a1a61fcb667a10. - Default to non generational ZGC with JVMCI - Merge tag 'jdk-23+21' into JDK-8326957 Added tag jdk-23+21 for changeset e833bfc8 - Merge tag 'jdk-23+19' into JDK-8326957 Added tag jdk-23+19 for changeset 706b421c - Remove extra space - Use consistent terminology - Merge tag 'jdk-23+17' into JDK-8326957 Added tag jdk-23+17 for changeset 8efd7aa6 - Merge tag 'jdk-23+16' into JDK-8326957 Added tag jdk-23+16 for changeset d580bcf9 - Update VMDeprecatedOptions.java test - ... and 1 more: https://git.openjdk.org/jdk/compare/5ded8da6...60460dce ------------- Changes: https://git.openjdk.org/jdk/pull/18393/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=18393&range=05 Stats: 107 lines in 7 files changed: 105 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/18393.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18393/head:pull/18393 PR: https://git.openjdk.org/jdk/pull/18393 From stefank at openjdk.org Tue May 14 11:26:29 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Tue, 14 May 2024 11:26:29 GMT Subject: RFR: 8326957: Implement JEP 474: ZGC: Generational Mode by Default [v6] In-Reply-To: References: Message-ID: On Tue, 14 May 2024 11:23:31 GMT, Axel Boldt-Christmas wrote: >> This is the implementation task for `JEP 474: ZGC: Generational Mode by Default`. See the JEP for details. [JDK-8326667](https://bugs.openjdk.org/browse/JDK-8326667) > > Axel Boldt-Christmas has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 11 commits: > > - Merge remote-tracking branch 'upstream_jdk/master' into JDK-8326957 > - Revert "Default to non generational ZGC with JVMCI" > > This reverts commit 4de30da2661c64ae00d13a5e15a1a61fcb667a10. > - Default to non generational ZGC with JVMCI > - Merge tag 'jdk-23+21' into JDK-8326957 > > Added tag jdk-23+21 for changeset e833bfc8 > - Merge tag 'jdk-23+19' into JDK-8326957 > > Added tag jdk-23+19 for changeset 706b421c > - Remove extra space > - Use consistent terminology > - Merge tag 'jdk-23+17' into JDK-8326957 > > Added tag jdk-23+17 for changeset 8efd7aa6 > - Merge tag 'jdk-23+16' into JDK-8326957 > > Added tag jdk-23+16 for changeset d580bcf9 > - Update VMDeprecatedOptions.java test > - ... and 1 more: https://git.openjdk.org/jdk/compare/5ded8da6...60460dce Marked as reviewed by stefank (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/18393#pullrequestreview-2055065442 From jsjolen at openjdk.org Tue May 14 11:33:27 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Tue, 14 May 2024 11:33:27 GMT Subject: RFR: 8312132: Add tracking of multiple address spaces in NMT [v81] In-Reply-To: References: Message-ID: <1tIXPVo88Ra38sYtM8fBqeEf8wQ0uwO7frb2AORBXRw=.644542dd-1e8f-4cee-88bf-2c0ae541931c@github.com> > Hi, > > This PR introduces a new abstraction to NMT, named `MemoryFileTracker`. Today, NMT does not track any memory outside of the virtual memory address space. This means that if you allocated memory in something such as a memory-backed file and use `mmap` to map into that memory, then you'll have trouble reporting this to NMT. This is the situation that ZGC is in, and that is what this patch attempts to fix. > > ## `MemoryFileTracker` > > The `MemoryFileTracker` adds the ability of adding new virtual memory address spaces to NMT and committing memory to these, the basic API is: > > ```c++ > static MemoryFile* make_device(const char* descriptive_name); > static void free_device(MemoryFile* device); > > static void allocate_memory(MemoryFile* device, size_t offset, size_t size, > MEMFLAGS flag, const NativeCallStack& stack); > static void free_memory(MemoryFile* device, size_t offset, size_t size); > > > It is easiest to see how this is used by looking at what ZGC's `ZNMT` class does: > > ```c++ > void ZNMT::reserve(zaddress_unsafe start, size_t size) { > MemTracker::record_virtual_memory_reserve((address)start, size, CALLER_PC, mtJavaHeap); > } > void ZNMT::commit(zoffset offset, size_t size) { > MemTracker::allocate_memory_in(ZNMT::_device, static_cast(offset), size, mtJavaHeap, CALLER_PC); > } > void ZNMT::uncommit(zoffset offset, size_t size) { > MemTracker::free_memory_in(ZNMT::_device, (size_t)offset, size); > } > > void ZNMT::map(zaddress_unsafe addr, size_t size, zoffset offset) { > // NMT doesn't track mappings at the moment. > } > void ZNMT::unmap(zaddress_unsafe addr, size_t size) { > // NMT doesn't track mappings at the moment. > } > > > As you can see, any mapping between reserved regions and device-allocated memory is not recorded in NMT. This means that in detailed mode you only get reserved regions printed for the reserved memory, the device-allocated memory is reported separately. When performing summary reporting any memory allocated via these devices is added to the corresponding `MEMFLAGS` as `committed` memory. > > This patch is also acting as a base on which we deploy multiple new backend ideas to NMT. These ideas are: > > 1. Implement VMA tracking using a balanced binary tree approach. Today's `VirtualMemoryTracker`'s usage of linked lists is slow and brittle, we'd like to move away from it. Our Treap-based approach in this patch gives a performance boost such that we see 25x better performance in a benchmark. The idea and draft of this... Johan Sj?len has updated the pull request incrementally with two additional commits since the last revision: - Test with opposite ordering - Of course you need to ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18289/files - new: https://git.openjdk.org/jdk/pull/18289/files/fb42fb0d..57ce5254 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18289&range=80 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18289&range=79-80 Stats: 75 lines in 1 file changed: 27 ins; 6 del; 42 mod Patch: https://git.openjdk.org/jdk/pull/18289.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18289/head:pull/18289 PR: https://git.openjdk.org/jdk/pull/18289 From jsjolen at openjdk.org Tue May 14 11:43:41 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Tue, 14 May 2024 11:43:41 GMT Subject: RFR: 8312132: Add tracking of multiple address spaces in NMT [v82] In-Reply-To: References: Message-ID: > Hi, > > This PR introduces a new abstraction to NMT, named `MemoryFileTracker`. Today, NMT does not track any memory outside of the virtual memory address space. This means that if you allocated memory in something such as a memory-backed file and use `mmap` to map into that memory, then you'll have trouble reporting this to NMT. This is the situation that ZGC is in, and that is what this patch attempts to fix. > > ## `MemoryFileTracker` > > The `MemoryFileTracker` adds the ability of adding new virtual memory address spaces to NMT and committing memory to these, the basic API is: > > ```c++ > static MemoryFile* make_device(const char* descriptive_name); > static void free_device(MemoryFile* device); > > static void allocate_memory(MemoryFile* device, size_t offset, size_t size, > MEMFLAGS flag, const NativeCallStack& stack); > static void free_memory(MemoryFile* device, size_t offset, size_t size); > > > It is easiest to see how this is used by looking at what ZGC's `ZNMT` class does: > > ```c++ > void ZNMT::reserve(zaddress_unsafe start, size_t size) { > MemTracker::record_virtual_memory_reserve((address)start, size, CALLER_PC, mtJavaHeap); > } > void ZNMT::commit(zoffset offset, size_t size) { > MemTracker::allocate_memory_in(ZNMT::_device, static_cast(offset), size, mtJavaHeap, CALLER_PC); > } > void ZNMT::uncommit(zoffset offset, size_t size) { > MemTracker::free_memory_in(ZNMT::_device, (size_t)offset, size); > } > > void ZNMT::map(zaddress_unsafe addr, size_t size, zoffset offset) { > // NMT doesn't track mappings at the moment. > } > void ZNMT::unmap(zaddress_unsafe addr, size_t size) { > // NMT doesn't track mappings at the moment. > } > > > As you can see, any mapping between reserved regions and device-allocated memory is not recorded in NMT. This means that in detailed mode you only get reserved regions printed for the reserved memory, the device-allocated memory is reported separately. When performing summary reporting any memory allocated via these devices is added to the corresponding `MEMFLAGS` as `committed` memory. > > This patch is also acting as a base on which we deploy multiple new backend ideas to NMT. These ideas are: > > 1. Implement VMA tracking using a balanced binary tree approach. Today's `VirtualMemoryTracker`'s usage of linked lists is slow and brittle, we'd like to move away from it. Our Treap-based approach in this patch gives a performance boost such that we see 25x better performance in a benchmark. The idea and draft of this... Johan Sj?len has updated the pull request incrementally with three additional commits since the last revision: - Off-by-one error - Fix iteration order - Fixes ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18289/files - new: https://git.openjdk.org/jdk/pull/18289/files/57ce5254..d7ab6001 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18289&range=81 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18289&range=80-81 Stats: 5 lines in 2 files changed: 1 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/18289.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18289/head:pull/18289 PR: https://git.openjdk.org/jdk/pull/18289 From aph at openjdk.org Tue May 14 12:17:05 2024 From: aph at openjdk.org (Andrew Haley) Date: Tue, 14 May 2024 12:17:05 GMT Subject: RFR: 8332066: AArch64: Math test failures since JDK-8331558 In-Reply-To: References: Message-ID: On Mon, 13 May 2024 15:11:19 GMT, Daniel D. Daugherty wrote: > [JDK-8332066](https://bugs.openjdk.org/browse/JDK-8332066) should have been renamed to: Mmm, but the core problems are that this is a manual process, it's different from the "normal" GitHub backouts and, worst of all, can't be corrected if anyone messes up the manual process. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19177#issuecomment-2110048952 From aboldtch at openjdk.org Tue May 14 13:14:10 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Tue, 14 May 2024 13:14:10 GMT Subject: RFR: 8326957: Implement JEP 474: ZGC: Generational Mode by Default [v6] In-Reply-To: References: Message-ID: On Tue, 14 May 2024 11:26:29 GMT, Axel Boldt-Christmas wrote: >> This is the implementation task for `JEP 474: ZGC: Generational Mode by Default`. See the JEP for details. [JDK-8326667](https://bugs.openjdk.org/browse/JDK-8326667) > > Axel Boldt-Christmas has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 11 commits: > > - Merge remote-tracking branch 'upstream_jdk/master' into JDK-8326957 > - Revert "Default to non generational ZGC with JVMCI" > > This reverts commit 4de30da2661c64ae00d13a5e15a1a61fcb667a10. > - Default to non generational ZGC with JVMCI > - Merge tag 'jdk-23+21' into JDK-8326957 > > Added tag jdk-23+21 for changeset e833bfc8 > - Merge tag 'jdk-23+19' into JDK-8326957 > > Added tag jdk-23+19 for changeset 706b421c > - Remove extra space > - Use consistent terminology > - Merge tag 'jdk-23+17' into JDK-8326957 > > Added tag jdk-23+17 for changeset 8efd7aa6 > - Merge tag 'jdk-23+16' into JDK-8326957 > > Added tag jdk-23+16 for changeset d580bcf9 > - Update VMDeprecatedOptions.java test > - ... and 1 more: https://git.openjdk.org/jdk/compare/5ded8da6...60460dce Thanks for the reviews. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18393#issuecomment-2110201771 From aboldtch at openjdk.org Tue May 14 13:14:11 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Tue, 14 May 2024 13:14:11 GMT Subject: Integrated: 8326957: Implement JEP 474: ZGC: Generational Mode by Default In-Reply-To: References: Message-ID: On Wed, 20 Mar 2024 09:24:51 GMT, Axel Boldt-Christmas wrote: > This is the implementation task for `JEP 474: ZGC: Generational Mode by Default`. See the JEP for details. [JDK-8326667](https://bugs.openjdk.org/browse/JDK-8326667) This pull request has now been integrated. Changeset: 4ba74475 Author: Axel Boldt-Christmas URL: https://git.openjdk.org/jdk/commit/4ba74475d44831c1fe49359458163cd1567e9619 Stats: 107 lines in 7 files changed: 105 ins; 0 del; 2 mod 8326957: Implement JEP 474: ZGC: Generational Mode by Default Reviewed-by: stefank, eosterlund ------------- PR: https://git.openjdk.org/jdk/pull/18393 From jsjolen at openjdk.org Tue May 14 13:38:28 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Tue, 14 May 2024 13:38:28 GMT Subject: RFR: 8312132: Add tracking of multiple address spaces in NMT [v83] In-Reply-To: References: Message-ID: > Hi, > > This PR introduces a new abstraction to NMT, named `MemoryFileTracker`. Today, NMT does not track any memory outside of the virtual memory address space. This means that if you allocated memory in something such as a memory-backed file and use `mmap` to map into that memory, then you'll have trouble reporting this to NMT. This is the situation that ZGC is in, and that is what this patch attempts to fix. > > ## `MemoryFileTracker` > > The `MemoryFileTracker` adds the ability of adding new virtual memory address spaces to NMT and committing memory to these, the basic API is: > > ```c++ > static MemoryFile* make_device(const char* descriptive_name); > static void free_device(MemoryFile* device); > > static void allocate_memory(MemoryFile* device, size_t offset, size_t size, > MEMFLAGS flag, const NativeCallStack& stack); > static void free_memory(MemoryFile* device, size_t offset, size_t size); > > > It is easiest to see how this is used by looking at what ZGC's `ZNMT` class does: > > ```c++ > void ZNMT::reserve(zaddress_unsafe start, size_t size) { > MemTracker::record_virtual_memory_reserve((address)start, size, CALLER_PC, mtJavaHeap); > } > void ZNMT::commit(zoffset offset, size_t size) { > MemTracker::allocate_memory_in(ZNMT::_device, static_cast(offset), size, mtJavaHeap, CALLER_PC); > } > void ZNMT::uncommit(zoffset offset, size_t size) { > MemTracker::free_memory_in(ZNMT::_device, (size_t)offset, size); > } > > void ZNMT::map(zaddress_unsafe addr, size_t size, zoffset offset) { > // NMT doesn't track mappings at the moment. > } > void ZNMT::unmap(zaddress_unsafe addr, size_t size) { > // NMT doesn't track mappings at the moment. > } > > > As you can see, any mapping between reserved regions and device-allocated memory is not recorded in NMT. This means that in detailed mode you only get reserved regions printed for the reserved memory, the device-allocated memory is reported separately. When performing summary reporting any memory allocated via these devices is added to the corresponding `MEMFLAGS` as `committed` memory. > > This patch is also acting as a base on which we deploy multiple new backend ideas to NMT. These ideas are: > > 1. Implement VMA tracking using a balanced binary tree approach. Today's `VirtualMemoryTracker`'s usage of linked lists is slow and brittle, we'd like to move away from it. Our Treap-based approach in this patch gives a performance boost such that we see 25x better performance in a benchmark. The idea and draft of this... Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: Remove is_noop() superfluous check ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18289/files - new: https://git.openjdk.org/jdk/pull/18289/files/d7ab6001..86da2444 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18289&range=82 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18289&range=81-82 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/18289.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18289/head:pull/18289 PR: https://git.openjdk.org/jdk/pull/18289 From jsjolen at openjdk.org Tue May 14 13:54:38 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Tue, 14 May 2024 13:54:38 GMT Subject: RFR: 8312132: Add tracking of multiple address spaces in NMT [v84] In-Reply-To: References: Message-ID: <6IlvXnA9cfDmRbDo-vOM7S-TrSvnr0JlbqVO-zoF3lo=.89c6b61f-a415-4a30-afbe-b362ec5f7cdb@github.com> > Hi, > > This PR introduces a new abstraction to NMT, named `MemoryFileTracker`. Today, NMT does not track any memory outside of the virtual memory address space. This means that if you allocated memory in something such as a memory-backed file and use `mmap` to map into that memory, then you'll have trouble reporting this to NMT. This is the situation that ZGC is in, and that is what this patch attempts to fix. > > ## `MemoryFileTracker` > > The `MemoryFileTracker` adds the ability of adding new virtual memory address spaces to NMT and committing memory to these, the basic API is: > > ```c++ > static MemoryFile* make_device(const char* descriptive_name); > static void free_device(MemoryFile* device); > > static void allocate_memory(MemoryFile* device, size_t offset, size_t size, > MEMFLAGS flag, const NativeCallStack& stack); > static void free_memory(MemoryFile* device, size_t offset, size_t size); > > > It is easiest to see how this is used by looking at what ZGC's `ZNMT` class does: > > ```c++ > void ZNMT::reserve(zaddress_unsafe start, size_t size) { > MemTracker::record_virtual_memory_reserve((address)start, size, CALLER_PC, mtJavaHeap); > } > void ZNMT::commit(zoffset offset, size_t size) { > MemTracker::allocate_memory_in(ZNMT::_device, static_cast(offset), size, mtJavaHeap, CALLER_PC); > } > void ZNMT::uncommit(zoffset offset, size_t size) { > MemTracker::free_memory_in(ZNMT::_device, (size_t)offset, size); > } > > void ZNMT::map(zaddress_unsafe addr, size_t size, zoffset offset) { > // NMT doesn't track mappings at the moment. > } > void ZNMT::unmap(zaddress_unsafe addr, size_t size) { > // NMT doesn't track mappings at the moment. > } > > > As you can see, any mapping between reserved regions and device-allocated memory is not recorded in NMT. This means that in detailed mode you only get reserved regions printed for the reserved memory, the device-allocated memory is reported separately. When performing summary reporting any memory allocated via these devices is added to the corresponding `MEMFLAGS` as `committed` memory. > > This patch is also acting as a base on which we deploy multiple new backend ideas to NMT. These ideas are: > > 1. Implement VMA tracking using a balanced binary tree approach. Today's `VirtualMemoryTracker`'s usage of linked lists is slow and brittle, we'd like to move away from it. Our Treap-based approach in this patch gives a performance boost such that we see 25x better performance in a benchmark. The idea and draft of this... Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: Test find ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18289/files - new: https://git.openjdk.org/jdk/pull/18289/files/86da2444..fd953805 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18289&range=83 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18289&range=82-83 Stats: 31 lines in 1 file changed: 31 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/18289.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18289/head:pull/18289 PR: https://git.openjdk.org/jdk/pull/18289 From stefank at openjdk.org Tue May 14 14:50:04 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Tue, 14 May 2024 14:50:04 GMT Subject: RFR: 8332042: Move MEMFLAGS to its own include file [v3] In-Reply-To: <-CWxZnLMZoA9GeqM-hJ8m2d8-HWDQ7bVRhoWbf80MTE=.e77cd29b-70f9-4732-b04c-12a333bc559c@github.com> References: <-CWxZnLMZoA9GeqM-hJ8m2d8-HWDQ7bVRhoWbf80MTE=.e77cd29b-70f9-4732-b04c-12a333bc559c@github.com> Message-ID: On Tue, 14 May 2024 07:19:32 GMT, Thomas Stuefe wrote: >> MEMFLAGS, as well as its enum constants, should live in its own include. >> >> The constants are used throughout the code base, often without needing the allocation APIs exposed through allocation.hpp. >> >> The MEMFLAGS enum def is often needed within NMT itself, again often without needing allocation.hpp. >> >> --- >> >> This patch moves the enum to its new file. >> >> It fixes those `allocation.hpp` includes that where only needed to get MEMFLAGS. It does not fix other includes. >> >> For backward compatibility, until we straightened out the dependencies (e.g., fixing all places where we rely on indirect includes), I added memflags.hpp to allocation.hpp. >> >> I tested (built) on: >> - MacOS aarch64, no precompiled headers, fastdebug >> - Linux x64, no precompiled headers, fastdebug, release, fastdebug crossbuild to aarch64, fastdebug minimal > > Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: > > Feedback StefanK Looks good. Thanks! ------------- Marked as reviewed by stefank (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19172#pullrequestreview-2055637663 From stuefe at openjdk.org Tue May 14 15:02:11 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 14 May 2024 15:02:11 GMT Subject: RFR: 8332042: Move MEMFLAGS to its own include file [v3] In-Reply-To: <-CWxZnLMZoA9GeqM-hJ8m2d8-HWDQ7bVRhoWbf80MTE=.e77cd29b-70f9-4732-b04c-12a333bc559c@github.com> References: <-CWxZnLMZoA9GeqM-hJ8m2d8-HWDQ7bVRhoWbf80MTE=.e77cd29b-70f9-4732-b04c-12a333bc559c@github.com> Message-ID: On Tue, 14 May 2024 07:19:32 GMT, Thomas Stuefe wrote: >> MEMFLAGS, as well as its enum constants, should live in its own include. >> >> The constants are used throughout the code base, often without needing the allocation APIs exposed through allocation.hpp. >> >> The MEMFLAGS enum def is often needed within NMT itself, again often without needing allocation.hpp. >> >> --- >> >> This patch moves the enum to its new file. >> >> It fixes those `allocation.hpp` includes that where only needed to get MEMFLAGS. It does not fix other includes. >> >> For backward compatibility, until we straightened out the dependencies (e.g., fixing all places where we rely on indirect includes), I added memflags.hpp to allocation.hpp. >> >> I tested (built) on: >> - MacOS aarch64, no precompiled headers, fastdebug >> - Linux x64, no precompiled headers, fastdebug, release, fastdebug crossbuild to aarch64, fastdebug minimal > > Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: > > Feedback StefanK Thanks @afshin-zafari @stefank @kimbarrett @jdksjolen ------------- PR Comment: https://git.openjdk.org/jdk/pull/19172#issuecomment-2110464347 From stuefe at openjdk.org Tue May 14 15:02:12 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 14 May 2024 15:02:12 GMT Subject: Integrated: 8332042: Move MEMFLAGS to its own include file In-Reply-To: References: Message-ID: On Fri, 10 May 2024 09:06:08 GMT, Thomas Stuefe wrote: > MEMFLAGS, as well as its enum constants, should live in its own include. > > The constants are used throughout the code base, often without needing the allocation APIs exposed through allocation.hpp. > > The MEMFLAGS enum def is often needed within NMT itself, again often without needing allocation.hpp. > > --- > > This patch moves the enum to its new file. > > It fixes those `allocation.hpp` includes that where only needed to get MEMFLAGS. It does not fix other includes. > > For backward compatibility, until we straightened out the dependencies (e.g., fixing all places where we rely on indirect includes), I added memflags.hpp to allocation.hpp. > > I tested (built) on: > - MacOS aarch64, no precompiled headers, fastdebug > - Linux x64, no precompiled headers, fastdebug, release, fastdebug crossbuild to aarch64, fastdebug minimal This pull request has now been integrated. Changeset: 95a60131 Author: Thomas Stuefe URL: https://git.openjdk.org/jdk/commit/95a601316de06b4b0fbf6e3c7777be5d2a1ca978 Stats: 201 lines in 25 files changed: 99 ins; 66 del; 36 mod 8332042: Move MEMFLAGS to its own include file Reviewed-by: jsjolen, stefank ------------- PR: https://git.openjdk.org/jdk/pull/19172 From jsjolen at openjdk.org Tue May 14 15:54:17 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Tue, 14 May 2024 15:54:17 GMT Subject: RFR: 8312132: Add tracking of multiple address spaces in NMT [v74] In-Reply-To: References: Message-ID: On Thu, 9 May 2024 11:35:29 GMT, Thomas Stuefe wrote: >> And I would not rely on the internal verification here, but call verify explicitly. (see also note in treap add/remove about verification) > > Please also test the scoped find function with different sets (eg. empty set, 1 item set etc). I've added a fair amount of tests to this code now. I still need to add some more for iterators, at least the closest_leq. Most of the tests use `int` (so not `size_t`), I wrote the `find` tests using `float` instead. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18289#discussion_r1600281887 From jsjolen at openjdk.org Tue May 14 16:23:25 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Tue, 14 May 2024 16:23:25 GMT Subject: RFR: 8312132: Add tracking of multiple address spaces in NMT [v85] In-Reply-To: References: Message-ID: > Hi, > > This PR introduces a new abstraction to NMT, named `MemoryFileTracker`. Today, NMT does not track any memory outside of the virtual memory address space. This means that if you allocated memory in something such as a memory-backed file and use `mmap` to map into that memory, then you'll have trouble reporting this to NMT. This is the situation that ZGC is in, and that is what this patch attempts to fix. > > ## `MemoryFileTracker` > > The `MemoryFileTracker` adds the ability of adding new virtual memory address spaces to NMT and committing memory to these, the basic API is: > > ```c++ > static MemoryFile* make_device(const char* descriptive_name); > static void free_device(MemoryFile* device); > > static void allocate_memory(MemoryFile* device, size_t offset, size_t size, > MEMFLAGS flag, const NativeCallStack& stack); > static void free_memory(MemoryFile* device, size_t offset, size_t size); > > > It is easiest to see how this is used by looking at what ZGC's `ZNMT` class does: > > ```c++ > void ZNMT::reserve(zaddress_unsafe start, size_t size) { > MemTracker::record_virtual_memory_reserve((address)start, size, CALLER_PC, mtJavaHeap); > } > void ZNMT::commit(zoffset offset, size_t size) { > MemTracker::allocate_memory_in(ZNMT::_device, static_cast(offset), size, mtJavaHeap, CALLER_PC); > } > void ZNMT::uncommit(zoffset offset, size_t size) { > MemTracker::free_memory_in(ZNMT::_device, (size_t)offset, size); > } > > void ZNMT::map(zaddress_unsafe addr, size_t size, zoffset offset) { > // NMT doesn't track mappings at the moment. > } > void ZNMT::unmap(zaddress_unsafe addr, size_t size) { > // NMT doesn't track mappings at the moment. > } > > > As you can see, any mapping between reserved regions and device-allocated memory is not recorded in NMT. This means that in detailed mode you only get reserved regions printed for the reserved memory, the device-allocated memory is reported separately. When performing summary reporting any memory allocated via these devices is added to the corresponding `MEMFLAGS` as `committed` memory. > > This patch is also acting as a base on which we deploy multiple new backend ideas to NMT. These ideas are: > > 1. Implement VMA tracking using a balanced binary tree approach. Today's `VirtualMemoryTracker`'s usage of linked lists is slow and brittle, we'd like to move away from it. Our Treap-based approach in this patch gives a performance boost such that we see 25x better performance in a benchmark. The idea and draft of this... Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: Test closest_leq ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18289/files - new: https://git.openjdk.org/jdk/pull/18289/files/fd953805..8168b388 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18289&range=84 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18289&range=83-84 Stats: 23 lines in 1 file changed: 23 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/18289.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18289/head:pull/18289 PR: https://git.openjdk.org/jdk/pull/18289 From jsjolen at openjdk.org Tue May 14 16:23:26 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Tue, 14 May 2024 16:23:26 GMT Subject: RFR: 8312132: Add tracking of multiple address spaces in NMT [v74] In-Reply-To: References: <36UojYkDr9uWmWb_n6ilASrGqOGDuRDsGbfclma5fKQ=.546479d7-96e0-493c-918e-1860de1200e9@github.com> <3PMzmANSCkXiAHX1DgXrTrOgSM9dwzshODQWL99Hlt0=.60c3162b-6df4-4166-bc04-602fe8c83c10@github.com> Message-ID: On Mon, 13 May 2024 12:51:45 GMT, Thomas Stuefe wrote: >> Afshin mentioned this too, I believe, but I want to push back here. I prefer showing what we're doing here (simple comparison) rather than hiding it behind a utility function. Requires less jumping around when reading unknown code. > > Tiny utility functions like this are effectively resolved in modern C++ IDEs like CDS. (In stark contrast to templates, which make IDEs very confused). And a clear name provides safety against accidental typos. > > I leave this up to you. Thanks, leaving them as they are. Even with jump-to-def, it bothers me :-). I might be the only one. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18289#discussion_r1600322153 From jsjolen at openjdk.org Tue May 14 16:23:26 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Tue, 14 May 2024 16:23:26 GMT Subject: RFR: 8312132: Add tracking of multiple address spaces in NMT [v74] In-Reply-To: References: Message-ID: <5JxNCZ3qXJqrt3U3IShrA5mxwWT3HSfoG0GVAMdexFQ=.248a224e-cb98-422b-b0f3-c4e14572bc6d@github.com> On Tue, 14 May 2024 15:51:50 GMT, Johan Sj?len wrote: >> Please also test the scoped find function with different sets (eg. empty set, 1 item set etc). > > I've added a fair amount of tests to this code now. I still need to add some more for iterators, at least the closest_leq. Most of the tests use `int` (so not `size_t`), I wrote the `find` tests using `float` instead. OK, very basic closest_leq tests added. For verify_self, I've removed all calls in the treap-code itself and let a small hammering test run the verify_self code. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18289#discussion_r1600320789 From cjplummer at openjdk.org Tue May 14 17:54:05 2024 From: cjplummer at openjdk.org (Chris Plummer) Date: Tue, 14 May 2024 17:54:05 GMT Subject: RFR: 8328083: degrade virtual thread support for GetObjectMonitorUsage [v3] In-Reply-To: References: <-lAT5GzHVRrOUJhhMLfV5CkkPA3DHHDUZfdE7CBOcHg=.ecb91a2d-82c2-4e94-a1f6-f84d7a8c2a87@github.com> <2A25kL9oqh30aBRofiekO9CwmSwgEZ5LEcReUEfmxrQ=.eec2eaf8-dc9a-4a0d-bb42-d9f192f72fb2@github.com> <2lhm2l4CzUnyStTj215njaZg9EcMwwKWxMxtdZTXD8I=.ba8b1275-f16c-4af4-80e5-81ace9b40aa2@github.com> Message-ID: On Fri, 3 May 2024 10:42:45 GMT, Serguei Spitsyn wrote: >> ...and there are also comments above with this issue. > >> expEnteringCount/expWaitingCount contain the tested patterns. > > I kind of disagree. > Please, take look at the loop below: > > for (int i = 0; i < NUMBER_OF_WAITING_THREADS; i++) { > expEnteringCount = isVirtual ? 0 : NUMBER_OF_ENTERING_THREADS + i + 1; > expWaitingCount = isVirtual ? 0 : NUMBER_OF_WAITING_THREADS - i - 1; > lockCheck.notify(); // notify waiting threads one by one > // now the notified WaitingTask has to be blocked on the lockCheck re-enter > > // entry count: 1 > // count of threads waiting to enter: NUMBER_OF_ENTERING_THREADS > // count of threads waiting to re-enter: i + 1 > // count of threads waiting to be notified: NUMBER_OF_WAITING_THREADS - i - 1 > check(lockCheck, expOwnerThread(), expEntryCount(), > expEnteringCount, > expWaitingCount); > } > > The comment fixed as you suggest does not look useful anymore as the tested pattern is lost: > > // entry count: expOwnerThread() > // count of threads waiting to enter: expEnteringCount > // count of threads waiting to re-enter: expEntryCount() > // count of threads waiting to be notified: expWaitingCount > check(lockCheck, expOwnerThread(), expEntryCount(), > expEnteringCount, > expWaitingCount); > } > > > I understand your concern but your suggestion is not that good. > We could remove these comments but the tested pattern will be thrown away with the comments. > Would it help if we add clarifications that the comments are correct for platform threads only? I don't understand the issue with the updated commented. It is precisely telling you what the expected "count" values should be. If you leave the macros in the comment, then the comment is wrong for virtual threads. If you want to keep the macros in the comment, you need to add something like "... or 0 for virtual threads". BTW, the "re-enter" comment should continue to be "i + 1". I'm not sure why it was changed to "expEntryCount()". ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19030#discussion_r1600436569 From mcimadamore at openjdk.org Tue May 14 18:10:28 2024 From: mcimadamore at openjdk.org (Maurizio Cimadamore) Date: Tue, 14 May 2024 18:10:28 GMT Subject: RFR: 8331671: Implement JEP 472: Prepare to Restrict the Use of JNI [v4] In-Reply-To: References: Message-ID: > This PR implements [JEP 472](https://openjdk.org/jeps/472), by restricting the use of JNI in the following ways: > > * `System::load` and `System::loadLibrary` are now restricted methods > * `Runtime::load` and `Runtime::loadLibrary` are now restricted methods > * binding a JNI `native` method declaration to a native implementation is now considered a restricted operation > > This PR slightly changes the way in which the JDK deals with restricted methods, even for FFM API calls. In Java 22, the single `--enable-native-access` was used both to specify a set of modules for which native access should be allowed *and* to specify whether illegal native access (that is, native access occurring from a module not specified by `--enable-native-access`) should be treated as an error or a warning. More specifically, an error is only issued if the `--enable-native-access flag` is used at least once. > > Here, a new flag is introduced, namely `illegal-native-access=allow/warn/deny`, which is used to specify what should happen when access to a restricted method and/or functionality is found outside the set of modules specified with `--enable-native-access`. The default policy is `warn`, but users can select `allow` to suppress the warnings, or `deny` to cause `IllegalCallerException` to be thrown. This aligns the treatment of restricted methods with other mechanisms, such as `--illegal-access` and the more recent `--sun-misc-unsafe-memory-access`. > > Some changes were required in the package-info javadoc for `java.lang.foreign`, to reflect the changes in the command line flags described above. Maurizio Cimadamore has updated the pull request incrementally with two additional commits since the last revision: - Address review comments Improve warning for JNI methods, similar to what's described in JEP 472 Beef up tests - Address review comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19213/files - new: https://git.openjdk.org/jdk/pull/19213/files/bad10942..0d21bf99 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19213&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19213&range=02-03 Stats: 84 lines in 15 files changed: 42 ins; 14 del; 28 mod Patch: https://git.openjdk.org/jdk/pull/19213.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19213/head:pull/19213 PR: https://git.openjdk.org/jdk/pull/19213 From shade at openjdk.org Tue May 14 18:24:23 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 14 May 2024 18:24:23 GMT Subject: RFR: 8331572: Allow using OopMapCache outside of STW GC phases Message-ID: As the reproducer in the issue shows, we would also like to use the `OopMapCache` during the concurrent GC phases. Zhengyu mentions there is also a production problem for stack walking that would benefit from letting `OopMapCache` be used without looking at GC at all. This PR unblocks `OopMapCache` uses for everything. Cleanups are nominally done by service thread. But, still appreciating that majority of use cases would be from GCs, we leave the proactive cleanups from the GC ops here as well. It requires the synchronization between readers that might be copying out the entries out of the hashmap and the concurrent reclamation. Handily, `GlobalCounter` can be used for that purpose. After this lands, I think we can go over `OopMapCache::compute_one_oop_map` uses and see if they would instead like to use the cached `lookup` to benefit from this cache too. I think those paths are for OSR and deopts, so their performance is unlikely to be critical. This PR already covers the concurrent GC paths well. Additional testing: - [x] Performance test reproducer from the bug improves significantly - [x] Linux AArch64 server fastdebug, `hotspot_gc_shenandoah` (10x) - [ ] Linux AArch64 server fastdebug, `all` - [x] Linux x86_64 server fastdebug, `all` ------------- Commit messages: - Avoid double handle-izing on GC critical path - Move critical section to a closer scope - Comments - Fix Changes: https://git.openjdk.org/jdk/pull/19229/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19229&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8331572 Stats: 80 lines in 8 files changed: 42 ins; 13 del; 25 mod Patch: https://git.openjdk.org/jdk/pull/19229.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19229/head:pull/19229 PR: https://git.openjdk.org/jdk/pull/19229 From shade at openjdk.org Tue May 14 18:24:23 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 14 May 2024 18:24:23 GMT Subject: RFR: 8331572: Allow using OopMapCache outside of STW GC phases In-Reply-To: References: Message-ID: On Tue, 14 May 2024 12:31:08 GMT, Aleksey Shipilev wrote: > As the reproducer in the issue shows, we would also like to use the `OopMapCache` during the concurrent GC phases. Zhengyu mentions there is also a production problem for stack walking that would benefit from letting `OopMapCache` be used without looking at GC at all. > > This PR unblocks `OopMapCache` uses for everything. Cleanups are nominally done by service thread. But, still appreciating that majority of use cases would be from GCs, we leave the proactive cleanups from the GC ops here as well. It requires the synchronization between readers that might be copying out the entries out of the hashmap and the concurrent reclamation. Handily, `GlobalCounter` can be used for that purpose. > > After this lands, I think we can go over `OopMapCache::compute_one_oop_map` uses and see if they would instead like to use the cached `lookup` to benefit from this cache too. I think those paths are for OSR and deopts, so their performance is unlikely to be critical. This PR already covers the concurrent GC paths well. > > Additional testing: > - [x] Performance test reproducer from the bug improves significantly > - [x] Linux AArch64 server fastdebug, `hotspot_gc_shenandoah` (10x) > - [ ] Linux AArch64 server fastdebug, `all` > - [x] Linux x86_64 server fastdebug, `all` Performance note: there is an intrinsic tradeoff here between the cost of acquiring the critical section vs the concurrency it unblocks for non-STW GCs and the cache improvements on non-GC paths. The critical section overhead is mostly due to the fence in https://github.com/openjdk/jdk/blob/5a4415a6bddb25cbd5b87ff8ad1a06179c2e452e/src/hotspot/share/utilities/globalCounter.inline.hpp#L43 So, the original reproducer (very stressy, with lots of interpreter frames) improves dramatically (73 -> 6ms) with Shenandoah GC, but run with Serial GC reveals there is a slight regression in GC times (74 -> 79 ms). I have not been able to replicate this regression in larger benchmarks. Anyhow, this very fine-grained regression nearly disappears (74.1 -> 74.3 ms on Serial) if we optimize the other part of this whole path a bit, done in this PR: https://github.com/openjdk/jdk/pull/19229/commits/455687addeba55dc998dbf9ab4b8ec58f0b69ee4. This also improves Shenandoah times further (6.1 -> 5.6 ms). ------------- PR Comment: https://git.openjdk.org/jdk/pull/19229#issuecomment-2110429057 From cjplummer at openjdk.org Tue May 14 20:12:03 2024 From: cjplummer at openjdk.org (Chris Plummer) Date: Tue, 14 May 2024 20:12:03 GMT Subject: RFR: 8332112: Update nsk.share.Log to don't print summary during VM shutdown hook In-Reply-To: <08rpfgXgTS5RvsqbnwgKdUKo3ADDDGuieSJclVz7leg=.28cded8e-3d2e-4fab-92f6-be89f7ddc6ce@github.com> References: <08rpfgXgTS5RvsqbnwgKdUKo3ADDDGuieSJclVz7leg=.28cded8e-3d2e-4fab-92f6-be89f7ddc6ce@github.com> Message-ID: On Sun, 12 May 2024 21:34:41 GMT, Leonid Mesnik wrote: > The nsk.share.Log doing some cleanup and reporting errors in the cleanup method. This method is supposed to be executed by finalizer originally. However, now it is called only during shutdown hook. > The cleanup using Cleaner doesn't work. See https://bugs.openjdk.org/browse/JDK-8330760 > > The cleanup() method flush stream and print summary which should be already printed by complain method. > > This cleanup is not necessary and printing summary usually is just disabled. It is enabled if the test called 'complain' method. However, the error should have been printed already in this method. > > So it would be simple to remove this cleanup and reduce usage of Finalizable in vmTestbase tests. > > Note: The 'verboseOnErrorEnabled' is just not used. > > See isVerboseOnErrorEnabled. > > public boolean isVerboseOnErrorEnabled() { > return errorsSummaryEnabled; > } > > > Tested with by running tests with different combinations (tier4-7) and tier1. Copyrights needs updating. test/hotspot/jtreg/vmTestbase/nsk/share/Log.java line 587: > 585: * print a warning message first. > 586: */ > 587: private synchronized void printErrorsSummary() { There is a comment above that still references this method. ------------- PR Review: https://git.openjdk.org/jdk/pull/19209#pullrequestreview-2056296736 PR Review Comment: https://git.openjdk.org/jdk/pull/19209#discussion_r1600577870 From lmesnik at openjdk.org Tue May 14 22:19:19 2024 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Tue, 14 May 2024 22:19:19 GMT Subject: RFR: 8332112: Update nsk.share.Log to don't print summary during VM shutdown hook [v2] In-Reply-To: <08rpfgXgTS5RvsqbnwgKdUKo3ADDDGuieSJclVz7leg=.28cded8e-3d2e-4fab-92f6-be89f7ddc6ce@github.com> References: <08rpfgXgTS5RvsqbnwgKdUKo3ADDDGuieSJclVz7leg=.28cded8e-3d2e-4fab-92f6-be89f7ddc6ce@github.com> Message-ID: <_SHejzXr7Vq7lwN3XI9-a1ZXsyfKrULFfPVHruXAGkQ=.672bceab-a5fa-4664-9030-dee17113aafc@github.com> > The nsk.share.Log doing some cleanup and reporting errors in the cleanup method. This method is supposed to be executed by finalizer originally. However, now it is called only during shutdown hook. > The cleanup using Cleaner doesn't work. See https://bugs.openjdk.org/browse/JDK-8330760 > > The cleanup() method flush stream and print summary which should be already printed by complain method. > > This cleanup is not necessary and printing summary usually is just disabled. It is enabled if the test called 'complain' method. However, the error should have been printed already in this method. > > So it would be simple to remove this cleanup and reduce usage of Finalizable in vmTestbase tests. > > Note: The 'verboseOnErrorEnabled' is just not used. > > See isVerboseOnErrorEnabled. > > public boolean isVerboseOnErrorEnabled() { > return errorsSummaryEnabled; > } > > > Tested with by running tests with different combinations (tier4-7) and tier1. Leonid Mesnik has updated the pull request incrementally with one additional commit since the last revision: fixed after comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19209/files - new: https://git.openjdk.org/jdk/pull/19209/files/68b20e65..69ffd5b7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19209&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19209&range=00-01 Stats: 27 lines in 25 files changed: 0 ins; 3 del; 24 mod Patch: https://git.openjdk.org/jdk/pull/19209.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19209/head:pull/19209 PR: https://git.openjdk.org/jdk/pull/19209 From cjplummer at openjdk.org Tue May 14 22:36:01 2024 From: cjplummer at openjdk.org (Chris Plummer) Date: Tue, 14 May 2024 22:36:01 GMT Subject: RFR: 8332112: Update nsk.share.Log to don't print summary during VM shutdown hook [v2] In-Reply-To: <_SHejzXr7Vq7lwN3XI9-a1ZXsyfKrULFfPVHruXAGkQ=.672bceab-a5fa-4664-9030-dee17113aafc@github.com> References: <08rpfgXgTS5RvsqbnwgKdUKo3ADDDGuieSJclVz7leg=.28cded8e-3d2e-4fab-92f6-be89f7ddc6ce@github.com> <_SHejzXr7Vq7lwN3XI9-a1ZXsyfKrULFfPVHruXAGkQ=.672bceab-a5fa-4664-9030-dee17113aafc@github.com> Message-ID: On Tue, 14 May 2024 22:19:19 GMT, Leonid Mesnik wrote: >> The nsk.share.Log doing some cleanup and reporting errors in the cleanup method. This method is supposed to be executed by finalizer originally. However, now it is called only during shutdown hook. >> The cleanup using Cleaner doesn't work. See https://bugs.openjdk.org/browse/JDK-8330760 >> >> The cleanup() method flush stream and print summary which should be already printed by complain method. >> >> This cleanup is not necessary and printing summary usually is just disabled. It is enabled if the test called 'complain' method. However, the error should have been printed already in this method. >> >> So it would be simple to remove this cleanup and reduce usage of Finalizable in vmTestbase tests. >> >> Note: The 'verboseOnErrorEnabled' is just not used. >> >> See isVerboseOnErrorEnabled. >> >> public boolean isVerboseOnErrorEnabled() { >> return errorsSummaryEnabled; >> } >> >> >> Tested with by running tests with different combinations (tier4-7) and tier1. > > Leonid Mesnik has updated the pull request incrementally with one additional commit since the last revision: > > fixed after comments Marked as reviewed by cjplummer (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/19209#pullrequestreview-2056579895 From sspitsyn at openjdk.org Tue May 14 23:16:02 2024 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Tue, 14 May 2024 23:16:02 GMT Subject: RFR: 8328083: degrade virtual thread support for GetObjectMonitorUsage [v3] In-Reply-To: References: <-lAT5GzHVRrOUJhhMLfV5CkkPA3DHHDUZfdE7CBOcHg=.ecb91a2d-82c2-4e94-a1f6-f84d7a8c2a87@github.com> <2A25kL9oqh30aBRofiekO9CwmSwgEZ5LEcReUEfmxrQ=.eec2eaf8-dc9a-4a0d-bb42-d9f192f72fb2@github.com> <2lhm2l4CzUnyStTj215njaZg9EcMwwKWxMxtdZTXD8I=.ba8b1275-f16c-4af4-80e5-81ace9b40aa2@github.com> Message-ID: On Tue, 14 May 2024 17:51:03 GMT, Chris Plummer wrote: >>> expEnteringCount/expWaitingCount contain the tested patterns. >> >> I kind of disagree. >> Please, take look at the loop below: >> >> for (int i = 0; i < NUMBER_OF_WAITING_THREADS; i++) { >> expEnteringCount = isVirtual ? 0 : NUMBER_OF_ENTERING_THREADS + i + 1; >> expWaitingCount = isVirtual ? 0 : NUMBER_OF_WAITING_THREADS - i - 1; >> lockCheck.notify(); // notify waiting threads one by one >> // now the notified WaitingTask has to be blocked on the lockCheck re-enter >> >> // entry count: 1 >> // count of threads waiting to enter: NUMBER_OF_ENTERING_THREADS >> // count of threads waiting to re-enter: i + 1 >> // count of threads waiting to be notified: NUMBER_OF_WAITING_THREADS - i - 1 >> check(lockCheck, expOwnerThread(), expEntryCount(), >> expEnteringCount, >> expWaitingCount); >> } >> >> The comment fixed as you suggest does not look useful anymore as the tested pattern is lost: >> >> // entry count: expOwnerThread() >> // count of threads waiting to enter: expEnteringCount >> // count of threads waiting to re-enter: expEntryCount() >> // count of threads waiting to be notified: expWaitingCount >> check(lockCheck, expOwnerThread(), expEntryCount(), >> expEnteringCount, >> expWaitingCount); >> } >> >> >> I understand your concern but your suggestion is not that good. >> We could remove these comments but the tested pattern will be thrown away with the comments. >> Would it help if we add clarifications that the comments are correct for platform threads only? > > I don't understand the issue with the updated commented. It is precisely telling you what the expected "count" values should be. If you leave the macros in the comment, then the comment is wrong for virtual threads. If you want to keep the macros in the comment, you need to add something like "... or 0 for virtual threads". > > BTW, the "re-enter" comment should continue to be "i + 1". I'm not sure why it was changed to "expEntryCount()". Okay, please, let me explain this one more time. The original comments before method `check()` calls describe the testing scenario but not the numbers expected to be returned by the JVMTI `GetObjectMonitorUsage`. For instance, if the testing scenario says: "count of threads waiting to enter: NUMBER_OF_ENTERING_THREADS" then it means there is a real number of these threads waiting to enter the monitor. And it does not matter if they are platform or virtual threads. They are really waiting to enter the monitor. However, the JVMTI `GetObjectMonitorUsage` won't include virtual threads into the returned results. Now, I'm suggesting to add the following header for comments before each `check()` method call: + // The numbers below describe the testing scenario, not the expected results. + // The expected numbers are different for virtual threads because + // they are not supported by JVMTI GetObjectMonitorUsage. Would it work for you? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19030#discussion_r1600769143 From sspitsyn at openjdk.org Tue May 14 23:22:03 2024 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Tue, 14 May 2024 23:22:03 GMT Subject: RFR: 8328083: degrade virtual thread support for GetObjectMonitorUsage [v3] In-Reply-To: References: <-lAT5GzHVRrOUJhhMLfV5CkkPA3DHHDUZfdE7CBOcHg=.ecb91a2d-82c2-4e94-a1f6-f84d7a8c2a87@github.com> <2A25kL9oqh30aBRofiekO9CwmSwgEZ5LEcReUEfmxrQ=.eec2eaf8-dc9a-4a0d-bb42-d9f192f72fb2@github.com> <2lhm2l4CzUnyStTj215njaZg9EcMwwKWxMxtdZTXD8I=.ba8b1275-f16c-4af4-80e5-81ace9b40aa2@github.com> Message-ID: On Tue, 14 May 2024 23:13:28 GMT, Serguei Spitsyn wrote: >> I don't understand the issue with the updated commented. It is precisely telling you what the expected "count" values should be. If you leave the macros in the comment, then the comment is wrong for virtual threads. If you want to keep the macros in the comment, you need to add something like "... or 0 for virtual threads". >> >> BTW, the "re-enter" comment should continue to be "i + 1". I'm not sure why it was changed to "expEntryCount()". > > Okay, please, let me explain this one more time. > The original comments before method `check()` calls describe the testing scenario but not the numbers expected to be returned by the JVMTI `GetObjectMonitorUsage`. > For instance, if the testing scenario says: "count of threads waiting to enter: NUMBER_OF_ENTERING_THREADS" then it means there is a real number of these threads waiting to enter the monitor. And it does not matter if they are platform or virtual threads. They are really waiting to enter the monitor. However, the JVMTI `GetObjectMonitorUsage` won't include virtual threads into the returned results. > > Now, I'm suggesting to add the following header for comments before each `check()` method call: > > + // The numbers below describe the testing scenario, not the expected results. > + // The expected numbers are different for virtual threads because > + // they are not supported by JVMTI GetObjectMonitorUsage. > > Would it work for you? > BTW, the "re-enter" comment should continue to be "i + 1". > I'm not sure why it was changed to "expEntryCount()". It depends on what are we trying to describe. We either describe the testing scenario (the number of threads doing something) or the expected results. I understood that you wanted to describe the results instead of the scenario. And then it becomes problematic to do so as you can see. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19030#discussion_r1600772051 From sspitsyn at openjdk.org Tue May 14 23:52:07 2024 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Tue, 14 May 2024 23:52:07 GMT Subject: RFR: 8328083: degrade virtual thread support for GetObjectMonitorUsage [v3] In-Reply-To: References: <-lAT5GzHVRrOUJhhMLfV5CkkPA3DHHDUZfdE7CBOcHg=.ecb91a2d-82c2-4e94-a1f6-f84d7a8c2a87@github.com> Message-ID: On Thu, 2 May 2024 21:50:26 GMT, Chris Plummer wrote: >> src/java.se/share/data/jdwp/jdwp.spec line 1622: >> >>> 1620: (threadObject owner "The platform thread owning this monitor, or nullptr " >>> 1621: "if owned` by a virtual thread or not owned.") >>> 1622: (int entryCount "The number of times the owning platform thread has entered the monitor.") >> >> See the comment I left for the JVMTI spec. We should be more complete in the explanation here, explaining how it is 0 for virtual threads. > > I don't think this has been resolved. Okay, thanks! Fixed now. The update is: --- a/src/java.se/share/data/jdwp/jdwp.spec +++ b/src/java.se/share/data/jdwp/jdwp.spec @@ -1619,9 +1619,11 @@ JDWP "Java(tm) Debug Wire Protocol" (Reply (threadObject owner "The platform thread owning this monitor, or null " "if owned by a virtual thread or not owned.") - (int entryCount "The number of times the owning platform thread has entered the monitor.") + (int entryCount "The number of times the owning platform thread has entered the monitor, " + "or 0 if owned by a virtual thread or not owned.") (Repeat waiters "The total number of platform threads that are waiting to enter or re-enter " - "the monitor, or waiting to be notified by the monitor." + "the monitor, or waiting to be notified by the monitor, or 0 if " + "only virtual threads are waiting or no threads are waiting." (threadObject thread "A platform thread waiting for this monitor.") ) ) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19030#discussion_r1600779564 From sspitsyn at openjdk.org Tue May 14 23:52:09 2024 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Tue, 14 May 2024 23:52:09 GMT Subject: RFR: 8328083: degrade virtual thread support for GetObjectMonitorUsage [v3] In-Reply-To: References: <-lAT5GzHVRrOUJhhMLfV5CkkPA3DHHDUZfdE7CBOcHg=.ecb91a2d-82c2-4e94-a1f6-f84d7a8c2a87@github.com> Message-ID: On Wed, 1 May 2024 20:45:58 GMT, Chris Plummer wrote: >> Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: >> >> review: tweaks in JVMTI and JDWP changes > > src/jdk.jdi/share/classes/com/sun/jdi/ObjectReference.java line 348: > >> 346: /** >> 347: * Returns a List containing a {@link ThreadReference} for >> 348: * each platform thread currently waiting for this object's monitor. > > You need to add "platform" a little below in the `@return` section. Okay, thanks! Fixed now. The update is: diff --git a/src/jdk.jdi/share/classes/com/sun/jdi/ObjectReference.java b/src/jdk.jdi/share/classes/com/sun/jdi/ObjectReference.java index 3f3490e84cd..affbf9f6c4c 100644 --- a/src/jdk.jdi/share/classes/com/sun/jdi/ObjectReference.java +++ b/src/jdk.jdi/share/classes/com/sun/jdi/ObjectReference.java @@ -355,7 +355,8 @@ Value invokeMethod(ThreadReference thread, Method method, * operation is supported. * * @return a List of {@link ThreadReference} objects. The list - * has zero length if no threads are waiting for the monitor. + * has zero length if no threads are waiting for the monitor, + * or only virtual threads are waiting for the monitor. * @throws java.lang.UnsupportedOperationException if the * target VM does not support this operation. * @throws IncompatibleThreadStateException if any ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19030#discussion_r1600786965 From sviswanathan at openjdk.org Tue May 14 23:54:10 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Tue, 14 May 2024 23:54:10 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v19] In-Reply-To: References: Message-ID: On Sat, 4 May 2024 19:35:21 GMT, Scott Gibbons wrote: >> Re-write the IndexOf code without the use of the pcmpestri instruction, only using AVX2 instructions. This change accelerates String.IndexOf on average 1.3x for AVX2. The benchmark numbers: >> >> >> Benchmark Score Latest >> StringIndexOf.advancedWithMediumSub 343.573 317.934 0.925375393x >> StringIndexOf.advancedWithShortSub1 1039.081 1053.96 1.014319384x >> StringIndexOf.advancedWithShortSub2 55.828 110.541 1.980027943x >> StringIndexOf.constantPattern 9.361 11.906 1.271872663x >> StringIndexOf.searchCharLongSuccess 4.216 4.218 1.000474383x >> StringIndexOf.searchCharMediumSuccess 3.133 3.216 1.02649218x >> StringIndexOf.searchCharShortSuccess 3.76 3.761 1.000265957x >> StringIndexOf.success 9.186 9.713 1.057369911x >> StringIndexOf.successBig 14.341 46.343 3.231504079x >> StringIndexOfChar.latin1_AVX2_String 6220.918 12154.52 1.953814533x >> StringIndexOfChar.latin1_AVX2_char 5503.556 5540.044 1.006629895x >> StringIndexOfChar.latin1_SSE4_String 6978.854 6818.689 0.977049957x >> StringIndexOfChar.latin1_SSE4_char 5657.499 5474.624 0.967675646x >> StringIndexOfChar.latin1_Short_String 7132.541 6863.359 0.962260014x >> StringIndexOfChar.latin1_Short_char 16013.389 16162.437 1.009307711x >> StringIndexOfChar.latin1_mixed_String 7386.123 14771.622 1.999915517x >> StringIndexOfChar.latin1_mixed_char 9901.671 9782.245 0.987938803 > > Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: > > Rearrange; add lambdas for clarity src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1589: > 1587: case 3: > 1588: case 4: > 1589: __ movl(needleVal, Address(needle, offsetOfFirstByteToCompare)); If the size of the needle is 7 and it is an LL case with NUMBER_OF_NEEDLE_BYTES_TO_COMPARE set as 3: bytesLeftToCompare = 4 (i.e. 7-3); offsetOfFirstByteToCompare = 2 (i.e. 3-1); the movl will be loading bytes 2,3,4,5 So we seem to be missing loading the last byte of the needle. Is that correct? src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1735: > 1733: // generated with 32 - (n - k + 1) bits set that ensures matches past the end of the original > 1734: // haystack do not get considered during compares. > 1735: // Mask is generated below with (n-k+1) bits set and not 32- (n-k+1) bits set. Also it will be helpful if we specify what is n and k. src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1784: > 1782: __ subq(tmp, haystack_len); > 1783: } > 1784: __ leaq(haystack, Address(rsp, tmp, Address::times_1)); This whole code is repeated in two places. Could be made into a function and used at both places. src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1838: > 1836: __ shrq(rax, 1); > 1837: } > 1838: We need to be consistent either use tzcntl, shrl, testl or tzcntq, shrq, testq. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1600787103 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1600760538 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1600489229 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1600765277 From sspitsyn at openjdk.org Tue May 14 23:56:14 2024 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Tue, 14 May 2024 23:56:14 GMT Subject: RFR: 8328083: degrade virtual thread support for GetObjectMonitorUsage [v4] In-Reply-To: <-lAT5GzHVRrOUJhhMLfV5CkkPA3DHHDUZfdE7CBOcHg=.ecb91a2d-82c2-4e94-a1f6-f84d7a8c2a87@github.com> References: <-lAT5GzHVRrOUJhhMLfV5CkkPA3DHHDUZfdE7CBOcHg=.ecb91a2d-82c2-4e94-a1f6-f84d7a8c2a87@github.com> Message-ID: > The fix is to degrade virtual threads support in the JVM TI `GetObjectMonitorUsage` function so that it is specified to only return an owner when the owner is a platform thread. Also, virtual threads are not listed in the both `waiters` and `notify_waiters` lists returned in the `jvmtiMonitorUsage` structure. Java 19 re-specified a number of JVMTI functions and events for virtual threads, we missed this one. > > The main motivation for degrading it now is that the object monitor implementation is being updated to allow virtual threads unmount while owning monitors. It would add overhead to record monitor usage when freezing/unmount, overhead that couldn't be tied to a JVMTI capability as the capability can be enabled at any time. > > `GetObjectMonitorUsage` was broken for 20+ years ([8247972](https://bugs.openjdk.org/browse/JDK-8247972)) without bug reports so it seems unlikely that the function is widely used. Degrading it to only return an owner when the owner is a platform thread has no compatibility impact for tooling that uses it in conjunction with `HotSpot` thread dumps or `ThreadMXBean`. > > One other point about `GetObjectMonitorUsage` is that it pre-dates j.u.concurrent in Java 5 so it can't be used to get a full picture of the lock usage in a program. > > The specs of the impacted `JDWP ObjectReference.MonitorInfo` command and the JDI `ObjectReference` `ownerThread()`, `waitingThreads()` and `entryCount()` methods are updated to match the JVM TI spec. > > Also, please, review the related CSR and Release Note: > - CSR: [8331422](https://bugs.openjdk.org/browse/JDK-8331422): degrade virtual thread support for GetObjectMonitorUsage > - RN: [8331465](https://bugs.openjdk.org/browse/JDK-8331465): Release Note: degrade virtual thread support for GetObjectMonitorUsage > > Testing: > - tested impacted and updated tests locally > - tested with mach5 tiers 1-6 Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: review: 1. clarifications in JDWP and JDI spec; 2. clarifications in test comments. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19030/files - new: https://git.openjdk.org/jdk/pull/19030/files/e7c2d652..8438cf4a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19030&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19030&range=02-03 Stats: 29 lines in 3 files changed: 25 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/19030.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19030/head:pull/19030 PR: https://git.openjdk.org/jdk/pull/19030 From dholmes at openjdk.org Wed May 15 00:20:02 2024 From: dholmes at openjdk.org (David Holmes) Date: Wed, 15 May 2024 00:20:02 GMT Subject: RFR: 8325932: Replace ATTRIBUTE_NORETURN with direct [[noreturn]] In-Reply-To: References: Message-ID: On Thu, 15 Feb 2024 09:10:51 GMT, Julian Waters wrote: > With clang 13 being the minimum required JDK-8325878, the noreturn bug that requires the ATTRIBUTE_NORETURN workaround now vanishes, and we can use [[noreturn]] directly within HotSpot. We should remove the workaround as soon as possible, given the chance If this now works fine for all compilers that is great! Thanks ------------- Marked as reviewed by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17868#pullrequestreview-2056671208 From jwaters at openjdk.org Wed May 15 00:26:08 2024 From: jwaters at openjdk.org (Julian Waters) Date: Wed, 15 May 2024 00:26:08 GMT Subject: RFR: 8325932: Replace ATTRIBUTE_NORETURN with direct [[noreturn]] In-Reply-To: References: Message-ID: <1TVgvFT1iI88N63pulM1WOCqMi1yTcRZRh21cHh1zQs=.606170ed-cb06-4626-b1ea-583343d931f2@github.com> On Thu, 15 Feb 2024 09:10:51 GMT, Julian Waters wrote: > With clang 13 being the minimum required JDK-8325878, the noreturn bug that requires the ATTRIBUTE_NORETURN workaround now vanishes, and we can use [[noreturn]] directly within HotSpot. We should remove the workaround as soon as possible, given the chance Thanks David and Kim for the reviews :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/17868#issuecomment-2111368153 From jwaters at openjdk.org Wed May 15 00:26:09 2024 From: jwaters at openjdk.org (Julian Waters) Date: Wed, 15 May 2024 00:26:09 GMT Subject: Integrated: 8325932: Replace ATTRIBUTE_NORETURN with direct [[noreturn]] In-Reply-To: References: Message-ID: On Thu, 15 Feb 2024 09:10:51 GMT, Julian Waters wrote: > With clang 13 being the minimum required JDK-8325878, the noreturn bug that requires the ATTRIBUTE_NORETURN workaround now vanishes, and we can use [[noreturn]] directly within HotSpot. We should remove the workaround as soon as possible, given the chance This pull request has now been integrated. Changeset: 7b4ba7f9 Author: Julian Waters URL: https://git.openjdk.org/jdk/commit/7b4ba7f90ab9ea5e1070c79534c587dad17d1bdd Stats: 73 lines in 5 files changed: 0 ins; 52 del; 21 mod 8325932: Replace ATTRIBUTE_NORETURN with direct [[noreturn]] Reviewed-by: kbarrett, dholmes ------------- PR: https://git.openjdk.org/jdk/pull/17868 From dholmes at openjdk.org Wed May 15 01:04:01 2024 From: dholmes at openjdk.org (David Holmes) Date: Wed, 15 May 2024 01:04:01 GMT Subject: RFR: 8331671: Implement JEP 472: Prepare to Restrict the Use of JNI [v4] In-Reply-To: References: Message-ID: On Tue, 14 May 2024 18:10:28 GMT, Maurizio Cimadamore wrote: >> This PR implements [JEP 472](https://openjdk.org/jeps/472), by restricting the use of JNI in the following ways: >> >> * `System::load` and `System::loadLibrary` are now restricted methods >> * `Runtime::load` and `Runtime::loadLibrary` are now restricted methods >> * binding a JNI `native` method declaration to a native implementation is now considered a restricted operation >> >> This PR slightly changes the way in which the JDK deals with restricted methods, even for FFM API calls. In Java 22, the single `--enable-native-access` was used both to specify a set of modules for which native access should be allowed *and* to specify whether illegal native access (that is, native access occurring from a module not specified by `--enable-native-access`) should be treated as an error or a warning. More specifically, an error is only issued if the `--enable-native-access flag` is used at least once. >> >> Here, a new flag is introduced, namely `illegal-native-access=allow/warn/deny`, which is used to specify what should happen when access to a restricted method and/or functionality is found outside the set of modules specified with `--enable-native-access`. The default policy is `warn`, but users can select `allow` to suppress the warnings, or `deny` to cause `IllegalCallerException` to be thrown. This aligns the treatment of restricted methods with other mechanisms, such as `--illegal-access` and the more recent `--sun-misc-unsafe-memory-access`. >> >> Some changes were required in the package-info javadoc for `java.lang.foreign`, to reflect the changes in the command line flags described above. > > Maurizio Cimadamore has updated the pull request incrementally with two additional commits since the last revision: > > - Address review comments > Improve warning for JNI methods, similar to what's described in JEP 472 > Beef up tests > - Address review comments Hotspot changes look good - notwithstanding discussion about properlty namespace placement. Manpage changes also look good. ------------- PR Review: https://git.openjdk.org/jdk/pull/19213#pullrequestreview-2056696636 From dholmes at openjdk.org Wed May 15 01:04:02 2024 From: dholmes at openjdk.org (David Holmes) Date: Wed, 15 May 2024 01:04:02 GMT Subject: RFR: 8331671: Implement JEP 472: Prepare to Restrict the Use of JNI [v3] In-Reply-To: References: Message-ID: On Mon, 13 May 2024 15:32:27 GMT, Alan Bateman wrote: >> Maurizio Cimadamore has updated the pull request incrementally with three additional commits since the last revision: >> >> - Fix another typo >> - Fix typo >> - Add more comments > > src/hotspot/share/runtime/arguments.cpp line 2271: > >> 2269: } else if (match_option(option, "--illegal-native-access=", &tail)) { >> 2270: if (!create_module_property("jdk.module.illegal.native.access", tail, InternalProperty)) { >> 2271: return JNI_ENOMEM; > > I think it would be helpful to get guidance on if this is the right way to add this system property, only because this one not a "module property". The configuration (WriteableProperty + InternalProperty) look right. So my recollection/understanding is that we use this mechanism to convert module-related `--` flags passed to the VM into system properties that the Java code can then read, but we set them up such that you are not allowed to specify them directly via `-D`. Is the question here whether this new property should be in the `jdk.module` namespace? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19213#discussion_r1600819327 From duke at openjdk.org Wed May 15 03:11:15 2024 From: duke at openjdk.org (duke) Date: Wed, 15 May 2024 03:11:15 GMT Subject: Withdrawn: 8328138: Optimize ArrayEquals on AArch64 & fix potential crash In-Reply-To: References: Message-ID: On Thu, 14 Mar 2024 06:21:56 GMT, Xiaowei Lu wrote: > Current implementation of ArrayEquals on AArch64 is quite complex, due to the variety of checks about alignment, tail processing, bus locking and so on. However, Modern Arm processors have eased such worries. Besides, we found crash when using lilliput. So we proposed to use a simple&straightforward flow of ArrayEquals. > With this simplified ArrayEquals, we observed performance gains on the latest arm platforms(Neoverse N1&N2) > Test case: org.openjdk.bench.java.util.ArraysEquals > > 1x vector length, 64-bit aligned array[0] > | Test Case | N1 | N2 | > |:----------------------:|:---------:|:---------:| > | testByteFalseBeginning | -21.42% | -13.37% | > | testByteFalseEnd | 25.79% | 27.45% | > | testByteFalseMid | 16.64% | 16.46% | > | testByteTrue | 12.39% | 24.66% | > | testCharFalseBeginning | -5.27% | -3.08% | > | testCharFalseEnd | 29.29% | 35.23% | > | testCharFalseMid | 15.13% | 19.34% | > | testCharTrue | 21.63% | 33.73% | > | Total | 11.77% | 17.55% | > > A key factor is to decide when we should utilize simd in array equals. An aggressive choice is to enable simd as long as array length exceeds vector length(8 words). The corresponding result is shown above, from which we can see performance regression in both testBeginning cases. To avoid such perf impact, we can set simd threshold to 3x vector length. > > 3x vector length, 64-bit aligned array[0] > | | n1 | n2 | > |:----------------------:|:---------:|:---------:| > | testByteFalseBeginning | 8.28% | 8.64% | > | testByteFalseEnd | 6.38% | 12.29% | > | testByteFalseMid | 6.17% | 7.96% | > | testByteTrue | -10.08% | 3.06% | > | testCharFalseBeginning | -1.42% | 7.23% | > | testCharFalseEnd | 4.05% | 13.48% | > | testCharFalseMid | 8.79% | 16.96% | > | testCharTrue | -5.66% | 10.23% | > | Total | 2.06% | 9.98% | > > > In addtion to perf improvement, we propose this patch to solve alignment issues in array equals. JDK-8139457 tries to relax alignment of array elements. On the other hand, this misalignment makes it an error to read the whole last word in array equals, in case that the array doesn't occupy the whole word and lilliput is enabled. A detailed explaination quoted from [https://github.com/openjdk/jdk/pull/11044#issuecomment-1996771480](url) > >> The root cause is that default behavior of MacroAss... This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/18292 From sspitsyn at openjdk.org Wed May 15 06:03:07 2024 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Wed, 15 May 2024 06:03:07 GMT Subject: RFR: 8330969: scalability issue with loaded JVMTI agent [v2] In-Reply-To: References: Message-ID: On Fri, 10 May 2024 22:09:01 GMT, Daniel D. Daugherty wrote: > Perhaps this is not what Chris had in mind, but I'm wondering what happens in some > Thread-A when it is checked and passed by but then Thread-A sets the flag in itself > after the for-loop has passed it by. Does that Thread-A flag value get lost? Thank you for the question. The Thread-A sets the flag optimistically and then re-checks if `sync_protocol_enabled()` and any disabler exists. It can be global disbaler (`_VTMS_transition_disable_for_all_count > 0`) or disabler of `Thread-A` only (`java_lang_Thread::VTMS_transition_disable_count(vth()) > 0`). If any disabler exists then `Thread-A` clears the optimistic settings and goes with the pessimistic approach under protection of `JvmtiVTMSTransition_lock`. Please, let me know if you still have questions. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18937#discussion_r1600987604 From aboldtch at openjdk.org Wed May 15 06:08:05 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Wed, 15 May 2024 06:08:05 GMT Subject: RFR: 8329839: Cleanup ZPhysicalMemoryBacking trace logging In-Reply-To: References: Message-ID: <68k1ZMET5gmrf4ywoFDtQO6x2IbPKHd29ozeUw721LE=.a83e8485-8fcd-4654-b24f-ea4a1761f37f@github.com> On Mon, 8 Apr 2024 09:12:33 GMT, Axel Boldt-Christmas wrote: > On bsd the MB scaling is only performed on the length and not the base offset so the numbers printed are wrong. > > On all other platforms the `zoffset` type is used incorrectly and should use `zoffset_end` when printing offsets that point to the end of a range. Thanks for the reviews. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18671#issuecomment-2111651318 From aboldtch at openjdk.org Wed May 15 06:08:06 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Wed, 15 May 2024 06:08:06 GMT Subject: Integrated: 8329839: Cleanup ZPhysicalMemoryBacking trace logging In-Reply-To: References: Message-ID: On Mon, 8 Apr 2024 09:12:33 GMT, Axel Boldt-Christmas wrote: > On bsd the MB scaling is only performed on the length and not the base offset so the numbers printed are wrong. > > On all other platforms the `zoffset` type is used incorrectly and should use `zoffset_end` when printing offsets that point to the end of a range. This pull request has now been integrated. Changeset: c642f44b Author: Axel Boldt-Christmas URL: https://git.openjdk.org/jdk/commit/c642f44bbe1e4cdbc23496a34ddaae30990ce7c0 Stats: 6 lines in 3 files changed: 0 ins; 0 del; 6 mod 8329839: Cleanup ZPhysicalMemoryBacking trace logging Reviewed-by: stefank, ayang ------------- PR: https://git.openjdk.org/jdk/pull/18671 From alanb at openjdk.org Wed May 15 06:18:02 2024 From: alanb at openjdk.org (Alan Bateman) Date: Wed, 15 May 2024 06:18:02 GMT Subject: RFR: 8331671: Implement JEP 472: Prepare to Restrict the Use of JNI [v3] In-Reply-To: References: Message-ID: <-gTDhrDCjlq9pEoBxG4Qneo9dEf7ErWmvnyOZKGx4mM=.8772d4dd-aa5e-412c-8131-75687cddad5b@github.com> On Wed, 15 May 2024 00:54:43 GMT, David Holmes wrote: >> src/hotspot/share/runtime/arguments.cpp line 2271: >> >>> 2269: } else if (match_option(option, "--illegal-native-access=", &tail)) { >>> 2270: if (!create_module_property("jdk.module.illegal.native.access", tail, InternalProperty)) { >>> 2271: return JNI_ENOMEM; >> >> I think it would be helpful to get guidance on if this is the right way to add this system property, only because this one not a "module property". The configuration (WriteableProperty + InternalProperty) look right. > > So my recollection/understanding is that we use this mechanism to convert module-related `--` flags passed to the VM into system properties that the Java code can then read, but we set them up such that you are not allowed to specify them directly via `-D`. Is the question here whether this new property should be in the `jdk.module` namespace? That's my recollection too. The usage here isn' related to modules which makes me wonder if this function should be renamed (not by this PR of course) of if we should be using PropertyList_unique_add (with AddProperty, WriteableProperty, InternalProperty) instead. There will be further GNU style options coming that will likely need to map to an internal system property in the same way. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19213#discussion_r1601002132 From rehn at openjdk.org Wed May 15 06:50:08 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Wed, 15 May 2024 06:50:08 GMT Subject: RFR: 8326306: RISC-V: Re-structure MASM calls and jumps [v12] In-Reply-To: References: Message-ID: On Mon, 13 May 2024 10:20:30 GMT, Robbin Ehn wrote: >> Hi, please consider. >> >> We have code that directly use the asm for call/jumps instead masm. >> Our masm have a bit odd naming, and we don't use 'proper' pseudoinstructions/mnemonics. >> Suggested by [riscv-asm-manual](https://github.com/riscv-non-isa/riscv-asm-manual/tree/master) >> >> j offset jal x0, offset Jump >> jal offset jal x1, offset Jump and link >> jr rs jalr x0, rs, 0 Jump register >> jalr rs jalr x1, rs, 0 Jump and link register >> ret jalr x0, x1, 0 Return from subroutine >> call offset auipc x1, offset[31:12]; jalr x1, x1, offset[11:0] Call far-away subroutine >> tail offset auipc x6, offset[31:12]; jalr x0, x6, offset[11:0] Tail call far-away subroutine >> >> But these can only be implemented like this if you have small enough application. >> The fallback of these is to use GOT (your C compiler should place a copy of GOT every 2G so it's always reachable). >> We don't have GOT, instead we materialize, so there is still differences between these and ours. >> >> This patch: >> - Tries to follow these suggested mappings as good we can. >> - Make sure all jumps/calls go through MASM. (so we get control and can easily change for sites using a certain calling convention) >> - To avoid confusion between MASM public/private methods and ASM methods and the mnemonics there are some renaming. >> E.g. the mnemonics jal means call offset, as we can't use that so there is no 'jal'. >> - I enabled c.j, but right now we never generate it. >> - As always the macro does no good and are legacy from when code base did not use templates. (also the x-macros screws up my IDE (vim+rtags)) >> >> I started down this path due to I have followup patch on top of this which removes trampoline in favor for load-n-jump. >> (WIP: https://github.com/robehn/jdk/compare/jal-fixes...robehn:jdk:load-n-link?expand=1) >> While looking into our calls it was a bit confusing, this helps. >> >> Done a couple of t1-3 slightly different version of this patch, and as part of the followup, no issues found. (VF2, qemu, LP4) >> Re-running tests, had some last minute changes. >> >> Thanks, Robbin > > Robbin Ehn has updated the pull request incrementally with one additional commit since the last revision: > > Use la() instead movptr where ok. Passed t1 with RCC 2047M, t1 with default RCC rolling on fine. (we have very noise t1 at the moment) Any other takers? ------------- PR Comment: https://git.openjdk.org/jdk/pull/18942#issuecomment-2111708516 From stefank at openjdk.org Wed May 15 07:43:01 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Wed, 15 May 2024 07:43:01 GMT Subject: RFR: 8301464: Code in GenFullCP is still disabled after JDK-8079697 was fixed In-Reply-To: References: Message-ID: On Tue, 14 May 2024 03:05:27 GMT, xiaotaonan wrote: > Code in GenFullCP is still disabled after JDK-8079697 was fixed > note:I have not found any relevant information on why ClassWriter.COMPUTE_FRAMES is disabled in JDK-8079697. This is not related to GC code, could you remove the hotspot-gc label you added? ------------- PR Comment: https://git.openjdk.org/jdk/pull/19228#issuecomment-2111799762 From duke at openjdk.org Wed May 15 07:59:04 2024 From: duke at openjdk.org (ExE Boss) Date: Wed, 15 May 2024 07:59:04 GMT Subject: RFR: 8331671: Implement JEP 472: Prepare to Restrict the Use of JNI [v4] In-Reply-To: References: Message-ID: On Tue, 14 May 2024 18:10:28 GMT, Maurizio Cimadamore wrote: >> This PR implements [JEP 472](https://openjdk.org/jeps/472), by restricting the use of JNI in the following ways: >> >> * `System::load` and `System::loadLibrary` are now restricted methods >> * `Runtime::load` and `Runtime::loadLibrary` are now restricted methods >> * binding a JNI `native` method declaration to a native implementation is now considered a restricted operation >> >> This PR slightly changes the way in which the JDK deals with restricted methods, even for FFM API calls. In Java 22, the single `--enable-native-access` was used both to specify a set of modules for which native access should be allowed *and* to specify whether illegal native access (that is, native access occurring from a module not specified by `--enable-native-access`) should be treated as an error or a warning. More specifically, an error is only issued if the `--enable-native-access flag` is used at least once. >> >> Here, a new flag is introduced, namely `illegal-native-access=allow/warn/deny`, which is used to specify what should happen when access to a restricted method and/or functionality is found outside the set of modules specified with `--enable-native-access`. The default policy is `warn`, but users can select `allow` to suppress the warnings, or `deny` to cause `IllegalCallerException` to be thrown. This aligns the treatment of restricted methods with other mechanisms, such as `--illegal-access` and the more recent `--sun-misc-unsafe-memory-access`. >> >> Some changes were required in the package-info javadoc for `java.lang.foreign`, to reflect the changes in the command line flags described above. > > Maurizio Cimadamore has updated the pull request incrementally with two additional commits since the last revision: > > - Address review comments > Improve warning for JNI methods, similar to what's described in JEP 472 > Beef up tests > - Address review comments src/java.base/share/classes/java/lang/Module.java line 334: > 332: System.err.printf(""" > 333: WARNING: A native method in %s has been bound > 334: WARNING: %s has been called by %s in %s Note that this line is still not entirely correct, as for code like: // in module a: package a; import b.Foo; public class Foo { public static void main(String... args) { System.load("JNI library implementing Java_b_Bar_nativeMethod"); Bar.nativeMethod(); } } // in module b: package b; public class Bar { public static native void nativeMethod(); } It?ll?show?`Bar` as?the?caller of?`Bar::nativeMethod()`, even?though the?caller is?`Foo` in?this?case, which?is?why I?initially?suggested just?omitting the?caller from?**JNI** linkage?warnings. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19213#discussion_r1601140578 From jsjolen at openjdk.org Wed May 15 08:41:34 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Wed, 15 May 2024 08:41:34 GMT Subject: RFR: 8312132: Add tracking of multiple address spaces in NMT [v86] In-Reply-To: References: Message-ID: > Hi, > > This PR introduces a new abstraction to NMT, named `MemoryFileTracker`. Today, NMT does not track any memory outside of the virtual memory address space. This means that if you allocated memory in something such as a memory-backed file and use `mmap` to map into that memory, then you'll have trouble reporting this to NMT. This is the situation that ZGC is in, and that is what this patch attempts to fix. > > ## `MemoryFileTracker` > > The `MemoryFileTracker` adds the ability of adding new virtual memory address spaces to NMT and committing memory to these, the basic API is: > > ```c++ > static MemoryFile* make_device(const char* descriptive_name); > static void free_device(MemoryFile* device); > > static void allocate_memory(MemoryFile* device, size_t offset, size_t size, > MEMFLAGS flag, const NativeCallStack& stack); > static void free_memory(MemoryFile* device, size_t offset, size_t size); > > > It is easiest to see how this is used by looking at what ZGC's `ZNMT` class does: > > ```c++ > void ZNMT::reserve(zaddress_unsafe start, size_t size) { > MemTracker::record_virtual_memory_reserve((address)start, size, CALLER_PC, mtJavaHeap); > } > void ZNMT::commit(zoffset offset, size_t size) { > MemTracker::allocate_memory_in(ZNMT::_device, static_cast(offset), size, mtJavaHeap, CALLER_PC); > } > void ZNMT::uncommit(zoffset offset, size_t size) { > MemTracker::free_memory_in(ZNMT::_device, (size_t)offset, size); > } > > void ZNMT::map(zaddress_unsafe addr, size_t size, zoffset offset) { > // NMT doesn't track mappings at the moment. > } > void ZNMT::unmap(zaddress_unsafe addr, size_t size) { > // NMT doesn't track mappings at the moment. > } > > > As you can see, any mapping between reserved regions and device-allocated memory is not recorded in NMT. This means that in detailed mode you only get reserved regions printed for the reserved memory, the device-allocated memory is reported separately. When performing summary reporting any memory allocated via these devices is added to the corresponding `MEMFLAGS` as `committed` memory. > > This patch is also acting as a base on which we deploy multiple new backend ideas to NMT. These ideas are: > > 1. Implement VMA tracking using a balanced binary tree approach. Today's `VirtualMemoryTracker`'s usage of linked lists is slow and brittle, we'd like to move away from it. Our Treap-based approach in this patch gives a performance boost such that we see 25x better performance in a benchmark. The idea and draft of this... Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: Fix test ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18289/files - new: https://git.openjdk.org/jdk/pull/18289/files/8168b388..24134209 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18289&range=85 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18289&range=84-85 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/18289.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18289/head:pull/18289 PR: https://git.openjdk.org/jdk/pull/18289 From jsjolen at openjdk.org Wed May 15 08:57:49 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Wed, 15 May 2024 08:57:49 GMT Subject: RFR: 8312132: Add tracking of multiple address spaces in NMT [v87] In-Reply-To: References: Message-ID: <9nlekvr9PXPCeIEKL3g9uSrmrpQ_YVrxAywzRlql1-o=.0e07f283-0dc2-41ed-8fe2-9eeaa03133da@github.com> > Hi, > > This PR introduces a new abstraction to NMT, named `MemoryFileTracker`. Today, NMT does not track any memory outside of the virtual memory address space. This means that if you allocated memory in something such as a memory-backed file and use `mmap` to map into that memory, then you'll have trouble reporting this to NMT. This is the situation that ZGC is in, and that is what this patch attempts to fix. > > ## `MemoryFileTracker` > > The `MemoryFileTracker` adds the ability of adding new virtual memory address spaces to NMT and committing memory to these, the basic API is: > > ```c++ > static MemoryFile* make_device(const char* descriptive_name); > static void free_device(MemoryFile* device); > > static void allocate_memory(MemoryFile* device, size_t offset, size_t size, > MEMFLAGS flag, const NativeCallStack& stack); > static void free_memory(MemoryFile* device, size_t offset, size_t size); > > > It is easiest to see how this is used by looking at what ZGC's `ZNMT` class does: > > ```c++ > void ZNMT::reserve(zaddress_unsafe start, size_t size) { > MemTracker::record_virtual_memory_reserve((address)start, size, CALLER_PC, mtJavaHeap); > } > void ZNMT::commit(zoffset offset, size_t size) { > MemTracker::allocate_memory_in(ZNMT::_device, static_cast(offset), size, mtJavaHeap, CALLER_PC); > } > void ZNMT::uncommit(zoffset offset, size_t size) { > MemTracker::free_memory_in(ZNMT::_device, (size_t)offset, size); > } > > void ZNMT::map(zaddress_unsafe addr, size_t size, zoffset offset) { > // NMT doesn't track mappings at the moment. > } > void ZNMT::unmap(zaddress_unsafe addr, size_t size) { > // NMT doesn't track mappings at the moment. > } > > > As you can see, any mapping between reserved regions and device-allocated memory is not recorded in NMT. This means that in detailed mode you only get reserved regions printed for the reserved memory, the device-allocated memory is reported separately. When performing summary reporting any memory allocated via these devices is added to the corresponding `MEMFLAGS` as `committed` memory. > > This patch is also acting as a base on which we deploy multiple new backend ideas to NMT. These ideas are: > > 1. Implement VMA tracking using a balanced binary tree approach. Today's `VirtualMemoryTracker`'s usage of linked lists is slow and brittle, we'd like to move away from it. Our Treap-based approach in this patch gives a performance boost such that we see 25x better performance in a benchmark. The idea and draft of this... Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: Don't look at val, look at key ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18289/files - new: https://git.openjdk.org/jdk/pull/18289/files/24134209..5e453a99 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18289&range=86 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18289&range=85-86 Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/18289.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18289/head:pull/18289 PR: https://git.openjdk.org/jdk/pull/18289 From shade at openjdk.org Wed May 15 09:13:07 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 15 May 2024 09:13:07 GMT Subject: RFR: 8332082: Shenandoah: Use SATB active flag for C2 pre-write barrier on x86 and PPC In-Reply-To: References: Message-ID: On Fri, 10 May 2024 16:13:51 GMT, William Kemper wrote: > This is consistent with c1 and other platforms. Hold on. Here is the C2 SATB barrier check, which checks gc-state for `MARKING`: https://github.com/openjdk/jdk/blob/2f10a316ff0c5a4c124b94f6fabb38fb119d2c82/src/hotspot/share/gc/shenandoah/c2/shenandoahBarrierSetC2.cpp#L241-L243 What this PR changes is `ShenandoahBarrierSetAssembler::satb_write_barrier_impl`, which is the SATB barrier for generic assembly code. Yes, some of that may be reached from C2. Looking around, I see that C1 AArch64, RISC-V, x86 `ShenandoahBarrierSetAssembler::generate_c1_pre_barrier_runtime_stub`-s use `gc_state == MARKING` too: https://github.com/openjdk/jdk/blob/2f10a316ff0c5a4c124b94f6fabb38fb119d2c82/src/hotspot/cpu/aarch64/gc/shenandoah/shenandoahBarrierSetAssembler_aarch64.cpp#L695-L698 https://github.com/openjdk/jdk/blob/2f10a316ff0c5a4c124b94f6fabb38fb119d2c82/src/hotspot/cpu/riscv/gc/shenandoah/shenandoahBarrierSetAssembler_riscv.cpp#L642-L646 https://github.com/openjdk/jdk/blob/2f10a316ff0c5a4c124b94f6fabb38fb119d2c82/src/hotspot/cpu/x86/gc/shenandoah/shenandoahBarrierSetAssembler_x86.cpp#L226-L228 It looks to me the current status is: * Assembler: - AArch64, RISC-V: SATBMarkQueue.isActive - x86, PPC: gc_state == MARKING * C1 (IR): - All platforms: SATBMarkQueue.isActive * C1 (assembler stub): - All platforms: gc_state == MARKING * C2 (IR): - All platforms: gc-state == MARKING So, are we better off going the other way around, towards gc-state == MARKING? This would "only" need to rewrite AArch64, RISC-V parts in assembler, and shared C1 barrier part. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19180#issuecomment-2111971770 From amitkumar at openjdk.org Wed May 15 09:18:10 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Wed, 15 May 2024 09:18:10 GMT Subject: RFR: 8319947: Recursive lightweight locking: s390x implementation In-Reply-To: <_HXRejW4TcLfTYbXlRQUVyejaomiwx6HhObYfG1lX6E=.f89b29a0-4c2d-43e1-83be-d7bafd84816c@github.com> References: <_HXRejW4TcLfTYbXlRQUVyejaomiwx6HhObYfG1lX6E=.f89b29a0-4c2d-43e1-83be-d7bafd84816c@github.com> Message-ID: <42t_xgoBH7Jai8bxfGbKQtAm_wkX_DlisBgYJoMq94M=.10f51aae-4b45-43e5-8fe4-6acb2ca4df2e@github.com> On Sun, 21 Apr 2024 16:30:43 GMT, Amit Kumar wrote: > s390x port for recursive locking. > > testing: > - [x] build fastdebug-vm > - [x] build slowdebug-vm > - [x] build release-vm > - [x] build optimized-vm > - [x] ./test/jdk/java/util/concurrent (fastdebug-vm) > - [x] with C1 > - [x] with C2 > - [x] with interpreter > - [x] ./test/jdk/java/util/concurrent (release-vm) > - [x] with C1 > - [x] with C2 > - [x] with interpreter > - [x] ./test/jdk/java/util/concurrent (slowdebug-vm) > - [x] with C1 > - [x] with C2 > - [x] with interpreter > - [x] tier1 with fastdebug-vm > - [x] tier1 with slowdebug-vm > - [x] tier1 with release-vm > > *BenchMarks*: > > Without Patch: > > make test TEST="micro:vm.lang.LockUnlock" MICRO="JAVA_OPTIONS=-XX:LockingMode=1" > > Benchmark (innerCount) Mode Cnt Score Error Units > LockUnlock.testContendedLock 100 avgt 12 15.175 ? 2.071 ns/op > LockUnlock.testRecursiveLockUnlock 100 avgt 12 5412.677 ? 274.280 ns/op > LockUnlock.testRecursiveSynchronization 100 avgt 12 29.293 ? 2.802 ns/op > LockUnlock.testSerialLockUnlock 100 avgt 12 503.216 ? 8.764 ns/op > LockUnlock.testSimpleLockUnlock 100 avgt 12 508.809 ? 13.565 ns/op > Finished running test 'micro:vm.lang.LockUnlock' > > With Patch: > > Benchmark (innerCount) Mode Cnt Score Error Units > LockUnlock.testContendedLock 100 avgt 12 13.876 ? 1.561 ns/op > LockUnlock.testRecursiveLockUnlock 100 avgt 12 5323.962 ? 189.045 ns/op > LockUnlock.testRecursiveSynchronization 100 avgt 12 29.545 ? 2.313 ns/op > LockUnlock.testSerialLockUnlock 100 avgt 12 505.054 ? 5.920 ns/op > LockUnlock.testSimpleLockUnlock 100 avgt 12 502.929 ? 9.131 ns/op > Finished running test 'micro:vm.lang.LockUnlock' > > > > > > Without Patch: > > make test TEST="micro:vm.lang.LockUnlock" MICRO="JAVA_OPTIONS=-XX:LockingMode=2" > Benchmark (innerCount) Mode Cnt Score Error Units > LockUnlock.testContendedLock 100 avgt 12 14.961 ? 1.189 ns/op > LockUnlock.testRecursiveLockUnlock 100 avgt 12 16136.332 ? 1321.914 ns/op > LockUnlock.testRecursiveSynchronization 100 avgt 12 31.176 ? 1.357 ns/op > LockUnlock.testSerialLockUnlock 100 avgt 12 461.308 ? 23.610 ns/op > LockUnlock.testSimpleLockUnlock ... [JDK-8330849](https://bugs.openjdk.org/browse/JDK-8330849) adds `TestRecursiveMonitorChurn.java`. So rebasing the whole code to fetch that test. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18878#issuecomment-2081345733 From amitkumar at openjdk.org Wed May 15 09:18:10 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Wed, 15 May 2024 09:18:10 GMT Subject: RFR: 8319947: Recursive lightweight locking: s390x implementation Message-ID: <_HXRejW4TcLfTYbXlRQUVyejaomiwx6HhObYfG1lX6E=.f89b29a0-4c2d-43e1-83be-d7bafd84816c@github.com> s390x port for recursive locking. testing: - [x] build fastdebug-vm - [x] build slowdebug-vm - [x] build release-vm - [x] build optimized-vm - [x] ./test/jdk/java/util/concurrent (fastdebug-vm) - [x] with C1 - [x] with C2 - [x] with interpreter - [x] ./test/jdk/java/util/concurrent (release-vm) - [x] with C1 - [x] with C2 - [x] with interpreter - [x] ./test/jdk/java/util/concurrent (slowdebug-vm) - [x] with C1 - [x] with C2 - [x] with interpreter - [x] tier1 with fastdebug-vm - [x] tier1 with slowdebug-vm - [x] tier1 with release-vm *BenchMarks*: Without Patch: make test TEST="micro:vm.lang.LockUnlock" MICRO="JAVA_OPTIONS=-XX:LockingMode=1" Benchmark (innerCount) Mode Cnt Score Error Units LockUnlock.testContendedLock 100 avgt 12 15.175 ? 2.071 ns/op LockUnlock.testRecursiveLockUnlock 100 avgt 12 5412.677 ? 274.280 ns/op LockUnlock.testRecursiveSynchronization 100 avgt 12 29.293 ? 2.802 ns/op LockUnlock.testSerialLockUnlock 100 avgt 12 503.216 ? 8.764 ns/op LockUnlock.testSimpleLockUnlock 100 avgt 12 508.809 ? 13.565 ns/op Finished running test 'micro:vm.lang.LockUnlock' With Patch: Benchmark (innerCount) Mode Cnt Score Error Units LockUnlock.testContendedLock 100 avgt 12 13.876 ? 1.561 ns/op LockUnlock.testRecursiveLockUnlock 100 avgt 12 5323.962 ? 189.045 ns/op LockUnlock.testRecursiveSynchronization 100 avgt 12 29.545 ? 2.313 ns/op LockUnlock.testSerialLockUnlock 100 avgt 12 505.054 ? 5.920 ns/op LockUnlock.testSimpleLockUnlock 100 avgt 12 502.929 ? 9.131 ns/op Finished running test 'micro:vm.lang.LockUnlock' Without Patch: make test TEST="micro:vm.lang.LockUnlock" MICRO="JAVA_OPTIONS=-XX:LockingMode=2" Benchmark (innerCount) Mode Cnt Score Error Units LockUnlock.testContendedLock 100 avgt 12 14.961 ? 1.189 ns/op LockUnlock.testRecursiveLockUnlock 100 avgt 12 16136.332 ? 1321.914 ns/op LockUnlock.testRecursiveSynchronization 100 avgt 12 31.176 ? 1.357 ns/op LockUnlock.testSerialLockUnlock 100 avgt 12 461.308 ? 23.610 ns/op LockUnlock.testSimpleLockUnlock 100 avgt 12 479.421 ? 37.541 ns/op Finished running test 'micro:vm.lang.LockUnlock' With Patch: Benchmark (innerCount) Mode Cnt Score Error Units LockUnlock.testContendedLock 100 avgt 12 16.777 ? 1.543 ns/op LockUnlock.testRecursiveLockUnlock 100 avgt 12 7060.493 ? 3095.793 ns/op LockUnlock.testRecursiveSynchronization 100 avgt 12 28.437 ? 1.022 ns/op LockUnlock.testSerialLockUnlock 100 avgt 12 439.701 ? 6.002 ns/op LockUnlock.testSimpleLockUnlock 100 avgt 12 460.381 ? 23.340 ns/op Finished running test 'micro:vm.lang.LockUnlock' Another run with Patch: Benchmark (innerCount) Mode Cnt Score Error Units LockUnlock.testContendedLock 100 avgt 12 19.482 ? 3.550 ns/op LockUnlock.testRecursiveLockUnlock 100 avgt 12 5135.956 ? 687.021 ns/op LockUnlock.testRecursiveSynchronization 100 avgt 12 28.111 ? 1.083 ns/op LockUnlock.testSerialLockUnlock 100 avgt 12 440.351 ? 31.667 ns/op LockUnlock.testSimpleLockUnlock 100 avgt 12 436.257 ? 20.705 ns/op Finished running test 'micro:vm.lang.LockUnlock' PPC port for the same: https://github.com/openjdk/jdk/pull/16611 ------------- Commit messages: - Merge branch 'master' into recursive_locking_v1 - s390x recursive locking port Changes: https://git.openjdk.org/jdk/pull/18878/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=18878&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8319947 Stats: 551 lines in 9 files changed: 427 ins; 56 del; 68 mod Patch: https://git.openjdk.org/jdk/pull/18878.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18878/head:pull/18878 PR: https://git.openjdk.org/jdk/pull/18878 From rehn at openjdk.org Wed May 15 09:41:11 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Wed, 15 May 2024 09:41:11 GMT Subject: RFR: 8332265: RISC-V: Materialize pointers faster by using a temp register Message-ID: Hi, please consider! Materializing a 48-bit pointer, using an additional register, we can do with: lui + lui + slli + add + addi This 15% faster both on VF2 and in CPU models, compared to movptr(). As we often materialize during calls there is free registers. I have choose just a few spot to use it, many more can use. E.g. la() with tmp register can use li48 instead of movptr. Running tests now (so far so good), as if I screwed up IC calls it should be seen fast. And benchmarks when hardware is free. ------------- Commit messages: - li48 Changes: https://git.openjdk.org/jdk/pull/19246/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19246&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8332265 Stats: 168 lines in 6 files changed: 130 ins; 4 del; 34 mod Patch: https://git.openjdk.org/jdk/pull/19246.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19246/head:pull/19246 PR: https://git.openjdk.org/jdk/pull/19246 From mcimadamore at openjdk.org Wed May 15 09:59:05 2024 From: mcimadamore at openjdk.org (Maurizio Cimadamore) Date: Wed, 15 May 2024 09:59:05 GMT Subject: RFR: 8331671: Implement JEP 472: Prepare to Restrict the Use of JNI [v4] In-Reply-To: References: Message-ID: <_MDZPWLFa7qcrmsqMsXDJx6Y5lqfI3E4d6Z6-VKv79g=.ad216d38-b066-47d2-bfcd-31a64052015d@github.com> On Wed, 15 May 2024 07:55:27 GMT, ExE Boss wrote: > Note that this line is still not entirely correct, as for code like: You are correct - the message is however consistent with what written in JEP 472. I'll discuss with @pron ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19213#discussion_r1601335120 From eastigeevich at openjdk.org Wed May 15 09:59:12 2024 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Wed, 15 May 2024 09:59:12 GMT Subject: Integrated: 8332111: [BACKOUT] A way to align already compiled methods with compiler directives In-Reply-To: References: Message-ID: On Mon, 13 May 2024 13:03:26 GMT, Evgeny Astigeevich wrote: > Backout of [JDK-8309271](https://bugs.openjdk.org/browse/JDK-8309271) which has known bugs, possible bugs and performance issues. REDO work is tracked by [JDK-8331749](https://bugs.openjdk.org/browse/JDK-8331749). > > Found bugs: > - When refreshing `CompilerDirectivesAddDCmd::execute` will call `DirectivesStack::hasMatchingDirectives(mh, true)` which only considers the compiler directive which is on the top of the directives stack. As more than one directive can be added, `CompilerDirectivesAddDCmd::execute` will not behave as expected. > - A Java method with old directives might be in the compilation queue. A request to recompile it with new directives will be ignored. > > There are other concerns: bugs and performance issues. > > Possible bugs: > - `has_matching_directives` might not be cleared. A nmethod might get into the unloading state before `CodeCache::recompile_marked_directives_matches`. If the nmethod has been used to mark a Java method and it is the only nmethod, there will be no nmethod in CodeCache to reach the Java method to clear the mark. > - A Java method might have been compiled with new directives before `CodeCache::recompile_marked_directives_matches`. `CodeCache::recompile_marked_directives_matches` will recompile it again. > - JIT compiler might be compiling a Java method with old directives. A request to recompile it with new directives will be ignored. > > Performance issues: > - Usually directives are updated for a small number of Java methods. If CodeCache has thousands of nmethods, `CodeCache::recompile_marked_directives_matches` will be traversing nmethods most of which don't need recompilation. > > The backout is not clean because of removal of `CompiledMethod`. > > Tested with release and fastdebug builds: tier1 and tier2 passed. This pull request has now been integrated. Changeset: 1a944478 Author: Evgeny Astigeevich URL: https://git.openjdk.org/jdk/commit/1a944478a26a766f5a573a1236b642d8e7b0685c Stats: 380 lines in 15 files changed: 3 ins; 347 del; 30 mod 8332111: [BACKOUT] A way to align already compiled methods with compiler directives Reviewed-by: shade, kvn ------------- PR: https://git.openjdk.org/jdk/pull/19215 From eastigeevich at openjdk.org Wed May 15 09:59:11 2024 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Wed, 15 May 2024 09:59:11 GMT Subject: RFR: 8332111: [BACKOUT] A way to align already compiled methods with compiler directives In-Reply-To: References: Message-ID: On Mon, 13 May 2024 22:43:44 GMT, Vladimir Kozlov wrote: >> What if instead of backing out we will use an experimental JVM flag: `XX:+CompilerDirectivesRefreshSupport`? > >> What if instead of backing out we will use an experimental JVM flag: `XX:+CompilerDirectivesRefreshSupport`? > > I don't think this is correct way to fix the bug. Thank you, @vnkozlov @dchuyko @shipilev ------------- PR Comment: https://git.openjdk.org/jdk/pull/19215#issuecomment-2112072984 From mcimadamore at openjdk.org Wed May 15 10:37:24 2024 From: mcimadamore at openjdk.org (Maurizio Cimadamore) Date: Wed, 15 May 2024 10:37:24 GMT Subject: RFR: 8331671: Implement JEP 472: Prepare to Restrict the Use of JNI [v3] In-Reply-To: <-gTDhrDCjlq9pEoBxG4Qneo9dEf7ErWmvnyOZKGx4mM=.8772d4dd-aa5e-412c-8131-75687cddad5b@github.com> References: <-gTDhrDCjlq9pEoBxG4Qneo9dEf7ErWmvnyOZKGx4mM=.8772d4dd-aa5e-412c-8131-75687cddad5b@github.com> Message-ID: <3w0X9MH3A4P3lX6oIuONx-daTSVe3kWm8z2YWDbHNvg=.9a19ac2b-f46b-4d64-9cdd-f3e70dc3da20@github.com> On Wed, 15 May 2024 06:15:35 GMT, Alan Bateman wrote: >> So my recollection/understanding is that we use this mechanism to convert module-related `--` flags passed to the VM into system properties that the Java code can then read, but we set them up such that you are not allowed to specify them directly via `-D`. Is the question here whether this new property should be in the `jdk.module` namespace? > > That's my recollection too. The usage here isn' related to modules which makes me wonder if this function should be renamed (not by this PR of course) of if we should be using PropertyList_unique_add (with AddProperty, WriteableProperty, InternalProperty) instead. There will be further GNU style options coming that will likely need to map to an internal system property in the same way. I don't fully agree that this option is not module related (which is why I gave it that name). The very definition of illegal native access is related to native access occurring from a module that is outside a specific set. So I think it's 50/50 as to whether this option is module-related or not. Of course I can fix the code if there's something clearly better. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19213#discussion_r1601386336 From mcimadamore at openjdk.org Wed May 15 10:40:34 2024 From: mcimadamore at openjdk.org (Maurizio Cimadamore) Date: Wed, 15 May 2024 10:40:34 GMT Subject: RFR: 8331671: Implement JEP 472: Prepare to Restrict the Use of JNI [v5] In-Reply-To: References: Message-ID: > This PR implements [JEP 472](https://openjdk.org/jeps/472), by restricting the use of JNI in the following ways: > > * `System::load` and `System::loadLibrary` are now restricted methods > * `Runtime::load` and `Runtime::loadLibrary` are now restricted methods > * binding a JNI `native` method declaration to a native implementation is now considered a restricted operation > > This PR slightly changes the way in which the JDK deals with restricted methods, even for FFM API calls. In Java 22, the single `--enable-native-access` was used both to specify a set of modules for which native access should be allowed *and* to specify whether illegal native access (that is, native access occurring from a module not specified by `--enable-native-access`) should be treated as an error or a warning. More specifically, an error is only issued if the `--enable-native-access flag` is used at least once. > > Here, a new flag is introduced, namely `illegal-native-access=allow/warn/deny`, which is used to specify what should happen when access to a restricted method and/or functionality is found outside the set of modules specified with `--enable-native-access`. The default policy is `warn`, but users can select `allow` to suppress the warnings, or `deny` to cause `IllegalCallerException` to be thrown. This aligns the treatment of restricted methods with other mechanisms, such as `--illegal-access` and the more recent `--sun-misc-unsafe-memory-access`. > > Some changes were required in the package-info javadoc for `java.lang.foreign`, to reflect the changes in the command line flags described above. Maurizio Cimadamore has updated the pull request incrementally with one additional commit since the last revision: Refine warning text for JNI method binding ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19213/files - new: https://git.openjdk.org/jdk/pull/19213/files/0d21bf99..daf729f4 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19213&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19213&range=03-04 Stats: 4 lines in 2 files changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/19213.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19213/head:pull/19213 PR: https://git.openjdk.org/jdk/pull/19213 From alanb at openjdk.org Wed May 15 11:05:14 2024 From: alanb at openjdk.org (Alan Bateman) Date: Wed, 15 May 2024 11:05:14 GMT Subject: RFR: 8331671: Implement JEP 472: Prepare to Restrict the Use of JNI [v3] In-Reply-To: <3w0X9MH3A4P3lX6oIuONx-daTSVe3kWm8z2YWDbHNvg=.9a19ac2b-f46b-4d64-9cdd-f3e70dc3da20@github.com> References: <-gTDhrDCjlq9pEoBxG4Qneo9dEf7ErWmvnyOZKGx4mM=.8772d4dd-aa5e-412c-8131-75687cddad5b@github.com> <3w0X9MH3A4P3lX6oIuONx-daTSVe3kWm8z2YWDbHNvg=.9a19ac2b-f46b-4d64-9cdd-f3e70dc3da20@github.com> Message-ID: On Wed, 15 May 2024 10:34:01 GMT, Maurizio Cimadamore wrote: > I don't fully agree that this option is not module related (which is why I gave it that name). The very definition of illegal native access is related to native access occurring from a module that is outside a specific set. So I think it's 50/50 as to whether this option is module-related or not. Of course I can fix the code if there's something clearly better. It maps here to a jdk.module.* property so I suppose it's okay. The functions were introduced to create jdk.module.* properties where the values were a set module names or a module path. Maybe these functions should be renamed at some point (not here) as they are more widely useful now. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19213#discussion_r1601421535 From jsjolen at openjdk.org Wed May 15 11:48:01 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Wed, 15 May 2024 11:48:01 GMT Subject: RFR: 8312132: Add tracking of multiple address spaces in NMT [v88] In-Reply-To: References: Message-ID: > Hi, > > This PR introduces a new abstraction to NMT, named `MemoryFileTracker`. Today, NMT does not track any memory outside of the virtual memory address space. This means that if you allocated memory in something such as a memory-backed file and use `mmap` to map into that memory, then you'll have trouble reporting this to NMT. This is the situation that ZGC is in, and that is what this patch attempts to fix. > > ## `MemoryFileTracker` > > The `MemoryFileTracker` adds the ability of adding new virtual memory address spaces to NMT and committing memory to these, the basic API is: > > ```c++ > static MemoryFile* make_device(const char* descriptive_name); > static void free_device(MemoryFile* device); > > static void allocate_memory(MemoryFile* device, size_t offset, size_t size, > MEMFLAGS flag, const NativeCallStack& stack); > static void free_memory(MemoryFile* device, size_t offset, size_t size); > > > It is easiest to see how this is used by looking at what ZGC's `ZNMT` class does: > > ```c++ > void ZNMT::reserve(zaddress_unsafe start, size_t size) { > MemTracker::record_virtual_memory_reserve((address)start, size, CALLER_PC, mtJavaHeap); > } > void ZNMT::commit(zoffset offset, size_t size) { > MemTracker::allocate_memory_in(ZNMT::_device, static_cast(offset), size, mtJavaHeap, CALLER_PC); > } > void ZNMT::uncommit(zoffset offset, size_t size) { > MemTracker::free_memory_in(ZNMT::_device, (size_t)offset, size); > } > > void ZNMT::map(zaddress_unsafe addr, size_t size, zoffset offset) { > // NMT doesn't track mappings at the moment. > } > void ZNMT::unmap(zaddress_unsafe addr, size_t size) { > // NMT doesn't track mappings at the moment. > } > > > As you can see, any mapping between reserved regions and device-allocated memory is not recorded in NMT. This means that in detailed mode you only get reserved regions printed for the reserved memory, the device-allocated memory is reported separately. When performing summary reporting any memory allocated via these devices is added to the corresponding `MEMFLAGS` as `committed` memory. > > This patch is also acting as a base on which we deploy multiple new backend ideas to NMT. These ideas are: > > 1. Implement VMA tracking using a balanced binary tree approach. Today's `VirtualMemoryTracker`'s usage of linked lists is slow and brittle, we'd like to move away from it. Our Treap-based approach in this patch gives a performance boost such that we see 25x better performance in a benchmark. The idea and draft of this... Johan Sj?len has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 175 additional commits since the last revision: - Merge remote-tracking branch 'openjdk/master' into nmt-physical-device - Don't look at val, look at key - Fix test - Test closest_leq - Test find - Remove is_noop() superfluous check - Off-by-one error - Fix iteration order - Fixes - Test with opposite ordering - ... and 165 more: https://git.openjdk.org/jdk/compare/6cb564ba...961d89ca ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18289/files - new: https://git.openjdk.org/jdk/pull/18289/files/5e453a99..961d89ca Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18289&range=87 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18289&range=86-87 Stats: 22089 lines in 472 files changed: 11457 ins; 7213 del; 3419 mod Patch: https://git.openjdk.org/jdk/pull/18289.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18289/head:pull/18289 PR: https://git.openjdk.org/jdk/pull/18289 From lmesnik at openjdk.org Wed May 15 15:00:19 2024 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Wed, 15 May 2024 15:00:19 GMT Subject: Integrated: 8332112: Update nsk.share.Log to don't print summary during VM shutdown hook In-Reply-To: <08rpfgXgTS5RvsqbnwgKdUKo3ADDDGuieSJclVz7leg=.28cded8e-3d2e-4fab-92f6-be89f7ddc6ce@github.com> References: <08rpfgXgTS5RvsqbnwgKdUKo3ADDDGuieSJclVz7leg=.28cded8e-3d2e-4fab-92f6-be89f7ddc6ce@github.com> Message-ID: <5Dt76GGgzyQjfZ99ANQ4ee6zYAwQbF8NwU3uBRScILo=.5a66a525-9c21-49be-912f-5b83d262ede1@github.com> On Sun, 12 May 2024 21:34:41 GMT, Leonid Mesnik wrote: > The nsk.share.Log doing some cleanup and reporting errors in the cleanup method. This method is supposed to be executed by finalizer originally. However, now it is called only during shutdown hook. > The cleanup using Cleaner doesn't work. See https://bugs.openjdk.org/browse/JDK-8330760 > > The cleanup() method flush stream and print summary which should be already printed by complain method. > > This cleanup is not necessary and printing summary usually is just disabled. It is enabled if the test called 'complain' method. However, the error should have been printed already in this method. > > So it would be simple to remove this cleanup and reduce usage of Finalizable in vmTestbase tests. > > Note: The 'verboseOnErrorEnabled' is just not used. > > See isVerboseOnErrorEnabled. > > public boolean isVerboseOnErrorEnabled() { > return errorsSummaryEnabled; > } > > > Tested with by running tests with different combinations (tier4-7) and tier1. This pull request has now been integrated. Changeset: 61aff6db Author: Leonid Mesnik URL: https://git.openjdk.org/jdk/commit/61aff6db15d5bdda77427af5ce34d0fe43373197 Stats: 168 lines in 30 files changed: 2 ins; 134 del; 32 mod 8332112: Update nsk.share.Log to don't print summary during VM shutdown hook Reviewed-by: dholmes, cjplummer ------------- PR: https://git.openjdk.org/jdk/pull/19209 From duke at openjdk.org Wed May 15 15:31:05 2024 From: duke at openjdk.org (Mikhail Ablakatov) Date: Wed, 15 May 2024 15:31:05 GMT Subject: RFR: 8322770: Implement C2 VectorizedHashCode on AArch64 In-Reply-To: References: <2VKOC-rT0vOyMcXUX2gs3sOrbZ5H79KBIo50sOOVmyI=.1936f78e-794c-4f54-af3c-b1b97e5fafa8@github.com> Message-ID: On Fri, 10 May 2024 12:38:46 GMT, Andrew Haley wrote: >> Hello, >> >> Please review the following PR for [JDK-8322770 Implement C2 VectorizedHashCode on AArch64](https://bugs.openjdk.org/browse/JDK-8322770). It follows previous work done in https://github.com/openjdk/jdk/pull/16629 and https://github.com/openjdk/jdk/pull/10847 for RISC-V and x86 respectively. >> >> The code to calculate a hash code consists of two parts: a vectorized loop of Neon instruction that process 4 or 8 elements per iteration depending on the data type and a fully unrolled scalar "loop" that processes up to 7 tail elements. >> >> At the time of writing this I don't see potential benefits from providing SVE/SVE2 implementation, but it could be added as a follow-up or independently later if required. >> >> # Performance >> >> ## Neoverse N1 >> >> >> -------------------------------------------------------------------------------------------- >> Version Baseline This patch >> -------------------------------------------------------------------------------------------- >> Benchmark (size) Mode Cnt Score Error Score Error Units >> -------------------------------------------------------------------------------------------- >> ArraysHashCode.bytes 1 avgt 15 1.249 ? 0.060 1.247 ? 0.062 ns/op >> ArraysHashCode.bytes 10 avgt 15 8.754 ? 0.028 4.387 ? 0.015 ns/op >> ArraysHashCode.bytes 100 avgt 15 98.596 ? 0.051 26.655 ? 0.097 ns/op >> ArraysHashCode.bytes 10000 avgt 15 10150.578 ? 1.352 2649.962 ? 216.744 ns/op >> ArraysHashCode.chars 1 avgt 15 1.286 ? 0.062 1.246 ? 0.054 ns/op >> ArraysHashCode.chars 10 avgt 15 8.731 ? 0.002 5.344 ? 0.003 ns/op >> ArraysHashCode.chars 100 avgt 15 98.632 ? 0.048 23.023 ? 0.142 ns/op >> ArraysHashCode.chars 10000 avgt 15 10150.658 ? 3.374 2410.504 ? 8.872 ns/op >> ArraysHashCode.ints 1 avgt 15 1.189 ? 0.005 1.187 ? 0.001 ns/op >> ArraysHashCode.ints 10 avgt 15 8.730 ? 0.002 5.676 ? 0.001 ns/op >> ArraysHashCode.ints 100 avgt 15 98.559 ? 0.016 24.378 ? 0.006 ns/op >> ArraysHashCode.ints 10000 avgt 15 10148.752 ? 1.336 2419.015 ? 0.492 ns/op >> ArraysHashCode.multibytes 1 avgt 15 1.037 ? 0.001 1.037 ? 0.001 ... > > Hi, > >> I can update the patch with current results on Monday and we could decide how to proceed with this PR after that. Sounds good? > > Yes, that's right. Hi @theRealAph ! You may find the latest version here: https://github.com/mikabl-arm/jdk/commit/b3db421c795f683db1a001853990026bafc2ed4b . I gave a short explanation in the commit message, feel free to ask for more details if required. Unfortunately, it still contains critical bugs and I won't be able to take a look into the issue before the next week at best. Until it's fixed, it's not possible to run the benchmarks. Although I expect it to improve performance on longer integer arrays based on a benchmark I've written in C++ and Assembly. The results aren't comparable to the jmh results, so I won't post them here. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18487#issuecomment-2112858007 From alanb at openjdk.org Wed May 15 15:59:14 2024 From: alanb at openjdk.org (Alan Bateman) Date: Wed, 15 May 2024 15:59:14 GMT Subject: RFR: 8331671: Implement JEP 472: Prepare to Restrict the Use of JNI [v5] In-Reply-To: References: Message-ID: On Wed, 15 May 2024 10:40:34 GMT, Maurizio Cimadamore wrote: >> This PR implements [JEP 472](https://openjdk.org/jeps/472), by restricting the use of JNI in the following ways: >> >> * `System::load` and `System::loadLibrary` are now restricted methods >> * `Runtime::load` and `Runtime::loadLibrary` are now restricted methods >> * binding a JNI `native` method declaration to a native implementation is now considered a restricted operation >> >> This PR slightly changes the way in which the JDK deals with restricted methods, even for FFM API calls. In Java 22, the single `--enable-native-access` was used both to specify a set of modules for which native access should be allowed *and* to specify whether illegal native access (that is, native access occurring from a module not specified by `--enable-native-access`) should be treated as an error or a warning. More specifically, an error is only issued if the `--enable-native-access flag` is used at least once. >> >> Here, a new flag is introduced, namely `illegal-native-access=allow/warn/deny`, which is used to specify what should happen when access to a restricted method and/or functionality is found outside the set of modules specified with `--enable-native-access`. The default policy is `warn`, but users can select `allow` to suppress the warnings, or `deny` to cause `IllegalCallerException` to be thrown. This aligns the treatment of restricted methods with other mechanisms, such as `--illegal-access` and the more recent `--sun-misc-unsafe-memory-access`. >> >> Some changes were required in the package-info javadoc for `java.lang.foreign`, to reflect the changes in the command line flags described above. > > Maurizio Cimadamore has updated the pull request incrementally with one additional commit since the last revision: > > Refine warning text for JNI method binding src/java.base/share/classes/jdk/internal/module/ModuleBootstrap.java line 871: > 869: return IllegalNativeAccess.WARN; > 870: } else { > 871: fail("Value specified to --illegal-access not recognized:" Typo in the message, should be --illegal-native-access. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19213#discussion_r1601898238 From mcimadamore at openjdk.org Wed May 15 16:08:17 2024 From: mcimadamore at openjdk.org (Maurizio Cimadamore) Date: Wed, 15 May 2024 16:08:17 GMT Subject: RFR: 8331671: Implement JEP 472: Prepare to Restrict the Use of JNI [v6] In-Reply-To: References: Message-ID: > This PR implements [JEP 472](https://openjdk.org/jeps/472), by restricting the use of JNI in the following ways: > > * `System::load` and `System::loadLibrary` are now restricted methods > * `Runtime::load` and `Runtime::loadLibrary` are now restricted methods > * binding a JNI `native` method declaration to a native implementation is now considered a restricted operation > > This PR slightly changes the way in which the JDK deals with restricted methods, even for FFM API calls. In Java 22, the single `--enable-native-access` was used both to specify a set of modules for which native access should be allowed *and* to specify whether illegal native access (that is, native access occurring from a module not specified by `--enable-native-access`) should be treated as an error or a warning. More specifically, an error is only issued if the `--enable-native-access flag` is used at least once. > > Here, a new flag is introduced, namely `illegal-native-access=allow/warn/deny`, which is used to specify what should happen when access to a restricted method and/or functionality is found outside the set of modules specified with `--enable-native-access`. The default policy is `warn`, but users can select `allow` to suppress the warnings, or `deny` to cause `IllegalCallerException` to be thrown. This aligns the treatment of restricted methods with other mechanisms, such as `--illegal-access` and the more recent `--sun-misc-unsafe-memory-access`. > > Some changes were required in the package-info javadoc for `java.lang.foreign`, to reflect the changes in the command line flags described above. Maurizio Cimadamore has updated the pull request incrementally with one additional commit since the last revision: Address review comment ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19213/files - new: https://git.openjdk.org/jdk/pull/19213/files/daf729f4..1c45e5d5 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19213&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19213&range=04-05 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/19213.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19213/head:pull/19213 PR: https://git.openjdk.org/jdk/pull/19213 From amitkumar at openjdk.org Wed May 15 16:31:41 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Wed, 15 May 2024 16:31:41 GMT Subject: Integrated: 8316935: [s390x] Use consistent naming for lightweight locking in MacroAssembler Message-ID: We (s390) also needs to update our naming from fast_lock & fast_unlock to MacroAssembler::lightweight_lock and MacroAssembler::lightweight_unlock respectively. ------------- Commit messages: - s390 patch Changes: https://git.openjdk.org/jdk/pull/15915/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=15915&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8316935 Stats: 16 lines in 4 files changed: 0 ins; 0 del; 16 mod Patch: https://git.openjdk.org/jdk/pull/15915.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15915/head:pull/15915 PR: https://git.openjdk.org/jdk/pull/15915 From mdoerr at openjdk.org Wed May 15 16:31:41 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 15 May 2024 16:31:41 GMT Subject: Integrated: 8316935: [s390x] Use consistent naming for lightweight locking in MacroAssembler In-Reply-To: References: Message-ID: On Tue, 26 Sep 2023 08:59:44 GMT, Amit Kumar wrote: > We (s390) also needs to update our naming from fast_lock & fast_unlock to MacroAssembler::lightweight_lock and MacroAssembler::lightweight_unlock respectively. Looks good and trivial. You can integrate it after you get a 2nd review or after 24h (whatever happens first). ------------- Marked as reviewed by mdoerr (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15915#pullrequestreview-1643811581 PR Comment: https://git.openjdk.org/jdk/pull/15915#issuecomment-1735560616 From lucy at openjdk.org Wed May 15 16:31:41 2024 From: lucy at openjdk.org (Lutz Schmidt) Date: Wed, 15 May 2024 16:31:41 GMT Subject: Integrated: 8316935: [s390x] Use consistent naming for lightweight locking in MacroAssembler In-Reply-To: References: Message-ID: On Tue, 26 Sep 2023 08:59:44 GMT, Amit Kumar wrote: > We (s390) also needs to update our naming from fast_lock & fast_unlock to MacroAssembler::lightweight_lock and MacroAssembler::lightweight_unlock respectively. Looks good to me. Changes which are classified "trivial" by a Reviewer can be integrated with just one positive review. Now you've got both: two reviews and a trivial classification. ------------- Marked as reviewed by lucy (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15915#pullrequestreview-1644295073 From amitkumar at openjdk.org Wed May 15 16:31:41 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Wed, 15 May 2024 16:31:41 GMT Subject: Integrated: 8316935: [s390x] Use consistent naming for lightweight locking in MacroAssembler In-Reply-To: References: Message-ID: <5lJCGw9v2YvI60oU2a_3aoCPdwrnOoJyspHSaSaUJiw=.26eb5a02-77fb-4376-87a1-c152c8a7f87f@github.com> On Tue, 26 Sep 2023 09:32:56 GMT, Martin Doerr wrote: >> We (s390) also needs to update our naming from fast_lock & fast_unlock to MacroAssembler::lightweight_lock and MacroAssembler::lightweight_unlock respectively. > > Looks good and trivial. Thanks @TheRealMDoerr for reviewing it. It's a trivial change, Should I integrate it or wait for a another Review ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/15915#issuecomment-1735415391 From amitkumar at openjdk.org Wed May 15 16:31:41 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Wed, 15 May 2024 16:31:41 GMT Subject: Integrated: 8316935: [s390x] Use consistent naming for lightweight locking in MacroAssembler In-Reply-To: References: Message-ID: <36JaF_Vyl2qSSamBEPmkJz4-y0YbUUlyKcuuvrmF9is=.12d1d839-4b02-4891-969c-39bf6ed4d918@github.com> On Tue, 26 Sep 2023 13:37:54 GMT, Lutz Schmidt wrote: > Now you've got both: two reviews and a trivial classification. Thank you, Lutz and Martin for approving it. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15915#issuecomment-1735725343 From amitkumar at openjdk.org Wed May 15 16:31:41 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Wed, 15 May 2024 16:31:41 GMT Subject: Integrated: 8316935: [s390x] Use consistent naming for lightweight locking in MacroAssembler In-Reply-To: References: Message-ID: On Tue, 26 Sep 2023 08:59:44 GMT, Amit Kumar wrote: > We (s390) also needs to update our naming from fast_lock & fast_unlock to MacroAssembler::lightweight_lock and MacroAssembler::lightweight_unlock respectively. This pull request has now been integrated. Changeset: efb7e85e Author: Amit Kumar URL: https://git.openjdk.org/jdk/commit/efb7e85ecfc9c6edb2820e1bf72d48958d4c9780 Stats: 16 lines in 4 files changed: 0 ins; 0 del; 16 mod 8316935: [s390x] Use consistent naming for lightweight locking in MacroAssembler Reviewed-by: mdoerr, lucy ------------- PR: https://git.openjdk.org/jdk/pull/15915 From wkemper at openjdk.org Wed May 15 16:43:03 2024 From: wkemper at openjdk.org (William Kemper) Date: Wed, 15 May 2024 16:43:03 GMT Subject: RFR: 8332082: Shenandoah: Use SATB active flag for C2 pre-write barrier on x86 and PPC In-Reply-To: References: Message-ID: On Fri, 10 May 2024 16:13:51 GMT, William Kemper wrote: > This is consistent with c1 and other platforms. `ShenandoahBarrierSetC2::verify_gc_barriers` is also looking for IR node pattern for `SATBMarkQueue.isActive`, so it'll need rework too. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19180#issuecomment-2113004499 From iklam at openjdk.org Wed May 15 16:59:06 2024 From: iklam at openjdk.org (Ioi Lam) Date: Wed, 15 May 2024 16:59:06 GMT Subject: RFR: 8330198: Add some class loading related perf counters to measure VM startup [v3] In-Reply-To: References: Message-ID: On Mon, 13 May 2024 23:02:27 GMT, Calvin Cheung wrote: >> Adding a few perf counters related to class loading to measure VM startup. The counters are only active if the user specifies `-Xlog:init` in the command line. A diagnostic flag `ProfileClassLinkage` is added to control the new counters. The flag is set to false by default and will be enabled if `-Xlog:init` is specified. >> >> This change is already in the leyden/premain branch. There are more counters in the branch to measure other stuff. For now, just upstreaming class loader related counters. >> >> Refer to the [comment](https://bugs.openjdk.org/browse/JDK-8330198?focusedId=14665311&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14665311) in the bug report for an example output. >> >> Passed tiers 1 - 4 testing. > > Calvin Cheung has updated the pull request incrementally with one additional commit since the last revision: > > comments from Ioi LGTM ------------- Marked as reviewed by iklam (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18790#pullrequestreview-2058557934 From cjplummer at openjdk.org Wed May 15 19:39:03 2024 From: cjplummer at openjdk.org (Chris Plummer) Date: Wed, 15 May 2024 19:39:03 GMT Subject: RFR: 8328083: degrade virtual thread support for GetObjectMonitorUsage [v4] In-Reply-To: References: <-lAT5GzHVRrOUJhhMLfV5CkkPA3DHHDUZfdE7CBOcHg=.ecb91a2d-82c2-4e94-a1f6-f84d7a8c2a87@github.com> <2A25kL9oqh30aBRofiekO9CwmSwgEZ5LEcReUEfmxrQ=.eec2eaf8-dc9a-4a0d-bb42-d9f192f72fb2@github.com> <2lhm2l4CzUnyStTj215njaZg9EcMwwKWxMxtdZTXD8I=.ba8b1275-f16c-4af4-80e5-81ace9b40aa2@github.com> Message-ID: On Tue, 14 May 2024 23:19:14 GMT, Serguei Spitsyn wrote: >> Okay, please, let me explain this one more time. >> The original comments before method `check()` calls describe the testing scenario (or configuration setup before the verifying check) but not the numbers expected to be returned by the JVMTI `GetObjectMonitorUsage`. For instance, if the testing scenario says: "count of threads waiting to enter: NUMBER_OF_ENTERING_THREADS" then it means there is a real number of these threads waiting to enter the monitor. And it does not matter if they are platform or virtual threads. They are really waiting to enter the monitor. However, the JVMTI `GetObjectMonitorUsage` won't include virtual threads into the returned results. >> >> Now, I'm suggesting to add the following header for comments before each `check()` method call: >> >> + // The numbers below describe the testing scenario, not the expected results. >> + // The expected numbers are different for virtual threads because >> + // they are not supported by JVMTI GetObjectMonitorUsage. >> >> Would it work for you (I've pushed an update)? > >> BTW, the "re-enter" comment should continue to be "i + 1". >> I'm not sure why it was changed to "expEntryCount()". > > It depends on what are we trying to describe. We either describe the testing scenario (the number of threads doing something) or the expected results. I understood that you wanted to describe the results instead of the scenario. And then it becomes problematic to do so as you can see. Ok, I understand what you are saying now. I think your clarifying comment above helps a lot. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19030#discussion_r1602155030 From cjplummer at openjdk.org Wed May 15 19:47:04 2024 From: cjplummer at openjdk.org (Chris Plummer) Date: Wed, 15 May 2024 19:47:04 GMT Subject: RFR: 8328083: degrade virtual thread support for GetObjectMonitorUsage [v4] In-Reply-To: References: <-lAT5GzHVRrOUJhhMLfV5CkkPA3DHHDUZfdE7CBOcHg=.ecb91a2d-82c2-4e94-a1f6-f84d7a8c2a87@github.com> Message-ID: On Wed, 1 May 2024 20:49:02 GMT, Chris Plummer wrote: >> Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: >> >> review: 1. clarifications in JDWP and JDI spec; 2. clarifications in test comments. > > src/jdk.jdi/share/classes/com/sun/jdi/ObjectReference.java line 369: > >> 367: >> 368: /** >> 369: * Returns an {@link ThreadReference} for the platform thread, if any, > > Pre-existing issue: It should be "a" not "an", but then in the `@return` section we are using "the", so maybe we should use similar wording here: `...the {@link ThreadReference} of the platform thread...` The above comment has not been addressed yet. Should be "a ThreadReference" or "the ThreadReference" ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19030#discussion_r1602163046 From cjplummer at openjdk.org Wed May 15 19:55:02 2024 From: cjplummer at openjdk.org (Chris Plummer) Date: Wed, 15 May 2024 19:55:02 GMT Subject: RFR: 8328083: degrade virtual thread support for GetObjectMonitorUsage [v4] In-Reply-To: References: <-lAT5GzHVRrOUJhhMLfV5CkkPA3DHHDUZfdE7CBOcHg=.ecb91a2d-82c2-4e94-a1f6-f84d7a8c2a87@github.com> Message-ID: On Tue, 14 May 2024 23:56:14 GMT, Serguei Spitsyn wrote: >> The fix is to degrade virtual threads support in the JVM TI `GetObjectMonitorUsage` function so that it is specified to only return an owner when the owner is a platform thread. Also, virtual threads are not listed in the both `waiters` and `notify_waiters` lists returned in the `jvmtiMonitorUsage` structure. Java 19 re-specified a number of JVMTI functions and events for virtual threads, we missed this one. >> >> The main motivation for degrading it now is that the object monitor implementation is being updated to allow virtual threads unmount while owning monitors. It would add overhead to record monitor usage when freezing/unmount, overhead that couldn't be tied to a JVMTI capability as the capability can be enabled at any time. >> >> `GetObjectMonitorUsage` was broken for 20+ years ([8247972](https://bugs.openjdk.org/browse/JDK-8247972)) without bug reports so it seems unlikely that the function is widely used. Degrading it to only return an owner when the owner is a platform thread has no compatibility impact for tooling that uses it in conjunction with `HotSpot` thread dumps or `ThreadMXBean`. >> >> One other point about `GetObjectMonitorUsage` is that it pre-dates j.u.concurrent in Java 5 so it can't be used to get a full picture of the lock usage in a program. >> >> The specs of the impacted `JDWP ObjectReference.MonitorInfo` command and the JDI `ObjectReference` `ownerThread()`, `waitingThreads()` and `entryCount()` methods are updated to match the JVM TI spec. >> >> Also, please, review the related CSR and Release Note: >> - CSR: [8331422](https://bugs.openjdk.org/browse/JDK-8331422): degrade virtual thread support for GetObjectMonitorUsage >> - RN: [8331465](https://bugs.openjdk.org/browse/JDK-8331465): Release Note: degrade virtual thread support for GetObjectMonitorUsage >> >> Testing: >> - tested impacted and updated tests locally >> - tested with mach5 tiers 1-6 > > Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: > > review: 1. clarifications in JDWP and JDI spec; 2. clarifications in test comments. Changes requested by cjplummer (Reviewer). src/hotspot/share/prims/jvmtiEnvBase.cpp line 1535: > 1533: bool is_virtual = java_lang_VirtualThread::is_instance(thread_oop); > 1534: if (is_virtual) { > 1535: skipped++; Do we really need to maintain `skipped`. Isn't not adding to `nWait` the same as skipping? src/hotspot/share/prims/jvmtiEnvBase.cpp line 1583: > 1581: assert(w != nullptr, "sanity check"); > 1582: if (java_lang_VirtualThread::is_instance(thread_oop)) { > 1583: skipped++; I don't think maintaining `skipped` does anything here. ------------- PR Review: https://git.openjdk.org/jdk/pull/19030#pullrequestreview-2058882144 PR Review Comment: https://git.openjdk.org/jdk/pull/19030#discussion_r1602170079 PR Review Comment: https://git.openjdk.org/jdk/pull/19030#discussion_r1602171199 From sspitsyn at openjdk.org Wed May 15 20:06:27 2024 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Wed, 15 May 2024 20:06:27 GMT Subject: RFR: 8328083: degrade virtual thread support for GetObjectMonitorUsage [v5] In-Reply-To: <-lAT5GzHVRrOUJhhMLfV5CkkPA3DHHDUZfdE7CBOcHg=.ecb91a2d-82c2-4e94-a1f6-f84d7a8c2a87@github.com> References: <-lAT5GzHVRrOUJhhMLfV5CkkPA3DHHDUZfdE7CBOcHg=.ecb91a2d-82c2-4e94-a1f6-f84d7a8c2a87@github.com> Message-ID: > The fix is to degrade virtual threads support in the JVM TI `GetObjectMonitorUsage` function so that it is specified to only return an owner when the owner is a platform thread. Also, virtual threads are not listed in the both `waiters` and `notify_waiters` lists returned in the `jvmtiMonitorUsage` structure. Java 19 re-specified a number of JVMTI functions and events for virtual threads, we missed this one. > > The main motivation for degrading it now is that the object monitor implementation is being updated to allow virtual threads unmount while owning monitors. It would add overhead to record monitor usage when freezing/unmount, overhead that couldn't be tied to a JVMTI capability as the capability can be enabled at any time. > > `GetObjectMonitorUsage` was broken for 20+ years ([8247972](https://bugs.openjdk.org/browse/JDK-8247972)) without bug reports so it seems unlikely that the function is widely used. Degrading it to only return an owner when the owner is a platform thread has no compatibility impact for tooling that uses it in conjunction with `HotSpot` thread dumps or `ThreadMXBean`. > > One other point about `GetObjectMonitorUsage` is that it pre-dates j.u.concurrent in Java 5 so it can't be used to get a full picture of the lock usage in a program. > > The specs of the impacted `JDWP ObjectReference.MonitorInfo` command and the JDI `ObjectReference` `ownerThread()`, `waitingThreads()` and `entryCount()` methods are updated to match the JVM TI spec. > > Also, please, review the related CSR and Release Note: > - CSR: [8331422](https://bugs.openjdk.org/browse/JDK-8331422): degrade virtual thread support for GetObjectMonitorUsage > - RN: [8331465](https://bugs.openjdk.org/browse/JDK-8331465): Release Note: degrade virtual thread support for GetObjectMonitorUsage > > Testing: > - tested impacted and updated tests locally > - tested with mach5 tiers 1-6 Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: review: fixed minor typos in JDI and JDWP specs ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19030/files - new: https://git.openjdk.org/jdk/pull/19030/files/8438cf4a..95ea3621 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19030&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19030&range=03-04 Stats: 2 lines in 2 files changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/19030.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19030/head:pull/19030 PR: https://git.openjdk.org/jdk/pull/19030 From coleenp at openjdk.org Wed May 15 20:08:04 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 15 May 2024 20:08:04 GMT Subject: RFR: 8331572: Allow using OopMapCache outside of STW GC phases In-Reply-To: References: Message-ID: <0FHsLntrHofCG7x31n4Worx5TdfoBZ7jGCTkDqJJU8M=.2e89548b-c70a-43e2-a91e-1a80f954188d@github.com> On Tue, 14 May 2024 12:31:08 GMT, Aleksey Shipilev wrote: > As the reproducer in the issue shows, we would also like to use the `OopMapCache` during the concurrent GC phases. Zhengyu mentions there is also a production problem for stack walking that would benefit from letting `OopMapCache` be used without looking at GC at all. > > This PR unblocks `OopMapCache` uses for everything. Cleanups are nominally done by service thread. But, still appreciating that majority of use cases would be from GCs, we leave the proactive cleanups from the GC ops here as well. It requires the synchronization between readers that might be copying out the entries out of the hashmap and the concurrent reclamation. Handily, `GlobalCounter` can be used for that purpose. > > After this lands, I think we can go over `OopMapCache::compute_one_oop_map` uses and see if they would instead like to use the cached `lookup` to benefit from this cache too. I think those paths are for OSR and deopts, so their performance is unlikely to be critical. This PR already covers the concurrent GC paths well. > > Additional testing: > - [x] Performance test reproducer from the bug improves significantly > - [x] Linux AArch64 server fastdebug, `hotspot_gc_shenandoah` (10x) > - [x] Linux AArch64 server fastdebug, `all` > - [x] Linux x86_64 server fastdebug, `all` This looks good but one question for ZGC, does ZGC need an OopMapCache::cleanup_old_entries() ? ------------- Marked as reviewed by coleenp (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19229#pullrequestreview-2058904373 From coleenp at openjdk.org Wed May 15 20:11:02 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 15 May 2024 20:11:02 GMT Subject: RFR: 8331572: Allow using OopMapCache outside of STW GC phases In-Reply-To: References: Message-ID: On Tue, 14 May 2024 12:31:08 GMT, Aleksey Shipilev wrote: > As the reproducer in the issue shows, we would also like to use the `OopMapCache` during the concurrent GC phases. Zhengyu mentions there is also a production problem for stack walking that would benefit from letting `OopMapCache` be used without looking at GC at all. > > This PR unblocks `OopMapCache` uses for everything. Cleanups are nominally done by service thread. But, still appreciating that majority of use cases would be from GCs, we leave the proactive cleanups from the GC ops here as well. It requires the synchronization between readers that might be copying out the entries out of the hashmap and the concurrent reclamation. Handily, `GlobalCounter` can be used for that purpose. > > After this lands, I think we can go over `OopMapCache::compute_one_oop_map` uses and see if they would instead like to use the cached `lookup` to benefit from this cache too. I think those paths are for OSR and deopts, so their performance is unlikely to be critical. This PR already covers the concurrent GC paths well. > > Additional testing: > - [x] Performance test reproducer from the bug improves significantly > - [x] Linux AArch64 server fastdebug, `hotspot_gc_shenandoah` (10x) > - [x] Linux AArch64 server fastdebug, `all` > - [x] Linux x86_64 server fastdebug, `all` I did have questions (sorry hit approve too soon). src/hotspot/share/interpreter/oopMapCache.cpp line 545: > 543: > 544: // First search for an empty slot > 545: for (int i = 0; i < _probe_depth; i++) { Does the GlobalCounter read barrier belong around this too? src/hotspot/share/interpreter/oopMapCache.cpp line 593: > 591: bool OopMapCache::has_cleanup_work() { > 592: return Atomic::load(&_old_entries) != nullptr; > 593: } Does this need to notify the ServiceThread? Since the ServiceThread is now a timed wait, maybe this is fine. ------------- Changes requested by coleenp (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19229#pullrequestreview-2058906401 PR Review Comment: https://git.openjdk.org/jdk/pull/19229#discussion_r1602184993 PR Review Comment: https://git.openjdk.org/jdk/pull/19229#discussion_r1602186516 From sspitsyn at openjdk.org Wed May 15 20:12:03 2024 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Wed, 15 May 2024 20:12:03 GMT Subject: RFR: 8328083: degrade virtual thread support for GetObjectMonitorUsage [v4] In-Reply-To: References: <-lAT5GzHVRrOUJhhMLfV5CkkPA3DHHDUZfdE7CBOcHg=.ecb91a2d-82c2-4e94-a1f6-f84d7a8c2a87@github.com> Message-ID: On Wed, 15 May 2024 19:51:51 GMT, Chris Plummer wrote: >> Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: >> >> review: 1. clarifications in JDWP and JDI spec; 2. clarifications in test comments. > > src/hotspot/share/prims/jvmtiEnvBase.cpp line 1535: > >> 1533: bool is_virtual = java_lang_VirtualThread::is_instance(thread_oop); >> 1534: if (is_virtual) { >> 1535: skipped++; > > Do we really need to maintain `skipped`. Isn't not adding to `nWait` the same as skipping? Good suggestion, thanks. Fixed now. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19030#discussion_r1602188546 From sspitsyn at openjdk.org Wed May 15 20:18:04 2024 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Wed, 15 May 2024 20:18:04 GMT Subject: RFR: 8328083: degrade virtual thread support for GetObjectMonitorUsage [v4] In-Reply-To: References: <-lAT5GzHVRrOUJhhMLfV5CkkPA3DHHDUZfdE7CBOcHg=.ecb91a2d-82c2-4e94-a1f6-f84d7a8c2a87@github.com> Message-ID: On Wed, 15 May 2024 19:52:36 GMT, Chris Plummer wrote: >> Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: >> >> review: 1. clarifications in JDWP and JDI spec; 2. clarifications in test comments. > > src/hotspot/share/prims/jvmtiEnvBase.cpp line 1583: > >> 1581: assert(w != nullptr, "sanity check"); >> 1582: if (java_lang_VirtualThread::is_instance(thread_oop)) { >> 1583: skipped++; > > I don't think maintaining `skipped` does anything here. Thank you for the question. It is needed at the line 1586 below to discount the index: if (java_lang_VirtualThread::is_instance(thread_oop)) { skipped++; } else { // If the thread was found on the ObjectWaiter list, then // it has not been notified. Handle th(current_thread, get_vthread_or_thread_oop(w)); 1586: ret.notify_waiters[i - skipped] = (jthread)jni_reference(calling_thread, th); } ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19030#discussion_r1602193077 From sspitsyn at openjdk.org Wed May 15 20:21:14 2024 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Wed, 15 May 2024 20:21:14 GMT Subject: RFR: 8328083: degrade virtual thread support for GetObjectMonitorUsage [v6] In-Reply-To: <-lAT5GzHVRrOUJhhMLfV5CkkPA3DHHDUZfdE7CBOcHg=.ecb91a2d-82c2-4e94-a1f6-f84d7a8c2a87@github.com> References: <-lAT5GzHVRrOUJhhMLfV5CkkPA3DHHDUZfdE7CBOcHg=.ecb91a2d-82c2-4e94-a1f6-f84d7a8c2a87@github.com> Message-ID: <-eAMFeNP4w4hzZm4HHW7RawOJtwcrwjdg09FfVJOqx8=.5c2bea51-eeb3-44f4-94fe-90e3eb01bc00@github.com> > The fix is to degrade virtual threads support in the JVM TI `GetObjectMonitorUsage` function so that it is specified to only return an owner when the owner is a platform thread. Also, virtual threads are not listed in the both `waiters` and `notify_waiters` lists returned in the `jvmtiMonitorUsage` structure. Java 19 re-specified a number of JVMTI functions and events for virtual threads, we missed this one. > > The main motivation for degrading it now is that the object monitor implementation is being updated to allow virtual threads unmount while owning monitors. It would add overhead to record monitor usage when freezing/unmount, overhead that couldn't be tied to a JVMTI capability as the capability can be enabled at any time. > > `GetObjectMonitorUsage` was broken for 20+ years ([8247972](https://bugs.openjdk.org/browse/JDK-8247972)) without bug reports so it seems unlikely that the function is widely used. Degrading it to only return an owner when the owner is a platform thread has no compatibility impact for tooling that uses it in conjunction with `HotSpot` thread dumps or `ThreadMXBean`. > > One other point about `GetObjectMonitorUsage` is that it pre-dates j.u.concurrent in Java 5 so it can't be used to get a full picture of the lock usage in a program. > > The specs of the impacted `JDWP ObjectReference.MonitorInfo` command and the JDI `ObjectReference` `ownerThread()`, `waitingThreads()` and `entryCount()` methods are updated to match the JVM TI spec. > > Also, please, review the related CSR and Release Note: > - CSR: [8331422](https://bugs.openjdk.org/browse/JDK-8331422): degrade virtual thread support for GetObjectMonitorUsage > - RN: [8331465](https://bugs.openjdk.org/browse/JDK-8331465): Release Note: degrade virtual thread support for GetObjectMonitorUsage > > Testing: > - tested impacted and updated tests locally > - tested with mach5 tiers 1-6 Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: review: simplified a fragment by removing tmp local variable ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19030/files - new: https://git.openjdk.org/jdk/pull/19030/files/95ea3621..f083fd65 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19030&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19030&range=04-05 Stats: 5 lines in 1 file changed: 0 ins; 2 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/19030.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19030/head:pull/19030 PR: https://git.openjdk.org/jdk/pull/19030 From duke at openjdk.org Wed May 15 20:26:17 2024 From: duke at openjdk.org (Volodymyr Paprotski) Date: Wed, 15 May 2024 20:26:17 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v19] In-Reply-To: References: Message-ID: On Sat, 4 May 2024 19:35:21 GMT, Scott Gibbons wrote: >> Re-write the IndexOf code without the use of the pcmpestri instruction, only using AVX2 instructions. This change accelerates String.IndexOf on average 1.3x for AVX2. The benchmark numbers: >> >> >> Benchmark Score Latest >> StringIndexOf.advancedWithMediumSub 343.573 317.934 0.925375393x >> StringIndexOf.advancedWithShortSub1 1039.081 1053.96 1.014319384x >> StringIndexOf.advancedWithShortSub2 55.828 110.541 1.980027943x >> StringIndexOf.constantPattern 9.361 11.906 1.271872663x >> StringIndexOf.searchCharLongSuccess 4.216 4.218 1.000474383x >> StringIndexOf.searchCharMediumSuccess 3.133 3.216 1.02649218x >> StringIndexOf.searchCharShortSuccess 3.76 3.761 1.000265957x >> StringIndexOf.success 9.186 9.713 1.057369911x >> StringIndexOf.successBig 14.341 46.343 3.231504079x >> StringIndexOfChar.latin1_AVX2_String 6220.918 12154.52 1.953814533x >> StringIndexOfChar.latin1_AVX2_char 5503.556 5540.044 1.006629895x >> StringIndexOfChar.latin1_SSE4_String 6978.854 6818.689 0.977049957x >> StringIndexOfChar.latin1_SSE4_char 5657.499 5474.624 0.967675646x >> StringIndexOfChar.latin1_Short_String 7132.541 6863.359 0.962260014x >> StringIndexOfChar.latin1_Short_char 16013.389 16162.437 1.009307711x >> StringIndexOfChar.latin1_mixed_String 7386.123 14771.622 1.999915517x >> StringIndexOfChar.latin1_mixed_char 9901.671 9782.245 0.987938803 > > Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: > > Rearrange; add lambdas for clarity First pass at StringIndexOfHuge.java and IndexOf.java test/jdk/java/lang/StringBuffer/IndexOf.java line 40: > 38: private static boolean failure = false; > 39: public static void main(String[] args) throws Exception { > 40: String testName = "IndexOf"; intentation test/jdk/java/lang/StringBuffer/IndexOf.java line 47: > 45: char[] haystack_16 = new char[128]; > 46: > 47: for (int i = 0; i < 128; i++) { you can use `char` instead of `int` as iterator test/jdk/java/lang/StringBuffer/IndexOf.java line 54: > 52: // for (int i = 1; i < 128; i++) { > 53: // haystack_16[i] = (char) (i); > 54: // } dead code test/jdk/java/lang/StringBuffer/IndexOf.java line 64: > 62: Charset hs_charset = StandardCharsets.UTF_16; > 63: Charset needleCharset = StandardCharsets.ISO_8859_1; > 64: // Charset needleCharset = StandardCharsets.UTF_16; Move from main() into a function that takes `needleCharset` as a parameter, then call that function twice. test/jdk/java/lang/StringBuffer/IndexOf.java line 81: > 79: sourceBuffer = new StringBuffer(sourceString); > 80: targetString = generateTestString(10, 11); > 81: } while (sourceString.indexOf(targetString) != -1); Should really keep the original test unmodified and add new tests as needed test/jdk/java/lang/StringBuffer/IndexOf.java line 83: > 81: shs = "$&),,18+-!'8)+"; > 82: endNeedle = "8)-"; > 83: l_offset = 9; dead code test/jdk/java/lang/StringBuffer/IndexOf.java line 89: > 87: StringBuffer bshs = new StringBuffer(shs); > 88: > 89: // printStringBytes(shs.getBytes(hs_charset)); dead code (and next two comments) test/jdk/java/lang/StringBuffer/IndexOf.java line 90: > 88: > 89: // printStringBytes(shs.getBytes(hs_charset)); > 90: for (int i = 0; i < 200000; i++) { This wont be a deterministic way to reach the intrinsic. I would suggest copying the idea from test/jdk/com/sun/crypto/provider/Cipher/ChaCha20/unittest/Poly1305UnitTestDriver.java i.e. Have two `@run main` invocations at the top of this file, one with default parameters, one with `-Xcomp -XX:-TieredCompilation`. You dont need a 'driver' program, that was to handle something else. /* * @test * @modules java.base/com.sun.crypto.provider * @run main java.base/com.sun.crypto.provider.Poly1305KAT * @summary Unit test for com.sun.crypto.provider.Poly1305. */ /* * @test * @modules java.base/com.sun.crypto.provider * @summary Unit test for IntrinsicCandidate in com.sun.crypto.provider.Poly1305. * @run main/othervm -Xcomp -XX:-TieredCompilation -XX:+UnlockDiagnosticVMOptions -XX:+ForceUnreachable java.base/com.sun.crypto.provider.Poly1305KAT */ test/jdk/java/lang/StringBuffer/IndexOf.java line 126: > 124: int aNewLength = getRandomIndex(min, max); > 125: for (int y = 0; y < aNewLength; y++) { > 126: int achar = generator.nextInt(30) + 30; This will only ever generate LL cases, i.e. chars from [30,60]. Could be parametrized to also produce utf16 if instead of 30, offset was in the unicode range test/jdk/java/lang/StringBuffer/IndexOf.java line 199: > 197: System.out.println("Source="+sourceString.substring(hsBegin, hsBegin + haystackLen)); > 198: System.out.println("Target="+targetString.substring(nBegin, nBegin + needleLen)); > 199: System.out.println("haystackLen="+haystackLen+" neeldeLen="+needleLen+" hsBegin="+hsBegin+" nBegin="+nBegin+ This looks like 'development scaffolding' (i.e. printf debugging) that was meant to be removed test/jdk/java/lang/StringBuffer/IndexOf.java line 237: > 235: + sourceBuffer.toString() + " len Buffer = " + sourceBuffer.toString().length()); > 236: System.err.println(" naive = " + naiveFind(sourceBuffer.toString(), targetString, 0) + ", IndexOf = " > 237: + sourceBuffer.indexOf(targetString)); More tracing left behind here and rest of this function (original just recorded failure and moved along) test/jdk/java/lang/StringBuffer/IndexOf.java line 284: > 282: > 283: // Note: it is possible although highly improbable that failCount will > 284: // be > 0 even if everthing is working ok This sounds like either a bug or a testcase bug? Same as line 301, `extremely remote possibility of > 1 match`? test/jdk/java/lang/StringBuffer/IndexOf.java line 295: > 293: sourceString = generateTestString(99, 100); > 294: sourceBuffer = new StringBuffer(sourceString); > 295: targetString = generateTestString(10, 11); Generate a random int [0,1,2] for LL, UU, UL, pass that as parameter to generateTestString() to test the other paths. Same for other tests in this file using this pattern. This test is specific to haystacklen=100, needlelen=10.. what about other haystack/needle sizes to exercise all the paths in the intrinsic assembler (i.e. haystack >=, <=32, needlelen ={1,2,3,4,5..32..}). Elsewhere already? test/jdk/java/lang/StringBuffer/IndexOf.java line 360: > 358: System.err.println(" sAnswer = " + sAnswer + ", sbAnswer = " + sbAnswer); > 359: System.err.println(" testString = '" + testString + "'"); > 360: System.err.println(" testBuffer = '" + testBuffer + "'"); tracing left here and further down test/micro/org/openjdk/bench/java/lang/StringIndexOfHuge.java line 2: > 1: /* > 2: * Copyright (c) 2014, 2024, Oracle and/or its affiliates. All rights reserved. New file, just 2024 test/micro/org/openjdk/bench/java/lang/StringIndexOfHuge.java line 81: > 79: lateMatchString16 = dataStringHuge16.substring(dataStringHuge16.length() - 31); > 80: > 81: searchString = "oscar"; Would had liked to see a few more small needles (i.e. to test/verify individual switch statement cases) test/micro/org/openjdk/bench/java/lang/StringIndexOfHuge.java line 94: > 92: > 93: > 94: /** IndexOf Micros */ Would really had preferred @Param{"LL", "UU", "UL"}; would be easier to spot if there are any copy/paste errors.. test/micro/org/openjdk/bench/java/lang/StringIndexOfHuge.java line 132: > 130: @Benchmark > 131: public int searchHugeLargeSubstring() { > 132: return dataStringHuge.indexOf("B".repeat(30) + "X" + "A".repeat(30), 74); .repeat() call and string concatenation shouldn't be part of the benchmark (here and several other @Benchmark functions in this file) since it will detract from the measurement. (String concatenation gets converted (by javac) into StringBuilder().append().append()....append().toString()) test/micro/org/openjdk/bench/java/lang/StringIndexOfHuge.java line 242: > 240: @Benchmark > 241: public int search16HugeLargeSubstring16() { > 242: return dataStringHuge16.indexOf("B".repeat(30) + "X" + "A".repeat(30), 74); `search16HugeLargeSubstring16` implies UU, but with `"B".repeat(30) + "X" + "A".repeat(30)` is UL ------------- PR Review: https://git.openjdk.org/jdk/pull/16753#pullrequestreview-2058681000 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1602136400 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1602140456 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1602137044 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1602158011 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1602160330 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1602144091 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1602147967 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1602153043 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1602181943 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1602162587 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1602167728 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1602184697 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1602198158 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1602171418 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1602200123 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1602133525 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1602130679 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1602047091 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1602115797 From duke at openjdk.org Wed May 15 20:26:17 2024 From: duke at openjdk.org (Volodymyr Paprotski) Date: Wed, 15 May 2024 20:26:17 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v19] In-Reply-To: References: Message-ID: On Wed, 15 May 2024 19:21:37 GMT, Volodymyr Paprotski wrote: >> Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: >> >> Rearrange; add lambdas for clarity > > test/jdk/java/lang/StringBuffer/IndexOf.java line 47: > >> 45: char[] haystack_16 = new char[128]; >> 46: >> 47: for (int i = 0; i < 128; i++) { > > you can use `char` instead of `int` as iterator combine into single loop haystack[i] = (char) i; haystack_16[i] = (char) (i + 256); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1602141543 From sspitsyn at openjdk.org Wed May 15 20:29:17 2024 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Wed, 15 May 2024 20:29:17 GMT Subject: RFR: 8328083: degrade virtual thread support for GetObjectMonitorUsage [v7] In-Reply-To: <-lAT5GzHVRrOUJhhMLfV5CkkPA3DHHDUZfdE7CBOcHg=.ecb91a2d-82c2-4e94-a1f6-f84d7a8c2a87@github.com> References: <-lAT5GzHVRrOUJhhMLfV5CkkPA3DHHDUZfdE7CBOcHg=.ecb91a2d-82c2-4e94-a1f6-f84d7a8c2a87@github.com> Message-ID: > The fix is to degrade virtual threads support in the JVM TI `GetObjectMonitorUsage` function so that it is specified to only return an owner when the owner is a platform thread. Also, virtual threads are not listed in the both `waiters` and `notify_waiters` lists returned in the `jvmtiMonitorUsage` structure. Java 19 re-specified a number of JVMTI functions and events for virtual threads, we missed this one. > > The main motivation for degrading it now is that the object monitor implementation is being updated to allow virtual threads unmount while owning monitors. It would add overhead to record monitor usage when freezing/unmount, overhead that couldn't be tied to a JVMTI capability as the capability can be enabled at any time. > > `GetObjectMonitorUsage` was broken for 20+ years ([8247972](https://bugs.openjdk.org/browse/JDK-8247972)) without bug reports so it seems unlikely that the function is widely used. Degrading it to only return an owner when the owner is a platform thread has no compatibility impact for tooling that uses it in conjunction with `HotSpot` thread dumps or `ThreadMXBean`. > > One other point about `GetObjectMonitorUsage` is that it pre-dates j.u.concurrent in Java 5 so it can't be used to get a full picture of the lock usage in a program. > > The specs of the impacted `JDWP ObjectReference.MonitorInfo` command and the JDI `ObjectReference` `ownerThread()`, `waitingThreads()` and `entryCount()` methods are updated to match the JVM TI spec. > > Also, please, review the related CSR and Release Note: > - CSR: [8331422](https://bugs.openjdk.org/browse/JDK-8331422): degrade virtual thread support for GetObjectMonitorUsage > - RN: [8331465](https://bugs.openjdk.org/browse/JDK-8331465): Release Note: degrade virtual thread support for GetObjectMonitorUsage > > Testing: > - tested impacted and updated tests locally > - tested with mach5 tiers 1-6 Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: review: UNDO: removed incorrect simplification that removed a tmp local skipped ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19030/files - new: https://git.openjdk.org/jdk/pull/19030/files/f083fd65..7091a3f6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19030&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19030&range=05-06 Stats: 6 lines in 1 file changed: 2 ins; 1 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/19030.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19030/head:pull/19030 PR: https://git.openjdk.org/jdk/pull/19030 From sspitsyn at openjdk.org Wed May 15 20:37:03 2024 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Wed, 15 May 2024 20:37:03 GMT Subject: RFR: 8328083: degrade virtual thread support for GetObjectMonitorUsage [v4] In-Reply-To: References: <-lAT5GzHVRrOUJhhMLfV5CkkPA3DHHDUZfdE7CBOcHg=.ecb91a2d-82c2-4e94-a1f6-f84d7a8c2a87@github.com> Message-ID: <0YzEv8zaHhoN7u7Iq8xN63266pMnnpGAeC-UkAbWLtg=.eb8764d0-fe4e-4bb2-895b-aaf3c97f5f85@github.com> On Wed, 15 May 2024 20:09:52 GMT, Serguei Spitsyn wrote: >> src/hotspot/share/prims/jvmtiEnvBase.cpp line 1535: >> >>> 1533: bool is_virtual = java_lang_VirtualThread::is_instance(thread_oop); >>> 1534: if (is_virtual) { >>> 1535: skipped++; >> >> Do we really need to maintain `skipped`. Isn't not adding to `nWait` the same as skipping? > > Good suggestion, thanks. Fixed now. I've undone this suggested simplification as it has not worked out. Please, see my answer on your next comment. >> src/hotspot/share/prims/jvmtiEnvBase.cpp line 1583: >> >>> 1581: assert(w != nullptr, "sanity check"); >>> 1582: if (java_lang_VirtualThread::is_instance(thread_oop)) { >>> 1583: skipped++; >> >> I don't think maintaining `skipped` does anything here. > > Thank you for the question. It is needed at the line 1586 below to discount the index: > > if (java_lang_VirtualThread::is_instance(thread_oop)) { > skipped++; > } else { > // If the thread was found on the ObjectWaiter list, then > // it has not been notified. > Handle th(current_thread, get_vthread_or_thread_oop(w)); > 1586: ret.notify_waiters[i - skipped] = (jthread)jni_reference(calling_thread, th); > } BTW: The simplification (getting rid of local `skipped`) you requested in previous comment damaged this fragment by making it incorrect. Here we need the `nWait` to account for virtual threads as well. Otherwise, the loop is shorted. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19030#discussion_r1602212314 PR Review Comment: https://git.openjdk.org/jdk/pull/19030#discussion_r1602210128 From sviswanathan at openjdk.org Wed May 15 21:13:13 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Wed, 15 May 2024 21:13:13 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v19] In-Reply-To: References: Message-ID: On Sat, 4 May 2024 19:35:21 GMT, Scott Gibbons wrote: >> Re-write the IndexOf code without the use of the pcmpestri instruction, only using AVX2 instructions. This change accelerates String.IndexOf on average 1.3x for AVX2. The benchmark numbers: >> >> >> Benchmark Score Latest >> StringIndexOf.advancedWithMediumSub 343.573 317.934 0.925375393x >> StringIndexOf.advancedWithShortSub1 1039.081 1053.96 1.014319384x >> StringIndexOf.advancedWithShortSub2 55.828 110.541 1.980027943x >> StringIndexOf.constantPattern 9.361 11.906 1.271872663x >> StringIndexOf.searchCharLongSuccess 4.216 4.218 1.000474383x >> StringIndexOf.searchCharMediumSuccess 3.133 3.216 1.02649218x >> StringIndexOf.searchCharShortSuccess 3.76 3.761 1.000265957x >> StringIndexOf.success 9.186 9.713 1.057369911x >> StringIndexOf.successBig 14.341 46.343 3.231504079x >> StringIndexOfChar.latin1_AVX2_String 6220.918 12154.52 1.953814533x >> StringIndexOfChar.latin1_AVX2_char 5503.556 5540.044 1.006629895x >> StringIndexOfChar.latin1_SSE4_String 6978.854 6818.689 0.977049957x >> StringIndexOfChar.latin1_SSE4_char 5657.499 5474.624 0.967675646x >> StringIndexOfChar.latin1_Short_String 7132.541 6863.359 0.962260014x >> StringIndexOfChar.latin1_Short_char 16013.389 16162.437 1.009307711x >> StringIndexOfChar.latin1_mixed_String 7386.123 14771.622 1.999915517x >> StringIndexOfChar.latin1_mixed_char 9901.671 9782.245 0.987938803 > > Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: > > Rearrange; add lambdas for clarity src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1476: > 1474: _masm); > 1475: > 1476: __ movq(r11, -1); There doesn't seem to be a use of r11 below in this function. src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1493: > 1491: // Assume r10 is n - k > 1492: __ leaq(last, Address(haystack, r10, Address::times_1, isU ? -30 : -31)); > 1493: __ jmpb(temp); Need to pass r10 as parameter. Also temp label could be given a better name. src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1502: > 1500: > 1501: __ cmpq(hsPtrRet, last); > 1502: __ cmovq(Assembler::aboveEqual, hsPtrRet, last); cmovq is expensive, better sequence would be: __ cmpq(hsPtrRet, last); __ jb_b(temp); __ movq(hsPtrRet, last); src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1510: > 1508: compare_big_haystack_to_needle(sizeKnown, size, NUMBER_OF_NEEDLE_BYTES_TO_COMPARE, loop_top, hsPtrRet, hsLength, > 1509: needleLen, isU, DO_EARLY_BAILOUT, eq_mask, temp2, r10, _masm); > 1510: At this point hsLength is not the remaining length from hsPtrRet, would that cause a problem? If not, all the special paths in compare_big_haystack_to_needle need not be generated on this call. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1602016421 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1601943761 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1602251994 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1602010926 From cjplummer at openjdk.org Wed May 15 21:15:04 2024 From: cjplummer at openjdk.org (Chris Plummer) Date: Wed, 15 May 2024 21:15:04 GMT Subject: RFR: 8328083: degrade virtual thread support for GetObjectMonitorUsage [v7] In-Reply-To: References: <-lAT5GzHVRrOUJhhMLfV5CkkPA3DHHDUZfdE7CBOcHg=.ecb91a2d-82c2-4e94-a1f6-f84d7a8c2a87@github.com> Message-ID: On Wed, 15 May 2024 20:29:17 GMT, Serguei Spitsyn wrote: >> The fix is to degrade virtual threads support in the JVM TI `GetObjectMonitorUsage` function so that it is specified to only return an owner when the owner is a platform thread. Also, virtual threads are not listed in the both `waiters` and `notify_waiters` lists returned in the `jvmtiMonitorUsage` structure. Java 19 re-specified a number of JVMTI functions and events for virtual threads, we missed this one. >> >> The main motivation for degrading it now is that the object monitor implementation is being updated to allow virtual threads unmount while owning monitors. It would add overhead to record monitor usage when freezing/unmount, overhead that couldn't be tied to a JVMTI capability as the capability can be enabled at any time. >> >> `GetObjectMonitorUsage` was broken for 20+ years ([8247972](https://bugs.openjdk.org/browse/JDK-8247972)) without bug reports so it seems unlikely that the function is widely used. Degrading it to only return an owner when the owner is a platform thread has no compatibility impact for tooling that uses it in conjunction with `HotSpot` thread dumps or `ThreadMXBean`. >> >> One other point about `GetObjectMonitorUsage` is that it pre-dates j.u.concurrent in Java 5 so it can't be used to get a full picture of the lock usage in a program. >> >> The specs of the impacted `JDWP ObjectReference.MonitorInfo` command and the JDI `ObjectReference` `ownerThread()`, `waitingThreads()` and `entryCount()` methods are updated to match the JVM TI spec. >> >> Also, please, review the related CSR and Release Note: >> - CSR: [8331422](https://bugs.openjdk.org/browse/JDK-8331422): degrade virtual thread support for GetObjectMonitorUsage >> - RN: [8331465](https://bugs.openjdk.org/browse/JDK-8331465): Release Note: degrade virtual thread support for GetObjectMonitorUsage >> >> Testing: >> - tested impacted and updated tests locally >> - tested with mach5 tiers 1-6 > > Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: > > review: UNDO: removed incorrect simplification that removed a tmp local skipped Changes look good now. ------------- Marked as reviewed by cjplummer (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19030#pullrequestreview-2059028269 From sspitsyn at openjdk.org Wed May 15 22:03:04 2024 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Wed, 15 May 2024 22:03:04 GMT Subject: RFR: 8328083: degrade virtual thread support for GetObjectMonitorUsage [v7] In-Reply-To: References: <-lAT5GzHVRrOUJhhMLfV5CkkPA3DHHDUZfdE7CBOcHg=.ecb91a2d-82c2-4e94-a1f6-f84d7a8c2a87@github.com> Message-ID: On Wed, 15 May 2024 20:29:17 GMT, Serguei Spitsyn wrote: >> The fix is to degrade virtual threads support in the JVM TI `GetObjectMonitorUsage` function so that it is specified to only return an owner when the owner is a platform thread. Also, virtual threads are not listed in the both `waiters` and `notify_waiters` lists returned in the `jvmtiMonitorUsage` structure. Java 19 re-specified a number of JVMTI functions and events for virtual threads, we missed this one. >> >> The main motivation for degrading it now is that the object monitor implementation is being updated to allow virtual threads unmount while owning monitors. It would add overhead to record monitor usage when freezing/unmount, overhead that couldn't be tied to a JVMTI capability as the capability can be enabled at any time. >> >> `GetObjectMonitorUsage` was broken for 20+ years ([8247972](https://bugs.openjdk.org/browse/JDK-8247972)) without bug reports so it seems unlikely that the function is widely used. Degrading it to only return an owner when the owner is a platform thread has no compatibility impact for tooling that uses it in conjunction with `HotSpot` thread dumps or `ThreadMXBean`. >> >> One other point about `GetObjectMonitorUsage` is that it pre-dates j.u.concurrent in Java 5 so it can't be used to get a full picture of the lock usage in a program. >> >> The specs of the impacted `JDWP ObjectReference.MonitorInfo` command and the JDI `ObjectReference` `ownerThread()`, `waitingThreads()` and `entryCount()` methods are updated to match the JVM TI spec. >> >> Also, please, review the related CSR and Release Note: >> - CSR: [8331422](https://bugs.openjdk.org/browse/JDK-8331422): degrade virtual thread support for GetObjectMonitorUsage >> - RN: [8331465](https://bugs.openjdk.org/browse/JDK-8331465): Release Note: degrade virtual thread support for GetObjectMonitorUsage >> >> Testing: >> - tested impacted and updated tests locally >> - tested with mach5 tiers 1-6 > > Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: > > review: UNDO: removed incorrect simplification that removed a tmp local skipped Thank you for review, Chris! ------------- PR Comment: https://git.openjdk.org/jdk/pull/19030#issuecomment-2113523235 From zgu at openjdk.org Wed May 15 22:05:01 2024 From: zgu at openjdk.org (Zhengyu Gu) Date: Wed, 15 May 2024 22:05:01 GMT Subject: RFR: 8331572: Allow using OopMapCache outside of STW GC phases In-Reply-To: <0FHsLntrHofCG7x31n4Worx5TdfoBZ7jGCTkDqJJU8M=.2e89548b-c70a-43e2-a91e-1a80f954188d@github.com> References: <0FHsLntrHofCG7x31n4Worx5TdfoBZ7jGCTkDqJJU8M=.2e89548b-c70a-43e2-a91e-1a80f954188d@github.com> Message-ID: On Wed, 15 May 2024 20:05:12 GMT, Coleen Phillimore wrote: > This looks good but one question for ZGC, does ZGC need an OopMapCache::cleanup_old_entries() ? We still call `OopMapCache::cleanup_old_entries() ` after STW pause, but now concurrent phase also can accumulate old entries, should we unify them? e.g. all depend on service thread to clean up old entries? ------------- PR Comment: https://git.openjdk.org/jdk/pull/19229#issuecomment-2113526362 From sspitsyn at openjdk.org Thu May 16 02:41:17 2024 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Thu, 16 May 2024 02:41:17 GMT Subject: RFR: 8326716: JVMTI spec: clarify what nullptr means for C/C++ developers Message-ID: The following RFE was fixed recently: [8324680](https://bugs.openjdk.org/browse/JDK-8324680): Replace NULL with nullptr in JVMTI generated code It replaced all the `NULL`'s in the generated spec with`nullptr`. JVMTI agents can be developed in C or C++. This update is to make it clear that `nullptr` is C programming language `null` pointer. I think we do not need a CSR for this fix. Testing: N/A (not needed) ------------- Commit messages: - 8326716: JVMTI spec: clarify what nullptr means for C/C++ developers Changes: https://git.openjdk.org/jdk/pull/19257/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19257&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8326716 Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/19257.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19257/head:pull/19257 PR: https://git.openjdk.org/jdk/pull/19257 From fyang at openjdk.org Thu May 16 06:47:01 2024 From: fyang at openjdk.org (Fei Yang) Date: Thu, 16 May 2024 06:47:01 GMT Subject: RFR: 8332265: RISC-V: Materialize pointers faster by using a temp register In-Reply-To: References: Message-ID: On Wed, 15 May 2024 09:34:11 GMT, Robbin Ehn wrote: > Hi, please consider! > > Materializing a 48-bit pointer, using an additional register, we can do with: > lui + lui + slli + add + addi > This 15% faster both on VF2 and in CPU models, compared to movptr(). > > As we often materialize during calls there is free registers. > > I have choose just a few spot to use it, many more can use. > E.g. la() with tmp register can use li48 instead of movptr. > > Running tests now (so far so good), as if I screwed up IC calls it should be seen fast. > And benchmarks when hardware is free. Hi, This looks interesting. Could all the movptr callsites be changed? I am asking this as I am a bit worried about the complexity / reward ratio when we have both movptr and li48 which are the same in functionality. ------------- PR Review: https://git.openjdk.org/jdk/pull/19246#pullrequestreview-2059730141 From shade at openjdk.org Thu May 16 07:25:03 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 16 May 2024 07:25:03 GMT Subject: RFR: 8331572: Allow using OopMapCache outside of STW GC phases In-Reply-To: References: Message-ID: On Wed, 15 May 2024 20:06:31 GMT, Coleen Phillimore wrote: >> As the reproducer in the issue shows, we would also like to use the `OopMapCache` during the concurrent GC phases. Zhengyu mentions there is also a production problem for stack walking that would benefit from letting `OopMapCache` be used without looking at GC at all. >> >> This PR unblocks `OopMapCache` uses for everything. Cleanups are nominally done by service thread. But, still appreciating that majority of use cases would be from GCs, we leave the proactive cleanups from the GC ops here as well. It requires the synchronization between readers that might be copying out the entries out of the hashmap and the concurrent reclamation. Handily, `GlobalCounter` can be used for that purpose. >> >> After this lands, I think we can go over `OopMapCache::compute_one_oop_map` uses and see if they would instead like to use the cached `lookup` to benefit from this cache too. I think those paths are for OSR and deopts, so their performance is unlikely to be critical. This PR already covers the concurrent GC paths well. >> >> Additional testing: >> - [x] Performance test reproducer from the bug improves significantly >> - [x] Linux AArch64 server fastdebug, `hotspot_gc_shenandoah` (10x) >> - [x] Linux AArch64 server fastdebug, `all` >> - [x] Linux x86_64 server fastdebug, `all` > > src/hotspot/share/interpreter/oopMapCache.cpp line 545: > >> 543: >> 544: // First search for an empty slot >> 545: for (int i = 0; i < _probe_depth; i++) { > > Does the GlobalCounter read barrier belong around this too? I don't think so: GlobalCounter guards against the reclamation of `OopMapCacheEntry`-es, so we only need to protect the paths that access their contents. We don't need it for anything else, like just poking into the array slots here. I used to have the critical section that spans this entire method, but reasoned it was excessive. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19229#discussion_r1602752873 From shade at openjdk.org Thu May 16 07:55:28 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 16 May 2024 07:55:28 GMT Subject: RFR: 8331572: Allow using OopMapCache outside of STW GC phases [v2] In-Reply-To: References: Message-ID: > As the reproducer in the issue shows, we would also like to use the `OopMapCache` during the concurrent GC phases. Zhengyu mentions there is also a production problem for stack walking that would benefit from letting `OopMapCache` be used without looking at GC at all. > > This PR unblocks `OopMapCache` uses for everything. Cleanups are nominally done by service thread. But, still appreciating that majority of use cases would be from GCs, we leave the proactive cleanups from the GC ops here as well. It requires the synchronization between readers that might be copying out the entries out of the hashmap and the concurrent reclamation. Handily, `GlobalCounter` can be used for that purpose. > > After this lands, I think we can go over `OopMapCache::compute_one_oop_map` uses and see if they would instead like to use the cached `lookup` to benefit from this cache too. I think those paths are for OSR and deopts, so their performance is unlikely to be critical. This PR already covers the concurrent GC paths well. > > Additional testing: > - [x] Performance test reproducer from the bug improves significantly > - [x] Linux AArch64 server fastdebug, `hotspot_gc_shenandoah` (10x) > - [x] Linux AArch64 server fastdebug, `all` > - [x] Linux x86_64 server fastdebug, `all` Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: Notify service thread on first enqueue ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19229/files - new: https://git.openjdk.org/jdk/pull/19229/files/455687ad..29dee418 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19229&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19229&range=00-01 Stats: 6 lines in 1 file changed: 5 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/19229.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19229/head:pull/19229 PR: https://git.openjdk.org/jdk/pull/19229 From shade at openjdk.org Thu May 16 07:55:28 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 16 May 2024 07:55:28 GMT Subject: RFR: 8331572: Allow using OopMapCache outside of STW GC phases [v2] In-Reply-To: References: Message-ID: On Wed, 15 May 2024 20:07:57 GMT, Coleen Phillimore wrote: >> Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: >> >> Notify service thread on first enqueue > > src/hotspot/share/interpreter/oopMapCache.cpp line 593: > >> 591: bool OopMapCache::has_cleanup_work() { >> 592: return Atomic::load(&_old_entries) != nullptr; >> 593: } > > Does this need to notify the ServiceThread? Since the ServiceThread is now a timed wait, maybe this is fine. Right. I have not realized the service thread timed wait addition was recent. So it would become a problem if we pull this patch to the JDK release where service thread needs to be explicitly notified. We need to do this carefully, though, since we don't want to acquire `ServiceLock` all that often or inundate the thread with requests. So I added a simple notification when queue is populated with the first element. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19229#discussion_r1602795482 From rehn at openjdk.org Thu May 16 07:58:02 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Thu, 16 May 2024 07:58:02 GMT Subject: RFR: 8332265: RISC-V: Materialize pointers faster by using a temp register In-Reply-To: References: Message-ID: <6hYm5BI8U_kB2R5XolQoBK9dXvmlmlynwhm7pt7Pi-g=.b168fee4-bff2-42e5-8816-b97776135a2c@github.com> On Wed, 15 May 2024 09:34:11 GMT, Robbin Ehn wrote: > Hi, please consider! > > Materializing a 48-bit pointer, using an additional register, we can do with: > lui + lui + slli + add + addi > This 15% faster both on VF2 and in CPU models, compared to movptr(). > > As we often materialize during calls there is free registers. > > I have choose just a few spot to use it, many more can use. > E.g. la() with tmp register can use li48 instead of movptr. > > Running tests now (so far so good), as if I screwed up IC calls it should be seen fast. > And benchmarks when hardware is free. Yes, but it's a long term job, as you need to free a register in many cases. (in non-call sites places) All callsites should be easy to change as you have plenty of callee saved registers which are already saved when using movptr. As li48 is faster than li when using more than 32-bits these cases should also use li48. I.e. mv t0, addr But mv is fishy partly because of RegisterOrConstant constructor, so we can't tell in mv if this was an address or not. I have been looking into cleaning that up, so mv with literal and mv with address is two seperate cases. To keep them apart would be to use e.g. "li reg, literal" and "li48 reg, temp_reg, address". As there is much work, this PR is intended as the first step with the hardest peices implemented already, i.e. li48 is ready to go. If we also fix mov_metadata la()->li48 we reduce static call stub size down from 12 to 10 instruction, which is significant. That one is on my todo list. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19246#issuecomment-2114315982 From shade at openjdk.org Thu May 16 07:58:05 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 16 May 2024 07:58:05 GMT Subject: RFR: 8331572: Allow using OopMapCache outside of STW GC phases [v2] In-Reply-To: References: <0FHsLntrHofCG7x31n4Worx5TdfoBZ7jGCTkDqJJU8M=.2e89548b-c70a-43e2-a91e-1a80f954188d@github.com> Message-ID: <4T5P_gVCfrkiSHmRtR7Qew6wmmit9Pa0Z3z2jZwZt0A=.8f190081-bef9-48e0-9c8d-7d1032e0d204@github.com> On Wed, 15 May 2024 22:02:55 GMT, Zhengyu Gu wrote: > > This looks good but one question for ZGC, does ZGC need an OopMapCache::cleanup_old_entries() ? > > We still call `OopMapCache::cleanup_old_entries() ` after STW pause, but now concurrent phase also can accumulate old entries, should we unify them? e.g. all depend on service thread to clean up old entries? Yes. I think the post-GC cleanup is opportunistic after this patch: it is not necessary, since service thread is supposed to catch up with cleanups, but we might still do it after the phases that we know might generate lots of old entries. This is why I left current calls to `cleanup_old_entries()` in current paths, and we might consider adding those for ZGC paths as well. I don't see a strong pressure to do it here, though. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19229#issuecomment-2114328579 From alanb at openjdk.org Thu May 16 08:01:05 2024 From: alanb at openjdk.org (Alan Bateman) Date: Thu, 16 May 2024 08:01:05 GMT Subject: RFR: 8326716: JVMTI spec: clarify what nullptr means for C/C++ developers In-Reply-To: References: Message-ID: On Thu, 16 May 2024 02:37:40 GMT, Serguei Spitsyn wrote: > The following RFE was fixed recently: > [8324680](https://bugs.openjdk.org/browse/JDK-8324680): Replace NULL with nullptr in JVMTI generated code > > It replaced all the `NULL`'s in the generated spec with`nullptr`. JVMTI agents can be developed in C or C++. > This update is to make it clear that `nullptr` is C programming language `null` pointer. > > I think we do not need a CSR for this fix. > > Testing: N/A (not needed) src/hotspot/share/prims/jvmti.xml line 1008: > 1006: function descriptions. Empty lists, arrays, sequences, etc are > 1007: returned as nullptr which is C programming language > 1008: null pointer. Shouldn't this be "NULL"? In any case, I think it would be helpful to expand this a bit to make it clear that usages of "nullptr" in parameter and error descriptions should be read or treated as "NULL" when developing an agent in C rather than C++. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19257#discussion_r1602803174 From jpai at openjdk.org Thu May 16 11:24:10 2024 From: jpai at openjdk.org (Jaikiran Pai) Date: Thu, 16 May 2024 11:24:10 GMT Subject: RFR: 8331671: Implement JEP 472: Prepare to Restrict the Use of JNI [v6] In-Reply-To: References: Message-ID: On Wed, 15 May 2024 16:08:17 GMT, Maurizio Cimadamore wrote: >> This PR implements [JEP 472](https://openjdk.org/jeps/472), by restricting the use of JNI in the following ways: >> >> * `System::load` and `System::loadLibrary` are now restricted methods >> * `Runtime::load` and `Runtime::loadLibrary` are now restricted methods >> * binding a JNI `native` method declaration to a native implementation is now considered a restricted operation >> >> This PR slightly changes the way in which the JDK deals with restricted methods, even for FFM API calls. In Java 22, the single `--enable-native-access` was used both to specify a set of modules for which native access should be allowed *and* to specify whether illegal native access (that is, native access occurring from a module not specified by `--enable-native-access`) should be treated as an error or a warning. More specifically, an error is only issued if the `--enable-native-access flag` is used at least once. >> >> Here, a new flag is introduced, namely `illegal-native-access=allow/warn/deny`, which is used to specify what should happen when access to a restricted method and/or functionality is found outside the set of modules specified with `--enable-native-access`. The default policy is `warn`, but users can select `allow` to suppress the warnings, or `deny` to cause `IllegalCallerException` to be thrown. This aligns the treatment of restricted methods with other mechanisms, such as `--illegal-access` and the more recent `--sun-misc-unsafe-memory-access`. >> >> Some changes were required in the package-info javadoc for `java.lang.foreign`, to reflect the changes in the command line flags described above. > > Maurizio Cimadamore has updated the pull request incrementally with one additional commit since the last revision: > > Address review comment src/java.base/share/classes/sun/launcher/resources/launcher.properties line 72: > 70: \ by code in modules for which native access is not explicitly enabled.\n\ > 71: \ is one of "deny", "warn" or "allow".\n\ > 72: \ This option will be removed in a future release.\n\ Should this specify the current default value for this option if it isn't set? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19213#discussion_r1603157916 From rrich at openjdk.org Thu May 16 11:41:03 2024 From: rrich at openjdk.org (Richard Reingruber) Date: Thu, 16 May 2024 11:41:03 GMT Subject: RFR: 8329748: Change default value of AssertWXAtThreadSync to true In-Reply-To: References: Message-ID: On Mon, 6 May 2024 11:10:08 GMT, Tobias Holenstein wrote: > The debug flag `-XX:+AssertWXAtThreadSync` conservatively checks for correct W^X thread state at possible safepoints or handshake. The flag is useful to detect missing `MACOS_AARCH64_ONLY(ThreadWXEnable __wx(WXWrite, thread));`. Since the check is cheap and it is a `AARCH64_ONLY(develop(..))` only flag it makes sense to enable the flag by default. > > There was one missing `MACOS_AARCH64_ONLY(ThreadWXEnable __wx(WXWrite, thread));` to make all tests (tier1-7) pass. Looks reasonable to me. Thanks, Richard. ------------- Marked as reviewed by rrich (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19102#pullrequestreview-2060486108 From jpai at openjdk.org Thu May 16 11:45:06 2024 From: jpai at openjdk.org (Jaikiran Pai) Date: Thu, 16 May 2024 11:45:06 GMT Subject: RFR: 8331671: Implement JEP 472: Prepare to Restrict the Use of JNI [v6] In-Reply-To: References: Message-ID: On Wed, 15 May 2024 16:08:17 GMT, Maurizio Cimadamore wrote: >> This PR implements [JEP 472](https://openjdk.org/jeps/472), by restricting the use of JNI in the following ways: >> >> * `System::load` and `System::loadLibrary` are now restricted methods >> * `Runtime::load` and `Runtime::loadLibrary` are now restricted methods >> * binding a JNI `native` method declaration to a native implementation is now considered a restricted operation >> >> This PR slightly changes the way in which the JDK deals with restricted methods, even for FFM API calls. In Java 22, the single `--enable-native-access` was used both to specify a set of modules for which native access should be allowed *and* to specify whether illegal native access (that is, native access occurring from a module not specified by `--enable-native-access`) should be treated as an error or a warning. More specifically, an error is only issued if the `--enable-native-access flag` is used at least once. >> >> Here, a new flag is introduced, namely `illegal-native-access=allow/warn/deny`, which is used to specify what should happen when access to a restricted method and/or functionality is found outside the set of modules specified with `--enable-native-access`. The default policy is `warn`, but users can select `allow` to suppress the warnings, or `deny` to cause `IllegalCallerException` to be thrown. This aligns the treatment of restricted methods with other mechanisms, such as `--illegal-access` and the more recent `--sun-misc-unsafe-memory-access`. >> >> Some changes were required in the package-info javadoc for `java.lang.foreign`, to reflect the changes in the command line flags described above. > > Maurizio Cimadamore has updated the pull request incrementally with one additional commit since the last revision: > > Address review comment Hello Maurizio, in the current mainline, we have code in `LauncherHelper` https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/sun/launcher/LauncherHelper.java#L636 where we enable native access to all unnamed modules if an executable jar with `Enable-Native-Access: ALL-UNNAMED` manifest is being launched. For such executable jars, what is the expected semantics when the launch also explicitly has a `--enable-native-access=M1,M2` option. Something like: java --enable-native-access=M1,M2 -jar foo.jar where `foo.jar` has `Enable-Native-Access: ALL-UNNAMED` in its manifest. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19213#issuecomment-2115005638 From mcimadamore at openjdk.org Thu May 16 11:50:05 2024 From: mcimadamore at openjdk.org (Maurizio Cimadamore) Date: Thu, 16 May 2024 11:50:05 GMT Subject: RFR: 8331671: Implement JEP 472: Prepare to Restrict the Use of JNI [v6] In-Reply-To: References: Message-ID: On Thu, 16 May 2024 11:42:48 GMT, Jaikiran Pai wrote: > Hello Maurizio, in the current mainline, we have code in `LauncherHelper` https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/sun/launcher/LauncherHelper.java#L636 where we enable native access to all unnamed modules if an executable jar with `Enable-Native-Access: ALL-UNNAMED` manifest is being launched. For such executable jars, what is the expected semantics when the launch also explicitly has a `--enable-native-access=M1,M2` option. Something like: > > ``` > java --enable-native-access=M1,M2 -jar foo.jar > ``` > > where `foo.jar` has `Enable-Native-Access: ALL-UNNAMED` in its manifest. The options are additive - e.g. the enable-native-access in the manifest will add up to the enable-native-access in the command line, so effectively it will be as if you were running with --enable-native-access=M1,M2,ALL-UNNAMED > src/java.base/share/classes/sun/launcher/resources/launcher.properties line 72: > >> 70: \ by code in modules for which native access is not explicitly enabled.\n\ >> 71: \ is one of "deny", "warn" or "allow".\n\ >> 72: \ This option will be removed in a future release.\n\ > > Should this specify the current default value for this option if it isn't set? We already do, see https://github.com/openjdk/jdk/pull/19213/files/1c45e5d56c429205ab8185481bc1044a86ab3bc6#diff-d05029afe6aed86f860a10901114402f1f6af4fe1e4b46d883141ab1d2a527b8R582 ------------- PR Comment: https://git.openjdk.org/jdk/pull/19213#issuecomment-2115012361 PR Review Comment: https://git.openjdk.org/jdk/pull/19213#discussion_r1603195671 From jpai at openjdk.org Thu May 16 11:58:06 2024 From: jpai at openjdk.org (Jaikiran Pai) Date: Thu, 16 May 2024 11:58:06 GMT Subject: RFR: 8331671: Implement JEP 472: Prepare to Restrict the Use of JNI [v6] In-Reply-To: References: Message-ID: <6OAF_6PrZCouzDuhvwc8J6TSIUmBEc4HEi9Z-155BJ8=.4968dd9f-3939-4a49-9f29-57a901a7d12a@github.com> On Thu, 16 May 2024 11:47:13 GMT, Maurizio Cimadamore wrote: >> src/java.base/share/classes/sun/launcher/resources/launcher.properties line 72: >> >>> 70: \ by code in modules for which native access is not explicitly enabled.\n\ >>> 71: \ is one of "deny", "warn" or "allow".\n\ >>> 72: \ This option will be removed in a future release.\n\ >> >> Should this specify the current default value for this option if it isn't set? > > We already do, see > https://github.com/openjdk/jdk/pull/19213/files/1c45e5d56c429205ab8185481bc1044a86ab3bc6#diff-d05029afe6aed86f860a10901114402f1f6af4fe1e4b46d883141ab1d2a527b8R582 This is slightly different from what we do in the other PR for unsafe memory access where we specify the default in the launcher's help text too https://github.com/openjdk/jdk/pull/19174/files#diff-799093930b698e97b23ead98c6496261af1e2e33ec7aa9261584870cbee8a5eaR219. I don't have a strong opinion on this, I think either one is fine. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19213#discussion_r1603205279 From mcimadamore at openjdk.org Thu May 16 12:20:05 2024 From: mcimadamore at openjdk.org (Maurizio Cimadamore) Date: Thu, 16 May 2024 12:20:05 GMT Subject: RFR: 8331671: Implement JEP 472: Prepare to Restrict the Use of JNI [v6] In-Reply-To: <6OAF_6PrZCouzDuhvwc8J6TSIUmBEc4HEi9Z-155BJ8=.4968dd9f-3939-4a49-9f29-57a901a7d12a@github.com> References: <6OAF_6PrZCouzDuhvwc8J6TSIUmBEc4HEi9Z-155BJ8=.4968dd9f-3939-4a49-9f29-57a901a7d12a@github.com> Message-ID: On Thu, 16 May 2024 11:55:35 GMT, Jaikiran Pai wrote: >> We already do, see >> https://github.com/openjdk/jdk/pull/19213/files/1c45e5d56c429205ab8185481bc1044a86ab3bc6#diff-d05029afe6aed86f860a10901114402f1f6af4fe1e4b46d883141ab1d2a527b8R582 > > This is slightly different from what we do in the other PR for unsafe memory access where we specify the default in the launcher's help text too https://github.com/openjdk/jdk/pull/19174/files#diff-799093930b698e97b23ead98c6496261af1e2e33ec7aa9261584870cbee8a5eaR219. > > I don't have a strong opinion on this, I think either one is fine. Ah, apologies, I was looking in the wrong place. I agree that we should specify default in the launcher, as well as in the man pages. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19213#discussion_r1603233038 From mcimadamore at openjdk.org Thu May 16 12:23:44 2024 From: mcimadamore at openjdk.org (Maurizio Cimadamore) Date: Thu, 16 May 2024 12:23:44 GMT Subject: RFR: 8331671: Implement JEP 472: Prepare to Restrict the Use of JNI [v7] In-Reply-To: References: Message-ID: > This PR implements [JEP 472](https://openjdk.org/jeps/472), by restricting the use of JNI in the following ways: > > * `System::load` and `System::loadLibrary` are now restricted methods > * `Runtime::load` and `Runtime::loadLibrary` are now restricted methods > * binding a JNI `native` method declaration to a native implementation is now considered a restricted operation > > This PR slightly changes the way in which the JDK deals with restricted methods, even for FFM API calls. In Java 22, the single `--enable-native-access` was used both to specify a set of modules for which native access should be allowed *and* to specify whether illegal native access (that is, native access occurring from a module not specified by `--enable-native-access`) should be treated as an error or a warning. More specifically, an error is only issued if the `--enable-native-access flag` is used at least once. > > Here, a new flag is introduced, namely `illegal-native-access=allow/warn/deny`, which is used to specify what should happen when access to a restricted method and/or functionality is found outside the set of modules specified with `--enable-native-access`. The default policy is `warn`, but users can select `allow` to suppress the warnings, or `deny` to cause `IllegalCallerException` to be thrown. This aligns the treatment of restricted methods with other mechanisms, such as `--illegal-access` and the more recent `--sun-misc-unsafe-memory-access`. > > Some changes were required in the package-info javadoc for `java.lang.foreign`, to reflect the changes in the command line flags described above. Maurizio Cimadamore has updated the pull request incrementally with one additional commit since the last revision: Add note on --illegal-native-access default value in the launcher help ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19213/files - new: https://git.openjdk.org/jdk/pull/19213/files/1c45e5d5..3a0db276 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19213&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19213&range=05-06 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/19213.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19213/head:pull/19213 PR: https://git.openjdk.org/jdk/pull/19213 From jsjolen at openjdk.org Thu May 16 12:30:43 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Thu, 16 May 2024 12:30:43 GMT Subject: RFR: 8312132: Add tracking of multiple address spaces in NMT [v89] In-Reply-To: References: Message-ID: > Hi, > > This PR introduces a new abstraction to NMT, named `MemoryFileTracker`. Today, NMT does not track any memory outside of the virtual memory address space. This means that if you allocated memory in something such as a memory-backed file and use `mmap` to map into that memory, then you'll have trouble reporting this to NMT. This is the situation that ZGC is in, and that is what this patch attempts to fix. > > ## `MemoryFileTracker` > > The `MemoryFileTracker` adds the ability of adding new virtual memory address spaces to NMT and committing memory to these, the basic API is: > > ```c++ > static MemoryFile* make_device(const char* descriptive_name); > static void free_device(MemoryFile* device); > > static void allocate_memory(MemoryFile* device, size_t offset, size_t size, > MEMFLAGS flag, const NativeCallStack& stack); > static void free_memory(MemoryFile* device, size_t offset, size_t size); > > > It is easiest to see how this is used by looking at what ZGC's `ZNMT` class does: > > ```c++ > void ZNMT::reserve(zaddress_unsafe start, size_t size) { > MemTracker::record_virtual_memory_reserve((address)start, size, CALLER_PC, mtJavaHeap); > } > void ZNMT::commit(zoffset offset, size_t size) { > MemTracker::allocate_memory_in(ZNMT::_device, static_cast(offset), size, mtJavaHeap, CALLER_PC); > } > void ZNMT::uncommit(zoffset offset, size_t size) { > MemTracker::free_memory_in(ZNMT::_device, (size_t)offset, size); > } > > void ZNMT::map(zaddress_unsafe addr, size_t size, zoffset offset) { > // NMT doesn't track mappings at the moment. > } > void ZNMT::unmap(zaddress_unsafe addr, size_t size) { > // NMT doesn't track mappings at the moment. > } > > > As you can see, any mapping between reserved regions and device-allocated memory is not recorded in NMT. This means that in detailed mode you only get reserved regions printed for the reserved memory, the device-allocated memory is reported separately. When performing summary reporting any memory allocated via these devices is added to the corresponding `MEMFLAGS` as `committed` memory. > > This patch is also acting as a base on which we deploy multiple new backend ideas to NMT. These ideas are: > > 1. Implement VMA tracking using a balanced binary tree approach. Today's `VirtualMemoryTracker`'s usage of linked lists is slow and brittle, we'd like to move away from it. Our Treap-based approach in this patch gives a performance boost such that we see 25x better performance in a benchmark. The idea and draft of this... Johan Sj?len has updated the pull request incrementally with two additional commits since the last revision: - Add corresponding tests to visit_in_order when applicable - Remove usage of auto in tests ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18289/files - new: https://git.openjdk.org/jdk/pull/18289/files/961d89ca..d546e26c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18289&range=88 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18289&range=87-88 Stats: 47 lines in 2 files changed: 30 ins; 0 del; 17 mod Patch: https://git.openjdk.org/jdk/pull/18289.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18289/head:pull/18289 PR: https://git.openjdk.org/jdk/pull/18289 From aph at openjdk.org Thu May 16 12:43:04 2024 From: aph at openjdk.org (Andrew Haley) Date: Thu, 16 May 2024 12:43:04 GMT Subject: RFR: 8322770: Implement C2 VectorizedHashCode on AArch64 In-Reply-To: References: <2VKOC-rT0vOyMcXUX2gs3sOrbZ5H79KBIo50sOOVmyI=.1936f78e-794c-4f54-af3c-b1b97e5fafa8@github.com> Message-ID: On Fri, 10 May 2024 12:38:46 GMT, Andrew Haley wrote: >> Hello, >> >> Please review the following PR for [JDK-8322770 Implement C2 VectorizedHashCode on AArch64](https://bugs.openjdk.org/browse/JDK-8322770). It follows previous work done in https://github.com/openjdk/jdk/pull/16629 and https://github.com/openjdk/jdk/pull/10847 for RISC-V and x86 respectively. >> >> The code to calculate a hash code consists of two parts: a vectorized loop of Neon instruction that process 4 or 8 elements per iteration depending on the data type and a fully unrolled scalar "loop" that processes up to 7 tail elements. >> >> At the time of writing this I don't see potential benefits from providing SVE/SVE2 implementation, but it could be added as a follow-up or independently later if required. >> >> # Performance >> >> ## Neoverse N1 >> >> >> -------------------------------------------------------------------------------------------- >> Version Baseline This patch >> -------------------------------------------------------------------------------------------- >> Benchmark (size) Mode Cnt Score Error Score Error Units >> -------------------------------------------------------------------------------------------- >> ArraysHashCode.bytes 1 avgt 15 1.249 ? 0.060 1.247 ? 0.062 ns/op >> ArraysHashCode.bytes 10 avgt 15 8.754 ? 0.028 4.387 ? 0.015 ns/op >> ArraysHashCode.bytes 100 avgt 15 98.596 ? 0.051 26.655 ? 0.097 ns/op >> ArraysHashCode.bytes 10000 avgt 15 10150.578 ? 1.352 2649.962 ? 216.744 ns/op >> ArraysHashCode.chars 1 avgt 15 1.286 ? 0.062 1.246 ? 0.054 ns/op >> ArraysHashCode.chars 10 avgt 15 8.731 ? 0.002 5.344 ? 0.003 ns/op >> ArraysHashCode.chars 100 avgt 15 98.632 ? 0.048 23.023 ? 0.142 ns/op >> ArraysHashCode.chars 10000 avgt 15 10150.658 ? 3.374 2410.504 ? 8.872 ns/op >> ArraysHashCode.ints 1 avgt 15 1.189 ? 0.005 1.187 ? 0.001 ns/op >> ArraysHashCode.ints 10 avgt 15 8.730 ? 0.002 5.676 ? 0.001 ns/op >> ArraysHashCode.ints 100 avgt 15 98.559 ? 0.016 24.378 ? 0.006 ns/op >> ArraysHashCode.ints 10000 avgt 15 10148.752 ? 1.336 2419.015 ? 0.492 ns/op >> ArraysHashCode.multibytes 1 avgt 15 1.037 ? 0.001 1.037 ? 0.001 ... > > Hi, > >> I can update the patch with current results on Monday and we could decide how to proceed with this PR after that. Sounds good? > > Yes, that's right. > Hi @theRealAph ! You may find the latest version here: [mikabl-arm at b3db421](https://github.com/mikabl-arm/jdk/commit/b3db421c795f683db1a001853990026bafc2ed4b) . I gave a short explanation in the commit message, feel free to ask for more details if required. > > Unfortunately, it still contains critical bugs and I won't be able to take a look into the issue before the next week at best. Until it's fixed, it's not possible to run the benchmarks. Although I expect it to improve performance on longer integer arrays based on a benchmark I've written in C++ and Assembly. The results aren't comparable to the jmh results, so I won't post them here. OK. One small thing, I think it's possible to rearrange things a bit to use `mlav`, which may help performance. No need for that until the code is correct, though. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18487#issuecomment-2115145669 From jsjolen at openjdk.org Thu May 16 12:58:12 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Thu, 16 May 2024 12:58:12 GMT Subject: RFR: 8312132: Add tracking of multiple address spaces in NMT [v89] In-Reply-To: References: Message-ID: On Thu, 16 May 2024 12:30:43 GMT, Johan Sj?len wrote: >> Hi, >> >> This PR introduces a new abstraction to NMT, named `MemoryFileTracker`. Today, NMT does not track any memory outside of the virtual memory address space. This means that if you allocated memory in something such as a memory-backed file and use `mmap` to map into that memory, then you'll have trouble reporting this to NMT. This is the situation that ZGC is in, and that is what this patch attempts to fix. >> >> ## `MemoryFileTracker` >> >> The `MemoryFileTracker` adds the ability of adding new virtual memory address spaces to NMT and committing memory to these, the basic API is: >> >> ```c++ >> static MemoryFile* make_device(const char* descriptive_name); >> static void free_device(MemoryFile* device); >> >> static void allocate_memory(MemoryFile* device, size_t offset, size_t size, >> MEMFLAGS flag, const NativeCallStack& stack); >> static void free_memory(MemoryFile* device, size_t offset, size_t size); >> >> >> It is easiest to see how this is used by looking at what ZGC's `ZNMT` class does: >> >> ```c++ >> void ZNMT::reserve(zaddress_unsafe start, size_t size) { >> MemTracker::record_virtual_memory_reserve((address)start, size, CALLER_PC, mtJavaHeap); >> } >> void ZNMT::commit(zoffset offset, size_t size) { >> MemTracker::allocate_memory_in(ZNMT::_device, static_cast(offset), size, mtJavaHeap, CALLER_PC); >> } >> void ZNMT::uncommit(zoffset offset, size_t size) { >> MemTracker::free_memory_in(ZNMT::_device, (size_t)offset, size); >> } >> >> void ZNMT::map(zaddress_unsafe addr, size_t size, zoffset offset) { >> // NMT doesn't track mappings at the moment. >> } >> void ZNMT::unmap(zaddress_unsafe addr, size_t size) { >> // NMT doesn't track mappings at the moment. >> } >> >> >> As you can see, any mapping between reserved regions and device-allocated memory is not recorded in NMT. This means that in detailed mode you only get reserved regions printed for the reserved memory, the device-allocated memory is reported separately. When performing summary reporting any memory allocated via these devices is added to the corresponding `MEMFLAGS` as `committed` memory. >> >> This patch is also acting as a base on which we deploy multiple new backend ideas to NMT. These ideas are: >> >> 1. Implement VMA tracking using a balanced binary tree approach. Today's `VirtualMemoryTracker`'s usage of linked lists is slow and brittle, we'd like to move away from it. Our Treap-based approach in this patch gives a performance bo... > > Johan Sj?len has updated the pull request incrementally with two additional commits since the last revision: > > - Add corresponding tests to visit_in_order when applicable > - Remove usage of auto in tests Some basic code coverage info: - nmtTreap: 99.2% - nmtNativeCallStackStorage.hpp: 84.6% - vmatree.cpp: 97.6% - vmatree.hpp: 87.8% I'll go through the specific coverage and see if I can find some more places to get some inspiration for tests. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18289#issuecomment-2115177754 From tschatzl at openjdk.org Thu May 16 13:10:02 2024 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Thu, 16 May 2024 13:10:02 GMT Subject: RFR: 8331711: G1 doesn't need pre write barrier for stores from new allocated objects [v2] In-Reply-To: References: <0OdHsQmnM80KQib8u-yWtCSCejCTIK8lJ_bpLk3O_9E=.d727d825-882e-4574-84d9-6a908138066c@github.com> Message-ID: <4qPcSxzNKXkLEbngTDAw9CWG2QYQhXmEKlC6wQPbWwA=.5bc878b0-3818-4130-b667-f8b9e652267f@github.com> On Thu, 9 May 2024 09:34:02 GMT, Liang Mao wrote: > > > > > > There is only 1 store that g1_can_remove_pre_barrier return false and was elided by this PR in JBB. > > > > > > Okay. That's what I expected. Given that we are about to remove all of this code in favour of more robust late barrier expansion, I feel like we can live without that one extra store barrier for now. > > ok. That's fairly reasonable. Could you please withdraw this PR so that it does not show up in the list of open PRs? Thanks, Thomas ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19098#discussion_r1603312243 From coleenp at openjdk.org Thu May 16 14:20:04 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 16 May 2024 14:20:04 GMT Subject: RFR: 8331572: Allow using OopMapCache outside of STW GC phases [v2] In-Reply-To: References: Message-ID: <4lTTfkHuxga5Lf4mNyVXRaAp5RB1O9bx7Ltptb3ZcDU=.14022077-357f-4073-af77-3c1da8db0172@github.com> On Thu, 16 May 2024 07:22:20 GMT, Aleksey Shipilev wrote: >> src/hotspot/share/interpreter/oopMapCache.cpp line 545: >> >>> 543: >>> 544: // First search for an empty slot >>> 545: for (int i = 0; i < _probe_depth; i++) { >> >> Does the GlobalCounter read barrier belong around this too? > > I don't think so: GlobalCounter guards against the reclamation of `OopMapCacheEntry`-es, so we only need to protect the paths that access their contents. We don't need it for anything else, like just poking into the array slots here. I used to have the critical section that spans this entire method, but reasoned it was excessive. Ok, I concur. It looks ok since the oopMapCache is an array so the search for the null container won't search through next pointers of entries that could be removed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19229#discussion_r1603438240 From coleenp at openjdk.org Thu May 16 14:20:03 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 16 May 2024 14:20:03 GMT Subject: RFR: 8331572: Allow using OopMapCache outside of STW GC phases [v2] In-Reply-To: References: Message-ID: On Thu, 16 May 2024 07:55:28 GMT, Aleksey Shipilev wrote: >> As the reproducer in the issue shows, we would also like to use the `OopMapCache` during the concurrent GC phases. Zhengyu mentions there is also a production problem for stack walking that would benefit from letting `OopMapCache` be used without looking at GC at all. >> >> This PR unblocks `OopMapCache` uses for everything. Cleanups are nominally done by service thread. But, still appreciating that majority of use cases would be from GCs, we leave the proactive cleanups from the GC ops here as well. It requires the synchronization between readers that might be copying out the entries out of the hashmap and the concurrent reclamation. Handily, `GlobalCounter` can be used for that purpose. >> >> After this lands, I think we can go over `OopMapCache::compute_one_oop_map` uses and see if they would instead like to use the cached `lookup` to benefit from this cache too. I think those paths are for OSR and deopts, so their performance is unlikely to be critical. This PR already covers the concurrent GC paths well. >> >> Additional testing: >> - [x] Performance test reproducer from the bug improves significantly >> - [x] Linux AArch64 server fastdebug, `hotspot_gc_shenandoah` (10x) >> - [x] Linux AArch64 server fastdebug, `all` >> - [x] Linux x86_64 server fastdebug, `all` > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Notify service thread on first enqueue This looks good. src/hotspot/share/interpreter/oopMapCache.cpp line 584: > 582: MutexLocker ml(Service_lock, Mutex::_no_safepoint_check_flag); > 583: Service_lock->notify_all(); > 584: } Yes, you don't want to do this too frequently. ------------- Marked as reviewed by coleenp (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19229#pullrequestreview-2060914518 PR Review Comment: https://git.openjdk.org/jdk/pull/19229#discussion_r1603439246 From stuefe at openjdk.org Thu May 16 15:25:18 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 16 May 2024 15:25:18 GMT Subject: RFR: 8312132: Add tracking of multiple address spaces in NMT [v89] In-Reply-To: References: Message-ID: On Thu, 16 May 2024 12:30:43 GMT, Johan Sj?len wrote: >> Hi, >> >> This PR introduces a new abstraction to NMT, named `MemoryFileTracker`. Today, NMT does not track any memory outside of the virtual memory address space. This means that if you allocated memory in something such as a memory-backed file and use `mmap` to map into that memory, then you'll have trouble reporting this to NMT. This is the situation that ZGC is in, and that is what this patch attempts to fix. >> >> ## `MemoryFileTracker` >> >> The `MemoryFileTracker` adds the ability of adding new virtual memory address spaces to NMT and committing memory to these, the basic API is: >> >> ```c++ >> static MemoryFile* make_device(const char* descriptive_name); >> static void free_device(MemoryFile* device); >> >> static void allocate_memory(MemoryFile* device, size_t offset, size_t size, >> MEMFLAGS flag, const NativeCallStack& stack); >> static void free_memory(MemoryFile* device, size_t offset, size_t size); >> >> >> It is easiest to see how this is used by looking at what ZGC's `ZNMT` class does: >> >> ```c++ >> void ZNMT::reserve(zaddress_unsafe start, size_t size) { >> MemTracker::record_virtual_memory_reserve((address)start, size, CALLER_PC, mtJavaHeap); >> } >> void ZNMT::commit(zoffset offset, size_t size) { >> MemTracker::allocate_memory_in(ZNMT::_device, static_cast(offset), size, mtJavaHeap, CALLER_PC); >> } >> void ZNMT::uncommit(zoffset offset, size_t size) { >> MemTracker::free_memory_in(ZNMT::_device, (size_t)offset, size); >> } >> >> void ZNMT::map(zaddress_unsafe addr, size_t size, zoffset offset) { >> // NMT doesn't track mappings at the moment. >> } >> void ZNMT::unmap(zaddress_unsafe addr, size_t size) { >> // NMT doesn't track mappings at the moment. >> } >> >> >> As you can see, any mapping between reserved regions and device-allocated memory is not recorded in NMT. This means that in detailed mode you only get reserved regions printed for the reserved memory, the device-allocated memory is reported separately. When performing summary reporting any memory allocated via these devices is added to the corresponding `MEMFLAGS` as `committed` memory. >> >> This patch is also acting as a base on which we deploy multiple new backend ideas to NMT. These ideas are: >> >> 1. Implement VMA tracking using a balanced binary tree approach. Today's `VirtualMemoryTracker`'s usage of linked lists is slow and brittle, we'd like to move away from it. Our Treap-based approach in this patch gives a performance bo... > > Johan Sj?len has updated the pull request incrementally with two additional commits since the last revision: > > - Add corresponding tests to visit_in_order when applicable > - Remove usage of auto in tests src/hotspot/share/utilities/nativeCallStack.hpp line 57: > 55: > 56: class NativeCallStack : public StackObj { > 57: friend class VMATreeTest; I am surprised friend is needed, the private section of this class being so tiny. What does friend give you what you could not get via normal accessors? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18289#discussion_r1603590027 From shade at openjdk.org Thu May 16 15:29:02 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 16 May 2024 15:29:02 GMT Subject: RFR: 8331572: Allow using OopMapCache outside of STW GC phases [v2] In-Reply-To: References: Message-ID: On Thu, 16 May 2024 14:17:03 GMT, Coleen Phillimore wrote: >> Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: >> >> Notify service thread on first enqueue > > src/hotspot/share/interpreter/oopMapCache.cpp line 584: > >> 582: MutexLocker ml(Service_lock, Mutex::_no_safepoint_check_flag); >> 583: Service_lock->notify_all(); >> 584: } > > Yes, you don't want to do this too frequently. Well, tests pass with this change, but now I am thinking if we would eventually run into any lock ranking problem here. At very least `stackwatermark` is ranked above `service`, so we are safe for concurrent GCs. There are only a few locks that are ranked below `service`, so maybe I am overthinking this? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19229#discussion_r1603597310 From coleenp at openjdk.org Thu May 16 17:50:02 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 16 May 2024 17:50:02 GMT Subject: RFR: 8331572: Allow using OopMapCache outside of STW GC phases [v2] In-Reply-To: References: Message-ID: <-tSv8ySxibUCWI0vT1FMkA7zP5iL8AfQSsaQKw_bAMs=.ece60fa5-e7c2-4e53-a9dd-e8066474b3c3@github.com> On Thu, 16 May 2024 15:26:25 GMT, Aleksey Shipilev wrote: >> src/hotspot/share/interpreter/oopMapCache.cpp line 584: >> >>> 582: MutexLocker ml(Service_lock, Mutex::_no_safepoint_check_flag); >>> 583: Service_lock->notify_all(); >>> 584: } >> >> Yes, you don't want to do this too frequently. > > Well, tests pass with this change, but now I am thinking if we would eventually run into any lock ranking problem here. At very least `stackwatermark` is ranked above `service`, so we are safe for concurrent GCs. There are only a few locks that are ranked below `service`, so maybe I am overthinking this? It is a low level lock, I think it'll be ok, you could check out some call stacks but the tests should find these lock inversions if they exist (famous last words). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19229#discussion_r1603789260 From alanb at openjdk.org Thu May 16 18:43:06 2024 From: alanb at openjdk.org (Alan Bateman) Date: Thu, 16 May 2024 18:43:06 GMT Subject: RFR: 8331671: Implement JEP 472: Prepare to Restrict the Use of JNI [v7] In-Reply-To: References: Message-ID: On Thu, 16 May 2024 12:23:44 GMT, Maurizio Cimadamore wrote: >> This PR implements [JEP 472](https://openjdk.org/jeps/472), by restricting the use of JNI in the following ways: >> >> * `System::load` and `System::loadLibrary` are now restricted methods >> * `Runtime::load` and `Runtime::loadLibrary` are now restricted methods >> * binding a JNI `native` method declaration to a native implementation is now considered a restricted operation >> >> This PR slightly changes the way in which the JDK deals with restricted methods, even for FFM API calls. In Java 22, the single `--enable-native-access` was used both to specify a set of modules for which native access should be allowed *and* to specify whether illegal native access (that is, native access occurring from a module not specified by `--enable-native-access`) should be treated as an error or a warning. More specifically, an error is only issued if the `--enable-native-access flag` is used at least once. >> >> Here, a new flag is introduced, namely `illegal-native-access=allow/warn/deny`, which is used to specify what should happen when access to a restricted method and/or functionality is found outside the set of modules specified with `--enable-native-access`. The default policy is `warn`, but users can select `allow` to suppress the warnings, or `deny` to cause `IllegalCallerException` to be thrown. This aligns the treatment of restricted methods with other mechanisms, such as `--illegal-access` and the more recent `--sun-misc-unsafe-memory-access`. >> >> Some changes were required in the package-info javadoc for `java.lang.foreign`, to reflect the changes in the command line flags described above. > > Maurizio Cimadamore has updated the pull request incrementally with one additional commit since the last revision: > > Add note on --illegal-native-access default value in the launcher help src/java.base/share/classes/java/lang/System.java line 2023: > 2021: * @throws NullPointerException if {@code filename} is {@code null} > 2022: * @throws IllegalCallerException If the caller is in a module that > 2023: * does not have native access enabled. The exception description is fine, just noticed the other exception descriptions start with a lowercase "if", this one is different. src/java.base/share/man/java.1 line 587: > 585: \f[V]deny\f[R]: This mode disables all illegal native access except for > 586: those modules enabled by the \f[V]--enable-native-access\f[R] > 587: command-line option. "This mode disable all illegal native access except for those modules enabled the --enable-native-access command-line option". This can be read to mean that modules granted native access with the command line option is also illegal native access An alternative is to make the second part of the sentence a new sentence, something like "Only modules enabled by the --enable-native-access command line option may perform native access. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19213#discussion_r1603878829 PR Review Comment: https://git.openjdk.org/jdk/pull/19213#discussion_r1603875920 From cjplummer at openjdk.org Thu May 16 19:29:05 2024 From: cjplummer at openjdk.org (Chris Plummer) Date: Thu, 16 May 2024 19:29:05 GMT Subject: RFR: 8326716: JVMTI spec: clarify what nullptr means for C/C++ developers In-Reply-To: References: Message-ID: On Thu, 16 May 2024 07:57:58 GMT, Alan Bateman wrote: >> The following RFE was fixed recently: >> [8324680](https://bugs.openjdk.org/browse/JDK-8324680): Replace NULL with nullptr in JVMTI generated code >> >> It replaced all the `NULL`'s in the generated spec with`nullptr`. JVMTI agents can be developed in C or C++. >> This update is to make it clear that `nullptr` is C programming language `null` pointer. >> >> I think we do not need a CSR for this fix. >> >> Testing: N/A (not needed) > > src/hotspot/share/prims/jvmti.xml line 1008: > >> 1006: function descriptions. Empty lists, arrays, sequences, etc are >> 1007: returned as nullptr which is C programming language >> 1008: null pointer. > > Shouldn't this be "NULL"? In any case, I think it would be helpful to expand this a bit to make it clear that usages of "nullptr" in parameter and error descriptions should be read or treated as "NULL" when developing an agent in C rather than C++. Yes, I think it should by NULL. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19257#discussion_r1603929609 From dcubed at openjdk.org Thu May 16 19:56:08 2024 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Thu, 16 May 2024 19:56:08 GMT Subject: RFR: 8330969: scalability issue with loaded JVMTI agent [v2] In-Reply-To: References: Message-ID: <1riS2rxi8S3odLFxNcMLJloR00dRiKKpEQnd7uIGxEw=.590cba72-b756-4bab-9522-59ff0613140b@github.com> On Wed, 15 May 2024 06:00:46 GMT, Serguei Spitsyn wrote: >> I'm not sure this answered Chris' query properly. Or I'm reading Chris' query wrong. >> >> Perhaps this is not what Chris had in mind, but I'm wondering what happens in some >> Thread-A when it is checked and passed by but then Thread-A sets the flag in itself >> after the for-loop has passed it by. Does that Thread-A flag value get lost? > >> Perhaps this is not what Chris had in mind, but I'm wondering what happens in some >> Thread-A when it is checked and passed by but then Thread-A sets the flag in itself >> after the for-loop has passed it by. Does that Thread-A flag value get lost? > > Thank you for the question. > The Thread-A sets the flag optimistically and then re-checks if `sync_protocol_enabled()` and any disabler exists. It can be global disbaler (`_VTMS_transition_disable_for_all_count > 0`) or disabler of `Thread-A` only (`java_lang_Thread::VTMS_transition_disable_count(vth()) > 0`). If any disabler exists then `Thread-A` clears the optimistic settings and goes with the pessimistic approach under protection of `JvmtiVTMSTransition_lock`. > > Please, let me know if you still have questions. This algorithm sounds correct. Thanks for closing the loop on my belated comment. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18937#discussion_r1603957324 From sgibbons at openjdk.org Thu May 16 20:57:12 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Thu, 16 May 2024 20:57:12 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v19] In-Reply-To: References: Message-ID: On Tue, 7 May 2024 17:25:04 GMT, Volodymyr Paprotski wrote: >> Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: >> >> Rearrange; add lambdas for clarity > > src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 4492: > >> 4490: >> 4491: // Compare char[] or byte[] arrays aligned to 4 bytes or substrings. >> 4492: void C2_MacroAssembler::arrays_equals(bool is_array_equ, Register ary1, > > I liked the old style better, fewer longer lines.. same for rest of the changes in this file. Done. > src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 4594: > >> 4592: #endif //_LP64 >> 4593: bind(COMPARE_WIDE_VECTORS); >> 4594: vmovdqu(vec1, Address(ary1, limit, > > create a local scale variable instead of ternary operators. Used several times. Done > src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 4250: > >> 4248: generate_chacha_stubs(); >> 4249: >> 4250: if ((UseAVX == 2) && EnableX86ECoreOpts && VM_Version::supports_avx2()) { > > Just `if (EnableX86ECoreOpts)`? I think all 3 should be specified, even if `EnableX86ECoreOpts` checks. This is for clarity of intent. > src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 391: > >> 389: } >> 390: >> 391: __ cmpq(needle_len, isU ? 2 : 1); > > Can we remove this comparison? i.e. > - broadcast first and last character unconditionally (same character). Or > - move broadcasts 'down' into individual cases.. > There is already specialized code to handle needle of size 1.. This adds extra pathlength. (Will we actually call this intrinsic for needle_size==1? Assume length>=2?) At this point in the code it is entirely possible for needle size to be == 1, but only in the case where haystack size is > 32 bytes. Moving the broadcasts 'down' into individual cases increases code size by 14 broadcast instructions. Seems like the best option is to just remove the compare and branch, broadcasting the first needle element twice. > src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1365: > >> 1363: // Compare first byte of needle to haystack >> 1364: vpcmpeq(cmp_0, byte_0, Address(haystack, 0), Assembler::AVX_256bit); >> 1365: if (size != (isU ? 2 : 1)) { > > `if (size != scale)` > > Though in this case, `elem_size` might hold more meaning. Done > src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1372: > >> 1370: >> 1371: if (bytesToCompare > 2) { >> 1372: if (size > (isU ? 4 : 2)) { > > `if (size > 2*scale)`? Done > src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1373: > >> 1371: if (bytesToCompare > 2) { >> 1372: if (size > (isU ? 4 : 2)) { >> 1373: if (doEarlyBailout) { > > Is there a big perf difference when `doEarlyBailout` is enabled? And/or just for this function? > > (i.e. removing `doEarlyBailout` in this function will mean less pathlength. Feels like a few extra vpands should be cheap enough.) I removed the macro DO_EARLY_BAILOUT and assumed it to be true. There's not much difference (if any) in performance, so we maybe ought to consider not bailing out early. I'll consider not bailing out and let you know how performance is impacted. > src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1469: > >> 1467: >> 1468: if (isU && (size & 1)) { >> 1469: __ emit_int8(0xcc); > > This should also be an `assert()` to catch this at compile-time. Although assert is technically runtime (;-)) I'll change these. They were put in to double-check while debugging. > src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1633: > >> 1631: if (isU) { >> 1632: if ((size & 1) != 0) { >> 1633: __ emit_int8(0xcc); > > Compile-time assert to ensure this code is never called instead? Done > src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1889: > >> 1887: // r13 = (needle length - 1) >> 1888: // r14 = &needle >> 1889: // r15 = unused > > There is quite a bit of redundancy in register usage. Its not incorrect, but looks odd. Not clear if this duplication can easily be removed (or if/why needed). > > // rbx = &haystack > // rdi = &haystack > // rdx = &needle > // r14 = &needle > // rcx = haystack length > // rsi = haystack length > // r12 = needle length > // r13 = (needle length - 1) > // r10 = hs_len - needle len > // rbp = -1 > > // rax = unused > // r11 = unused > // r8 = unused > // r9 = unused > // r15 = unused > > > (Could this comment be out-of-sync with the code? Looks like only rbx, r14 and temps out of unused registers are used few lines down) This comment provides the full register state upon entry to each of the cases of the switch. The duplication is an artifact of the decisions made in setup code (like checking ranges, etc.). Each case can depend on the values of the registers to be as documented on entry. It can use either rcx or rsi to get the haystack length, for example. > src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1950: > >> 1948: // r13 = (needle length - 1) >> 1949: // r14 = &needle >> 1950: // r15 = unused > > Same as for the small case Yes, same as for the small case. > test/micro/org/openjdk/bench/java/lang/StringIndexOfHuge.java line 2: > >> 1: /* >> 2: * Copyright (c) 2014, 2024, Oracle and/or its affiliates. All rights reserved. > > New file, just 2024 Done ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1603734868 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1603735274 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1603737342 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1603806354 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1603953047 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1603985462 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1603955117 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1603956554 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1603989550 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1604006660 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1604006994 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1604024770 From sgibbons at openjdk.org Thu May 16 20:57:20 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Thu, 16 May 2024 20:57:20 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v19] In-Reply-To: <8Y-nIHc8vfB1X_hp3tpqqqgpCzu6dAt6BBIP_zc4Q70=.c9a48c68-8c14-4af9-8357-ab50e62a5fd3@github.com> References: <8Y-nIHc8vfB1X_hp3tpqqqgpCzu6dAt6BBIP_zc4Q70=.c9a48c68-8c14-4af9-8357-ab50e62a5fd3@github.com> Message-ID: On Mon, 6 May 2024 20:56:36 GMT, Sandhya Viswanathan wrote: >> Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: >> >> Rearrange; add lambdas for clarity > > src/hotspot/cpu/x86/macroAssembler_x86.cpp line 1174: > >> 1172: // Alignment specifying the maximum number of allowed bytes to pad. >> 1173: // If padding > max, no padding is inserted. >> 1174: void MacroAssembler::p2align(int modulus, int maxbytes) { > > We could pass offset() as an argument to p2align. Basically have three arguments to p2align(modulus, target, maxbytes). Also maybe rename p2align as align then? Removed p2align(). Was never used and was a remnant of prior implementation attempt. > src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 208: > >> 206: //////////////////////////////////////////////////////////////////////////////////////// >> 207: //////////////////////////////////////////////////////////////////////////////////////// >> 208: if (VM_Version::supports_avx2()) { // AVX2 version > > Instead of the if check here, it would be better to do an assert here: > assert (VM_Version::supports_avx2(), "Needs AVX2 support"); Done > src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 238: > >> 236: const Register needle = rdx; >> 237: const Register needle_len = rcx; >> 238: > > This is the calling convention on Linux. How is windows platform handled? The entry code switches Windows calling convention into Linux calling convention by moving/saving registers, which are properly restored on function exit. This makes register tracking easier. > src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 260: > >> 258: // const XMMRegister save_rcx = xmm11; >> 259: // const XMMRegister save_r8 = xmm12; >> 260: > > This could be removed? Done > src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 279: > >> 277: fnptrs[isLL ? StrIntrinsicNode::LL >> 278: : isUU ? StrIntrinsicNode::UU >> 279: : StrIntrinsicNode::UL] = __ pc(); > > Could this not be simplified as: > fnptrs[ae] = __ pc(); Done > src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 314: > >> 312: >> 313: // needle_len is in elements, not bytes, for UTF-16 >> 314: __ cmpq(needle_len, isUU ? OPT_NEEDLE_SIZE_MAX / 2 : OPT_NEEDLE_SIZE_MAX); > > OPT_NEEDLE_SIZE_MAX is an odd number (set to 5), should that have been an even number? Removed OPT_NEEDLE_SIZE_MAX and replaced with constant == 6. > src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 383: > >> 381: { >> 382: Label L_short; >> 383: > > A comment here: > // Broadcast the beginning of needle into a vector register. Done > src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 390: > >> 388: __ vpbroadcastb(byte_0, Address(needle, 0), Assembler::AVX_256bit); >> 389: } >> 390: > > A comment here: > // Broadcast the end of needle into a vector register. This step is not needed for single element needle. Done > src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 418: > >> 416: __ cmpq(haystack_len, 0x10); >> 417: __ ja_b(L_moreThan16); >> 418: > > An assert here to check for header size >= 16 would be good. > Also a comment here would he good, something like: > // Copy 16 or 32 bytes prior to haystack end onto stack > // This will possibly including some object header bytes when haystack length is less than 16 or 32 bytes // Set the new haystack address to beginning of copied haystack on stack adjusting for extra bytes copied I don't know how to assert header size >= 16 bytes, so I'll add a comment stating such. If you can tell me how to assert, I'll add that code in place of the comment. > src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 498: > >> 496: >> 497: // big_case_loop_helper will fall through to this point if one or more potential matches are found >> 498: // The mask will have a bitmask indicating the position of the potential matches within the haystack > > If no potential match, which label does the big_case_loop_helper jump to? Added comment > src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 517: > >> 515: __C2 arrays_equals(false, haystackStart, firstNeedleCompare, compLen, retval, rScratch, xmm_tmp3, xmm_tmp4, >> 516: false /* char */, knoreg); >> 517: __ testl(retval, retval); > > Since this is byte compare even for isU, the retval here could be a 64-bit quantity so the testl should be a testq. `arrays_equals` returns a boolean value of `0` for not found and `1` for found using `movl(result, 0/1)` so testl is appropriate here. > src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 553: > >> 551: // Haystack always copied to stack, so 32-byte reads OK >> 552: // Haystack length < 32 >> 553: // 10 < needle length < 32 > > The comment below may need update as we come here for needle_len > OPT_NEEDLE_SIZE_MAX which is currently set as 5: > // 10 < needle length < 32 No. The jump is based on NUMBER_OF_CASES which is == 10. See line 147. > src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 576: > >> 574: broadcast_additional_needles(false, 0 /* unknown */, NUMBER_OF_NEEDLE_BYTES_TO_COMPARE, needle, needleLen, rTmp3, >> 575: isUU, isUL, _masm); >> 576: > > Good to pass output xmm registers to this method. Done > src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 587: > >> 585: // firstNeedleCompare has address of second element of needle >> 586: // compLen has length of comparison to do >> 587: > > This is not clear. firstNeedleCompare gets needle + NUMBER_OF_NEEDLE_BYTES_TO_COMPARE - 1 which is not necessarily the second element of needle. If it helps let us fix the NUMBER_OF_NEEDLE_BYTES_TO_COMPARE to 3 and have comments and code versus that only. Replaced NUMBER_OF_NEEDLE_BYTES_TO_COMPARE with constant `3` > src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 611: > >> 609: __C2 arrays_equals(false, rTmp, firstNeedleCompare, compLen, rTmp3, rTmp2, xmm_tmp3, xmm_tmp4, false /* char */, >> 610: knoreg); >> 611: __ testl(rTmp3, rTmp3); > > Since this is byte compare even for isU, the rtmp3 here could be a 64-bit quantity so the testl should be a testq. `arrays_equals` returns boolean via `movl(retval, 0/1)` so testl is appropriate here. > src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 629: > >> 627: >> 628: __ bind(L_returnError); >> 629: __ movq(rbp, -1); > > This could directly be rax instead of intermediate rbp and then moving from rbp to rax. Done > src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 633: > >> 631: >> 632: __ bind(L_returnZero); >> 633: __ xorl(rbp, rbp); > > This could directly be rax instead of intermediate rbp and then moving from rbp to rax. Removed block - never jumped to. > src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 639: > >> 637: __ movl(rax, r8); >> 638: __ subq(rcx, rbx); >> 639: __ addq(rcx, rax); > > This could be: > __ subq(rcx, rbx); > __ addq(rcx, r8); Done > src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 647: > >> 645: __ cmpq(r11, r10); >> 646: __ movq(rbp, -1); >> 647: __ cmovq(Assembler::belowEqual, rbp, r11); > > This could be directly computed in rax: > __ movq(rax, -1); > __ cmovq(Assembler::belowEqual, rax, r11); > Also is it possible to not do cmov on some paths? It is an expensive operation. OK > src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1010: > >> 1008: static void broadcast_additional_needles(bool sizeKnown, int size, int bytesToCompare, Register needle, >> 1009: Register needleLen, Register rTmp, bool isUU, bool isUL, >> 1010: MacroAssembler *_masm) { > > Good to add output XMM registers to the parameter list. Done > src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1040: > >> 1038: __ vpbroadcastb(byte_1, Address(needle, 1), Assembler::AVX_256bit); >> 1039: } >> 1040: } > > It will be good to have a function which broadcasts a needle element from a given offset into a vector register. > That function could take (needle address, offset, outout vector register, temps). > Such a function could then be called twice from here and from main function for offset 0. No longer relevant - always comparing 3 needle bytes only, so the second broadcast is gone. > src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1054: > >> 1052: } else if (isUL) { >> 1053: __ movzbl(rTmp, Address(needle, 2)); >> 1054: __ movdl(byte_1, rTmp); > > Should be: __ movdl(byte_2, rTmp); Removed byte_2 - always comparing 3 bytes. > src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1056: > >> 1054: __ movdl(byte_1, rTmp); >> 1055: // 1st byte of needle in words >> 1056: __ vpbroadcastw(byte_1, byte_1, Assembler::AVX_256bit); > > Should be: > __ vpbroadcastw(byte_2, byte_2, Assembler::AVX_256bit); Removed byte_2 - always comparing 3 bytes. > src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1082: > >> 1080: // noMatch - label bound outside to jump to if there is no match >> 1081: // haystack - the address of the first byte of the haystack >> 1082: // hsLen - the sizeof the haystack > > Good to specify if the size (size of needle) and hsLen (size of haystack) is in bytes or elements. In bytes. added > src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1083: > >> 1081: // haystack - the address of the first byte of the haystack >> 1082: // hsLen - the sizeof the haystack >> 1083: // isU - true if argument encoding is either UU or UL > > We need to list needleLen here as well? Done > src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1096: > >> 1094: MacroAssembler *_masm) { >> 1095: >> 1096: assert_different_registers(eq_mask, haystack, needleLen, rTmp, hsLen, r10); > > r10 kind of stands out here. You could say nMinusK in this assert. > The assert following to this one is checking for nMinusK==r10 so that should suffice. > BTW, didn't see anything in the code below that needs nMinuxK to be r10. r10 holds the value `(n - k)` always, which is used to ensure the returned index is not past the end of the haystack. I will annotate this register as global in comments. I also reserve xmm0, xmm1, and xmm12 to hold the broadcasted needle bytes globally. I'll try to make this as obvious as possible. > src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1120: > >> 1118: #define cmp_0 XMM_TMP3 >> 1119: #undef cmp_k >> 1120: #define cmp_k XMM_TMP4 > > XMM_TMP4 is not reused so cmp_k could be declared as const. In general limiting undef/define pair only to reused registers would make the review easier. OK. I'll handle this as a last pass over the code for register allocation. > src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1125: > >> 1123: #undef lastMask >> 1124: >> 1125: int sizeIncr = isU ? 2 : 1; > > sizeIncr and scale seems to be same, we could just use one of them in this function. Done > src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1149: > >> 1147: >> 1148: if (size == (isU ? 2 : 1)) { >> 1149: __ vpmovmskb(eq_mask, cmp_0, Assembler::AVX_256bit); > > vpmovmskb is being done twice if doEarlyBailout is set to 1 (the setting we have currently). > If it helps to simplify, we could assume that doEarlyBailout is always set to 1 and remove this configurability. Fixed with removal of DO_EARLY_BAILOUT > src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1174: > >> 1172: #define lastMask rTmp >> 1173: __ vpmovmskb(lastMask, cmp_k, Assembler::AVX_256bit); >> 1174: __ shrq(lastMask); > > did you mean to shift the lastMask by shiftVal here? The whole machination around saving/restoring rcx here was to shift by cl. The code emitted by this instruction is: `0x00007fffe463d048: 48 d3 ea shr rdx,cl` which is what is desired. > src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1183: > >> 1181: >> 1182: if (bytesToCompare > 2) { >> 1183: if (size > (isU ? 4 : 2)) { > > this and other usages could be simplified to: size > 2 * scale Done > src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1185: > >> 1183: if (size > (isU ? 4 : 2)) { >> 1184: if (doEarlyBailout) { >> 1185: __ testl(eq_mask, eq_mask); > > The masks are 32 bit as we are comparing max 32 byes (256 bits) at a time. So we could consistently do either andl, testl, shrl or andq, testq, shrq. Changed to `l` variant > src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1476: > >> 1474: _masm); >> 1475: >> 1476: __ movq(r11, -1); > > There doesn't seem to be a use of r11 below in this function. r11 is used in exit code as the pointer to the haystack byte that matches. Setting to `-1` will always be past the end of any haystack and return an error. The helper after this call makes that assumption. This is another of the "pseudo-global" registers. > src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1493: > >> 1491: // Assume r10 is n - k >> 1492: __ leaq(last, Address(haystack, r10, Address::times_1, isU ? -30 : -31)); >> 1493: __ jmpb(temp); > > Need to pass r10 as parameter. Also temp label could be given a better name. Done > src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1502: > >> 1500: >> 1501: __ cmpq(hsPtrRet, last); >> 1502: __ cmovq(Assembler::aboveEqual, hsPtrRet, last); > > cmovq is expensive, better sequence would be: > > __ cmpq(hsPtrRet, last); > __ jb_b(temp); > __ movq(hsPtrRet, last); Done > src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1510: > >> 1508: compare_big_haystack_to_needle(sizeKnown, size, NUMBER_OF_NEEDLE_BYTES_TO_COMPARE, loop_top, hsPtrRet, hsLength, >> 1509: needleLen, isU, DO_EARLY_BAILOUT, eq_mask, temp2, r10, _masm); >> 1510: > > At this point hsLength is not the remaining length from hsPtrRet, would that cause a problem? If not, all the special paths in compare_big_haystack_to_needle need not be generated on this call. Not sure what you mean here. I *think* you mean that hsLength is not the length of the remaining bytes in the haystack, but the actual length. There may be an issue if that is correct, right? I'll investigate. > src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1589: > >> 1587: case 3: >> 1588: case 4: >> 1589: __ movl(needleVal, Address(needle, offsetOfFirstByteToCompare)); > > If the size of the needle is 7 and it is an LL case with NUMBER_OF_NEEDLE_BYTES_TO_COMPARE set as 3: > bytesLeftToCompare = 4 (i.e. 7-3); > offsetOfFirstByteToCompare = 2 (i.e. 3-1); > the movl will be loading bytes 2,3,4,5 > So we seem to be missing loading the last byte of the needle. Is that correct? Bytes 0, 1, and 6 have already compared equal before getting to this code, so it is correct functionally. > src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1735: > >> 1733: // generated with 32 - (n - k + 1) bits set that ensures matches past the end of the original >> 1734: // haystack do not get considered during compares. >> 1735: // > > Mask is generated below with (n-k+1) bits set and not 32- (n-k+1) bits set. Also it will be helpful if we specify what is n and k. Thanks. Fixed. > src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1838: > >> 1836: __ shrq(rax, 1); >> 1837: } >> 1838: > > We need to be consistent either use tzcntl, shrl, testl or tzcntq, shrq, testq. I'll search through the code making them all consistent. > src/hotspot/share/opto/library_call.cpp line 1263: > >> 1261: if (result != nullptr) { >> 1262: // The result is index relative to from_index if substring was found, -1 otherwise. >> 1263: // Generate code which will fold into cmove. > > Any reason to remove this comment? No reason - cut/paste error. Thanks. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1603736399 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1603740677 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1603743601 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1603752052 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1603752276 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1603752936 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1603780784 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1603780997 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1603816022 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1603833467 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1603846748 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1603855986 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1603864665 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1603865621 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1603866807 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1603868917 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1603869305 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1603884368 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1603889410 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1603895505 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1603896809 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1603897475 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1603897759 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1603903738 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1603906289 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1603914822 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1603917518 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1603922652 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1603924998 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1603939571 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1603949004 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1603951974 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1603966864 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1603974757 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1603969211 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1603985006 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1603989357 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1603990826 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1603999938 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1604012121 From sgibbons at openjdk.org Thu May 16 20:57:25 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Thu, 16 May 2024 20:57:25 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v7] In-Reply-To: References: Message-ID: On Tue, 16 Jan 2024 12:09:11 GMT, Jatin Bhateja wrote: >> Scott Gibbons has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 22 commits: >> >> - Merge branch 'openjdk:master' into indexof >> - Merge branch 'openjdk:master' into indexof >> - Addressing review comments. >> - Fix for JDK-8321599 >> - Support UU IndexOf >> - Only use optimization when EnableX86ECoreOpts is true >> - Fix whitespace >> - Merge branch 'openjdk:master' into indexof >> - Comments; added exhaustive-ish test >> - Subtracting 0x10 twice. >> - ... and 12 more: https://git.openjdk.org/jdk/compare/8e12053e...3e58d0c2 > > src/hotspot/share/opto/library_call.cpp line 1229: > >> 1227: } else { >> 1228: result = make_indexOf_node(src_start, src_count, tgt_start, tgt_count, >> 1229: result_rgn, result_phi, ae); > > Existing routines emits IR to handle following special cases. > > tgt_cnt > src_cnt return -1 > tgt_cnt == 0 return 0. > > Should we not be preserving those check before calling stub ? > > As of now these checks are part of stub and doing them in JIT code will save call overhead. Working on this. Trying to develop my IR chops. However, this is optimizing for a very small percentage of calls, so there will be unnoticable effect on overall performance. There will only be savings for calls that have needle length == 0 (probably zero calls do this) or haystack length < needle length (maybe, but highly unlikely). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1604010493 From gziemski at openjdk.org Thu May 16 21:07:11 2024 From: gziemski at openjdk.org (Gerard Ziemski) Date: Thu, 16 May 2024 21:07:11 GMT Subject: RFR: 8312132: Add tracking of multiple address spaces in NMT [v89] In-Reply-To: References: Message-ID: On Thu, 16 May 2024 12:30:43 GMT, Johan Sj?len wrote: >> Hi, >> >> This PR introduces a new abstraction to NMT, named `MemoryFileTracker`. Today, NMT does not track any memory outside of the virtual memory address space. This means that if you allocated memory in something such as a memory-backed file and use `mmap` to map into that memory, then you'll have trouble reporting this to NMT. This is the situation that ZGC is in, and that is what this patch attempts to fix. >> >> ## `MemoryFileTracker` >> >> The `MemoryFileTracker` adds the ability of adding new virtual memory address spaces to NMT and committing memory to these, the basic API is: >> >> ```c++ >> static MemoryFile* make_device(const char* descriptive_name); >> static void free_device(MemoryFile* device); >> >> static void allocate_memory(MemoryFile* device, size_t offset, size_t size, >> MEMFLAGS flag, const NativeCallStack& stack); >> static void free_memory(MemoryFile* device, size_t offset, size_t size); >> >> >> It is easiest to see how this is used by looking at what ZGC's `ZNMT` class does: >> >> ```c++ >> void ZNMT::reserve(zaddress_unsafe start, size_t size) { >> MemTracker::record_virtual_memory_reserve((address)start, size, CALLER_PC, mtJavaHeap); >> } >> void ZNMT::commit(zoffset offset, size_t size) { >> MemTracker::allocate_memory_in(ZNMT::_device, static_cast(offset), size, mtJavaHeap, CALLER_PC); >> } >> void ZNMT::uncommit(zoffset offset, size_t size) { >> MemTracker::free_memory_in(ZNMT::_device, (size_t)offset, size); >> } >> >> void ZNMT::map(zaddress_unsafe addr, size_t size, zoffset offset) { >> // NMT doesn't track mappings at the moment. >> } >> void ZNMT::unmap(zaddress_unsafe addr, size_t size) { >> // NMT doesn't track mappings at the moment. >> } >> >> >> As you can see, any mapping between reserved regions and device-allocated memory is not recorded in NMT. This means that in detailed mode you only get reserved regions printed for the reserved memory, the device-allocated memory is reported separately. When performing summary reporting any memory allocated via these devices is added to the corresponding `MEMFLAGS` as `committed` memory. >> >> This patch is also acting as a base on which we deploy multiple new backend ideas to NMT. These ideas are: >> >> 1. Implement VMA tracking using a balanced binary tree approach. Today's `VirtualMemoryTracker`'s usage of linked lists is slow and brittle, we'd like to move away from it. Our Treap-based approach in this patch gives a performance bo... > > Johan Sj?len has updated the pull request incrementally with two additional commits since the last revision: > > - Add corresponding tests to visit_in_order when applicable > - Remove usage of auto in tests src/hotspot/share/nmt/nmtTreap.hpp line 175: > 173: #ifdef ASSERT > 174: void verify_self() { > 175: const double expected_maximum_depth = log(this->_node_count+1) * 5; Where did 5 come from, shouldn't this be: ` const double expected_maximum_depth = log(this->_node_count+1) * (this->_node_count+1) ` ? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18289#discussion_r1604038737 From kbarrett at openjdk.org Thu May 16 23:38:04 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Thu, 16 May 2024 23:38:04 GMT Subject: RFR: 8326716: JVMTI spec: clarify what nullptr means for C/C++ developers In-Reply-To: References: Message-ID: On Thu, 16 May 2024 02:37:40 GMT, Serguei Spitsyn wrote: > The following RFE was fixed recently: > [8324680](https://bugs.openjdk.org/browse/JDK-8324680): Replace NULL with nullptr in JVMTI generated code > > It replaced all the `NULL`'s in the generated spec with`nullptr`. JVMTI agents can be developed in C or C++. > This update is to make it clear that `nullptr` is C programming language `null` pointer. > > I think we do not need a CSR for this fix. > > Testing: N/A (not needed) Changes requested by kbarrett (Reviewer). src/hotspot/share/prims/jvmti.xml line 1008: > 1006: function descriptions. Empty lists, arrays, sequences, etc are > 1007: returned as nullptr which is C programming language > 1008: null pointer. Perhaps instead something like "returned as a null pointer (C NULL or C++ nullptr)." "null pointer" is the generic phrase used in both the C and C++ standards. ------------- PR Review: https://git.openjdk.org/jdk/pull/19257#pullrequestreview-2059896023 PR Review Comment: https://git.openjdk.org/jdk/pull/19257#discussion_r1602805633 From sviswanathan at openjdk.org Thu May 16 23:38:08 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Thu, 16 May 2024 23:38:08 GMT Subject: RFR: 8329538: Accelerate P256 on x86_64 using Montgomery intrinsic [v9] In-Reply-To: References: Message-ID: <9aJy6ON5gSI5ihwK-WkvnyrtHjJTPN5IAFymf1Jpp9M=.32b8ee27-465d-47d8-9099-22cb846cff9a@github.com> On Fri, 10 May 2024 00:19:32 GMT, Volodymyr Paprotski wrote: >> Performance. Before: >> >> Benchmark (algorithm) (dataSize) (keyLength) (provider) Mode Cnt Score Error Units >> SignatureBench.ECDSA.sign SHA256withECDSA 1024 256 thrpt 3 6443.934 ? 6.491 ops/s >> SignatureBench.ECDSA.sign SHA256withECDSA 16384 256 thrpt 3 6152.979 ? 4.954 ops/s >> SignatureBench.ECDSA.verify SHA256withECDSA 1024 256 thrpt 3 1895.410 ? 36.979 ops/s >> SignatureBench.ECDSA.verify SHA256withECDSA 16384 256 thrpt 3 1878.955 ? 45.487 ops/s >> Benchmark (algorithm) (keyLength) (kpgAlgorithm) (provider) Mode Cnt Score Error Units >> o.o.b.j.c.full.KeyAgreementBench.EC.generateSecret ECDH 256 EC thrpt 3 1357.810 ? 26.584 ops/s >> o.o.b.j.c.small.KeyAgreementBench.EC.generateSecret ECDH 256 EC thrpt 3 1352.119 ? 23.547 ops/s >> Benchmark (isMontBench) Mode Cnt Score Error Units >> PolynomialP256Bench.benchMultiply false thrpt 3 1746.126 ? 10.970 ops/s >> >> Performance, no intrinsic: >> >> Benchmark (algorithm) (dataSize) (keyLength) (provider) Mode Cnt Score Error Units >> SignatureBench.ECDSA.sign SHA256withECDSA 1024 256 thrpt 3 6529.839 ? 42.420 ops/s >> SignatureBench.ECDSA.sign SHA256withECDSA 16384 256 thrpt 3 6199.747 ? 133.566 ops/s >> SignatureBench.ECDSA.verify SHA256withECDSA 1024 256 thrpt 3 1973.676 ? 54.071 ops/s >> SignatureBench.ECDSA.verify SHA256withECDSA 16384 256 thrpt 3 1932.127 ? 35.920 ops/s >> Benchmark (algorithm) (keyLength) (kpgAlgorithm) (provider) Mode Cnt Score Error Units >> o.o.b.j.c.full.KeyAgreementBench.EC.generateSecret ECDH 256 EC thrpt 3 1355.788 ? 29.858 ops/s >> o.o.b.j.c.small.KeyAgreementBench.EC.generateSecret ECDH 256 EC thrpt 3 1346.523 ? 28.722 ops/s >> Benchmark (isMontBench) Mode Cnt Score Error Units >> PolynomialP256Bench.benchMultiply true thrpt 3 1919.57... > > Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision: > > whitespace src/hotspot/cpu/x86/stubGenerator_x86_64_poly_mont.cpp line 168: > 166: XMMRegister broadcast5 = xmm24; > 167: KRegister limb0 = k1; > 168: KRegister limb5 = k2; limb5 and select are not being used anymore. src/hotspot/cpu/x86/stubGenerator_x86_64_poly_mont.cpp line 185: > 183: __ evmovdquq(modulus, allLimbs, ExternalAddress(modulus_p256()), false, Assembler::AVX_512bit, rscratch); > 184: > 185: // A = load(*aLimbs) A little bit more description in comments on what the load step involves would be helpful. e.g. Load upper 4 limbs, shift left by 1 limb using perm, or in the lowest limb. src/hotspot/cpu/x86/stubGenerator_x86_64_poly_mont.cpp line 270: > 268: __ push(r14); > 269: __ push(r15); > 270: No need to save/restore rbx, r12, r14, r15. Only r13 is used as temp in montgomeryMultiply(aLimbs, bLimbs, rLimbs). That too could be easily changed to r8. src/hotspot/cpu/x86/stubGenerator_x86_64_poly_mont.cpp line 286: > 284: __ mov(aLimbs, c_rarg0); > 285: __ mov(bLimbs, c_rarg1); > 286: __ mov(rLimbs, c_rarg2); We could directly call montgomeryMultiply(c_rarg0, c_rarg1, c_rarg2) then these moves are not necessary. src/hotspot/cpu/x86/vm_version_x86.cpp line 1370: > 1368: > 1369: #ifdef _LP64 > 1370: if (supports_avx512ifma() && supports_avx512vlbw() && MaxVectorSize >= 64) { No need to tie the intrinsic to MaxVectorSize setting. src/hotspot/share/opto/library_call.cpp line 7564: > 7562: > 7563: if (!stubAddr) return false; > 7564: if (stopped()) return true; Line 7564 seems redundant here as there is no range check or anything like that before this. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18583#discussion_r1604169603 PR Review Comment: https://git.openjdk.org/jdk/pull/18583#discussion_r1604141586 PR Review Comment: https://git.openjdk.org/jdk/pull/18583#discussion_r1604174141 PR Review Comment: https://git.openjdk.org/jdk/pull/18583#discussion_r1604175443 PR Review Comment: https://git.openjdk.org/jdk/pull/18583#discussion_r1603792252 PR Review Comment: https://git.openjdk.org/jdk/pull/18583#discussion_r1603865712 From sspitsyn at openjdk.org Fri May 17 00:43:18 2024 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Fri, 17 May 2024 00:43:18 GMT Subject: RFR: 8326716: JVMTI spec: clarify what nullptr means for C/C++ developers [v2] In-Reply-To: References: Message-ID: <6Sb8kKpbkh4ylD4u5Zayx2fV0ZaC5aVNicqoX6g_UNA=.7831eabc-905f-489b-87da-68953ec03412@github.com> > The following RFE was fixed recently: > [8324680](https://bugs.openjdk.org/browse/JDK-8324680): Replace NULL with nullptr in JVMTI generated code > > It replaced all the `NULL`'s in the generated spec with`nullptr`. JVMTI agents can be developed in C or C++. > This update is to make it clear that `nullptr` is C programming language `null` pointer. > > I think we do not need a CSR for this fix. > > Testing: N/A (not needed) Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: review: corrected the nullptr clarification ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19257/files - new: https://git.openjdk.org/jdk/pull/19257/files/fd0e8d43..9fe639e1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19257&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19257&range=00-01 Stats: 2 lines in 1 file changed: 0 ins; 1 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/19257.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19257/head:pull/19257 PR: https://git.openjdk.org/jdk/pull/19257 From sspitsyn at openjdk.org Fri May 17 00:43:19 2024 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Fri, 17 May 2024 00:43:19 GMT Subject: RFR: 8326716: JVMTI spec: clarify what nullptr means for C/C++ developers [v2] In-Reply-To: References: Message-ID: On Thu, 16 May 2024 07:59:51 GMT, Kim Barrett wrote: >> Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: >> >> review: corrected the nullptr clarification > > src/hotspot/share/prims/jvmti.xml line 1008: > >> 1006: function descriptions. Empty lists, arrays, sequences, etc are >> 1007: returned as nullptr which is C programming language >> 1008: null pointer. > > Perhaps instead something like > > "returned as a null pointer (C NULL or C++ nullptr)." > > "null pointer" is the generic phrase used in both the C and C++ standards. Thank you, Kim. I like this suggestion. Updated now. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19257#discussion_r1604210615 From sspitsyn at openjdk.org Fri May 17 00:43:19 2024 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Fri, 17 May 2024 00:43:19 GMT Subject: RFR: 8326716: JVMTI spec: clarify what nullptr means for C/C++ developers [v2] In-Reply-To: References: Message-ID: On Thu, 16 May 2024 19:26:07 GMT, Chris Plummer wrote: >> src/hotspot/share/prims/jvmti.xml line 1008: >> >>> 1006: function descriptions. Empty lists, arrays, sequences, etc are >>> 1007: returned as nullptr which is C programming language >>> 1008: null pointer. >> >> Shouldn't this be "NULL"? In any case, I think it would be helpful to expand this a bit to make it clear that usages of "nullptr" in parameter and error descriptions should be read or treated as "NULL" when developing an agent in C rather than C++. > > Yes, I think it should by NULL. Thank you for suggestions. I've changed it as Kim suggested below. Please, let me know if it is not good enough, or some additions are needed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19257#discussion_r1604211567 From lmesnik at openjdk.org Fri May 17 01:53:14 2024 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Fri, 17 May 2024 01:53:14 GMT Subject: RFR: 8332259: JvmtiTrace::safe_get_thread_name is not safe Message-ID: The JvmtiTrace::safe_get_thread_name sometimes crashes when called while current thread is in native thread state. It happens when thread_name is set for tracing from jvmti functions. See: https://github.com/openjdk/jdk/blob/master/src/hotspot/share/prims/jvmtiEnter.xsl#L649 The setup is called and the thread name is used in tracing before the thread transition. There is no good location where this method could be called from vm thread_state only. Some functions like raw monitor enter/exit never transition in vm state. So sometimes it is needed to call this function from native thread state. The change should affect JVMTI trace mode only (-XX:TraceJVMTI). Verified by running jvmti/jdi/jdb tests with tracing enabled. ------------- Commit messages: - include updated. - 8332259 Changes: https://git.openjdk.org/jdk/pull/19275/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19275&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8332259 Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/19275.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19275/head:pull/19275 PR: https://git.openjdk.org/jdk/pull/19275 From kbarrett at openjdk.org Fri May 17 01:55:08 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Fri, 17 May 2024 01:55:08 GMT Subject: RFR: 8326716: JVMTI spec: clarify what nullptr means for C/C++ developers [v2] In-Reply-To: <6Sb8kKpbkh4ylD4u5Zayx2fV0ZaC5aVNicqoX6g_UNA=.7831eabc-905f-489b-87da-68953ec03412@github.com> References: <6Sb8kKpbkh4ylD4u5Zayx2fV0ZaC5aVNicqoX6g_UNA=.7831eabc-905f-489b-87da-68953ec03412@github.com> Message-ID: On Fri, 17 May 2024 00:43:18 GMT, Serguei Spitsyn wrote: >> The following RFE was fixed recently: >> [8324680](https://bugs.openjdk.org/browse/JDK-8324680): Replace NULL with nullptr in JVMTI generated code >> >> It replaced all the `NULL`'s in the generated spec with`nullptr`. JVMTI agents can be developed in C or C++. >> This update is to make it clear that `nullptr` is C programming language `null` pointer. >> >> I think we do not need a CSR for this fix. >> >> Testing: N/A (not needed) > > Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: > > review: corrected the nullptr clarification Looks good. ------------- Marked as reviewed by kbarrett (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19257#pullrequestreview-2062184501 From dholmes at openjdk.org Fri May 17 02:03:02 2024 From: dholmes at openjdk.org (David Holmes) Date: Fri, 17 May 2024 02:03:02 GMT Subject: RFR: 8326716: JVMTI spec: clarify what nullptr means for C/C++ developers [v2] In-Reply-To: <6Sb8kKpbkh4ylD4u5Zayx2fV0ZaC5aVNicqoX6g_UNA=.7831eabc-905f-489b-87da-68953ec03412@github.com> References: <6Sb8kKpbkh4ylD4u5Zayx2fV0ZaC5aVNicqoX6g_UNA=.7831eabc-905f-489b-87da-68953ec03412@github.com> Message-ID: <_CuYvr39rfebBcJRO0AM-2p8yQ2-V0oboFclyxAJ7Mo=.8cdba311-3f93-4c95-ac8b-6d7d41d88e24@github.com> On Fri, 17 May 2024 00:43:18 GMT, Serguei Spitsyn wrote: >> The following RFE was fixed recently: >> [8324680](https://bugs.openjdk.org/browse/JDK-8324680): Replace NULL with nullptr in JVMTI generated code >> >> It replaced all the `NULL`'s in the generated spec with`nullptr`. JVMTI agents can be developed in C or C++. >> This update is to make it clear that `nullptr` is C programming language `null` pointer. >> >> I think we do not need a CSR for this fix. >> >> Testing: N/A (not needed) > > Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: > > review: corrected the nullptr clarification But this clarification doesn't actually clarify that the rest of the spec uses `nullptr`. Based on the proposed wording I would expect things like: The function may return nullptr to say The function may return a null pointer ------------- Changes requested by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19257#pullrequestreview-2062190669 From lmesnik at openjdk.org Fri May 17 02:08:34 2024 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Fri, 17 May 2024 02:08:34 GMT Subject: RFR: 8332259: JvmtiTrace::safe_get_thread_name fails if current thread is in native state [v2] In-Reply-To: References: Message-ID: > The JvmtiTrace::safe_get_thread_name sometimes crashes when called while current thread is in native thread state. > > It happens when thread_name is set for tracing from jvmti functions. > See: > https://github.com/openjdk/jdk/blob/master/src/hotspot/share/prims/jvmtiEnter.xsl#L649 > > The setup is called and the thread name is used in tracing before the thread transition. There is no good location where this method could be called from vm thread_state only. Some functions like raw monitor enter/exit never transition in vm state. So sometimes it is needed to call this function from native thread state. > > The change should affect JVMTI trace mode only (-XX:TraceJVMTI). > > Verified by running jvmti/jdi/jdb tests with tracing enabled. Leonid Mesnik has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: - copyrights updated. - Merge branch 'master' of https://github.com/openjdk/jdk into 8332259 - include updated. - 8332259 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19275/files - new: https://git.openjdk.org/jdk/pull/19275/files/a2b1942b..c534c91b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19275&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19275&range=00-01 Stats: 30226 lines in 654 files changed: 18294 ins; 7842 del; 4090 mod Patch: https://git.openjdk.org/jdk/pull/19275.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19275/head:pull/19275 PR: https://git.openjdk.org/jdk/pull/19275 From lmao at openjdk.org Fri May 17 02:56:09 2024 From: lmao at openjdk.org (Liang Mao) Date: Fri, 17 May 2024 02:56:09 GMT Subject: Withdrawn: 8331711: G1 doesn't need pre write barrier for stores from new allocated objects In-Reply-To: References: Message-ID: <5-DXDH2w7qIuou7fkByb2ET73EmP7PzZxQTsvyHKF60=.754855b9-5600-467c-a16b-3de4c286c8c0@github.com> On Mon, 6 May 2024 09:32:49 GMT, Liang Mao wrote: > The pre-write barrier of G1 is used to capture the object disconnected from the marking graph which could be unmarked aka *white* and stored into *black* objects then break tri-color invariance. But references in new allocated objects are created in object initialization after marking start and never could be white. So we don't need pre-write barrier for stores from new allocated objects. The same mechanism is also used for barrier eliminantion in GenZGC. > > Additional testing: > - [x] Linux aarch64 server release/fastdebug, test/hotspot/jtreg/gc with +UseG1GC > - [x] Run several iterations of SPECjbb2015 with aggressively frequent concurrent mark This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/19098 From qamai at openjdk.org Fri May 17 03:52:01 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Fri, 17 May 2024 03:52:01 GMT Subject: RFR: 8326716: JVMTI spec: clarify what nullptr means for C/C++ developers [v2] In-Reply-To: <6Sb8kKpbkh4ylD4u5Zayx2fV0ZaC5aVNicqoX6g_UNA=.7831eabc-905f-489b-87da-68953ec03412@github.com> References: <6Sb8kKpbkh4ylD4u5Zayx2fV0ZaC5aVNicqoX6g_UNA=.7831eabc-905f-489b-87da-68953ec03412@github.com> Message-ID: <5OI8D0PhkM19awFsxnm6RTlJkaDxkUyvW75D3q-wK0Q=.a2a0262e-9d3c-4380-aafd-e6b7cfc4393a@github.com> On Fri, 17 May 2024 00:43:18 GMT, Serguei Spitsyn wrote: >> The following RFE was fixed recently: >> [8324680](https://bugs.openjdk.org/browse/JDK-8324680): Replace NULL with nullptr in JVMTI generated code >> >> It replaced all the `NULL`'s in the generated spec with`nullptr`. JVMTI agents can be developed in C or C++. >> This update is to make it clear that `nullptr` is C programming language `null` pointer. >> >> I think we do not need a CSR for this fix. >> >> Testing: N/A (not needed) > > Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: > > review: corrected the nullptr clarification src/hotspot/share/prims/jvmti.xml line 1007: > 1005: explicitly deallocate. This is indicated in the individual > 1006: function descriptions. Empty lists, arrays, sequences, etc are > 1007: returned as a null pointer (C NULL or C++ nullptr). This may be a little unnecessary rigor, but I believe that `nullptr` is not a null pointer. `nullptr` is the pointer literal that can be implicitly converted to a null pointer value of any pointer type and any pointer to member type. And I think the thing returned here is a null pointer, not `nullptr`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19257#discussion_r1604313245 From kbarrett at openjdk.org Fri May 17 04:37:01 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Fri, 17 May 2024 04:37:01 GMT Subject: RFR: 8326716: JVMTI spec: clarify what nullptr means for C/C++ developers [v2] In-Reply-To: <_CuYvr39rfebBcJRO0AM-2p8yQ2-V0oboFclyxAJ7Mo=.8cdba311-3f93-4c95-ac8b-6d7d41d88e24@github.com> References: <6Sb8kKpbkh4ylD4u5Zayx2fV0ZaC5aVNicqoX6g_UNA=.7831eabc-905f-489b-87da-68953ec03412@github.com> <_CuYvr39rfebBcJRO0AM-2p8yQ2-V0oboFclyxAJ7Mo=.8cdba311-3f93-4c95-ac8b-6d7d41d88e24@github.com> Message-ID: On Fri, 17 May 2024 02:00:29 GMT, David Holmes wrote: > But this clarification doesn't actually clarify that the rest of the spec uses `nullptr`. Based on the proposed wording I would expect things like: > > ``` > The function may return nullptr > ``` > > to say > > ``` > The function may return a null pointer > ``` Looking at this again, I think I agree with @dholmes-ora . Some of the relevant places are text, and should be using "null pointer". Some are example code or the like. Those should be using NULL rather than nullptr, since we have this text early on: "Unless otherwise stated, all examples and declarations in this specification use the C language." I didn't find any that were described as C++ rather than C. So JDK-8324680 was somewhat mistaken about what needed to be done, and what was done. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19257#issuecomment-2116606245 From alanb at openjdk.org Fri May 17 04:50:01 2024 From: alanb at openjdk.org (Alan Bateman) Date: Fri, 17 May 2024 04:50:01 GMT Subject: RFR: 8326716: JVMTI spec: clarify what nullptr means for C/C++ developers [v2] In-Reply-To: References: Message-ID: On Fri, 17 May 2024 00:38:07 GMT, Serguei Spitsyn wrote: >> src/hotspot/share/prims/jvmti.xml line 1008: >> >>> 1006: function descriptions. Empty lists, arrays, sequences, etc are >>> 1007: returned as nullptr which is C programming language >>> 1008: null pointer. >> >> Perhaps instead something like >> >> "returned as a null pointer (C NULL or C++ nullptr)." >> >> "null pointer" is the generic phrase used in both the C and C++ standards. > > Thank you, Kim. I like this suggestion. Updated now. That part looks okay but I think all the parameters and error descriptions changed by JDK-8324680 will now need to change to use "null" instead of "nullptr". ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19257#discussion_r1604344850 From dholmes at openjdk.org Fri May 17 05:55:03 2024 From: dholmes at openjdk.org (David Holmes) Date: Fri, 17 May 2024 05:55:03 GMT Subject: RFR: 8332259: JvmtiTrace::safe_get_thread_name fails if current thread is in native state [v2] In-Reply-To: References: Message-ID: On Fri, 17 May 2024 02:08:34 GMT, Leonid Mesnik wrote: >> The JvmtiTrace::safe_get_thread_name sometimes crashes when called while current thread is in native thread state. >> >> It happens when thread_name is set for tracing from jvmti functions. >> See: >> https://github.com/openjdk/jdk/blob/master/src/hotspot/share/prims/jvmtiEnter.xsl#L649 >> >> The setup is called and the thread name is used in tracing before the thread transition. There is no good location where this method could be called from vm thread_state only. Some functions like raw monitor enter/exit never transition in vm state. So sometimes it is needed to call this function from native thread state. >> >> The change should affect JVMTI trace mode only (-XX:TraceJVMTI). >> >> Verified by running jvmti/jdi/jdb tests with tracing enabled. > > Leonid Mesnik has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - copyrights updated. > - Merge branch 'master' of https://github.com/openjdk/jdk into 8332259 > - include updated. > - 8332259 I have to wonder whether this solution will potentially cause problems because the code will now block for safepoints. We could fallback to `Thread::name()` if the current thread is in-native. ------------- PR Review: https://git.openjdk.org/jdk/pull/19275#pullrequestreview-2062389545 From iwalulya at openjdk.org Fri May 17 06:10:05 2024 From: iwalulya at openjdk.org (Ivan Walulya) Date: Fri, 17 May 2024 06:10:05 GMT Subject: RFR: 8331557: Serial: Refactor SerialHeap::do_collection [v6] In-Reply-To: References: Message-ID: On Fri, 10 May 2024 08:55:36 GMT, Albert Mingkun Yang wrote: >> It's probably easier to read the new code directly. The two classes in `serialVMOperations` serve as entrance points to invoke young/full GCs. Some previously hidden decisions are made more obvious, e.g. if a young-gc fails (or will probablly fail), fallback to full-gc. >> >> Additionally, `StatRecord` is removed, because this kind of info-aggregation should be done outsite VM (by third-party tool). >> >> Test: tier1-6 > > Albert Mingkun Yang has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains seven commits: > > - Merge branch 'master' into s1-do-collect > - review > - Merge branch 'master' into s1-do-collect > - merge > - review > - Merge branch 'master' into s1-do-collect > - s1-do-collect Changes requested by iwalulya (Reviewer). src/hotspot/share/gc/serial/serialHeap.cpp line 557: > 555: return result; > 556: } > 557: Would be nice to add a comment here to indicate that the previous collection could have shrunk the heap. src/hotspot/share/gc/serial/serialHeap.cpp line 714: > 712: > 713: void SerialHeap::do_full_collection_no_gc_locker(bool clear_all_soft_refs) { > 714: IsSTWGCActiveMark gc_active_mark; `IsSTWGCActiveMark active_gc_mark;`in`do_young_collection_no_gc_locker`, just choose one and be consistent with it src/hotspot/share/gc/serial/serialHeap.cpp line 907: > 905: > 906: void SerialHeap::print_tracing_info() const { > 907: // Nothing What is the `Nothing` supposed to convey here? src/hotspot/share/gc/serial/serialHeap.hpp line 117: > 115: void do_full_collection_no_gc_locker(bool clear_all_soft_refs); > 116: > 117: void collect_at_safepoint_no_gc_locker(bool full); I am not very convinced by the naming of the methods with the "no_gc_locker" constraint. But I guess it is following same convention as "*at_safepoint" method naming. ------------- PR Review: https://git.openjdk.org/jdk/pull/19056#pullrequestreview-2062384283 PR Review Comment: https://git.openjdk.org/jdk/pull/19056#discussion_r1604380445 PR Review Comment: https://git.openjdk.org/jdk/pull/19056#discussion_r1604389765 PR Review Comment: https://git.openjdk.org/jdk/pull/19056#discussion_r1604390627 PR Review Comment: https://git.openjdk.org/jdk/pull/19056#discussion_r1604388902 From iwalulya at openjdk.org Fri May 17 07:26:10 2024 From: iwalulya at openjdk.org (Ivan Walulya) Date: Fri, 17 May 2024 07:26:10 GMT Subject: RFR: 8331557: Serial: Refactor SerialHeap::do_collection [v6] In-Reply-To: References: Message-ID: <3BLe1vEaprqbbs1Qa2dFYzxOJ3YDdIAC6XVC-R5fgbg=.2c8bab90-0c18-4c33-a343-6667ff2f92ec@github.com> On Fri, 17 May 2024 07:17:13 GMT, Albert Mingkun Yang wrote: >> src/hotspot/share/gc/serial/serialHeap.cpp line 907: >> >>> 905: >>> 906: void SerialHeap::print_tracing_info() const { >>> 907: // Nothing >> >> What is the `Nothing` supposed to convey here? > > To emphasize that this empty method is intentional, inspired by `ZCollectedHeap::print_tracing_info`. Then better to keep the verb in the comment ` // Does nothing` >> src/hotspot/share/gc/serial/serialHeap.hpp line 117: >> >>> 115: void do_full_collection_no_gc_locker(bool clear_all_soft_refs); >>> 116: >>> 117: void collect_at_safepoint_no_gc_locker(bool full); >> >> I am not very convinced by the naming of the methods with the "no_gc_locker" constraint. But I guess it is following same convention as "*at_safepoint" method naming. > > How about calling them `try_x` and `x` for the public and private API, respectively, e.g. `try_do_full_collection` and `do_full_collection`? I prefer that to the "no_gc_locker" emphasizing names ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19056#discussion_r1604466593 PR Review Comment: https://git.openjdk.org/jdk/pull/19056#discussion_r1604465894 From ayang at openjdk.org Fri May 17 07:26:09 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Fri, 17 May 2024 07:26:09 GMT Subject: RFR: 8331557: Serial: Refactor SerialHeap::do_collection [v6] In-Reply-To: References: Message-ID: On Fri, 17 May 2024 06:02:31 GMT, Ivan Walulya wrote: >> Albert Mingkun Yang has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains seven commits: >> >> - Merge branch 'master' into s1-do-collect >> - review >> - Merge branch 'master' into s1-do-collect >> - merge >> - review >> - Merge branch 'master' into s1-do-collect >> - s1-do-collect > > src/hotspot/share/gc/serial/serialHeap.cpp line 907: > >> 905: >> 906: void SerialHeap::print_tracing_info() const { >> 907: // Nothing > > What is the `Nothing` supposed to convey here? To emphasize that this empty method is intentional, inspired by `ZCollectedHeap::print_tracing_info`. > src/hotspot/share/gc/serial/serialHeap.hpp line 117: > >> 115: void do_full_collection_no_gc_locker(bool clear_all_soft_refs); >> 116: >> 117: void collect_at_safepoint_no_gc_locker(bool full); > > I am not very convinced by the naming of the methods with the "no_gc_locker" constraint. But I guess it is following same convention as "*at_safepoint" method naming. How about calling them `try_x` and `x` for the public and private API, respectively, e.g. `try_do_full_collection` and `do_full_collection`? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19056#discussion_r1604459781 PR Review Comment: https://git.openjdk.org/jdk/pull/19056#discussion_r1604462152 From jsjolen at openjdk.org Fri May 17 07:28:15 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Fri, 17 May 2024 07:28:15 GMT Subject: RFR: 8312132: Add tracking of multiple address spaces in NMT [v89] In-Reply-To: References: Message-ID: On Thu, 16 May 2024 15:22:44 GMT, Thomas Stuefe wrote: >> Johan Sj?len has updated the pull request incrementally with two additional commits since the last revision: >> >> - Add corresponding tests to visit_in_order when applicable >> - Remove usage of auto in tests > > src/hotspot/share/utilities/nativeCallStack.hpp line 57: > >> 55: >> 56: class NativeCallStack : public StackObj { >> 57: friend class VMATreeTest; > > I am surprised friend is needed, the private section of this class being so tiny. What does friend give you what you could not get via normal accessors? This permits access to the `_tree` field, so we can inspect it to check that it conforms to the shape we expect it to be. I don't think that we should have an accessor to that field, as that forms a part of its public interface. Test fixtures being friend classes is imo not a code smell, but a natural consequence of doing clear box testing. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18289#discussion_r1604469265 From ayang at openjdk.org Fri May 17 07:44:29 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Fri, 17 May 2024 07:44:29 GMT Subject: RFR: 8331557: Serial: Refactor SerialHeap::do_collection [v7] In-Reply-To: References: Message-ID: > It's probably easier to read the new code directly. The two classes in `serialVMOperations` serve as entrance points to invoke young/full GCs. Some previously hidden decisions are made more obvious, e.g. if a young-gc fails (or will probablly fail), fallback to full-gc. > > Additionally, `StatRecord` is removed, because this kind of info-aggregation should be done outsite VM (by third-party tool). > > Test: tier1-6 Albert Mingkun Yang has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains ten commits: - review - Merge branch 'master' into s1-do-collect - Merge branch 'master' into s1-do-collect - review - Merge branch 'master' into s1-do-collect - merge - review - Merge branch 'master' into s1-do-collect - s1-do-collect ------------- Changes: https://git.openjdk.org/jdk/pull/19056/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19056&range=06 Stats: 566 lines in 15 files changed: 125 ins; 356 del; 85 mod Patch: https://git.openjdk.org/jdk/pull/19056.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19056/head:pull/19056 PR: https://git.openjdk.org/jdk/pull/19056 From ayang at openjdk.org Fri May 17 07:44:29 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Fri, 17 May 2024 07:44:29 GMT Subject: RFR: 8331557: Serial: Refactor SerialHeap::do_collection [v6] In-Reply-To: <3BLe1vEaprqbbs1Qa2dFYzxOJ3YDdIAC6XVC-R5fgbg=.2c8bab90-0c18-4c33-a343-6667ff2f92ec@github.com> References: <3BLe1vEaprqbbs1Qa2dFYzxOJ3YDdIAC6XVC-R5fgbg=.2c8bab90-0c18-4c33-a343-6667ff2f92ec@github.com> Message-ID: On Fri, 17 May 2024 07:22:36 GMT, Ivan Walulya wrote: >> How about calling them `try_x` and `x` for the public and private API, respectively, e.g. `try_do_full_collection` and `do_full_collection`? > > I prefer that to the "no_gc_locker" emphasizing names I can do that for `try_collect_at_safepoint`, but `SerialHeap::do_full_collection` is an API from `CollectedHeap`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19056#discussion_r1604485388 From iwalulya at openjdk.org Fri May 17 07:44:29 2024 From: iwalulya at openjdk.org (Ivan Walulya) Date: Fri, 17 May 2024 07:44:29 GMT Subject: RFR: 8331557: Serial: Refactor SerialHeap::do_collection [v6] In-Reply-To: References: <3BLe1vEaprqbbs1Qa2dFYzxOJ3YDdIAC6XVC-R5fgbg=.2c8bab90-0c18-4c33-a343-6667ff2f92ec@github.com> Message-ID: On Fri, 17 May 2024 07:39:39 GMT, Albert Mingkun Yang wrote: >> I prefer that to the "no_gc_locker" emphasizing names > > I can do that for `try_collect_at_safepoint`, but `SerialHeap::do_full_collection` is an API from `CollectedHeap`. yeah, i only meant change those that are not "override" and have the "no_gclocker" postfix. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19056#discussion_r1604487661 From iwalulya at openjdk.org Fri May 17 08:06:12 2024 From: iwalulya at openjdk.org (Ivan Walulya) Date: Fri, 17 May 2024 08:06:12 GMT Subject: RFR: 8331557: Serial: Refactor SerialHeap::do_collection [v7] In-Reply-To: References: Message-ID: On Fri, 17 May 2024 07:44:29 GMT, Albert Mingkun Yang wrote: >> It's probably easier to read the new code directly. The two classes in `serialVMOperations` serve as entrance points to invoke young/full GCs. Some previously hidden decisions are made more obvious, e.g. if a young-gc fails (or will probablly fail), fallback to full-gc. >> >> Additionally, `StatRecord` is removed, because this kind of info-aggregation should be done outsite VM (by third-party tool). >> >> Test: tier1-6 > > Albert Mingkun Yang has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains ten commits: > > - review > - Merge branch 'master' into s1-do-collect > - Merge branch 'master' into s1-do-collect > - review > - Merge branch 'master' into s1-do-collect > - merge > - review > - Merge branch 'master' into s1-do-collect > - s1-do-collect Looks good! Thanks for the cleanup. ------------- Marked as reviewed by iwalulya (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19056#pullrequestreview-2062603522 From jsjolen at openjdk.org Fri May 17 08:08:15 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Fri, 17 May 2024 08:08:15 GMT Subject: RFR: 8312132: Add tracking of multiple address spaces in NMT [v89] In-Reply-To: References: Message-ID: On Thu, 16 May 2024 21:04:49 GMT, Gerard Ziemski wrote: >> Johan Sj?len has updated the pull request incrementally with two additional commits since the last revision: >> >> - Add corresponding tests to visit_in_order when applicable >> - Remove usage of auto in tests > > src/hotspot/share/nmt/nmtTreap.hpp line 175: > >> 173: #ifdef ASSERT >> 174: void verify_self() { >> 175: const double expected_maximum_depth = log(this->_node_count+1) * 5; > > Where did 5 come from, shouldn't this be: > > ` const double expected_maximum_depth = log(this->_node_count+1) * (this->_node_count+1) > ` > ? The depth of a binary tree is on the order of `log(n)`, "the order of" is important here. Essentially, we need some wiggle room. I found that 3 fails, so I bumped it to 5. This did cause me to investigate whether we can pick a tighter bound, and `3.5` fits as long as we perform `ceil` instead of `floor` on the result. I'll comment where the constant comes from. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18289#discussion_r1604520911 From jsjolen at openjdk.org Fri May 17 08:45:30 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Fri, 17 May 2024 08:45:30 GMT Subject: RFR: 8312132: Add tracking of multiple address spaces in NMT [v90] In-Reply-To: References: Message-ID: > Hi, > > This PR introduces a new abstraction to NMT, named `MemoryFileTracker`. Today, NMT does not track any memory outside of the virtual memory address space. This means that if you allocated memory in something such as a memory-backed file and use `mmap` to map into that memory, then you'll have trouble reporting this to NMT. This is the situation that ZGC is in, and that is what this patch attempts to fix. > > ## `MemoryFileTracker` > > The `MemoryFileTracker` adds the ability of adding new virtual memory address spaces to NMT and committing memory to these, the basic API is: > > ```c++ > static MemoryFile* make_device(const char* descriptive_name); > static void free_device(MemoryFile* device); > > static void allocate_memory(MemoryFile* device, size_t offset, size_t size, > MEMFLAGS flag, const NativeCallStack& stack); > static void free_memory(MemoryFile* device, size_t offset, size_t size); > > > It is easiest to see how this is used by looking at what ZGC's `ZNMT` class does: > > ```c++ > void ZNMT::reserve(zaddress_unsafe start, size_t size) { > MemTracker::record_virtual_memory_reserve((address)start, size, CALLER_PC, mtJavaHeap); > } > void ZNMT::commit(zoffset offset, size_t size) { > MemTracker::allocate_memory_in(ZNMT::_device, static_cast(offset), size, mtJavaHeap, CALLER_PC); > } > void ZNMT::uncommit(zoffset offset, size_t size) { > MemTracker::free_memory_in(ZNMT::_device, (size_t)offset, size); > } > > void ZNMT::map(zaddress_unsafe addr, size_t size, zoffset offset) { > // NMT doesn't track mappings at the moment. > } > void ZNMT::unmap(zaddress_unsafe addr, size_t size) { > // NMT doesn't track mappings at the moment. > } > > > As you can see, any mapping between reserved regions and device-allocated memory is not recorded in NMT. This means that in detailed mode you only get reserved regions printed for the reserved memory, the device-allocated memory is reported separately. When performing summary reporting any memory allocated via these devices is added to the corresponding `MEMFLAGS` as `committed` memory. > > This patch is also acting as a base on which we deploy multiple new backend ideas to NMT. These ideas are: > > 1. Implement VMA tracking using a balanced binary tree approach. Today's `VirtualMemoryTracker`'s usage of linked lists is slow and brittle, we'd like to move away from it. Our Treap-based approach in this patch gives a performance boost such that we see 25x better performance in a benchmark. The idea and draft of this... Johan Sj?len has updated the pull request incrementally with two additional commits since the last revision: - Fix visit_in_order tests - Find a closer bound for treap depth and express it in base-2 log ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18289/files - new: https://git.openjdk.org/jdk/pull/18289/files/d546e26c..59254f47 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18289&range=89 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18289&range=88-89 Stats: 36 lines in 2 files changed: 16 ins; 1 del; 19 mod Patch: https://git.openjdk.org/jdk/pull/18289.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18289/head:pull/18289 PR: https://git.openjdk.org/jdk/pull/18289 From jsjolen at openjdk.org Fri May 17 08:48:13 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Fri, 17 May 2024 08:48:13 GMT Subject: RFR: 8312132: Add tracking of multiple address spaces in NMT [v89] In-Reply-To: References: Message-ID: On Fri, 17 May 2024 08:05:16 GMT, Johan Sj?len wrote: >> src/hotspot/share/nmt/nmtTreap.hpp line 175: >> >>> 173: #ifdef ASSERT >>> 174: void verify_self() { >>> 175: const double expected_maximum_depth = log(this->_node_count+1) * 5; >> >> Where did 5 come from, shouldn't this be: >> >> ` const double expected_maximum_depth = log(this->_node_count+1) * (this->_node_count+1) >> ` >> ? > > The depth of a binary tree is on the order of `log(n)`, "the order of" is important here. Essentially, we need some wiggle room. I found that 3 fails, so I bumped it to 5. This did cause me to investigate whether we can pick a tighter bound, and `3.5` fits as long as we perform `ceil` instead of `floor` on the result. > > I'll comment where the constant comes from. This made me a bit curious about other trees bounds. An RB-tree has a bound of `2log_2(N + 1)`, so I decided to find the bound for our treap in base-2 log. Turns out we're at `2.5*log_2(N + 1)`, so very close to the RB-tree. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18289#discussion_r1604573924 From duke at openjdk.org Fri May 17 08:52:16 2024 From: duke at openjdk.org (kuaiwei) Date: Fri, 17 May 2024 08:52:16 GMT Subject: Withdrawn: 8325821: [REDO] use "dmb.ishst+dmb.ishld" for release barrier In-Reply-To: References: Message-ID: On Mon, 25 Mar 2024 06:54:01 GMT, kuaiwei wrote: > The origin patch for https://bugs.openjdk.org/browse/JDK-8324186 has 2 issues: > 1 It show regression in some platform, like Apple silicon in mac os > 2 Can not handle instruction sequence like "dmb.ishld; dmb.ishst; dmb.ishld; dmb.ishld" > > It can be fixed by: > 1 Enable AlwaysMergeDMB by default, only disable it in architecture we can see performance improvement (N1 or N2) > 2 Check the special pattern and merge the subsequent dmb. > > It also fix a bug when code buffer is expanding, st/ld/dmb can not be merged. I added unit tests for these. > > This patch still has a unhandled case. Insts like "dmb.ishld; dmb.ishst; dmb.ish", it will merge the last 2 instructions and can not merge all three. Because when emitting dmb.ish, if merge all previous dmbs, the code buffer will shrink the size. I think it may break some resumption and think it's not a common pattern. > > - Update: > After discussion, I made a new implementation based on finite state machine for merging instruction. The mergeable instruction will be pending in fsm until next unmergeable instruction. This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/18467 From jsjolen at openjdk.org Fri May 17 08:58:44 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Fri, 17 May 2024 08:58:44 GMT Subject: RFR: 8312132: Add tracking of multiple address spaces in NMT [v91] In-Reply-To: References: Message-ID: > Hi, > > This PR introduces a new abstraction to NMT, named `MemoryFileTracker`. Today, NMT does not track any memory outside of the virtual memory address space. This means that if you allocated memory in something such as a memory-backed file and use `mmap` to map into that memory, then you'll have trouble reporting this to NMT. This is the situation that ZGC is in, and that is what this patch attempts to fix. > > ## `MemoryFileTracker` > > The `MemoryFileTracker` adds the ability of adding new virtual memory address spaces to NMT and committing memory to these, the basic API is: > > ```c++ > static MemoryFile* make_device(const char* descriptive_name); > static void free_device(MemoryFile* device); > > static void allocate_memory(MemoryFile* device, size_t offset, size_t size, > MEMFLAGS flag, const NativeCallStack& stack); > static void free_memory(MemoryFile* device, size_t offset, size_t size); > > > It is easiest to see how this is used by looking at what ZGC's `ZNMT` class does: > > ```c++ > void ZNMT::reserve(zaddress_unsafe start, size_t size) { > MemTracker::record_virtual_memory_reserve((address)start, size, CALLER_PC, mtJavaHeap); > } > void ZNMT::commit(zoffset offset, size_t size) { > MemTracker::allocate_memory_in(ZNMT::_device, static_cast(offset), size, mtJavaHeap, CALLER_PC); > } > void ZNMT::uncommit(zoffset offset, size_t size) { > MemTracker::free_memory_in(ZNMT::_device, (size_t)offset, size); > } > > void ZNMT::map(zaddress_unsafe addr, size_t size, zoffset offset) { > // NMT doesn't track mappings at the moment. > } > void ZNMT::unmap(zaddress_unsafe addr, size_t size) { > // NMT doesn't track mappings at the moment. > } > > > As you can see, any mapping between reserved regions and device-allocated memory is not recorded in NMT. This means that in detailed mode you only get reserved regions printed for the reserved memory, the device-allocated memory is reported separately. When performing summary reporting any memory allocated via these devices is added to the corresponding `MEMFLAGS` as `committed` memory. > > This patch is also acting as a base on which we deploy multiple new backend ideas to NMT. These ideas are: > > 1. Implement VMA tracking using a balanced binary tree approach. Today's `VirtualMemoryTracker`'s usage of linked lists is slow and brittle, we'd like to move away from it. Our Treap-based approach in this patch gives a performance boost such that we see 25x better performance in a benchmark. The idea and draft of this... Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: Move definition of struct to gain external linkage Due to mgronlund ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18289/files - new: https://git.openjdk.org/jdk/pull/18289/files/59254f47..9d2e6768 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18289&range=90 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18289&range=89-90 Stats: 28 lines in 2 files changed: 14 ins; 14 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/18289.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18289/head:pull/18289 PR: https://git.openjdk.org/jdk/pull/18289 From duke at openjdk.org Fri May 17 09:02:13 2024 From: duke at openjdk.org (kuaiwei) Date: Fri, 17 May 2024 09:02:13 GMT Subject: RFR: 8325821: [REDO] use "dmb.ishst+dmb.ishld" for release barrier Message-ID: he origin patch for https://bugs.openjdk.org/browse/JDK-8324186 has 2 issues: 1 It show regression in some platform, like Apple silicon in mac os 2 Can not handle instruction sequence like "dmb.ishld; dmb.ishst; dmb.ishld; dmb.ishld" It can be fixed by: 1 Enable AlwaysMergeDMB by default, only disable it in architecture we can see performance improvement (N1 or N2) 2 Check the special pattern and merge the subsequent dmb. It also fix a bug when code buffer is expanding, st/ld/dmb can not be merged. I added unit tests for these. This patch still has a unhandled case. Insts like "dmb.ishld; dmb.ishst; dmb.ish", it will merge the last 2 instructions and can not merge all three. Because when emitting dmb.ish, if merge all previous dmbs, the code buffer will shrink the size. I think it may break some resumption and think it's not a common pattern. In previous PR https://github.com/openjdk/jdk/pull/18467 , I tried an implementation to use state machine for merging. But it looks risky to pending instruction during emitting. ------------- Commit messages: - 8325821: [REDO] use "dmb.ishst+dmb.ishld" for release barrier Changes: https://git.openjdk.org/jdk/pull/19278/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19278&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8325821 Stats: 343 lines in 9 files changed: 330 ins; 0 del; 13 mod Patch: https://git.openjdk.org/jdk/pull/19278.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19278/head:pull/19278 PR: https://git.openjdk.org/jdk/pull/19278 From ayang at openjdk.org Fri May 17 09:12:26 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Fri, 17 May 2024 09:12:26 GMT Subject: Integrated: 8331557: Serial: Refactor SerialHeap::do_collection In-Reply-To: References: Message-ID: <42-vFtR8If6ZyRbW1G5AL__9eFqxCl-FMOA-4g8NTqE=.25018359-ed9c-4a6f-9ac1-624095cb595c@github.com> On Thu, 2 May 2024 10:48:12 GMT, Albert Mingkun Yang wrote: > It's probably easier to read the new code directly. The two classes in `serialVMOperations` serve as entrance points to invoke young/full GCs. Some previously hidden decisions are made more obvious, e.g. if a young-gc fails (or will probablly fail), fallback to full-gc. > > Additionally, `StatRecord` is removed, because this kind of info-aggregation should be done outsite VM (by third-party tool). > > Test: tier1-6 This pull request has now been integrated. Changeset: f1ce9b0e Author: Albert Mingkun Yang URL: https://git.openjdk.org/jdk/commit/f1ce9b0ecce9b506f5bf7a66fcf03c93b9ae8fed Stats: 566 lines in 15 files changed: 125 ins; 356 del; 85 mod 8331557: Serial: Refactor SerialHeap::do_collection Reviewed-by: gli, iwalulya ------------- PR: https://git.openjdk.org/jdk/pull/19056 From ayang at openjdk.org Fri May 17 09:12:25 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Fri, 17 May 2024 09:12:25 GMT Subject: RFR: 8331557: Serial: Refactor SerialHeap::do_collection [v7] In-Reply-To: References: Message-ID: On Fri, 17 May 2024 07:44:29 GMT, Albert Mingkun Yang wrote: >> It's probably easier to read the new code directly. The two classes in `serialVMOperations` serve as entrance points to invoke young/full GCs. Some previously hidden decisions are made more obvious, e.g. if a young-gc fails (or will probablly fail), fallback to full-gc. >> >> Additionally, `StatRecord` is removed, because this kind of info-aggregation should be done outsite VM (by third-party tool). >> >> Test: tier1-6 > > Albert Mingkun Yang has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains ten commits: > > - review > - Merge branch 'master' into s1-do-collect > - Merge branch 'master' into s1-do-collect > - review > - Merge branch 'master' into s1-do-collect > - merge > - review > - Merge branch 'master' into s1-do-collect > - s1-do-collect Thanks for review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19056#issuecomment-2117095015 From jsjolen at openjdk.org Fri May 17 09:23:12 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Fri, 17 May 2024 09:23:12 GMT Subject: RFR: 8312132: Add tracking of multiple address spaces in NMT [v91] In-Reply-To: References: Message-ID: On Fri, 17 May 2024 08:58:44 GMT, Johan Sj?len wrote: >> Hi, >> >> This PR introduces a new abstraction to NMT, named `MemoryFileTracker`. Today, NMT does not track any memory outside of the virtual memory address space. This means that if you allocated memory in something such as a memory-backed file and use `mmap` to map into that memory, then you'll have trouble reporting this to NMT. This is the situation that ZGC is in, and that is what this patch attempts to fix. >> >> ## `MemoryFileTracker` >> >> The `MemoryFileTracker` adds the ability of adding new virtual memory address spaces to NMT and committing memory to these, the basic API is: >> >> ```c++ >> static MemoryFile* make_device(const char* descriptive_name); >> static void free_device(MemoryFile* device); >> >> static void allocate_memory(MemoryFile* device, size_t offset, size_t size, >> MEMFLAGS flag, const NativeCallStack& stack); >> static void free_memory(MemoryFile* device, size_t offset, size_t size); >> >> >> It is easiest to see how this is used by looking at what ZGC's `ZNMT` class does: >> >> ```c++ >> void ZNMT::reserve(zaddress_unsafe start, size_t size) { >> MemTracker::record_virtual_memory_reserve((address)start, size, CALLER_PC, mtJavaHeap); >> } >> void ZNMT::commit(zoffset offset, size_t size) { >> MemTracker::allocate_memory_in(ZNMT::_device, static_cast(offset), size, mtJavaHeap, CALLER_PC); >> } >> void ZNMT::uncommit(zoffset offset, size_t size) { >> MemTracker::free_memory_in(ZNMT::_device, (size_t)offset, size); >> } >> >> void ZNMT::map(zaddress_unsafe addr, size_t size, zoffset offset) { >> // NMT doesn't track mappings at the moment. >> } >> void ZNMT::unmap(zaddress_unsafe addr, size_t size) { >> // NMT doesn't track mappings at the moment. >> } >> >> >> As you can see, any mapping between reserved regions and device-allocated memory is not recorded in NMT. This means that in detailed mode you only get reserved regions printed for the reserved memory, the device-allocated memory is reported separately. When performing summary reporting any memory allocated via these devices is added to the corresponding `MEMFLAGS` as `committed` memory. >> >> This patch is also acting as a base on which we deploy multiple new backend ideas to NMT. These ideas are: >> >> 1. Implement VMA tracking using a balanced binary tree approach. Today's `VirtualMemoryTracker`'s usage of linked lists is slow and brittle, we'd like to move away from it. Our Treap-based approach in this patch gives a performance bo... > > Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: > > Move definition of struct to gain external linkage > > Due to mgronlund Thanks to Markus Gr?nlund for figuring out the linking error on Windows debug builds. It has to do with AddressState having internal linkage when defined in `.cpp` and then using it as a template parameter, causing the whole type to have internal linkage (and this is an issue apparently). This is a gray area in the standard, apparently :-). ------------- PR Comment: https://git.openjdk.org/jdk/pull/18289#issuecomment-2117116054 From ayang at openjdk.org Fri May 17 09:34:09 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Fri, 17 May 2024 09:34:09 GMT Subject: RFR: 8332448: Make SpaceMangler inherit AllStatic Message-ID: Extract the state for `top` out of `SpaceMangler`. Users (Serial and Parallel GC) already tracks the top before/after GC. The "real" change in this PR are only two places: `serialFullGC.cpp` and `PSParallelCompact::post_compact`. Test: tier1-5 ------------- Commit messages: - mangle Changes: https://git.openjdk.org/jdk/pull/19279/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19279&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8332448 Stats: 518 lines in 26 files changed: 9 ins; 486 del; 23 mod Patch: https://git.openjdk.org/jdk/pull/19279.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19279/head:pull/19279 PR: https://git.openjdk.org/jdk/pull/19279 From alanb at openjdk.org Fri May 17 09:34:11 2024 From: alanb at openjdk.org (Alan Bateman) Date: Fri, 17 May 2024 09:34:11 GMT Subject: RFR: 8331671: Implement JEP 472: Prepare to Restrict the Use of JNI [v7] In-Reply-To: References: Message-ID: <9oh0XZoux2OKBke0T-hr6CS9OsuDD6Wk1HdQdPu2YyY=.1d9a2a75-cb80-498c-86f9-da457669e3e8@github.com> On Thu, 16 May 2024 12:23:44 GMT, Maurizio Cimadamore wrote: >> This PR implements [JEP 472](https://openjdk.org/jeps/472), by restricting the use of JNI in the following ways: >> >> * `System::load` and `System::loadLibrary` are now restricted methods >> * `Runtime::load` and `Runtime::loadLibrary` are now restricted methods >> * binding a JNI `native` method declaration to a native implementation is now considered a restricted operation >> >> This PR slightly changes the way in which the JDK deals with restricted methods, even for FFM API calls. In Java 22, the single `--enable-native-access` was used both to specify a set of modules for which native access should be allowed *and* to specify whether illegal native access (that is, native access occurring from a module not specified by `--enable-native-access`) should be treated as an error or a warning. More specifically, an error is only issued if the `--enable-native-access flag` is used at least once. >> >> Here, a new flag is introduced, namely `illegal-native-access=allow/warn/deny`, which is used to specify what should happen when access to a restricted method and/or functionality is found outside the set of modules specified with `--enable-native-access`. The default policy is `warn`, but users can select `allow` to suppress the warnings, or `deny` to cause `IllegalCallerException` to be thrown. This aligns the treatment of restricted methods with other mechanisms, such as `--illegal-access` and the more recent `--sun-misc-unsafe-memory-access`. >> >> Some changes were required in the package-info javadoc for `java.lang.foreign`, to reflect the changes in the command line flags described above. > > Maurizio Cimadamore has updated the pull request incrementally with one additional commit since the last revision: > > Add note on --illegal-native-access default value in the launcher help src/java.base/share/classes/java/lang/foreign/package-info.java line 170: > 168: * the special value {@code ALL-UNNAMED} can be used). Access to restricted methods > 169: * from modules not listed by that option is deemed illegal. Clients can > 170: * control how illegal access to restricted method is handled, using the command line I assume this should be "to a restricted method is handled" or "to restricted methods are handled", either would work here. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19213#discussion_r1604637950 From alanb at openjdk.org Fri May 17 09:40:21 2024 From: alanb at openjdk.org (Alan Bateman) Date: Fri, 17 May 2024 09:40:21 GMT Subject: RFR: 8331671: Implement JEP 472: Prepare to Restrict the Use of JNI [v7] In-Reply-To: References: Message-ID: On Thu, 16 May 2024 12:23:44 GMT, Maurizio Cimadamore wrote: >> This PR implements [JEP 472](https://openjdk.org/jeps/472), by restricting the use of JNI in the following ways: >> >> * `System::load` and `System::loadLibrary` are now restricted methods >> * `Runtime::load` and `Runtime::loadLibrary` are now restricted methods >> * binding a JNI `native` method declaration to a native implementation is now considered a restricted operation >> >> This PR slightly changes the way in which the JDK deals with restricted methods, even for FFM API calls. In Java 22, the single `--enable-native-access` was used both to specify a set of modules for which native access should be allowed *and* to specify whether illegal native access (that is, native access occurring from a module not specified by `--enable-native-access`) should be treated as an error or a warning. More specifically, an error is only issued if the `--enable-native-access flag` is used at least once. >> >> Here, a new flag is introduced, namely `illegal-native-access=allow/warn/deny`, which is used to specify what should happen when access to a restricted method and/or functionality is found outside the set of modules specified with `--enable-native-access`. The default policy is `warn`, but users can select `allow` to suppress the warnings, or `deny` to cause `IllegalCallerException` to be thrown. This aligns the treatment of restricted methods with other mechanisms, such as `--illegal-access` and the more recent `--sun-misc-unsafe-memory-access`. >> >> Some changes were required in the package-info javadoc for `java.lang.foreign`, to reflect the changes in the command line flags described above. > > Maurizio Cimadamore has updated the pull request incrementally with one additional commit since the last revision: > > Add note on --illegal-native-access default value in the launcher help src/java.base/share/classes/jdk/internal/access/JavaLangAccess.java line 288: > 286: * throw exception depending on the configuration. > 287: */ > 288: void ensureNativeAccess(Module m, Class owner, String methodName, Class currentClass, boolean jni); It might be helpful to future maintainers if we put `@param` descriptions for these parameters. I had to re-read Module.enableNativeAccess to remember the difference between the owner class and the current class. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19213#discussion_r1604644767 From alanb at openjdk.org Fri May 17 09:44:05 2024 From: alanb at openjdk.org (Alan Bateman) Date: Fri, 17 May 2024 09:44:05 GMT Subject: RFR: 8331671: Implement JEP 472: Prepare to Restrict the Use of JNI [v7] In-Reply-To: References: Message-ID: On Thu, 16 May 2024 12:23:44 GMT, Maurizio Cimadamore wrote: >> This PR implements [JEP 472](https://openjdk.org/jeps/472), by restricting the use of JNI in the following ways: >> >> * `System::load` and `System::loadLibrary` are now restricted methods >> * `Runtime::load` and `Runtime::loadLibrary` are now restricted methods >> * binding a JNI `native` method declaration to a native implementation is now considered a restricted operation >> >> This PR slightly changes the way in which the JDK deals with restricted methods, even for FFM API calls. In Java 22, the single `--enable-native-access` was used both to specify a set of modules for which native access should be allowed *and* to specify whether illegal native access (that is, native access occurring from a module not specified by `--enable-native-access`) should be treated as an error or a warning. More specifically, an error is only issued if the `--enable-native-access flag` is used at least once. >> >> Here, a new flag is introduced, namely `illegal-native-access=allow/warn/deny`, which is used to specify what should happen when access to a restricted method and/or functionality is found outside the set of modules specified with `--enable-native-access`. The default policy is `warn`, but users can select `allow` to suppress the warnings, or `deny` to cause `IllegalCallerException` to be thrown. This aligns the treatment of restricted methods with other mechanisms, such as `--illegal-access` and the more recent `--sun-misc-unsafe-memory-access`. >> >> Some changes were required in the package-info javadoc for `java.lang.foreign`, to reflect the changes in the command line flags described above. > > Maurizio Cimadamore has updated the pull request incrementally with one additional commit since the last revision: > > Add note on --illegal-native-access default value in the launcher help src/java.base/share/classes/java/lang/ClassLoader.java line 2448: > 2446: * Invoked in the VM class linking code. > 2447: */ > 2448: static long findNative(ClassLoader loader, Class clazz, String entryName, String javaName) { I think this is another place where `@param` descriptions would help as it's not immediately clear that "javaName" is a method name. src/java.base/share/classes/java/lang/Runtime.java line 39: > 37: > 38: import jdk.internal.access.SharedSecrets; > 39: import jdk.internal.javac.Restricted; Runtime has been touched for a while so you'll need to bump the copyright year. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19213#discussion_r1604648529 PR Review Comment: https://git.openjdk.org/jdk/pull/19213#discussion_r1604650293 From alanb at openjdk.org Fri May 17 09:48:13 2024 From: alanb at openjdk.org (Alan Bateman) Date: Fri, 17 May 2024 09:48:13 GMT Subject: RFR: 8331671: Implement JEP 472: Prepare to Restrict the Use of JNI [v7] In-Reply-To: References: Message-ID: On Thu, 16 May 2024 12:23:44 GMT, Maurizio Cimadamore wrote: >> This PR implements [JEP 472](https://openjdk.org/jeps/472), by restricting the use of JNI in the following ways: >> >> * `System::load` and `System::loadLibrary` are now restricted methods >> * `Runtime::load` and `Runtime::loadLibrary` are now restricted methods >> * binding a JNI `native` method declaration to a native implementation is now considered a restricted operation >> >> This PR slightly changes the way in which the JDK deals with restricted methods, even for FFM API calls. In Java 22, the single `--enable-native-access` was used both to specify a set of modules for which native access should be allowed *and* to specify whether illegal native access (that is, native access occurring from a module not specified by `--enable-native-access`) should be treated as an error or a warning. More specifically, an error is only issued if the `--enable-native-access flag` is used at least once. >> >> Here, a new flag is introduced, namely `illegal-native-access=allow/warn/deny`, which is used to specify what should happen when access to a restricted method and/or functionality is found outside the set of modules specified with `--enable-native-access`. The default policy is `warn`, but users can select `allow` to suppress the warnings, or `deny` to cause `IllegalCallerException` to be thrown. This aligns the treatment of restricted methods with other mechanisms, such as `--illegal-access` and the more recent `--sun-misc-unsafe-memory-access`. >> >> Some changes were required in the package-info javadoc for `java.lang.foreign`, to reflect the changes in the command line flags described above. > > Maurizio Cimadamore has updated the pull request incrementally with one additional commit since the last revision: > > Add note on --illegal-native-access default value in the launcher help This looks good. Just a few minor comments where future maintainers might appreciate comments that describe parameters. src/java.base/share/classes/java/lang/Module.java line 332: > 330: String caller = currentClass != null ? currentClass.getName() : "code"; > 331: if (jni) { > 332: System.err.printf(""" System.err may change in a running VM. It may be that we will need to change this at some point to use its initial setting. Not suggesting we changing it now but we might have to re-visit this. ------------- Marked as reviewed by alanb (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19213#pullrequestreview-2062832385 PR Review Comment: https://git.openjdk.org/jdk/pull/19213#discussion_r1604653749 From jsjolen at openjdk.org Fri May 17 11:27:21 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Fri, 17 May 2024 11:27:21 GMT Subject: RFR: 8312132: Add tracking of multiple address spaces in NMT [v89] In-Reply-To: References: Message-ID: On Fri, 17 May 2024 08:45:49 GMT, Johan Sj?len wrote: >> The depth of a binary tree is on the order of `log(n)`, "the order of" is important here. Essentially, we need some wiggle room. I found that 3 fails, so I bumped it to 5. This did cause me to investigate whether we can pick a tighter bound, and `3.5` fits as long as we perform `ceil` instead of `floor` on the result. >> >> I'll comment where the constant comes from. > > This made me a bit curious about other trees bounds. An RB-tree has a bound of `2log_2(N + 1)`, so I decided to find the bound for our treap in base-2 log. Turns out we're at `2.5*log_2(N + 1)`, so very close to the RB-tree. ># assert(maximum_depth_found <= ceil(expected_maximum_depth)) failed: depth unexpectedly large, was: 57, expected: 56 Apparently there's a very small chance of it being worse than that :-). I'll adjust the expectations here a bit more. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18289#discussion_r1604815042 From shade at openjdk.org Fri May 17 11:35:37 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 17 May 2024 11:35:37 GMT Subject: RFR: 8331572: Allow using OopMapCache outside of STW GC phases [v3] In-Reply-To: References: Message-ID: <08qAsGXFBbttgcQ0tfpfKQmtcuyiP4E2I0Tvi1tJNCE=.3cf52475-8b0e-4f7b-b43d-48305b74e008@github.com> > As the reproducer in the issue shows, we would also like to use the `OopMapCache` during the concurrent GC phases. Zhengyu mentions there is also a production problem for stack walking that would benefit from letting `OopMapCache` be used without looking at GC at all. > > This PR unblocks `OopMapCache` uses for everything. Cleanups are nominally done by service thread. But, still appreciating that majority of use cases would be from GCs, we leave the proactive cleanups from the GC ops here as well. It requires the synchronization between readers that might be copying out the entries out of the hashmap and the concurrent reclamation. Handily, `GlobalCounter` can be used for that purpose. > > After this lands, I think we can go over `OopMapCache::compute_one_oop_map` uses and see if they would instead like to use the cached `lookup` to benefit from this cache too. I think those paths are for OSR and deopts, so their performance is unlikely to be critical. This PR already covers the concurrent GC paths well. > > Additional testing: > - [x] Performance test reproducer from the bug improves significantly > - [x] Linux AArch64 server fastdebug, `hotspot_gc_shenandoah` (10x) > - [x] Linux AArch64 server fastdebug, `all` > - [x] Linux x86_64 server fastdebug, `all` Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: Chicken out: do not notify Service thread from OopMapCache enqueue paths ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19229/files - new: https://git.openjdk.org/jdk/pull/19229/files/29dee418..c81a1139 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19229&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19229&range=01-02 Stats: 22 lines in 5 files changed: 10 ins; 5 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/19229.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19229/head:pull/19229 PR: https://git.openjdk.org/jdk/pull/19229 From shade at openjdk.org Fri May 17 11:35:37 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 17 May 2024 11:35:37 GMT Subject: RFR: 8331572: Allow using OopMapCache outside of STW GC phases [v2] In-Reply-To: <-tSv8ySxibUCWI0vT1FMkA7zP5iL8AfQSsaQKw_bAMs=.ece60fa5-e7c2-4e53-a9dd-e8066474b3c3@github.com> References: <-tSv8ySxibUCWI0vT1FMkA7zP5iL8AfQSsaQKw_bAMs=.ece60fa5-e7c2-4e53-a9dd-e8066474b3c3@github.com> Message-ID: On Thu, 16 May 2024 17:47:16 GMT, Coleen Phillimore wrote: >> Well, tests pass with this change, but now I am thinking if we would eventually run into any lock ranking problem here. At very least `stackwatermark` is ranked above `service`, so we are safe for concurrent GCs. There are only a few locks that are ranked below `service`, so maybe I am overthinking this? > > It is a low level lock, I think it'll be ok, you could check out some call stacks but the tests should find these lock inversions if they exist (famous last words). OK, the "problem" here is that OopMapCache is used from the generic interpreter frame walkers, so I cannot really trace the callers all that well. And it looks to me that capturing the failure in tests would be maddeningly hard: the notification happens very rarely. So we cannot even rely on current tests. So I am chickening out from notifying the service thread on this path. Instead, we would rely on service thread timed wait in current mainline. I amended the patch a bit, with the method that can trigger the cleanup. When/if we backport this change to previous releases, we can just hook the call to that method to safepoint cleanup sequence, like we used to have for String/SymbolTable; which would cover the case when service thread is not time-waited yet. This also likely frees application threads from the cleanup, as cleanup is delegated solely to Service thread now. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19229#discussion_r1604830095 From jsjolen at openjdk.org Fri May 17 11:41:30 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Fri, 17 May 2024 11:41:30 GMT Subject: RFR: 8312132: Add tracking of multiple address spaces in NMT [v92] In-Reply-To: References: Message-ID: > Hi, > > This PR introduces a new abstraction to NMT, named `MemoryFileTracker`. Today, NMT does not track any memory outside of the virtual memory address space. This means that if you allocated memory in something such as a memory-backed file and use `mmap` to map into that memory, then you'll have trouble reporting this to NMT. This is the situation that ZGC is in, and that is what this patch attempts to fix. > > ## `MemoryFileTracker` > > The `MemoryFileTracker` adds the ability of adding new virtual memory address spaces to NMT and committing memory to these, the basic API is: > > ```c++ > static MemoryFile* make_device(const char* descriptive_name); > static void free_device(MemoryFile* device); > > static void allocate_memory(MemoryFile* device, size_t offset, size_t size, > MEMFLAGS flag, const NativeCallStack& stack); > static void free_memory(MemoryFile* device, size_t offset, size_t size); > > > It is easiest to see how this is used by looking at what ZGC's `ZNMT` class does: > > ```c++ > void ZNMT::reserve(zaddress_unsafe start, size_t size) { > MemTracker::record_virtual_memory_reserve((address)start, size, CALLER_PC, mtJavaHeap); > } > void ZNMT::commit(zoffset offset, size_t size) { > MemTracker::allocate_memory_in(ZNMT::_device, static_cast(offset), size, mtJavaHeap, CALLER_PC); > } > void ZNMT::uncommit(zoffset offset, size_t size) { > MemTracker::free_memory_in(ZNMT::_device, (size_t)offset, size); > } > > void ZNMT::map(zaddress_unsafe addr, size_t size, zoffset offset) { > // NMT doesn't track mappings at the moment. > } > void ZNMT::unmap(zaddress_unsafe addr, size_t size) { > // NMT doesn't track mappings at the moment. > } > > > As you can see, any mapping between reserved regions and device-allocated memory is not recorded in NMT. This means that in detailed mode you only get reserved regions printed for the reserved memory, the device-allocated memory is reported separately. When performing summary reporting any memory allocated via these devices is added to the corresponding `MEMFLAGS` as `committed` memory. > > This patch is also acting as a base on which we deploy multiple new backend ideas to NMT. These ideas are: > > 1. Implement VMA tracking using a balanced binary tree approach. Today's `VirtualMemoryTracker`'s usage of linked lists is slow and brittle, we'd like to move away from it. Our Treap-based approach in this patch gives a performance boost such that we see 25x better performance in a benchmark. The idea and draft of this... Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: Allow for up to 3 extra in depth ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18289/files - new: https://git.openjdk.org/jdk/pull/18289/files/9d2e6768..75fcefbc Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18289&range=91 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18289&range=90-91 Stats: 3 lines in 1 file changed: 1 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/18289.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18289/head:pull/18289 PR: https://git.openjdk.org/jdk/pull/18289 From tholenstein at openjdk.org Fri May 17 11:54:02 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Fri, 17 May 2024 11:54:02 GMT Subject: RFR: 8329748: Change default value of AssertWXAtThreadSync to true In-Reply-To: References: Message-ID: On Thu, 9 May 2024 02:36:25 GMT, Dean Long wrote: > FWIW, I decided to look into WXExec as default (JDK-8328306), and in my draft so far I have removedAssertWXAtThreadSync completely, and I suspect that a successful implementation of exec-by-default will make JDK-8307817 no longer needed as well. Thanks for looking at JDK-8328306. Sounds like an interesting approach that could simplify things with WXExec/WXWrite. But do you think we can integrate this PR until JDK-8328306 is ready? ------------- PR Comment: https://git.openjdk.org/jdk/pull/19102#issuecomment-2117427816 From rehn at openjdk.org Fri May 17 12:50:18 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Fri, 17 May 2024 12:50:18 GMT Subject: RFR: 8326306: RISC-V: Re-structure MASM calls and jumps [v13] In-Reply-To: References: Message-ID: > Hi, please consider. > > We have code that directly use the asm for call/jumps instead masm. > Our masm have a bit odd naming, and we don't use 'proper' pseudoinstructions/mnemonics. > Suggested by [riscv-asm-manual](https://github.com/riscv-non-isa/riscv-asm-manual/tree/master) > > j offset jal x0, offset Jump > jal offset jal x1, offset Jump and link > jr rs jalr x0, rs, 0 Jump register > jalr rs jalr x1, rs, 0 Jump and link register > ret jalr x0, x1, 0 Return from subroutine > call offset auipc x1, offset[31:12]; jalr x1, x1, offset[11:0] Call far-away subroutine > tail offset auipc x6, offset[31:12]; jalr x0, x6, offset[11:0] Tail call far-away subroutine > > But these can only be implemented like this if you have small enough application. > The fallback of these is to use GOT (your C compiler should place a copy of GOT every 2G so it's always reachable). > We don't have GOT, instead we materialize, so there is still differences between these and ours. > > This patch: > - Tries to follow these suggested mappings as good we can. > - Make sure all jumps/calls go through MASM. (so we get control and can easily change for sites using a certain calling convention) > - To avoid confusion between MASM public/private methods and ASM methods and the mnemonics there are some renaming. > E.g. the mnemonics jal means call offset, as we can't use that so there is no 'jal'. > - I enabled c.j, but right now we never generate it. > - As always the macro does no good and are legacy from when code base did not use templates. (also the x-macros screws up my IDE (vim+rtags)) > > I started down this path due to I have followup patch on top of this which removes trampoline in favor for load-n-jump. > (WIP: https://github.com/robehn/jdk/compare/jal-fixes...robehn:jdk:load-n-link?expand=1) > While looking into our calls it was a bit confusing, this helps. > > Done a couple of t1-3 slightly different version of this patch, and as part of the followup, no issues found. (VF2, qemu, LP4) > Re-running tests, had some last minute changes. > > Thanks, Robbin Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 16 additional commits since the last revision: - Merge branch 'master' into jal-fixes - Use la() instead movptr where ok. - Review changes - Merge branch 'master' into jal-fixes - Merge branch 'master' into jal-fixes - Revert JNI field, call()->li() - Use li instead of movptr for call - REVERT: Use li instead of movptr - Use li instead of movptr - VM leaf should use li - ... and 6 more: https://git.openjdk.org/jdk/compare/e5692ccc...d882cd59 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18942/files - new: https://git.openjdk.org/jdk/pull/18942/files/b663e872..d882cd59 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18942&range=12 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18942&range=11-12 Stats: 15735 lines in 342 files changed: 9020 ins; 4995 del; 1720 mod Patch: https://git.openjdk.org/jdk/pull/18942.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18942/head:pull/18942 PR: https://git.openjdk.org/jdk/pull/18942 From mbaesken at openjdk.org Fri May 17 13:03:22 2024 From: mbaesken at openjdk.org (Matthias Baesken) Date: Fri, 17 May 2024 13:03:22 GMT Subject: RFR: 8332473: ubsan: growableArray.hpp:290:10: runtime error: null pointer passed as argument 1, which is declared to never be null Message-ID: <-LubBa-IRTqX4WOO-P9_9ulsmTV2KUgUAwZjbiRKcZg=.f3958562-a66d-4b09-9136-002f0736c472@github.com> On Linux x86_64 fastdebug with ubsan enabled we run into this error because we call qsort with a first parameter that is null. /jdk/src/hotspot/share/utilities/growableArray.hpp:290:10: runtime error: null pointer passed as argument 1, which is declared to never be null #0 0x150d701bb4b1 in GrowableArrayView::sort(int (*)(nmethod**, nmethod**)) /jdk/src/hotspot/share/utilities/growableArray.hpp:290 #1 0x150d701bb4b1 in ClassUnloadingContext::free_nmethods() /jdk/src/hotspot/share/gc/shared/classUnloadingContext.cpp:159 #2 0x150d71f5cca3 in G1CollectedHeap::unload_classes_and_code(char const*, BoolObjectClosure*, GCTimer*) /jdk/src/hotspot/share/gc/g1/g1CollectedHeap.cpp:2538 #3 0x150d71ffb009 in G1FullCollector::phase1_mark_live_objects() /jdk/src/hotspot/share/gc/g1/g1FullCollector.cpp:330 #4 0x150d71ffc675 in G1FullCollector::collect() /jdk/src/hotspot/share/gc/g1/g1FullCollector.cpp:209 #5 0x150d71f3e593 in G1CollectedHeap::do_full_collection(bool, bool) /jdk/src/hotspot/share/gc/g1/g1CollectedHeap.cpp:842 #6 0x150d71f5b12d in G1CollectedHeap::satisfy_failed_allocation_helper(unsigned long, bool, bool, bool, bool*) /jdk/src/hotspot/share/gc/g1/g1CollectedHeap.cpp:917 #7 0x150d71f5b3dc in G1CollectedHeap::satisfy_failed_allocation(unsigned long, bool*) /jdk/src/hotspot/share/gc/g1/g1CollectedHeap.cpp:930 #8 0x150d721835f7 in VM_G1CollectForAllocation::doit() /jdk/src/hotspot/share/gc/g1/g1VMOperations.cpp:127 #9 0x150d74291ec8 in VM_Operation::evaluate() /jdk/src/hotspot/share/runtime/vmOperations.cpp:75 #10 0x150d742ca1be in VMThread::evaluate_operation(VM_Operation*) /jdk/src/hotspot/share/runtime/vmThread.cpp:283 #11 0x150d742cb9e7 in VMThread::inner_execute(VM_Operation*) /jdk/src/hotspot/share/runtime/vmThread.cpp:427 #12 0x150d742cc601 in VMThread::loop() /jdk/src/hotspot/share/runtime/vmThread.cpp:493 #13 0x150d742cc601 in VMThread::loop() /jdk/src/hotspot/share/runtime/vmThread.cpp:478 seems we sometimes call qsort with nullptr as first parameter, this is not recommended. When adding a guarantee the same can be seen (_data is null). So better add a check and do not sort, if there is nothing provided to be sorted . ------------- Commit messages: - JDK-8332473 Changes: https://git.openjdk.org/jdk/pull/19283/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19283&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8332473 Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/19283.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19283/head:pull/19283 PR: https://git.openjdk.org/jdk/pull/19283 From mcimadamore at openjdk.org Fri May 17 13:23:06 2024 From: mcimadamore at openjdk.org (Maurizio Cimadamore) Date: Fri, 17 May 2024 13:23:06 GMT Subject: RFR: 8331671: Implement JEP 472: Prepare to Restrict the Use of JNI [v7] In-Reply-To: References: Message-ID: On Thu, 16 May 2024 18:39:57 GMT, Alan Bateman wrote: >> Maurizio Cimadamore has updated the pull request incrementally with one additional commit since the last revision: >> >> Add note on --illegal-native-access default value in the launcher help > > src/java.base/share/classes/java/lang/System.java line 2023: > >> 2021: * @throws NullPointerException if {@code filename} is {@code null} >> 2022: * @throws IllegalCallerException If the caller is in a module that >> 2023: * does not have native access enabled. > > The exception description is fine, just noticed the other exception descriptions start with a lowercase "if", this one is different. I'll fix this. Note that in `ModuleLayer.Controller`, all `@throws` start with capital letter, which is probably where I copied/pasted this from. I'll fix all, except for `ModuleLayer` where, for consistency, I think upper case is better. > src/java.base/share/man/java.1 line 587: > >> 585: \f[V]deny\f[R]: This mode disables all illegal native access except for >> 586: those modules enabled by the \f[V]--enable-native-access\f[R] >> 587: command-line option. > > "This mode disable all illegal native access except for those modules enabled the --enable-native-access command-line option". > > This can be read to mean that modules granted native access with the command line option is also illegal native access An alternative is to make the second part of the sentence a new sentence, something like "Only modules enabled by the --enable-native-access command line option may perform native access. I've simplified the text to: This mode disables illegal native access. That is, any illegal native access causes an `IllegalCallerException`. This mode will become the default in a future release. I think it's not necessary to state again the dependency on `--enable-native-access` as we already defined what "illegal native access" means. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19213#discussion_r1604994928 PR Review Comment: https://git.openjdk.org/jdk/pull/19213#discussion_r1604993505 From mcimadamore at openjdk.org Fri May 17 13:38:25 2024 From: mcimadamore at openjdk.org (Maurizio Cimadamore) Date: Fri, 17 May 2024 13:38:25 GMT Subject: RFR: 8331671: Implement JEP 472: Prepare to Restrict the Use of JNI [v8] In-Reply-To: References: Message-ID: > This PR implements [JEP 472](https://openjdk.org/jeps/472), by restricting the use of JNI in the following ways: > > * `System::load` and `System::loadLibrary` are now restricted methods > * `Runtime::load` and `Runtime::loadLibrary` are now restricted methods > * binding a JNI `native` method declaration to a native implementation is now considered a restricted operation > > This PR slightly changes the way in which the JDK deals with restricted methods, even for FFM API calls. In Java 22, the single `--enable-native-access` was used both to specify a set of modules for which native access should be allowed *and* to specify whether illegal native access (that is, native access occurring from a module not specified by `--enable-native-access`) should be treated as an error or a warning. More specifically, an error is only issued if the `--enable-native-access flag` is used at least once. > > Here, a new flag is introduced, namely `illegal-native-access=allow/warn/deny`, which is used to specify what should happen when access to a restricted method and/or functionality is found outside the set of modules specified with `--enable-native-access`. The default policy is `warn`, but users can select `allow` to suppress the warnings, or `deny` to cause `IllegalCallerException` to be thrown. This aligns the treatment of restricted methods with other mechanisms, such as `--illegal-access` and the more recent `--sun-misc-unsafe-memory-access`. > > Some changes were required in the package-info javadoc for `java.lang.foreign`, to reflect the changes in the command line flags described above. Maurizio Cimadamore has updated the pull request incrementally with one additional commit since the last revision: Address review comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19213/files - new: https://git.openjdk.org/jdk/pull/19213/files/3a0db276..789bdf48 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19213&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19213&range=06-07 Stats: 28 lines in 10 files changed: 8 ins; 2 del; 18 mod Patch: https://git.openjdk.org/jdk/pull/19213.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19213/head:pull/19213 PR: https://git.openjdk.org/jdk/pull/19213 From coleenp at openjdk.org Fri May 17 15:01:03 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Fri, 17 May 2024 15:01:03 GMT Subject: RFR: 8331572: Allow using OopMapCache outside of STW GC phases [v3] In-Reply-To: <08qAsGXFBbttgcQ0tfpfKQmtcuyiP4E2I0Tvi1tJNCE=.3cf52475-8b0e-4f7b-b43d-48305b74e008@github.com> References: <08qAsGXFBbttgcQ0tfpfKQmtcuyiP4E2I0Tvi1tJNCE=.3cf52475-8b0e-4f7b-b43d-48305b74e008@github.com> Message-ID: On Fri, 17 May 2024 11:35:37 GMT, Aleksey Shipilev wrote: >> As the reproducer in the issue shows, we would also like to use the `OopMapCache` during the concurrent GC phases. Zhengyu mentions there is also a production problem for stack walking that would benefit from letting `OopMapCache` be used without looking at GC at all. >> >> This PR unblocks `OopMapCache` uses for everything. Cleanups are nominally done by service thread. But, still appreciating that majority of use cases would be from GCs, we leave the proactive cleanups from the GC ops here as well. It requires the synchronization between readers that might be copying out the entries out of the hashmap and the concurrent reclamation. Handily, `GlobalCounter` can be used for that purpose. >> >> After this lands, I think we can go over `OopMapCache::compute_one_oop_map` uses and see if they would instead like to use the cached `lookup` to benefit from this cache too. I think those paths are for OSR and deopts, so their performance is unlikely to be critical. This PR already covers the concurrent GC paths well. >> >> Additional testing: >> - [x] Performance test reproducer from the bug improves significantly >> - [x] Linux AArch64 server fastdebug, `hotspot_gc_shenandoah` (10x) >> - [x] Linux AArch64 server fastdebug, `all` >> - [x] Linux x86_64 server fastdebug, `all` > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Chicken out: do not notify Service thread from OopMapCache enqueue paths This looks safe. You still get some trigger for cleanup, plus the ServiceThread timeout. The timer comment is vague enough to cover this case also. ------------- Marked as reviewed by coleenp (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19229#pullrequestreview-2063667904 From jsjolen at openjdk.org Fri May 17 15:09:02 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Fri, 17 May 2024 15:09:02 GMT Subject: RFR: 8332473: ubsan: growableArray.hpp:290:10: runtime error: null pointer passed as argument 1, which is declared to never be null In-Reply-To: <-LubBa-IRTqX4WOO-P9_9ulsmTV2KUgUAwZjbiRKcZg=.f3958562-a66d-4b09-9136-002f0736c472@github.com> References: <-LubBa-IRTqX4WOO-P9_9ulsmTV2KUgUAwZjbiRKcZg=.f3958562-a66d-4b09-9136-002f0736c472@github.com> Message-ID: On Fri, 17 May 2024 12:59:07 GMT, Matthias Baesken wrote: > On Linux x86_64 fastdebug with ubsan enabled we run into this error because we call qsort with a first parameter that is null. > > /jdk/src/hotspot/share/utilities/growableArray.hpp:290:10: runtime error: null pointer passed as argument 1, which is declared to never be null > #0 0x150d701bb4b1 in GrowableArrayView::sort(int (*)(nmethod**, nmethod**)) /jdk/src/hotspot/share/utilities/growableArray.hpp:290 > #1 0x150d701bb4b1 in ClassUnloadingContext::free_nmethods() /jdk/src/hotspot/share/gc/shared/classUnloadingContext.cpp:159 > #2 0x150d71f5cca3 in G1CollectedHeap::unload_classes_and_code(char const*, BoolObjectClosure*, GCTimer*) /jdk/src/hotspot/share/gc/g1/g1CollectedHeap.cpp:2538 > #3 0x150d71ffb009 in G1FullCollector::phase1_mark_live_objects() /jdk/src/hotspot/share/gc/g1/g1FullCollector.cpp:330 > #4 0x150d71ffc675 in G1FullCollector::collect() /jdk/src/hotspot/share/gc/g1/g1FullCollector.cpp:209 > #5 0x150d71f3e593 in G1CollectedHeap::do_full_collection(bool, bool) /jdk/src/hotspot/share/gc/g1/g1CollectedHeap.cpp:842 > #6 0x150d71f5b12d in G1CollectedHeap::satisfy_failed_allocation_helper(unsigned long, bool, bool, bool, bool*) /jdk/src/hotspot/share/gc/g1/g1CollectedHeap.cpp:917 > #7 0x150d71f5b3dc in G1CollectedHeap::satisfy_failed_allocation(unsigned long, bool*) /jdk/src/hotspot/share/gc/g1/g1CollectedHeap.cpp:930 > #8 0x150d721835f7 in VM_G1CollectForAllocation::doit() /jdk/src/hotspot/share/gc/g1/g1VMOperations.cpp:127 > #9 0x150d74291ec8 in VM_Operation::evaluate() /jdk/src/hotspot/share/runtime/vmOperations.cpp:75 > #10 0x150d742ca1be in VMThread::evaluate_operation(VM_Operation*) /jdk/src/hotspot/share/runtime/vmThread.cpp:283 > #11 0x150d742cb9e7 in VMThread::inner_execute(VM_Operation*) /jdk/src/hotspot/share/runtime/vmThread.cpp:427 > #12 0x150d742cc601 in VMThread::loop() /jdk/src/hotspot/share/runtime/vmThread.cpp:493 > #13 0x150d742cc601 in VMThread::loop() /jdk/src/hotspot/share/runtime/vmThread.cpp:478 > > seems we sometimes call qsort with nullptr as first parameter, this is not recommended. > When adding a guarantee the same can be seen (_data is null). > So better add a check and do not sort, if there is nothing provided to be sorted . Right, because `GrowableArray` will not allocate anything when the passed in `capacity` is 0 and will set the data pointer to null. This turns out not to be a problem in practice, as the length of the array is 0 (and so the pointer should not be dereferenced). I'm OK with this. ------------- Marked as reviewed by jsjolen (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19283#pullrequestreview-2063684220 From zgu at openjdk.org Fri May 17 15:17:02 2024 From: zgu at openjdk.org (Zhengyu Gu) Date: Fri, 17 May 2024 15:17:02 GMT Subject: RFR: 8331572: Allow using OopMapCache outside of STW GC phases [v3] In-Reply-To: <08qAsGXFBbttgcQ0tfpfKQmtcuyiP4E2I0Tvi1tJNCE=.3cf52475-8b0e-4f7b-b43d-48305b74e008@github.com> References: <08qAsGXFBbttgcQ0tfpfKQmtcuyiP4E2I0Tvi1tJNCE=.3cf52475-8b0e-4f7b-b43d-48305b74e008@github.com> Message-ID: On Fri, 17 May 2024 11:35:37 GMT, Aleksey Shipilev wrote: >> As the reproducer in the issue shows, we would also like to use the `OopMapCache` during the concurrent GC phases. Zhengyu mentions there is also a production problem for stack walking that would benefit from letting `OopMapCache` be used without looking at GC at all. >> >> This PR unblocks `OopMapCache` uses for everything. Cleanups are nominally done by service thread. But, still appreciating that majority of use cases would be from GCs, we leave the proactive cleanups from the GC ops here as well. It requires the synchronization between readers that might be copying out the entries out of the hashmap and the concurrent reclamation. Handily, `GlobalCounter` can be used for that purpose. >> >> After this lands, I think we can go over `OopMapCache::compute_one_oop_map` uses and see if they would instead like to use the cached `lookup` to benefit from this cache too. I think those paths are for OSR and deopts, so their performance is unlikely to be critical. This PR already covers the concurrent GC paths well. >> >> Additional testing: >> - [x] Performance test reproducer from the bug improves significantly >> - [x] Linux AArch64 server fastdebug, `hotspot_gc_shenandoah` (10x) >> - [x] Linux AArch64 server fastdebug, `all` >> - [x] Linux x86_64 server fastdebug, `all` > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Chicken out: do not notify Service thread from OopMapCache enqueue paths I would suggest to move `OopMapCache::trigger_cleanup();` from `VM_ShenandoahReferenceOperation::doit_epilogue()` to `VM_ShenandoahOperation::doit_prologue()` and add the call to ` VM_ZOperation` and ` VM_XOperation`'s `doit_epilogue()` ------------- PR Comment: https://git.openjdk.org/jdk/pull/19229#issuecomment-2117821424 From gziemski at openjdk.org Fri May 17 15:30:11 2024 From: gziemski at openjdk.org (Gerard Ziemski) Date: Fri, 17 May 2024 15:30:11 GMT Subject: RFR: 8312132: Add tracking of multiple address spaces in NMT [v89] In-Reply-To: References: Message-ID: On Fri, 17 May 2024 11:24:10 GMT, Johan Sj?len wrote: >> This made me a bit curious about other trees bounds. An RB-tree has a bound of `2log_2(N + 1)`, so I decided to find the bound for our treap in base-2 log. Turns out we're at `2.5*log_2(N + 1)`, so very close to the RB-tree. > >># assert(maximum_depth_found <= ceil(expected_maximum_depth)) failed: depth unexpectedly large, was: 57, expected: 56 > > Apparently there's a very small chance of it being worse than that :-). I'll adjust the expectations here a bit more. The absolute worst case scenario is O(n) and the best is O(log(n)) and since we use random numbers to balance the tree it should be closer to the best case, correct? I'm just wondering whether we really want: ` assert(maximum_depth_found <= (int)expected_maximum_depth, "depth unexpectedly large"); ` We cannot guarantee that it will not be triggered at some point in the future, and if it does, how useful it will be for someone to investigate? I would consider printing out a warning instead of using `assert` here. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18289#discussion_r1605195124 From shade at openjdk.org Fri May 17 15:36:09 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 17 May 2024 15:36:09 GMT Subject: RFR: 8331572: Allow using OopMapCache outside of STW GC phases [v3] In-Reply-To: References: <08qAsGXFBbttgcQ0tfpfKQmtcuyiP4E2I0Tvi1tJNCE=.3cf52475-8b0e-4f7b-b43d-48305b74e008@github.com> Message-ID: On Fri, 17 May 2024 15:14:07 GMT, Zhengyu Gu wrote: > I would suggest to move `OopMapCache::trigger_cleanup();` from `VM_ShenandoahReferenceOperation::doit_epilogue()` to `VM_ShenandoahOperation::doit_prologue()` and add the call to ` VM_ZOperation` and ` VM_XOperation`'s `doit_epilogue()` Right. Surely, it would be better to move it to `VM_ShenandoahOperation::doit_epilogue()` as well? This way we don't risk Service thread waking up during the short GC pause. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19229#issuecomment-2117859377 From jsjolen at openjdk.org Fri May 17 15:44:11 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Fri, 17 May 2024 15:44:11 GMT Subject: RFR: 8312132: Add tracking of multiple address spaces in NMT [v89] In-Reply-To: References: Message-ID: On Fri, 17 May 2024 15:27:33 GMT, Gerard Ziemski wrote: >The absolute worst case scenario is O(n) and the best is O(log(n)) and since we use random numbers to balance the tree it should be closer to the best case, correct? Right. In fact, it shouldn't be even close to O(n). >We cannot guarantee that it will not be triggered at some point in the future, and if it does, how useful it will be for someone to investigate? I would consider printing out a warning instead of using assert here. To be clear: This is in a verification function only called in two specific tests meant to challenge the treap (in some sense). If we do have a large deviation then that is probably worth investigating, it's an indication that something is going on. If we have intermittent failures because we chose a very tight bound, then we might have to increase that bound slightly, or we investigate how good our PRNG is. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18289#discussion_r1605212325 From shade at openjdk.org Fri May 17 15:58:32 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 17 May 2024 15:58:32 GMT Subject: RFR: 8331572: Allow using OopMapCache outside of STW GC phases [v4] In-Reply-To: References: Message-ID: <_mFVw8VmpUzTscas3PU4wFHW63mgIrEPlbGPo3iTMrM=.81b20124-b3dd-4264-9d23-e4fbfc79fc78@github.com> > As the reproducer in the issue shows, we would also like to use the `OopMapCache` during the concurrent GC phases. Zhengyu mentions there is also a production problem for stack walking that would benefit from letting `OopMapCache` be used without looking at GC at all. > > This PR unblocks `OopMapCache` uses for everything. Cleanups are nominally done by service thread. But, still appreciating that majority of use cases would be from GCs, we leave the proactive cleanups from the GC ops here as well. It requires the synchronization between readers that might be copying out the entries out of the hashmap and the concurrent reclamation. Handily, `GlobalCounter` can be used for that purpose. > > After this lands, I think we can go over `OopMapCache::compute_one_oop_map` uses and see if they would instead like to use the cached `lookup` to benefit from this cache too. I think those paths are for OSR and deopts, so their performance is unlikely to be critical. This PR already covers the concurrent GC paths well. > > Additional testing: > - [x] Performance test reproducer from the bug improves significantly > - [x] Linux AArch64 server fastdebug, `hotspot_gc_shenandoah` (10x) > - [x] Linux AArch64 server fastdebug, `all` > - [x] Linux x86_64 server fastdebug, `all` Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: Add more GC triggers around ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19229/files - new: https://git.openjdk.org/jdk/pull/19229/files/c81a1139..ad9f97f3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19229&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19229&range=02-03 Stats: 16 lines in 3 files changed: 13 ins; 3 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/19229.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19229/head:pull/19229 PR: https://git.openjdk.org/jdk/pull/19229 From zgu at openjdk.org Fri May 17 16:18:03 2024 From: zgu at openjdk.org (Zhengyu Gu) Date: Fri, 17 May 2024 16:18:03 GMT Subject: RFR: 8331572: Allow using OopMapCache outside of STW GC phases [v4] In-Reply-To: <_mFVw8VmpUzTscas3PU4wFHW63mgIrEPlbGPo3iTMrM=.81b20124-b3dd-4264-9d23-e4fbfc79fc78@github.com> References: <_mFVw8VmpUzTscas3PU4wFHW63mgIrEPlbGPo3iTMrM=.81b20124-b3dd-4264-9d23-e4fbfc79fc78@github.com> Message-ID: On Fri, 17 May 2024 15:58:32 GMT, Aleksey Shipilev wrote: >> As the reproducer in the issue shows, we would also like to use the `OopMapCache` during the concurrent GC phases. Zhengyu mentions there is also a production problem for stack walking that would benefit from letting `OopMapCache` be used without looking at GC at all. >> >> This PR unblocks `OopMapCache` uses for everything. Cleanups are nominally done by service thread. But, still appreciating that majority of use cases would be from GCs, we leave the proactive cleanups from the GC ops here as well. It requires the synchronization between readers that might be copying out the entries out of the hashmap and the concurrent reclamation. Handily, `GlobalCounter` can be used for that purpose. >> >> After this lands, I think we can go over `OopMapCache::compute_one_oop_map` uses and see if they would instead like to use the cached `lookup` to benefit from this cache too. I think those paths are for OSR and deopts, so their performance is unlikely to be critical. This PR already covers the concurrent GC paths well. >> >> Additional testing: >> - [x] Performance test reproducer from the bug improves significantly >> - [x] Linux AArch64 server fastdebug, `hotspot_gc_shenandoah` (10x) >> - [x] Linux AArch64 server fastdebug, `all` >> - [x] Linux x86_64 server fastdebug, `all` > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Add more GC triggers around LGTM ------------- Marked as reviewed by zgu (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19229#pullrequestreview-2063858977 From mbaesken at openjdk.org Fri May 17 16:25:01 2024 From: mbaesken at openjdk.org (Matthias Baesken) Date: Fri, 17 May 2024 16:25:01 GMT Subject: RFR: 8332473: ubsan: growableArray.hpp:290:10: runtime error: null pointer passed as argument 1, which is declared to never be null In-Reply-To: <-LubBa-IRTqX4WOO-P9_9ulsmTV2KUgUAwZjbiRKcZg=.f3958562-a66d-4b09-9136-002f0736c472@github.com> References: <-LubBa-IRTqX4WOO-P9_9ulsmTV2KUgUAwZjbiRKcZg=.f3958562-a66d-4b09-9136-002f0736c472@github.com> Message-ID: On Fri, 17 May 2024 12:59:07 GMT, Matthias Baesken wrote: > On Linux x86_64 fastdebug with ubsan enabled we run into this error because we call qsort with a first parameter that is null. > > /jdk/src/hotspot/share/utilities/growableArray.hpp:290:10: runtime error: null pointer passed as argument 1, which is declared to never be null > #0 0x150d701bb4b1 in GrowableArrayView::sort(int (*)(nmethod**, nmethod**)) /jdk/src/hotspot/share/utilities/growableArray.hpp:290 > #1 0x150d701bb4b1 in ClassUnloadingContext::free_nmethods() /jdk/src/hotspot/share/gc/shared/classUnloadingContext.cpp:159 > #2 0x150d71f5cca3 in G1CollectedHeap::unload_classes_and_code(char const*, BoolObjectClosure*, GCTimer*) /jdk/src/hotspot/share/gc/g1/g1CollectedHeap.cpp:2538 > #3 0x150d71ffb009 in G1FullCollector::phase1_mark_live_objects() /jdk/src/hotspot/share/gc/g1/g1FullCollector.cpp:330 > #4 0x150d71ffc675 in G1FullCollector::collect() /jdk/src/hotspot/share/gc/g1/g1FullCollector.cpp:209 > #5 0x150d71f3e593 in G1CollectedHeap::do_full_collection(bool, bool) /jdk/src/hotspot/share/gc/g1/g1CollectedHeap.cpp:842 > #6 0x150d71f5b12d in G1CollectedHeap::satisfy_failed_allocation_helper(unsigned long, bool, bool, bool, bool*) /jdk/src/hotspot/share/gc/g1/g1CollectedHeap.cpp:917 > #7 0x150d71f5b3dc in G1CollectedHeap::satisfy_failed_allocation(unsigned long, bool*) /jdk/src/hotspot/share/gc/g1/g1CollectedHeap.cpp:930 > #8 0x150d721835f7 in VM_G1CollectForAllocation::doit() /jdk/src/hotspot/share/gc/g1/g1VMOperations.cpp:127 > #9 0x150d74291ec8 in VM_Operation::evaluate() /jdk/src/hotspot/share/runtime/vmOperations.cpp:75 > #10 0x150d742ca1be in VMThread::evaluate_operation(VM_Operation*) /jdk/src/hotspot/share/runtime/vmThread.cpp:283 > #11 0x150d742cb9e7 in VMThread::inner_execute(VM_Operation*) /jdk/src/hotspot/share/runtime/vmThread.cpp:427 > #12 0x150d742cc601 in VMThread::loop() /jdk/src/hotspot/share/runtime/vmThread.cpp:493 > #13 0x150d742cc601 in VMThread::loop() /jdk/src/hotspot/share/runtime/vmThread.cpp:478 > > seems we sometimes call qsort with nullptr as first parameter, this is not recommended. > When adding a guarantee the same can be seen (_data is null). > So better add a check and do not sort, if there is nothing provided to be sorted . Hi Johan, thanks for the review . btw seems I found a similar one /jdk/src/java.base/unix/native/libjava/ProcessImpl_md.c:562:5: runtime error: null pointer passed as argument 2, which is declared to never be null #0 0x7fd95bec78d8 in spawnChild /jdk/src/java.base/unix/native/libjava/ProcessImpl_md.c:562 #1 0x7fd95bec78d8 in startChild /jdk/src/java.base/unix/native/libjava/ProcessImpl_md.c:612 #2 0x7fd95bec78d8 in Java_java_lang_ProcessImpl_forkAndExec /jdk/src/java.base/unix/native/libjava/ProcessImpl_md.c:712 #3 0x7fd93797a06d () but here it is memset not qsort . ` memcpy(buf+offset, c->pdir, sp.dirlen);` gets a second parameter null. Something similar was discussed and fixed here https://bugs.python.org/issue27570 for Python . ------------- PR Comment: https://git.openjdk.org/jdk/pull/19283#issuecomment-2117947985 From lmesnik at openjdk.org Fri May 17 16:46:29 2024 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Fri, 17 May 2024 16:46:29 GMT Subject: RFR: 8332259: JvmtiTrace::safe_get_thread_name fails if current thread is in native state [v3] In-Reply-To: References: Message-ID: > The JvmtiTrace::safe_get_thread_name sometimes crashes when called while current thread is in native thread state. > > It happens when thread_name is set for tracing from jvmti functions. > See: > https://github.com/openjdk/jdk/blob/master/src/hotspot/share/prims/jvmtiEnter.xsl#L649 > > The setup is called and the thread name is used in tracing before the thread transition. There is no good location where this method could be called from vm thread_state only. Some functions like raw monitor enter/exit never transition in vm state. So sometimes it is needed to call this function from native thread state. > > The change should affect JVMTI trace mode only (-XX:TraceJVMTI). > > Verified by running jvmti/jdi/jdb tests with tracing enabled. Leonid Mesnik has updated the pull request incrementally with one additional commit since the last revision: don't change state ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19275/files - new: https://git.openjdk.org/jdk/pull/19275/files/c534c91b..f8fd4744 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19275&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19275&range=01-02 Stats: 15 lines in 2 files changed: 13 ins; 1 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/19275.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19275/head:pull/19275 PR: https://git.openjdk.org/jdk/pull/19275 From lmesnik at openjdk.org Fri May 17 16:56:04 2024 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Fri, 17 May 2024 16:56:04 GMT Subject: RFR: 8332259: JvmtiTrace::safe_get_thread_name fails if current thread is in native state [v2] In-Reply-To: References: Message-ID: On Fri, 17 May 2024 05:52:36 GMT, David Holmes wrote: > I have to wonder whether this solution will potentially cause problems because the code will now block for safepoints. We could fallback to `Thread::name()` if the current thread is in-native. Thanks for feedback. Here is the update. I've updated the safe_get_thread_name() to not change thread state. In "jvmtiEnter.xsl" functions the thread name is s read once before the transition happened and re-used then. So I updated the tracing to 're-read' if the transition to VM happened to update the thread name once it became known. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19275#issuecomment-2118004152 From duke at openjdk.org Fri May 17 17:10:06 2024 From: duke at openjdk.org (Volodymyr Paprotski) Date: Fri, 17 May 2024 17:10:06 GMT Subject: RFR: 8329538: Accelerate P256 on x86_64 using Montgomery intrinsic [v10] In-Reply-To: References: Message-ID: > Performance. Before: > > Benchmark (algorithm) (dataSize) (keyLength) (provider) Mode Cnt Score Error Units > SignatureBench.ECDSA.sign SHA256withECDSA 1024 256 thrpt 3 6443.934 ? 6.491 ops/s > SignatureBench.ECDSA.sign SHA256withECDSA 16384 256 thrpt 3 6152.979 ? 4.954 ops/s > SignatureBench.ECDSA.verify SHA256withECDSA 1024 256 thrpt 3 1895.410 ? 36.979 ops/s > SignatureBench.ECDSA.verify SHA256withECDSA 16384 256 thrpt 3 1878.955 ? 45.487 ops/s > Benchmark (algorithm) (keyLength) (kpgAlgorithm) (provider) Mode Cnt Score Error Units > o.o.b.j.c.full.KeyAgreementBench.EC.generateSecret ECDH 256 EC thrpt 3 1357.810 ? 26.584 ops/s > o.o.b.j.c.small.KeyAgreementBench.EC.generateSecret ECDH 256 EC thrpt 3 1352.119 ? 23.547 ops/s > Benchmark (isMontBench) Mode Cnt Score Error Units > PolynomialP256Bench.benchMultiply false thrpt 3 1746.126 ? 10.970 ops/s > > Performance, no intrinsic: > > Benchmark (algorithm) (dataSize) (keyLength) (provider) Mode Cnt Score Error Units > SignatureBench.ECDSA.sign SHA256withECDSA 1024 256 thrpt 3 6529.839 ? 42.420 ops/s > SignatureBench.ECDSA.sign SHA256withECDSA 16384 256 thrpt 3 6199.747 ? 133.566 ops/s > SignatureBench.ECDSA.verify SHA256withECDSA 1024 256 thrpt 3 1973.676 ? 54.071 ops/s > SignatureBench.ECDSA.verify SHA256withECDSA 16384 256 thrpt 3 1932.127 ? 35.920 ops/s > Benchmark (algorithm) (keyLength) (kpgAlgorithm) (provider) Mode Cnt Score Error Units > o.o.b.j.c.full.KeyAgreementBench.EC.generateSecret ECDH 256 EC thrpt 3 1355.788 ? 29.858 ops/s > o.o.b.j.c.small.KeyAgreementBench.EC.generateSecret ECDH 256 EC thrpt 3 1346.523 ? 28.722 ops/s > Benchmark (isMontBench) Mode Cnt Score Error Units > PolynomialP256Bench.benchMultiply true thrpt 3 1919.574 ? 10.591 ops/s > > Performance, **with intrinsics*... Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision: comments from Sandhya ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18583/files - new: https://git.openjdk.org/jdk/pull/18583/files/8cd095dd..5c360e35 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18583&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18583&range=08-09 Stats: 82 lines in 4 files changed: 1 ins; 59 del; 22 mod Patch: https://git.openjdk.org/jdk/pull/18583.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18583/head:pull/18583 PR: https://git.openjdk.org/jdk/pull/18583 From duke at openjdk.org Fri May 17 17:10:09 2024 From: duke at openjdk.org (Volodymyr Paprotski) Date: Fri, 17 May 2024 17:10:09 GMT Subject: RFR: 8329538: Accelerate P256 on x86_64 using Montgomery intrinsic [v9] In-Reply-To: <9aJy6ON5gSI5ihwK-WkvnyrtHjJTPN5IAFymf1Jpp9M=.32b8ee27-465d-47d8-9099-22cb846cff9a@github.com> References: <9aJy6ON5gSI5ihwK-WkvnyrtHjJTPN5IAFymf1Jpp9M=.32b8ee27-465d-47d8-9099-22cb846cff9a@github.com> Message-ID: <4xhuUYOdlWYYc55JwT3nlFeXj6E5gjrYfauN_RF3DB0=.73d9f401-3dda-4450-94bf-fa4a99cf31e4@github.com> On Thu, 16 May 2024 23:21:36 GMT, Sandhya Viswanathan wrote: >> Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision: >> >> whitespace > > src/hotspot/cpu/x86/stubGenerator_x86_64_poly_mont.cpp line 168: > >> 166: XMMRegister broadcast5 = xmm24; >> 167: KRegister limb0 = k1; >> 168: KRegister limb5 = k2; > > limb5 and select are not being used anymore. Thanks, fixed (and also broadcast5) > src/hotspot/cpu/x86/stubGenerator_x86_64_poly_mont.cpp line 185: > >> 183: __ evmovdquq(modulus, allLimbs, ExternalAddress(modulus_p256()), false, Assembler::AVX_512bit, rscratch); >> 184: >> 185: // A = load(*aLimbs) > > A little bit more description in comments on what the load step involves would be helpful. e.g. Load upper 4 limbs, shift left by 1 limb using perm, or in the lowest limb. Done > src/hotspot/cpu/x86/stubGenerator_x86_64_poly_mont.cpp line 270: > >> 268: __ push(r14); >> 269: __ push(r15); >> 270: > > No need to save/restore rbx, r12, r14, r15. Only r13 is used as temp in montgomeryMultiply(aLimbs, bLimbs, rLimbs). That too could be easily changed to r8. Seems I forgot to completely cleanup, thanks! (Originally copied from poly1305 stub) > src/hotspot/cpu/x86/stubGenerator_x86_64_poly_mont.cpp line 286: > >> 284: __ mov(aLimbs, c_rarg0); >> 285: __ mov(bLimbs, c_rarg1); >> 286: __ mov(rLimbs, c_rarg2); > > We could directly call montgomeryMultiply(c_rarg0, c_rarg1, c_rarg2) then these moves are not necessary. Gave them symbolic names and passed the gpr temp and parameter. vector register map still in the montgomeryMultiply function, but gprs explicitly passed in. 'close enough'? > src/hotspot/cpu/x86/vm_version_x86.cpp line 1370: > >> 1368: >> 1369: #ifdef _LP64 >> 1370: if (supports_avx512ifma() && supports_avx512vlbw() && MaxVectorSize >= 64) { > > No need to tie the intrinsic to MaxVectorSize setting. Done > src/hotspot/share/opto/library_call.cpp line 7564: > >> 7562: >> 7563: if (!stubAddr) return false; >> 7564: if (stopped()) return true; > > Line 7564 seems redundant here as there is no range check or anything like that before this. Oh. That is what that is for... I thought it was some soft of 'VM quitting' short-circuit. Removed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18583#discussion_r1605328906 PR Review Comment: https://git.openjdk.org/jdk/pull/18583#discussion_r1605328960 PR Review Comment: https://git.openjdk.org/jdk/pull/18583#discussion_r1605328859 PR Review Comment: https://git.openjdk.org/jdk/pull/18583#discussion_r1605328829 PR Review Comment: https://git.openjdk.org/jdk/pull/18583#discussion_r1605329040 PR Review Comment: https://git.openjdk.org/jdk/pull/18583#discussion_r1605328995 From sspitsyn at openjdk.org Fri May 17 20:43:00 2024 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Fri, 17 May 2024 20:43:00 GMT Subject: RFR: 8326716: JVMTI spec: clarify what nullptr means for C/C++ developers [v2] In-Reply-To: References: <6Sb8kKpbkh4ylD4u5Zayx2fV0ZaC5aVNicqoX6g_UNA=.7831eabc-905f-489b-87da-68953ec03412@github.com> <_CuYvr39rfebBcJRO0AM-2p8yQ2-V0oboFclyxAJ7Mo=.8cdba311-3f93-4c95-ac8b-6d7d41d88e24@github.com> Message-ID: On Fri, 17 May 2024 04:34:22 GMT, Kim Barrett wrote: > So JDK-8324680 was somewhat mistaken about what needed to be done, and what was done. The `jvmti.xml` is used to generate several things with the XSL scripts: - JVMTI spec (`jvm.html`) - JVMTI api (`jvmti.h`) - `jvmtiEnter.cpp`, `jvmtiEnterTrace.cpp` In fact, it is pretty tricky to separate these usage aspects of `nullptr` or `NULL`. One of the approaches is to undo the [JDK-8324680](https://bugs.openjdk.org/browse/JDK-8324680). Please, let me know if you prefer this path. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19257#issuecomment-2118354928 From duke at openjdk.org Fri May 17 21:16:47 2024 From: duke at openjdk.org (Volodymyr Paprotski) Date: Fri, 17 May 2024 21:16:47 GMT Subject: RFR: 8329538: Accelerate P256 on x86_64 using Montgomery intrinsic [v11] In-Reply-To: References: Message-ID: > Performance. Before: > > Benchmark (algorithm) (dataSize) (keyLength) (provider) Mode Cnt Score Error Units > SignatureBench.ECDSA.sign SHA256withECDSA 1024 256 thrpt 3 6443.934 ? 6.491 ops/s > SignatureBench.ECDSA.sign SHA256withECDSA 16384 256 thrpt 3 6152.979 ? 4.954 ops/s > SignatureBench.ECDSA.verify SHA256withECDSA 1024 256 thrpt 3 1895.410 ? 36.979 ops/s > SignatureBench.ECDSA.verify SHA256withECDSA 16384 256 thrpt 3 1878.955 ? 45.487 ops/s > Benchmark (algorithm) (keyLength) (kpgAlgorithm) (provider) Mode Cnt Score Error Units > o.o.b.j.c.full.KeyAgreementBench.EC.generateSecret ECDH 256 EC thrpt 3 1357.810 ? 26.584 ops/s > o.o.b.j.c.small.KeyAgreementBench.EC.generateSecret ECDH 256 EC thrpt 3 1352.119 ? 23.547 ops/s > Benchmark (isMontBench) Mode Cnt Score Error Units > PolynomialP256Bench.benchMultiply false thrpt 3 1746.126 ? 10.970 ops/s > > Performance, no intrinsic: > > Benchmark (algorithm) (dataSize) (keyLength) (provider) Mode Cnt Score Error Units > SignatureBench.ECDSA.sign SHA256withECDSA 1024 256 thrpt 3 6529.839 ? 42.420 ops/s > SignatureBench.ECDSA.sign SHA256withECDSA 16384 256 thrpt 3 6199.747 ? 133.566 ops/s > SignatureBench.ECDSA.verify SHA256withECDSA 1024 256 thrpt 3 1973.676 ? 54.071 ops/s > SignatureBench.ECDSA.verify SHA256withECDSA 16384 256 thrpt 3 1932.127 ? 35.920 ops/s > Benchmark (algorithm) (keyLength) (kpgAlgorithm) (provider) Mode Cnt Score Error Units > o.o.b.j.c.full.KeyAgreementBench.EC.generateSecret ECDH 256 EC thrpt 3 1355.788 ? 29.858 ops/s > o.o.b.j.c.small.KeyAgreementBench.EC.generateSecret ECDH 256 EC thrpt 3 1346.523 ? 28.722 ops/s > Benchmark (isMontBench) Mode Cnt Score Error Units > PolynomialP256Bench.benchMultiply true thrpt 3 1919.574 ? 10.591 ops/s > > Performance, **with intrinsics*... Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision: shenandoah verifier ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18583/files - new: https://git.openjdk.org/jdk/pull/18583/files/5c360e35..df4fe6fa Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18583&range=10 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18583&range=09-10 Stats: 7 lines in 2 files changed: 6 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/18583.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18583/head:pull/18583 PR: https://git.openjdk.org/jdk/pull/18583 From sviswanathan at openjdk.org Fri May 17 21:58:14 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Fri, 17 May 2024 21:58:14 GMT Subject: RFR: 8329538: Accelerate P256 on x86_64 using Montgomery intrinsic [v11] In-Reply-To: References: Message-ID: On Fri, 17 May 2024 21:16:47 GMT, Volodymyr Paprotski wrote: >> Performance. Before: >> >> Benchmark (algorithm) (dataSize) (keyLength) (provider) Mode Cnt Score Error Units >> SignatureBench.ECDSA.sign SHA256withECDSA 1024 256 thrpt 3 6443.934 ? 6.491 ops/s >> SignatureBench.ECDSA.sign SHA256withECDSA 16384 256 thrpt 3 6152.979 ? 4.954 ops/s >> SignatureBench.ECDSA.verify SHA256withECDSA 1024 256 thrpt 3 1895.410 ? 36.979 ops/s >> SignatureBench.ECDSA.verify SHA256withECDSA 16384 256 thrpt 3 1878.955 ? 45.487 ops/s >> Benchmark (algorithm) (keyLength) (kpgAlgorithm) (provider) Mode Cnt Score Error Units >> o.o.b.j.c.full.KeyAgreementBench.EC.generateSecret ECDH 256 EC thrpt 3 1357.810 ? 26.584 ops/s >> o.o.b.j.c.small.KeyAgreementBench.EC.generateSecret ECDH 256 EC thrpt 3 1352.119 ? 23.547 ops/s >> Benchmark (isMontBench) Mode Cnt Score Error Units >> PolynomialP256Bench.benchMultiply false thrpt 3 1746.126 ? 10.970 ops/s >> >> Performance, no intrinsic: >> >> Benchmark (algorithm) (dataSize) (keyLength) (provider) Mode Cnt Score Error Units >> SignatureBench.ECDSA.sign SHA256withECDSA 1024 256 thrpt 3 6529.839 ? 42.420 ops/s >> SignatureBench.ECDSA.sign SHA256withECDSA 16384 256 thrpt 3 6199.747 ? 133.566 ops/s >> SignatureBench.ECDSA.verify SHA256withECDSA 1024 256 thrpt 3 1973.676 ? 54.071 ops/s >> SignatureBench.ECDSA.verify SHA256withECDSA 16384 256 thrpt 3 1932.127 ? 35.920 ops/s >> Benchmark (algorithm) (keyLength) (kpgAlgorithm) (provider) Mode Cnt Score Error Units >> o.o.b.j.c.full.KeyAgreementBench.EC.generateSecret ECDH 256 EC thrpt 3 1355.788 ? 29.858 ops/s >> o.o.b.j.c.small.KeyAgreementBench.EC.generateSecret ECDH 256 EC thrpt 3 1346.523 ? 28.722 ops/s >> Benchmark (isMontBench) Mode Cnt Score Error Units >> PolynomialP256Bench.benchMultiply true thrpt 3 1919.57... > > Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision: > > shenandoah verifier Marked as reviewed by sviswanathan (Reviewer). The intrinsics and the C2 changes look good to me. ------------- PR Review: https://git.openjdk.org/jdk/pull/18583#pullrequestreview-2064439617 PR Comment: https://git.openjdk.org/jdk/pull/18583#issuecomment-2118426661 From duke at openjdk.org Fri May 17 22:20:08 2024 From: duke at openjdk.org (Volodymyr Paprotski) Date: Fri, 17 May 2024 22:20:08 GMT Subject: RFR: 8329538: Accelerate P256 on x86_64 using Montgomery intrinsic [v11] In-Reply-To: References: Message-ID: On Fri, 17 May 2024 21:16:47 GMT, Volodymyr Paprotski wrote: >> Performance. Before: >> >> Benchmark (algorithm) (dataSize) (keyLength) (provider) Mode Cnt Score Error Units >> SignatureBench.ECDSA.sign SHA256withECDSA 1024 256 thrpt 3 6443.934 ? 6.491 ops/s >> SignatureBench.ECDSA.sign SHA256withECDSA 16384 256 thrpt 3 6152.979 ? 4.954 ops/s >> SignatureBench.ECDSA.verify SHA256withECDSA 1024 256 thrpt 3 1895.410 ? 36.979 ops/s >> SignatureBench.ECDSA.verify SHA256withECDSA 16384 256 thrpt 3 1878.955 ? 45.487 ops/s >> Benchmark (algorithm) (keyLength) (kpgAlgorithm) (provider) Mode Cnt Score Error Units >> o.o.b.j.c.full.KeyAgreementBench.EC.generateSecret ECDH 256 EC thrpt 3 1357.810 ? 26.584 ops/s >> o.o.b.j.c.small.KeyAgreementBench.EC.generateSecret ECDH 256 EC thrpt 3 1352.119 ? 23.547 ops/s >> Benchmark (isMontBench) Mode Cnt Score Error Units >> PolynomialP256Bench.benchMultiply false thrpt 3 1746.126 ? 10.970 ops/s >> >> Performance, no intrinsic: >> >> Benchmark (algorithm) (dataSize) (keyLength) (provider) Mode Cnt Score Error Units >> SignatureBench.ECDSA.sign SHA256withECDSA 1024 256 thrpt 3 6529.839 ? 42.420 ops/s >> SignatureBench.ECDSA.sign SHA256withECDSA 16384 256 thrpt 3 6199.747 ? 133.566 ops/s >> SignatureBench.ECDSA.verify SHA256withECDSA 1024 256 thrpt 3 1973.676 ? 54.071 ops/s >> SignatureBench.ECDSA.verify SHA256withECDSA 16384 256 thrpt 3 1932.127 ? 35.920 ops/s >> Benchmark (algorithm) (keyLength) (kpgAlgorithm) (provider) Mode Cnt Score Error Units >> o.o.b.j.c.full.KeyAgreementBench.EC.generateSecret ECDH 256 EC thrpt 3 1355.788 ? 29.858 ops/s >> o.o.b.j.c.small.KeyAgreementBench.EC.generateSecret ECDH 256 EC thrpt 3 1346.523 ? 28.722 ops/s >> Benchmark (isMontBench) Mode Cnt Score Error Units >> PolynomialP256Bench.benchMultiply true thrpt 3 1919.57... > > Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision: > > shenandoah verifier Thanks Sandhya! Now that I have @ascarpino approval as well, I plan to integrate next Tuesday. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18583#issuecomment-2118443577 From lmesnik at openjdk.org Fri May 17 22:31:32 2024 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Fri, 17 May 2024 22:31:32 GMT Subject: RFR: 8332259: JvmtiTrace::safe_get_thread_name fails if current thread is in native state [v4] In-Reply-To: References: Message-ID: > The JvmtiTrace::safe_get_thread_name sometimes crashes when called while current thread is in native thread state. > > It happens when thread_name is set for tracing from jvmti functions. > See: > https://github.com/openjdk/jdk/blob/master/src/hotspot/share/prims/jvmtiEnter.xsl#L649 > > The setup is called and the thread name is used in tracing before the thread transition. There is no good location where this method could be called from vm thread_state only. Some functions like raw monitor enter/exit never transition in vm state. So sometimes it is needed to call this function from native thread state. > > The change should affect JVMTI trace mode only (-XX:TraceJVMTI). > > Verified by running jvmti/jdi/jdb tests with tracing enabled. Leonid Mesnik has updated the pull request incrementally with one additional commit since the last revision: wrong thread state ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19275/files - new: https://git.openjdk.org/jdk/pull/19275/files/f8fd4744..12ddfca2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19275&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19275&range=02-03 Stats: 7 lines in 2 files changed: 2 ins; 1 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/19275.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19275/head:pull/19275 PR: https://git.openjdk.org/jdk/pull/19275 From sviswanathan at openjdk.org Fri May 17 22:40:15 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Fri, 17 May 2024 22:40:15 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v19] In-Reply-To: References: <8Y-nIHc8vfB1X_hp3tpqqqgpCzu6dAt6BBIP_zc4Q70=.c9a48c68-8c14-4af9-8357-ab50e62a5fd3@github.com> Message-ID: On Thu, 16 May 2024 20:22:40 GMT, Scott Gibbons wrote: >> src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1510: >> >>> 1508: compare_big_haystack_to_needle(sizeKnown, size, NUMBER_OF_NEEDLE_BYTES_TO_COMPARE, loop_top, hsPtrRet, hsLength, >>> 1509: needleLen, isU, DO_EARLY_BAILOUT, eq_mask, temp2, r10, _masm); >>> 1510: >> >> At this point hsLength is not the remaining length from hsPtrRet, would that cause a problem? If not, all the special paths in compare_big_haystack_to_needle need not be generated on this call. > > Not sure what you mean here. I *think* you mean that hsLength is not the length of the remaining bytes in the haystack, but the actual length. There may be an issue if that is correct, right? I'll investigate. Yes, that is what I meant. Thanks for investigating. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1605594796 From sviswanathan at openjdk.org Fri May 17 22:43:05 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Fri, 17 May 2024 22:43:05 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v19] In-Reply-To: References: <8Y-nIHc8vfB1X_hp3tpqqqgpCzu6dAt6BBIP_zc4Q70=.c9a48c68-8c14-4af9-8357-ab50e62a5fd3@github.com> Message-ID: <5DbhciTOeJf2n_vsG_R2r35-vFFp3QH3mmOX9hrqC3g=.9117cc86-a514-4e9b-a5d4-7108e72170ae@github.com> On Thu, 16 May 2024 17:08:21 GMT, Scott Gibbons wrote: >> src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 238: >> >>> 236: const Register needle = rdx; >>> 237: const Register needle_len = rcx; >>> 238: >> >> This is the calling convention on Linux. How is windows platform handled? > > The entry code switches Windows calling convention into Linux calling convention by moving/saving registers, which are properly restored on function exit. This makes register tracking easier. I don't see the place where the switch is happening before this initial piece of code. You also have windows tests failing in the GHA. Could you please double check? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1605596148 From sgibbons at openjdk.org Fri May 17 23:47:45 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Fri, 17 May 2024 23:47:45 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v20] In-Reply-To: References: Message-ID: > Re-write the IndexOf code without the use of the pcmpestri instruction, only using AVX2 instructions. This change accelerates String.IndexOf on average 1.3x for AVX2. The benchmark numbers: > > > Benchmark Score Latest > StringIndexOf.advancedWithMediumSub 343.573 317.934 0.925375393x > StringIndexOf.advancedWithShortSub1 1039.081 1053.96 1.014319384x > StringIndexOf.advancedWithShortSub2 55.828 110.541 1.980027943x > StringIndexOf.constantPattern 9.361 11.906 1.271872663x > StringIndexOf.searchCharLongSuccess 4.216 4.218 1.000474383x > StringIndexOf.searchCharMediumSuccess 3.133 3.216 1.02649218x > StringIndexOf.searchCharShortSuccess 3.76 3.761 1.000265957x > StringIndexOf.success 9.186 9.713 1.057369911x > StringIndexOf.successBig 14.341 46.343 3.231504079x > StringIndexOfChar.latin1_AVX2_String 6220.918 12154.52 1.953814533x > StringIndexOfChar.latin1_AVX2_char 5503.556 5540.044 1.006629895x > StringIndexOfChar.latin1_SSE4_String 6978.854 6818.689 0.977049957x > StringIndexOfChar.latin1_SSE4_char 5657.499 5474.624 0.967675646x > StringIndexOfChar.latin1_Short_String 7132.541 6863.359 0.962260014x > StringIndexOfChar.latin1_Short_char 16013.389 16162.437 1.009307711x > StringIndexOfChar.latin1_mixed_String 7386.123 14771.622 1.999915517x > StringIndexOfChar.latin1_mixed_char 9901.671 9782.245 0.987938803 Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: Addressing lots of comments. Interim commit. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16753/files - new: https://git.openjdk.org/jdk/pull/16753/files/fb4da92a..9a861979 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16753&range=19 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16753&range=18-19 Stats: 1639 lines in 9 files changed: 429 ins; 683 del; 527 mod Patch: https://git.openjdk.org/jdk/pull/16753.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16753/head:pull/16753 PR: https://git.openjdk.org/jdk/pull/16753 From sgibbons at openjdk.org Fri May 17 23:56:08 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Fri, 17 May 2024 23:56:08 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v19] In-Reply-To: References: Message-ID: <8-W2sMyDMG71FBi7q_BLwiRoUj5Drr_J2IHiJPAtXd8=.a92b0aa8-402b-4d3e-9eb5-60e5d125920a@github.com> On Tue, 14 May 2024 18:38:38 GMT, Sandhya Viswanathan wrote: >> Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: >> >> Rearrange; add lambdas for clarity > > src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1784: > >> 1782: __ subq(tmp, haystack_len); >> 1783: } >> 1784: __ leaq(haystack, Address(rsp, tmp, Address::times_1)); > > This whole code is repeated in two places. Could be made into a function and used at both places. This is the only place now. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1605617739 From sgibbons at openjdk.org Fri May 17 23:56:07 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Fri, 17 May 2024 23:56:07 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v19] In-Reply-To: <5DbhciTOeJf2n_vsG_R2r35-vFFp3QH3mmOX9hrqC3g=.9117cc86-a514-4e9b-a5d4-7108e72170ae@github.com> References: <8Y-nIHc8vfB1X_hp3tpqqqgpCzu6dAt6BBIP_zc4Q70=.c9a48c68-8c14-4af9-8357-ab50e62a5fd3@github.com> <5DbhciTOeJf2n_vsG_R2r35-vFFp3QH3mmOX9hrqC3g=.9117cc86-a514-4e9b-a5d4-7108e72170ae@github.com> Message-ID: On Fri, 17 May 2024 22:40:50 GMT, Sandhya Viswanathan wrote: >> The entry code switches Windows calling convention into Linux calling convention by moving/saving registers, which are properly restored on function exit. This makes register tracking easier. > > I don't see the place where the switch is happening before this initial piece of code. You also have windows tests failing in the GHA. Could you please double check? Fixed to use c_rargX ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1605618391 From sgibbons at openjdk.org Sat May 18 00:02:17 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Sat, 18 May 2024 00:02:17 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v19] In-Reply-To: References: Message-ID: On Tue, 14 May 2024 00:38:30 GMT, Sandhya Viswanathan wrote: >> Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: >> >> Rearrange; add lambdas for clarity > > src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1178: > >> 1176: __ andq(eq_mask, lastMask); >> 1177: if (needToSaveRCX) { >> 1178: __ movdq(rcx, saveRCX); > > movdq is an expensive instruction (about 3 cycle). If we have another gpr temporary available here for shiftVal, then we dont need to do save/restore rcx. No longer need to use rcx. Refactored. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1605619614 From sgibbons at openjdk.org Sat May 18 00:02:17 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Sat, 18 May 2024 00:02:17 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v19] In-Reply-To: References: Message-ID: On Wed, 15 May 2024 19:18:02 GMT, Volodymyr Paprotski wrote: >> Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: >> >> Rearrange; add lambdas for clarity > > test/jdk/java/lang/StringBuffer/IndexOf.java line 40: > >> 38: private static boolean failure = false; >> 39: public static void main(String[] args) throws Exception { >> 40: String testName = "IndexOf"; > > intentation Fixed ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1605619940 From wkemper at openjdk.org Sat May 18 00:16:19 2024 From: wkemper at openjdk.org (William Kemper) Date: Sat, 18 May 2024 00:16:19 GMT Subject: RFR: 8332082: Shenandoah: Use consistent tests to determine when pre-write barrier is active [v2] In-Reply-To: References: Message-ID: > This is consistent with c1 and other platforms. William Kemper has updated the pull request incrementally with two additional commits since the last revision: - Make all the barriers and verifiers use gc_state Still need to fix verifier for c2 - Revert "Check for satb active flag, rather than gc state" This reverts commit 2769c97cbf5313c5c0f1336060ec39cb66584e3c. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19180/files - new: https://git.openjdk.org/jdk/pull/19180/files/2769c97c..3271fecf Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19180&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19180&range=00-01 Stats: 73 lines in 7 files changed: 17 ins; 31 del; 25 mod Patch: https://git.openjdk.org/jdk/pull/19180.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19180/head:pull/19180 PR: https://git.openjdk.org/jdk/pull/19180 From amenkov at openjdk.org Sat May 18 00:53:21 2024 From: amenkov at openjdk.org (Alex Menkov) Date: Sat, 18 May 2024 00:53:21 GMT Subject: RFR: 8331683: Clean up GetCarrierThread Message-ID: JVMTI GetCarrierThread extension function was introduced by loom for testing. It's used by several tests in hotspot/jtreg/serviceability. Testings: tier1..tier6 ------------- Commit messages: - fix Changes: https://git.openjdk.org/jdk/pull/19289/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19289&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8331683 Stats: 37 lines in 3 files changed: 4 ins; 27 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/19289.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19289/head:pull/19289 PR: https://git.openjdk.org/jdk/pull/19289 From duke at openjdk.org Sat May 18 01:06:05 2024 From: duke at openjdk.org (xiaotaonan) Date: Sat, 18 May 2024 01:06:05 GMT Subject: RFR: 8301464: Code in GenFullCP is still disabled after JDK-8079697 was fixed In-Reply-To: References: Message-ID: On Wed, 15 May 2024 07:40:28 GMT, Stefan Karlsson wrote: >> Code in GenFullCP is still disabled after JDK-8079697 was fixed >> note:I have not found any relevant information on why ClassWriter.COMPUTE_FRAMES is disabled in JDK-8079697. > > This is not related to GC code, could you remove the hotspot-gc label you added? @stefank @mdinacci @hns @landonf ------------- PR Comment: https://git.openjdk.org/jdk/pull/19228#issuecomment-2118539897 From dlong at openjdk.org Sat May 18 02:14:02 2024 From: dlong at openjdk.org (Dean Long) Date: Sat, 18 May 2024 02:14:02 GMT Subject: RFR: 8329748: Change default value of AssertWXAtThreadSync to true In-Reply-To: References: Message-ID: On Fri, 17 May 2024 11:51:25 GMT, Tobias Holenstein wrote: > But do you think we can integrate this PR until JDK-8328306 is ready? Sure, fine with me! ------------- PR Comment: https://git.openjdk.org/jdk/pull/19102#issuecomment-2118600312 From duke at openjdk.org Sat May 18 09:07:18 2024 From: duke at openjdk.org (Lei Zaakjyu) Date: Sat, 18 May 2024 09:07:18 GMT Subject: RFR: 8330694: Rename 'HeapRegion' to 'G1HeapRegion' [v12] In-Reply-To: <3IdWn9VGEERd8v9RcH2E_LzjVo0L8nMfi5jGWmhgVuM=.6b5b3be4-bfbd-4376-9580-48d78d75665c@github.com> References: <3IdWn9VGEERd8v9RcH2E_LzjVo0L8nMfi5jGWmhgVuM=.6b5b3be4-bfbd-4376-9580-48d78d75665c@github.com> Message-ID: > follow up 8267941 Lei Zaakjyu has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: restore ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18871/files - new: https://git.openjdk.org/jdk/pull/18871/files/dafdc775..c240897a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18871&range=11 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18871&range=10-11 Stats: 19 lines in 6 files changed: 0 ins; 0 del; 19 mod Patch: https://git.openjdk.org/jdk/pull/18871.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18871/head:pull/18871 PR: https://git.openjdk.org/jdk/pull/18871 From jpai at openjdk.org Sat May 18 11:43:02 2024 From: jpai at openjdk.org (Jaikiran Pai) Date: Sat, 18 May 2024 11:43:02 GMT Subject: RFR: 8331671: Implement JEP 472: Prepare to Restrict the Use of JNI [v8] In-Reply-To: References: Message-ID: <_4CgX7Ojzb5QH2sJ4k2fDgfz_zba03l_4feYaVyzhl0=.a6128ce8-56c3-4b71-a0e3-cf48c9b68c3e@github.com> On Fri, 17 May 2024 13:38:25 GMT, Maurizio Cimadamore wrote: >> This PR implements [JEP 472](https://openjdk.org/jeps/472), by restricting the use of JNI in the following ways: >> >> * `System::load` and `System::loadLibrary` are now restricted methods >> * `Runtime::load` and `Runtime::loadLibrary` are now restricted methods >> * binding a JNI `native` method declaration to a native implementation is now considered a restricted operation >> >> This PR slightly changes the way in which the JDK deals with restricted methods, even for FFM API calls. In Java 22, the single `--enable-native-access` was used both to specify a set of modules for which native access should be allowed *and* to specify whether illegal native access (that is, native access occurring from a module not specified by `--enable-native-access`) should be treated as an error or a warning. More specifically, an error is only issued if the `--enable-native-access flag` is used at least once. >> >> Here, a new flag is introduced, namely `illegal-native-access=allow/warn/deny`, which is used to specify what should happen when access to a restricted method and/or functionality is found outside the set of modules specified with `--enable-native-access`. The default policy is `warn`, but users can select `allow` to suppress the warnings, or `deny` to cause `IllegalCallerException` to be thrown. This aligns the treatment of restricted methods with other mechanisms, such as `--illegal-access` and the more recent `--sun-misc-unsafe-memory-access`. >> >> Some changes were required in the package-info javadoc for `java.lang.foreign`, to reflect the changes in the command line flags described above. > > Maurizio Cimadamore has updated the pull request incrementally with one additional commit since the last revision: > > Address review comments Marked as reviewed by jpai (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/19213#pullrequestreview-2064736036 From amitkumar at openjdk.org Sun May 19 10:29:04 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Sun, 19 May 2024 10:29:04 GMT Subject: RFR: 8319947: Recursive lightweight locking: s390x implementation In-Reply-To: <_HXRejW4TcLfTYbXlRQUVyejaomiwx6HhObYfG1lX6E=.f89b29a0-4c2d-43e1-83be-d7bafd84816c@github.com> References: <_HXRejW4TcLfTYbXlRQUVyejaomiwx6HhObYfG1lX6E=.f89b29a0-4c2d-43e1-83be-d7bafd84816c@github.com> Message-ID: On Sun, 21 Apr 2024 16:30:43 GMT, Amit Kumar wrote: > s390x port for recursive locking. > > testing: > - [x] build fastdebug-vm > - [x] build slowdebug-vm > - [x] build release-vm > - [x] build optimized-vm > - [x] ./test/jdk/java/util/concurrent (fastdebug-vm) > - [x] with C1 > - [x] with C2 > - [x] with interpreter > - [x] ./test/jdk/java/util/concurrent (release-vm) > - [x] with C1 > - [x] with C2 > - [x] with interpreter > - [x] ./test/jdk/java/util/concurrent (slowdebug-vm) > - [x] with C1 > - [x] with C2 > - [x] with interpreter > - [x] tier1 with fastdebug-vm > - [x] tier1 with slowdebug-vm > - [x] tier1 with release-vm > > *BenchMarks*: > > Results from Performance LPARs : > > > Locking Mode = 1 (without Patch) > > Benchmark (innerCount) Mode Cnt Score Error Units > LockUnlock.testContendedLock 100 avgt 12 5.144 ? 0.035 ns/op > LockUnlock.testRecursiveLockUnlock 100 avgt 12 3824.742 ? 89.475 ns/op > LockUnlock.testRecursiveSynchronization 100 avgt 12 25.348 ? 0.559 ns/op > LockUnlock.testSerialLockUnlock 100 avgt 12 466.629 ? 3.036 ns/op > LockUnlock.testSimpleLockUnlock 100 avgt 12 468.532 ? 1.793 ns/op > Finished running test 'micro:vm.lang.LockUnlock' > > Locking Mode = 1 (with patch) > > Benchmark (innerCount) Mode Cnt Score Error Units > LockUnlock.testContendedLock 100 avgt 12 5.146 ? 0.027 ns/op > LockUnlock.testRecursiveLockUnlock 100 avgt 12 3833.175 ? 75.863 ns/op > LockUnlock.testRecursiveSynchronization 100 avgt 12 25.206 ? 0.519 ns/op > LockUnlock.testSerialLockUnlock 100 avgt 12 473.973 ? 2.103 ns/op > LockUnlock.testSimpleLockUnlock 100 avgt 12 470.749 ? 2.229 ns/op > Finished running test 'micro:vm.lang.LockUnlock' > > > > > Locking Mode = 2 (without Patch) > > Benchmark (innerCount) Mode Cnt Score Error Units > LockUnlock.testContendedLock 100 avgt 12 4.688 ? 0.051 ns/op > LockUnlock.testRecursiveLockUnlock 100 avgt 12 12800.544 ? 92.265 ns/op > LockUnlock.testRecursiveSynchronization 100 avgt 12 26.486 ? 2.229 ns/op > LockUnlock.testSerialLockUnlock 100 avgt 12 424.499 ? 0.416 ns/op > LockUnlock.testSimpleLockUnlock 100 avgt 12 424.241 ? 0.840 ns/op > Finished running test 'micro:vm.lang.Lo... @TheRealMDoerr Would you please review this one :-) ------------- PR Comment: https://git.openjdk.org/jdk/pull/18878#issuecomment-2119183356 From dholmes at openjdk.org Sun May 19 22:25:01 2024 From: dholmes at openjdk.org (David Holmes) Date: Sun, 19 May 2024 22:25:01 GMT Subject: RFR: 8332259: JvmtiTrace::safe_get_thread_name fails if current thread is in native state [v4] In-Reply-To: References: Message-ID: On Fri, 17 May 2024 22:31:32 GMT, Leonid Mesnik wrote: >> The JvmtiTrace::safe_get_thread_name sometimes crashes when called while current thread is in native thread state. >> >> It happens when thread_name is set for tracing from jvmti functions. >> See: >> https://github.com/openjdk/jdk/blob/master/src/hotspot/share/prims/jvmtiEnter.xsl#L649 >> >> The setup is called and the thread name is used in tracing before the thread transition. There is no good location where this method could be called from vm thread_state only. Some functions like raw monitor enter/exit never transition in vm state. So sometimes it is needed to call this function from native thread state. >> >> The change should affect JVMTI trace mode only (-XX:TraceJVMTI). >> >> Verified by running jvmti/jdi/jdb tests with tracing enabled. > > Leonid Mesnik has updated the pull request incrementally with one additional commit since the last revision: > > wrong thread state I don't understand the additional changes because they read the current thread's name, whereas this issue is about reading an arbitrary thread's name when the current thread happens to be in the wrong state. ??? ------------- PR Comment: https://git.openjdk.org/jdk/pull/19275#issuecomment-2119378363 From fyang at openjdk.org Mon May 20 02:04:01 2024 From: fyang at openjdk.org (Fei Yang) Date: Mon, 20 May 2024 02:04:01 GMT Subject: RFR: 8332265: RISC-V: Materialize pointers faster by using a temp register In-Reply-To: <6hYm5BI8U_kB2R5XolQoBK9dXvmlmlynwhm7pt7Pi-g=.b168fee4-bff2-42e5-8816-b97776135a2c@github.com> References: <6hYm5BI8U_kB2R5XolQoBK9dXvmlmlynwhm7pt7Pi-g=.b168fee4-bff2-42e5-8816-b97776135a2c@github.com> Message-ID: <03bVuz3hv7OWpWQbFRH_CeRnu0TMW9VF-kg5QMpu3PA=.296bd4c9-4404-4271-bd15-282707bb1e6b@github.com> On Thu, 16 May 2024 07:52:40 GMT, Robbin Ehn wrote: > Yes, but it's a long term job, as you need to free a register in many cases. (in non-call sites places) All callsites should be easy to change as you have plenty of callee saved registers which are already saved when using movptr. OK, I guess this might be a good compromise. Inspired by PPC's `Assembler::load_const`, `MacroAssembler::get_const` and `MacroAssembler::patch_const` [1-3], I think we could have a similar design. Adding one extra tmp register param for `movptr` like `void movptr(Register Rd, address addr, int32_t &offset, Register tmp=noreg);`, we can factor out `li48` then. The only difference compared with PPC's solution is that that we will have different sizes depending on whether we could find a tmp register for `movptr`. But I guess that's not a big issue? We can add a reference param (say, `size`) to existing `is_movptr_at/is_movptr`, we get the correct size when checking the instruction sequence and return this in `size`. I don't think it's a good idea to have more `li` variants like `li48`. We should also remove the existing `li64` which is not used anywhere. Could you please consider this? [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/ppc/assembler_ppc.cpp#L323 [2] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/ppc/macroAssembler_ppc.cpp#L327 [3] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/ppc/macroAssembler_ppc.cpp#L349 ------------- PR Comment: https://git.openjdk.org/jdk/pull/19246#issuecomment-2119537134 From dholmes at openjdk.org Mon May 20 03:53:03 2024 From: dholmes at openjdk.org (David Holmes) Date: Mon, 20 May 2024 03:53:03 GMT Subject: RFR: 8332259: JvmtiTrace::safe_get_thread_name fails if current thread is in native state [v4] In-Reply-To: References: Message-ID: On Fri, 17 May 2024 22:31:32 GMT, Leonid Mesnik wrote: >> The JvmtiTrace::safe_get_thread_name sometimes crashes when called while current thread is in native thread state. >> >> It happens when thread_name is set for tracing from jvmti functions. >> See: >> https://github.com/openjdk/jdk/blob/master/src/hotspot/share/prims/jvmtiEnter.xsl#L649 >> >> The setup is called and the thread name is used in tracing before the thread transition. There is no good location where this method could be called from vm thread_state only. Some functions like raw monitor enter/exit never transition in vm state. So sometimes it is needed to call this function from native thread state. >> >> The change should affect JVMTI trace mode only (-XX:TraceJVMTI). >> >> Verified by running jvmti/jdi/jdb tests with tracing enabled. > > Leonid Mesnik has updated the pull request incrementally with one additional commit since the last revision: > > wrong thread state Okay now I get it. Even once the function is made truly safe, we are always calling it from an unsafe state and so will get the default `Thread::name` response. So now, after any transition to the VM the name is read again to get a good value. This seems a good enhancement though I have to wonder if the apparent changing of the thread name in the tracing might cause problems. The tracing really needs to include a unique thread identifier. Thanks src/hotspot/share/prims/jvmtiEventController.cpp line 961: > 959: JvmtiEventControllerPrivate::change_field_watch(jvmtiEvent event_type, bool added) { > 960: int *count_addr; > 961: Nit: this file doesn't need to be touched. ------------- Marked as reviewed by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19275#pullrequestreview-2065287663 PR Review Comment: https://git.openjdk.org/jdk/pull/19275#discussion_r1606209679 From alanb at openjdk.org Mon May 20 06:02:01 2024 From: alanb at openjdk.org (Alan Bateman) Date: Mon, 20 May 2024 06:02:01 GMT Subject: RFR: 8332259: JvmtiTrace::safe_get_thread_name fails if current thread is in native state [v4] In-Reply-To: References: Message-ID: On Fri, 17 May 2024 22:31:32 GMT, Leonid Mesnik wrote: >> The JvmtiTrace::safe_get_thread_name sometimes crashes when called while current thread is in native thread state. >> >> It happens when thread_name is set for tracing from jvmti functions. >> See: >> https://github.com/openjdk/jdk/blob/master/src/hotspot/share/prims/jvmtiEnter.xsl#L649 >> >> The setup is called and the thread name is used in tracing before the thread transition. There is no good location where this method could be called from vm thread_state only. Some functions like raw monitor enter/exit never transition in vm state. So sometimes it is needed to call this function from native thread state. >> >> The change should affect JVMTI trace mode only (-XX:TraceJVMTI). >> >> Verified by running jvmti/jdi/jdb tests with tracing enabled. > > Leonid Mesnik has updated the pull request incrementally with one additional commit since the last revision: > > wrong thread state Are there tests that run with TraceJVMTI so that this option is tested? ------------- PR Comment: https://git.openjdk.org/jdk/pull/19275#issuecomment-2119724541 From jsjolen at openjdk.org Mon May 20 09:19:41 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Mon, 20 May 2024 09:19:41 GMT Subject: RFR: 8312132: Add tracking of multiple address spaces in NMT [v93] In-Reply-To: References: Message-ID: > Hi, > > This PR introduces a new abstraction to NMT, named `MemoryFileTracker`. Today, NMT does not track any memory outside of the virtual memory address space. This means that if you allocated memory in something such as a memory-backed file and use `mmap` to map into that memory, then you'll have trouble reporting this to NMT. This is the situation that ZGC is in, and that is what this patch attempts to fix. > > ## `MemoryFileTracker` > > The `MemoryFileTracker` adds the ability of adding new virtual memory address spaces to NMT and committing memory to these, the basic API is: > > ```c++ > static MemoryFile* make_device(const char* descriptive_name); > static void free_device(MemoryFile* device); > > static void allocate_memory(MemoryFile* device, size_t offset, size_t size, > MEMFLAGS flag, const NativeCallStack& stack); > static void free_memory(MemoryFile* device, size_t offset, size_t size); > > > It is easiest to see how this is used by looking at what ZGC's `ZNMT` class does: > > ```c++ > void ZNMT::reserve(zaddress_unsafe start, size_t size) { > MemTracker::record_virtual_memory_reserve((address)start, size, CALLER_PC, mtJavaHeap); > } > void ZNMT::commit(zoffset offset, size_t size) { > MemTracker::allocate_memory_in(ZNMT::_device, static_cast(offset), size, mtJavaHeap, CALLER_PC); > } > void ZNMT::uncommit(zoffset offset, size_t size) { > MemTracker::free_memory_in(ZNMT::_device, (size_t)offset, size); > } > > void ZNMT::map(zaddress_unsafe addr, size_t size, zoffset offset) { > // NMT doesn't track mappings at the moment. > } > void ZNMT::unmap(zaddress_unsafe addr, size_t size) { > // NMT doesn't track mappings at the moment. > } > > > As you can see, any mapping between reserved regions and device-allocated memory is not recorded in NMT. This means that in detailed mode you only get reserved regions printed for the reserved memory, the device-allocated memory is reported separately. When performing summary reporting any memory allocated via these devices is added to the corresponding `MEMFLAGS` as `committed` memory. > > This patch is also acting as a base on which we deploy multiple new backend ideas to NMT. These ideas are: > > 1. Implement VMA tracking using a balanced binary tree approach. Today's `VirtualMemoryTracker`'s usage of linked lists is slow and brittle, we'd like to move away from it. Our Treap-based approach in this patch gives a performance boost such that we see 25x better performance in a benchmark. The idea and draft of this... Johan Sj?len has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 182 commits: - Merge remote-tracking branch 'openjdk/master' into nmt-physical-device - Allow for up to 3 extra in depth - Move definition of struct to gain external linkage Due to mgronlund - Fix visit_in_order tests - Find a closer bound for treap depth and express it in base-2 log - Add corresponding tests to visit_in_order when applicable - Remove usage of auto in tests - Merge remote-tracking branch 'openjdk/master' into nmt-physical-device - Don't look at val, look at key - Fix test - ... and 172 more: https://git.openjdk.org/jdk/compare/b92bd671...1c3fb154 ------------- Changes: https://git.openjdk.org/jdk/pull/18289/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=18289&range=92 Stats: 2083 lines in 20 files changed: 1978 ins; 86 del; 19 mod Patch: https://git.openjdk.org/jdk/pull/18289.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18289/head:pull/18289 PR: https://git.openjdk.org/jdk/pull/18289 From jsjolen at openjdk.org Mon May 20 10:20:39 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Mon, 20 May 2024 10:20:39 GMT Subject: RFR: 8312132: Add tracking of multiple address spaces in NMT [v94] In-Reply-To: References: Message-ID: <73qpoaZCJQ7X9eD7PSWGCC8MYKSawrXsTU7Rus3rBuA=.d3018090-3c16-44fa-9583-7b2ed2e14f16@github.com> > Hi, > > This PR introduces a new abstraction to NMT, named `MemoryFileTracker`. Today, NMT does not track any memory outside of the virtual memory address space. This means that if you allocated memory in something such as a memory-backed file and use `mmap` to map into that memory, then you'll have trouble reporting this to NMT. This is the situation that ZGC is in, and that is what this patch attempts to fix. > > ## `MemoryFileTracker` > > The `MemoryFileTracker` adds the ability of adding new virtual memory address spaces to NMT and committing memory to these, the basic API is: > > ```c++ > static MemoryFile* make_device(const char* descriptive_name); > static void free_device(MemoryFile* device); > > static void allocate_memory(MemoryFile* device, size_t offset, size_t size, > MEMFLAGS flag, const NativeCallStack& stack); > static void free_memory(MemoryFile* device, size_t offset, size_t size); > > > It is easiest to see how this is used by looking at what ZGC's `ZNMT` class does: > > ```c++ > void ZNMT::reserve(zaddress_unsafe start, size_t size) { > MemTracker::record_virtual_memory_reserve((address)start, size, CALLER_PC, mtJavaHeap); > } > void ZNMT::commit(zoffset offset, size_t size) { > MemTracker::allocate_memory_in(ZNMT::_device, static_cast(offset), size, mtJavaHeap, CALLER_PC); > } > void ZNMT::uncommit(zoffset offset, size_t size) { > MemTracker::free_memory_in(ZNMT::_device, (size_t)offset, size); > } > > void ZNMT::map(zaddress_unsafe addr, size_t size, zoffset offset) { > // NMT doesn't track mappings at the moment. > } > void ZNMT::unmap(zaddress_unsafe addr, size_t size) { > // NMT doesn't track mappings at the moment. > } > > > As you can see, any mapping between reserved regions and device-allocated memory is not recorded in NMT. This means that in detailed mode you only get reserved regions printed for the reserved memory, the device-allocated memory is reported separately. When performing summary reporting any memory allocated via these devices is added to the corresponding `MEMFLAGS` as `committed` memory. > > This patch is also acting as a base on which we deploy multiple new backend ideas to NMT. These ideas are: > > 1. Implement VMA tracking using a balanced binary tree approach. Today's `VirtualMemoryTracker`'s usage of linked lists is slow and brittle, we'd like to move away from it. Our Treap-based approach in this patch gives a performance boost such that we see 25x better performance in a benchmark. The idea and draft of this... Johan Sj?len has updated the pull request incrementally with two additional commits since the last revision: - Fix memory leak, store the links in NativeCallStackStorage in Arena - Accidental issue with docs, remove ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18289/files - new: https://git.openjdk.org/jdk/pull/18289/files/1c3fb154..95fc3ba2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18289&range=93 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18289&range=92-93 Stats: 12 lines in 1 file changed: 3 ins; 4 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/18289.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18289/head:pull/18289 PR: https://git.openjdk.org/jdk/pull/18289 From jsjolen at openjdk.org Mon May 20 10:32:11 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Mon, 20 May 2024 10:32:11 GMT Subject: RFR: 8312132: Add tracking of multiple address spaces in NMT [v94] In-Reply-To: <73qpoaZCJQ7X9eD7PSWGCC8MYKSawrXsTU7Rus3rBuA=.d3018090-3c16-44fa-9583-7b2ed2e14f16@github.com> References: <73qpoaZCJQ7X9eD7PSWGCC8MYKSawrXsTU7Rus3rBuA=.d3018090-3c16-44fa-9583-7b2ed2e14f16@github.com> Message-ID: On Mon, 20 May 2024 10:20:39 GMT, Johan Sj?len wrote: >> Hi, >> >> This PR introduces a new abstraction to NMT, named `MemoryFileTracker`. Today, NMT does not track any memory outside of the virtual memory address space. This means that if you allocated memory in something such as a memory-backed file and use `mmap` to map into that memory, then you'll have trouble reporting this to NMT. This is the situation that ZGC is in, and that is what this patch attempts to fix. >> >> ## `MemoryFileTracker` >> >> The `MemoryFileTracker` adds the ability of adding new virtual memory address spaces to NMT and committing memory to these, the basic API is: >> >> ```c++ >> static MemoryFile* make_device(const char* descriptive_name); >> static void free_device(MemoryFile* device); >> >> static void allocate_memory(MemoryFile* device, size_t offset, size_t size, >> MEMFLAGS flag, const NativeCallStack& stack); >> static void free_memory(MemoryFile* device, size_t offset, size_t size); >> >> >> It is easiest to see how this is used by looking at what ZGC's `ZNMT` class does: >> >> ```c++ >> void ZNMT::reserve(zaddress_unsafe start, size_t size) { >> MemTracker::record_virtual_memory_reserve((address)start, size, CALLER_PC, mtJavaHeap); >> } >> void ZNMT::commit(zoffset offset, size_t size) { >> MemTracker::allocate_memory_in(ZNMT::_device, static_cast(offset), size, mtJavaHeap, CALLER_PC); >> } >> void ZNMT::uncommit(zoffset offset, size_t size) { >> MemTracker::free_memory_in(ZNMT::_device, (size_t)offset, size); >> } >> >> void ZNMT::map(zaddress_unsafe addr, size_t size, zoffset offset) { >> // NMT doesn't track mappings at the moment. >> } >> void ZNMT::unmap(zaddress_unsafe addr, size_t size) { >> // NMT doesn't track mappings at the moment. >> } >> >> >> As you can see, any mapping between reserved regions and device-allocated memory is not recorded in NMT. This means that in detailed mode you only get reserved regions printed for the reserved memory, the device-allocated memory is reported separately. When performing summary reporting any memory allocated via these devices is added to the corresponding `MEMFLAGS` as `committed` memory. >> >> This patch is also acting as a base on which we deploy multiple new backend ideas to NMT. These ideas are: >> >> 1. Implement VMA tracking using a balanced binary tree approach. Today's `VirtualMemoryTracker`'s usage of linked lists is slow and brittle, we'd like to move away from it. Our Treap-based approach in this patch gives a performance bo... > > Johan Sj?len has updated the pull request incrementally with two additional commits since the last revision: > > - Fix memory leak, store the links in NativeCallStackStorage in Arena > - Accidental issue with docs, remove In a future RFE NativeCallStackStorage can store its links and references in a linear free list allocator, cutting down from 16 bytes per link to 8 bytes. We may be able to cut down on the number of buckets significantly, as we'll have better locality on our side. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18289#issuecomment-2120158986 From jsjolen at openjdk.org Mon May 20 12:03:28 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Mon, 20 May 2024 12:03:28 GMT Subject: RFR: 8312132: Add tracking of multiple address spaces in NMT [v95] In-Reply-To: References: Message-ID: > Hi, > > This PR introduces a new abstraction to NMT, named `MemoryFileTracker`. Today, NMT does not track any memory outside of the virtual memory address space. This means that if you allocated memory in something such as a memory-backed file and use `mmap` to map into that memory, then you'll have trouble reporting this to NMT. This is the situation that ZGC is in, and that is what this patch attempts to fix. > > ## `MemoryFileTracker` > > The `MemoryFileTracker` adds the ability of adding new virtual memory address spaces to NMT and committing memory to these, the basic API is: > > ```c++ > static MemoryFile* make_device(const char* descriptive_name); > static void free_device(MemoryFile* device); > > static void allocate_memory(MemoryFile* device, size_t offset, size_t size, > MEMFLAGS flag, const NativeCallStack& stack); > static void free_memory(MemoryFile* device, size_t offset, size_t size); > > > It is easiest to see how this is used by looking at what ZGC's `ZNMT` class does: > > ```c++ > void ZNMT::reserve(zaddress_unsafe start, size_t size) { > MemTracker::record_virtual_memory_reserve((address)start, size, CALLER_PC, mtJavaHeap); > } > void ZNMT::commit(zoffset offset, size_t size) { > MemTracker::allocate_memory_in(ZNMT::_device, static_cast(offset), size, mtJavaHeap, CALLER_PC); > } > void ZNMT::uncommit(zoffset offset, size_t size) { > MemTracker::free_memory_in(ZNMT::_device, (size_t)offset, size); > } > > void ZNMT::map(zaddress_unsafe addr, size_t size, zoffset offset) { > // NMT doesn't track mappings at the moment. > } > void ZNMT::unmap(zaddress_unsafe addr, size_t size) { > // NMT doesn't track mappings at the moment. > } > > > As you can see, any mapping between reserved regions and device-allocated memory is not recorded in NMT. This means that in detailed mode you only get reserved regions printed for the reserved memory, the device-allocated memory is reported separately. When performing summary reporting any memory allocated via these devices is added to the corresponding `MEMFLAGS` as `committed` memory. > > This patch is also acting as a base on which we deploy multiple new backend ideas to NMT. These ideas are: > > 1. Implement VMA tracking using a balanced binary tree approach. Today's `VirtualMemoryTracker`'s usage of linked lists is slow and brittle, we'd like to move away from it. Our Treap-based approach in this patch gives a performance boost such that we see 25x better performance in a benchmark. The idea and draft of this... Johan Sj?len has updated the pull request incrementally with four additional commits since the last revision: - Remove unused include - Basic tests for NativeCallStackStorage - Allow for passing in nr of buckets - Remove friend-ness ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18289/files - new: https://git.openjdk.org/jdk/pull/18289/files/95fc3ba2..4a68e141 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18289&range=94 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18289&range=93-94 Stats: 76 lines in 4 files changed: 61 ins; 8 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/18289.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18289/head:pull/18289 PR: https://git.openjdk.org/jdk/pull/18289 From rehn at openjdk.org Mon May 20 13:15:15 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Mon, 20 May 2024 13:15:15 GMT Subject: RFR: 8332265: RISC-V: Materialize pointers faster by using a temp register [v2] In-Reply-To: References: Message-ID: > Hi, please consider! > > Materializing a 48-bit pointer, using an additional register, we can do with: > lui + lui + slli + add + addi > This 15% faster both on VF2 and in CPU models, compared to movptr(). > > As we often materialize during calls there is free registers. > > I have choose just a few spot to use it, many more can use. > E.g. la() with tmp register can use li48 instead of movptr. > > Running tests now (so far so good), as if I screwed up IC calls it should be seen fast. > And benchmarks when hardware is free. Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: - li48 -> movptr - Merge branch 'master' into 8332265 - li48 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19246/files - new: https://git.openjdk.org/jdk/pull/19246/files/f5efaca8..edfdda28 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19246&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19246&range=00-01 Stats: 41195 lines in 524 files changed: 34199 ins; 5033 del; 1963 mod Patch: https://git.openjdk.org/jdk/pull/19246.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19246/head:pull/19246 PR: https://git.openjdk.org/jdk/pull/19246 From rehn at openjdk.org Mon May 20 13:27:07 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Mon, 20 May 2024 13:27:07 GMT Subject: RFR: 8332265: RISC-V: Materialize pointers faster by using a temp register [v2] In-Reply-To: References: Message-ID: <2ioigf_50oVYuRPaFw03fF94pEqmpJwljL9J3vY6qYM=.d59502c0-29e3-4cac-a3fa-021e53380346@github.com> On Mon, 20 May 2024 13:15:15 GMT, Robbin Ehn wrote: >> Hi, please consider! >> >> Materializing a 48-bit pointer, using an additional register, we can do with: >> lui + lui + slli + add + addi >> This 15% faster both on VF2 and in CPU models, compared to movptr(). >> >> As we often materialize during calls there is free registers. >> >> I have choose just a few spot to use it, many more can use. >> E.g. la() with tmp register can use li48 instead of movptr. >> >> Running tests now (so far so good), as if I screwed up IC calls it should be seen fast. >> And benchmarks when hardware is free. > > Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - li48 -> movptr > - Merge branch 'master' into 8332265 > - li48 Hey, I did an update, not fully what you are saying. We are missing a lot of 'abstraction' e.g. take a look at CodeInstaller::pd_next_offset in jvmciCodeInstaller. I think this code should look like: jint CodeInstaller::pd_next_offset(NativeInstruction* inst, jint pc_offset, JVMCI_TRAPS) { if(inst->is_call() || inst->is_jump() || inst->is_movptr()) { return pc_offset + inst->size(); } JVMCI_ERROR_0("unsupported type of instruction for call site"); } But I need to add a bunch of stuff to unrelated NativeInst, I think that is better suited in another PR. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19246#issuecomment-2120457463 From aph at openjdk.org Mon May 20 14:02:01 2024 From: aph at openjdk.org (Andrew Haley) Date: Mon, 20 May 2024 14:02:01 GMT Subject: RFR: 8325821: [REDO] use "dmb.ishst+dmb.ishld" for release barrier In-Reply-To: References: Message-ID: <7eML4nr0XN1_QVOO_2tk-yXf8W578S4qb1kA3AoaU8w=.81b03ff5-7ba8-496d-acfe-285ba3de2004@github.com> On Fri, 17 May 2024 08:57:20 GMT, kuaiwei wrote: > he origin patch for https://bugs.openjdk.org/browse/JDK-8324186 has 2 issues: > 1 It show regression in some platform, like Apple silicon in mac os > 2 Can not handle instruction sequence like "dmb.ishld; dmb.ishst; dmb.ishld; dmb.ishld" > > It can be fixed by: > 1 Enable AlwaysMergeDMB by default, only disable it in architecture we can see performance improvement (N1 or N2) > 2 Check the special pattern and merge the subsequent dmb. > > It also fix a bug when code buffer is expanding, st/ld/dmb can not be merged. I added unit tests for these. > > This patch still has a unhandled case. Insts like "dmb.ishld; dmb.ishst; dmb.ish", it will merge the last 2 instructions and can not merge all three. Because when emitting dmb.ish, if merge all previous dmbs, the code buffer will shrink the size. I think it may break some resumption and think it's not a common pattern. > > In previous PR https://github.com/openjdk/jdk/pull/18467 , I tried an implementation to use state machine for merging. But it looks risky to pending instruction during emitting. src/hotspot/cpu/aarch64/aarch64.ad line 7841: > 7839: ins_encode %{ > 7840: __ block_comment("membar_release"); > 7841: __ membar(Assembler::StoreStore); Do we need to respect `AlwaysMergeDMB`here? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19278#discussion_r1606820799 From luhenry at openjdk.org Mon May 20 15:23:04 2024 From: luhenry at openjdk.org (Ludovic Henry) Date: Mon, 20 May 2024 15:23:04 GMT Subject: RFR: 8332265: RISC-V: Materialize pointers faster by using a temp register [v2] In-Reply-To: References: Message-ID: <2H5-YPIxkeYRtmPE4P91lFico3lm7UrU63LyiOb0bxM=.7f0a81de-f879-4fd9-857d-00a2e1eeeb59@github.com> On Mon, 20 May 2024 13:15:15 GMT, Robbin Ehn wrote: >> Hi, please consider! >> >> Materializing a 48-bit pointer, using an additional register, we can do with: >> lui + lui + slli + add + addi >> This 15% faster both on VF2 and in CPU models, compared to movptr(). >> >> As we often materialize during calls there is free registers. >> >> I have choose just a few spot to use it, many more can use. >> E.g. la() with tmp register can use li48 instead of movptr. >> >> Running tests now (so far so good), as if I screwed up IC calls it should be seen fast. >> And benchmarks when hardware is free. > > Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - li48 -> movptr > - Merge branch 'master' into 8332265 > - li48 src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 1669: > 1667: } > 1668: > 1669: void MacroAssembler::movptr_1(Register Rd, uint64_t imm64, int32_t &offset) { `movptr1` instead, to make it easily searchable with `is_movptr1_at` src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 1688: > 1686: } > 1687: > 1688: void MacroAssembler::movptr_2(Register Rd, uint64_t addr, int32_t &offset, Register tmp) { Also `movptr2` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19246#discussion_r1606918839 PR Review Comment: https://git.openjdk.org/jdk/pull/19246#discussion_r1606919047 From wkemper at openjdk.org Mon May 20 16:59:25 2024 From: wkemper at openjdk.org (William Kemper) Date: Mon, 20 May 2024 16:59:25 GMT Subject: RFR: 8332082: Shenandoah: Use consistent tests to determine when pre-write barrier is active [v3] In-Reply-To: References: Message-ID: <7YitGep10T35vf9lzitE2Oz3A9XwZywdDpgeiQoMXho=.7bb368d9-ea10-447d-ad29-6429f8ef6631@github.com> > This is consistent with c1 and other platforms. William Kemper has updated the pull request incrementally with one additional commit since the last revision: Fix typo ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19180/files - new: https://git.openjdk.org/jdk/pull/19180/files/3271fecf..cff7dc0c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19180&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19180&range=01-02 Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/19180.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19180/head:pull/19180 PR: https://git.openjdk.org/jdk/pull/19180 From lmesnik at openjdk.org Mon May 20 17:13:01 2024 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Mon, 20 May 2024 17:13:01 GMT Subject: RFR: 8332259: JvmtiTrace::safe_get_thread_name fails if current thread is in native state [v4] In-Reply-To: References: Message-ID: On Fri, 17 May 2024 22:31:32 GMT, Leonid Mesnik wrote: >> The JvmtiTrace::safe_get_thread_name sometimes crashes when called while current thread is in native thread state. >> >> It happens when thread_name is set for tracing from jvmti functions. >> See: >> https://github.com/openjdk/jdk/blob/master/src/hotspot/share/prims/jvmtiEnter.xsl#L649 >> >> The setup is called and the thread name is used in tracing before the thread transition. There is no good location where this method could be called from vm thread_state only. Some functions like raw monitor enter/exit never transition in vm state. So sometimes it is needed to call this function from native thread state. >> >> The change should affect JVMTI trace mode only (-XX:TraceJVMTI). >> >> Verified by running jvmti/jdi/jdb tests with tracing enabled. > > Leonid Mesnik has updated the pull request incrementally with one additional commit since the last revision: > > wrong thread state There are no tests currently executed with TraceJVMTI. I am thinking about adding execution of the svc testing. However I've got another failure that should resolved before https://bugs.openjdk.org/browse/JDK-8332536 It is not related to the current issue. Probably, it makes sense to add some basic logging testing with verification of log content also. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19275#issuecomment-2120856531 From prr at openjdk.org Mon May 20 18:50:04 2024 From: prr at openjdk.org (Phil Race) Date: Mon, 20 May 2024 18:50:04 GMT Subject: RFR: 8331671: Implement JEP 472: Prepare to Restrict the Use of JNI [v8] In-Reply-To: References: Message-ID: On Fri, 17 May 2024 13:38:25 GMT, Maurizio Cimadamore wrote: >> This PR implements [JEP 472](https://openjdk.org/jeps/472), by restricting the use of JNI in the following ways: >> >> * `System::load` and `System::loadLibrary` are now restricted methods >> * `Runtime::load` and `Runtime::loadLibrary` are now restricted methods >> * binding a JNI `native` method declaration to a native implementation is now considered a restricted operation >> >> This PR slightly changes the way in which the JDK deals with restricted methods, even for FFM API calls. In Java 22, the single `--enable-native-access` was used both to specify a set of modules for which native access should be allowed *and* to specify whether illegal native access (that is, native access occurring from a module not specified by `--enable-native-access`) should be treated as an error or a warning. More specifically, an error is only issued if the `--enable-native-access flag` is used at least once. >> >> Here, a new flag is introduced, namely `illegal-native-access=allow/warn/deny`, which is used to specify what should happen when access to a restricted method and/or functionality is found outside the set of modules specified with `--enable-native-access`. The default policy is `warn`, but users can select `allow` to suppress the warnings, or `deny` to cause `IllegalCallerException` to be thrown. This aligns the treatment of restricted methods with other mechanisms, such as `--illegal-access` and the more recent `--sun-misc-unsafe-memory-access`. >> >> Some changes were required in the package-info javadoc for `java.lang.foreign`, to reflect the changes in the command line flags described above. > > Maurizio Cimadamore has updated the pull request incrementally with one additional commit since the last revision: > > Address review comments Have you looked into / thought about how this will work for jpackaged apps ? I suspect that both the existing FFM usage and this will be options the application packager will need to supply when building the jpackaged app - the end user cannot pass in command line VM options. Seems there should be some testing of this as some kind of native access could be a common case for jpackaged apps. ------------- PR Review: https://git.openjdk.org/jdk/pull/19213#pullrequestreview-2066794950 From prr at openjdk.org Mon May 20 18:50:06 2024 From: prr at openjdk.org (Phil Race) Date: Mon, 20 May 2024 18:50:06 GMT Subject: RFR: 8331671: Implement JEP 472: Prepare to Restrict the Use of JNI [v8] In-Reply-To: References: Message-ID: <-eahuR7pahX1Zuu-n4btdItPet2ZOE8fX1qYl6sn2s4=.c37ac0ad-bf97-4778-8c8b-c137390c7e14@github.com> On Mon, 13 May 2024 10:49:30 GMT, Maurizio Cimadamore wrote: >> Maurizio Cimadamore has updated the pull request incrementally with one additional commit since the last revision: >> >> Address review comments > > make/conf/module-loader-map.conf line 105: > >> 103: java.smartcardio \ >> 104: jdk.accessibility \ >> 105: jdk.attach \ > > The list of allowed modules has been rewritten from scratch, by looking at the set of modules containing at least one `native` method declaration. Should I understand this list to be the set of modules exempt from needing to specific that native access is allowed ? ie they always have native access without any warnings, and further that any attempt to enable warnings, or to disable native access for these modules is ignored ? > src/java.desktop/macosx/classes/com/apple/eio/FileManager.java line 61: > >> 59: } >> 60: >> 61: @SuppressWarnings({"removal", "restricted"}) > > There are several of these changes. One option might have been to just disable restricted warnings when building. But on a deeper look, I realized that in all these places we already disabled deprecation warnings for the use of security manager, so I also added a new suppression instead. Sounds reasonable. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19213#discussion_r1607136237 PR Review Comment: https://git.openjdk.org/jdk/pull/19213#discussion_r1607136808 From alanb at openjdk.org Mon May 20 18:54:05 2024 From: alanb at openjdk.org (Alan Bateman) Date: Mon, 20 May 2024 18:54:05 GMT Subject: RFR: 8331671: Implement JEP 472: Prepare to Restrict the Use of JNI [v8] In-Reply-To: <-eahuR7pahX1Zuu-n4btdItPet2ZOE8fX1qYl6sn2s4=.c37ac0ad-bf97-4778-8c8b-c137390c7e14@github.com> References: <-eahuR7pahX1Zuu-n4btdItPet2ZOE8fX1qYl6sn2s4=.c37ac0ad-bf97-4778-8c8b-c137390c7e14@github.com> Message-ID: On Mon, 20 May 2024 18:39:31 GMT, Phil Race wrote: >> make/conf/module-loader-map.conf line 105: >> >>> 103: java.smartcardio \ >>> 104: jdk.accessibility \ >>> 105: jdk.attach \ >> >> The list of allowed modules has been rewritten from scratch, by looking at the set of modules containing at least one `native` method declaration. > > Should I understand this list to be the set of modules exempt from needing to specific that native access is allowed ? > ie they always have native access without any warnings, and further that any attempt to enable warnings, or to disable native access for these modules is ignored ? Yes, this was added via JDK-8327218. The changes in this PR are just trimming down the list to only the modules that have native code. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19213#discussion_r1607147983 From kbarrett at openjdk.org Mon May 20 20:58:02 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Mon, 20 May 2024 20:58:02 GMT Subject: RFR: 8332448: Make SpaceMangler inherit AllStatic In-Reply-To: References: Message-ID: On Fri, 17 May 2024 09:29:07 GMT, Albert Mingkun Yang wrote: > Extract the state for `top` out of `SpaceMangler`. Users (Serial and Parallel GC) already tracks the top before/after GC. The "real" change in this PR are only two places: `serialFullGC.cpp` and `PSParallelCompact::post_compact`. > > Test: tier1-5 Changes requested by kbarrett (Reviewer). src/hotspot/share/gc/serial/serialFullGC.cpp line 370: > 368: HeapWord* new_top = get_compaction_top(i); > 369: // Reset top and unused memory > 370: space->set_top(new_top); I'd rather the new_top variable declaration was after the comment rather than before. src/hotspot/share/gc/shared/spaceDecorator.cpp line 32: > 30: > 31: // Simply mangle the MemRegion mr. > 32: void SpaceMangler::mangle_region(MemRegion mr) { SpaceMangler::mangle_region should be debug-only rather than not-product, both conceptually, and because it has no effect in non-debug builds. ------------- PR Review: https://git.openjdk.org/jdk/pull/19279#pullrequestreview-2067015569 PR Review Comment: https://git.openjdk.org/jdk/pull/19279#discussion_r1607265118 PR Review Comment: https://git.openjdk.org/jdk/pull/19279#discussion_r1607273088 From ayang at openjdk.org Mon May 20 21:11:28 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Mon, 20 May 2024 21:11:28 GMT Subject: RFR: 8332448: Make SpaceMangler inherit AllStatic [v2] In-Reply-To: References: Message-ID: > Extract the state for `top` out of `SpaceMangler`. Users (Serial and Parallel GC) already tracks the top before/after GC. The "real" change in this PR are only two places: `serialFullGC.cpp` and `PSParallelCompact::post_compact`. > > Test: tier1-5 Albert Mingkun Yang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: - review - Merge branch 'master' into mangle - mangle ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19279/files - new: https://git.openjdk.org/jdk/pull/19279/files/27c35a9b..e1d524cc Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19279&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19279&range=00-01 Stats: 33588 lines in 360 files changed: 29030 ins; 3490 del; 1068 mod Patch: https://git.openjdk.org/jdk/pull/19279.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19279/head:pull/19279 PR: https://git.openjdk.org/jdk/pull/19279 From kbarrett at openjdk.org Mon May 20 21:20:02 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Mon, 20 May 2024 21:20:02 GMT Subject: RFR: 8332448: Make SpaceMangler inherit AllStatic [v2] In-Reply-To: References: Message-ID: <604CdyLgCWeeWPJpwvO8tBUmetnmC2YCXNJVzJkJ7Gk=.f9486273-1404-4cef-a89d-185430e00d75@github.com> On Mon, 20 May 2024 21:11:28 GMT, Albert Mingkun Yang wrote: >> Extract the state for `top` out of `SpaceMangler`. Users (Serial and Parallel GC) already tracks the top before/after GC. The "real" change in this PR are only two places: `serialFullGC.cpp` and `PSParallelCompact::post_compact`. >> >> Test: tier1-5 > > Albert Mingkun Yang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - review > - Merge branch 'master' into mangle > - mangle Looks good. ------------- Marked as reviewed by kbarrett (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19279#pullrequestreview-2067060273 From gziemski at openjdk.org Mon May 20 23:46:10 2024 From: gziemski at openjdk.org (Gerard Ziemski) Date: Mon, 20 May 2024 23:46:10 GMT Subject: RFR: 8312132: Add tracking of multiple address spaces in NMT [v95] In-Reply-To: References: Message-ID: On Mon, 20 May 2024 12:03:28 GMT, Johan Sj?len wrote: >> Hi, >> >> This PR introduces a new abstraction to NMT, named `MemoryFileTracker`. Today, NMT does not track any memory outside of the virtual memory address space. This means that if you allocated memory in something such as a memory-backed file and use `mmap` to map into that memory, then you'll have trouble reporting this to NMT. This is the situation that ZGC is in, and that is what this patch attempts to fix. >> >> ## `MemoryFileTracker` >> >> The `MemoryFileTracker` adds the ability of adding new virtual memory address spaces to NMT and committing memory to these, the basic API is: >> >> ```c++ >> static MemoryFile* make_device(const char* descriptive_name); >> static void free_device(MemoryFile* device); >> >> static void allocate_memory(MemoryFile* device, size_t offset, size_t size, >> MEMFLAGS flag, const NativeCallStack& stack); >> static void free_memory(MemoryFile* device, size_t offset, size_t size); >> >> >> It is easiest to see how this is used by looking at what ZGC's `ZNMT` class does: >> >> ```c++ >> void ZNMT::reserve(zaddress_unsafe start, size_t size) { >> MemTracker::record_virtual_memory_reserve((address)start, size, CALLER_PC, mtJavaHeap); >> } >> void ZNMT::commit(zoffset offset, size_t size) { >> MemTracker::allocate_memory_in(ZNMT::_device, static_cast(offset), size, mtJavaHeap, CALLER_PC); >> } >> void ZNMT::uncommit(zoffset offset, size_t size) { >> MemTracker::free_memory_in(ZNMT::_device, (size_t)offset, size); >> } >> >> void ZNMT::map(zaddress_unsafe addr, size_t size, zoffset offset) { >> // NMT doesn't track mappings at the moment. >> } >> void ZNMT::unmap(zaddress_unsafe addr, size_t size) { >> // NMT doesn't track mappings at the moment. >> } >> >> >> As you can see, any mapping between reserved regions and device-allocated memory is not recorded in NMT. This means that in detailed mode you only get reserved regions printed for the reserved memory, the device-allocated memory is reported separately. When performing summary reporting any memory allocated via these devices is added to the corresponding `MEMFLAGS` as `committed` memory. >> >> This patch is also acting as a base on which we deploy multiple new backend ideas to NMT. These ideas are: >> >> 1. Implement VMA tracking using a balanced binary tree approach. Today's `VirtualMemoryTracker`'s usage of linked lists is slow and brittle, we'd like to move away from it. Our Treap-based approach in this patch gives a performance bo... > > Johan Sj?len has updated the pull request incrementally with four additional commits since the last revision: > > - Remove unused include > - Basic tests for NativeCallStackStorage > - Allow for passing in nr of buckets > - Remove friend-ness Changes requested by gziemski (Committer). test/hotspot/gtest/nmt/test_nmt_treap.cpp line 326: > 324: for (int i = 0; i < ten_thousand; i++) { > 325: int r = os::random(); > 326: if (r >= 0) { I think `os::random()` will only return positive numbers, so this test case will only call `upsert` and will never call `remove`. Instead of: ` if (r >= 0) {` we should do: ` if (r%2 == 0) {` ------------- PR Review: https://git.openjdk.org/jdk/pull/18289#pullrequestreview-2067220840 PR Review Comment: https://git.openjdk.org/jdk/pull/18289#discussion_r1607406043 From sspitsyn at openjdk.org Mon May 20 23:47:01 2024 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Mon, 20 May 2024 23:47:01 GMT Subject: RFR: 8331683: Clean up GetCarrierThread In-Reply-To: References: Message-ID: On Sat, 18 May 2024 00:47:59 GMT, Alex Menkov wrote: > JVMTI GetCarrierThread extension function was introduced by loom for testing. > It's used by several tests in hotspot/jtreg/serviceability. > > Testings: tier1..tier6 Looks good. ------------- Marked as reviewed by sspitsyn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19289#pullrequestreview-2067222107 From duke at openjdk.org Tue May 21 03:04:06 2024 From: duke at openjdk.org (kuaiwei) Date: Tue, 21 May 2024 03:04:06 GMT Subject: RFR: 8325821: [REDO] use "dmb.ishst+dmb.ishld" for release barrier In-Reply-To: <7eML4nr0XN1_QVOO_2tk-yXf8W578S4qb1kA3AoaU8w=.81b03ff5-7ba8-496d-acfe-285ba3de2004@github.com> References: <7eML4nr0XN1_QVOO_2tk-yXf8W578S4qb1kA3AoaU8w=.81b03ff5-7ba8-496d-acfe-285ba3de2004@github.com> Message-ID: On Mon, 20 May 2024 13:59:31 GMT, Andrew Haley wrote: >> he origin patch for https://bugs.openjdk.org/browse/JDK-8324186 has 2 issues: >> 1 It show regression in some platform, like Apple silicon in mac os >> 2 Can not handle instruction sequence like "dmb.ishld; dmb.ishst; dmb.ishld; dmb.ishld" >> >> It can be fixed by: >> 1 Enable AlwaysMergeDMB by default, only disable it in architecture we can see performance improvement (N1 or N2) >> 2 Check the special pattern and merge the subsequent dmb. >> >> It also fix a bug when code buffer is expanding, st/ld/dmb can not be merged. I added unit tests for these. >> >> This patch still has a unhandled case. Insts like "dmb.ishld; dmb.ishst; dmb.ish", it will merge the last 2 instructions and can not merge all three. Because when emitting dmb.ish, if merge all previous dmbs, the code buffer will shrink the size. I think it may break some resumption and think it's not a common pattern. >> >> In previous PR https://github.com/openjdk/jdk/pull/18467 , I tried an implementation to use state machine for merging. But it looks risky to pending instruction during emitting. > > src/hotspot/cpu/aarch64/aarch64.ad line 7841: > >> 7839: ins_encode %{ >> 7840: __ block_comment("membar_release"); >> 7841: __ membar(Assembler::StoreStore); > > Do we need to respect `AlwaysMergeDMB`here? Yes, usually they can be merged in macroAssembler. but it can help to reduce the possibility of unmerged case. Thanks to point it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19278#discussion_r1607532915 From thartmann at openjdk.org Tue May 21 05:26:08 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 21 May 2024 05:26:08 GMT Subject: RFR: 8328181: C2: assert(MaxVectorSize >= 32) failed: vector length should be >= 32 [v2] In-Reply-To: References: Message-ID: <-QcJMgC_7JqcWcmeY2MaB9Mh7Yq7f13q5KhyHGOC4yc=.f49a8233-4046-4b12-92ca-0d402717c513@github.com> On Mon, 8 Apr 2024 02:35:33 GMT, Jatin Bhateja wrote: >> This bug fix patch tightens the predication check for small constant length clear array pattern and relaxes associated feature checks. Modified few comments for clarity. >> >> Kindly review and approve. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Cleanup predicates. This introduced a performance regression, see [JDK-8332487](https://bugs.openjdk.org/browse/JDK-8332487). @jatin-bhateja, could you please have a look? Thanks. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18464#issuecomment-2121764497 From fyang at openjdk.org Tue May 21 06:00:07 2024 From: fyang at openjdk.org (Fei Yang) Date: Tue, 21 May 2024 06:00:07 GMT Subject: RFR: 8332265: RISC-V: Materialize pointers faster by using a temp register [v2] In-Reply-To: References: Message-ID: <2f25EhAHETKwXhFcg6nE_W37QAU7U7opYHa8Wzo2MfU=.05e5cfce-2d3b-4825-a8af-7963d4c266f7@github.com> On Mon, 20 May 2024 13:15:15 GMT, Robbin Ehn wrote: >> Hi, please consider! >> >> Materializing a 48-bit pointer, using an additional register, we can do with: >> lui + lui + slli + add + addi >> This 15% faster both on VF2 and in CPU models, compared to movptr(). >> >> As we often materialize during calls there is free registers. >> >> I have choose just a few spot to use it, many more can use. >> E.g. la() with tmp register can use li48 instead of movptr. >> >> Running tests now (so far so good), as if I screwed up IC calls it should be seen fast. >> And benchmarks when hardware is free. > > Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - li48 -> movptr > - Merge branch 'master' into 8332265 > - li48 Thanks for the update. Taking a more closer look. src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 1426: > 1424: } > 1425: > 1426: static int patch_addr_in_movptr2(address instruction_address, address target) { Can we have a common entry of `patch_addr_in_movptr` which delegates work to `patch_addr_in_movptr1` and `patch_addr_in_movptr2`? src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 1526: > 1524: } > 1525: > 1526: static address get_target_of_movptr2(address insn_addr) { Similar here. Maybe we can have a common entry of `get_target_of_movptr` which delegates work to `get_target_of_movptr1` and `get_target_of_movptr2`? ------------- PR Review: https://git.openjdk.org/jdk/pull/19246#pullrequestreview-2067597362 PR Review Comment: https://git.openjdk.org/jdk/pull/19246#discussion_r1607681377 PR Review Comment: https://git.openjdk.org/jdk/pull/19246#discussion_r1607681874 From cjplummer at openjdk.org Tue May 21 06:12:07 2024 From: cjplummer at openjdk.org (Chris Plummer) Date: Tue, 21 May 2024 06:12:07 GMT Subject: RFR: 8330694: Rename 'HeapRegion' to 'G1HeapRegion' [v12] In-Reply-To: References: <3IdWn9VGEERd8v9RcH2E_LzjVo0L8nMfi5jGWmhgVuM=.6b5b3be4-bfbd-4376-9580-48d78d75665c@github.com> Message-ID: <27qGzorWxtdq6HLmIMPLHZ6_qRbOZo2DvA7pewZfNKA=.3f11daeb-1645-466e-b4bb-56aab62021b2@github.com> On Sat, 18 May 2024 09:07:18 GMT, Lei Zaakjyu wrote: >> follow up 8267941 > > Lei Zaakjyu has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: > > restore Changes requested by cjplummer (Reviewer). test/hotspot/jtreg/runtime/cds/appcds/sharedStrings/SharedStringsHumongous.java line 90: > 88: // before dumping the string table. That means the heap should contain no > 89: // humongous regions. > 90: dumpOutput.shouldNotMatch("gc,region,cds. G1HeapRegion 0x[0-9a-f]* HUM"); Just a minor nit. I noticed a pre-existing typo on line 87 above. It says "kelp" instead of "kept". Can you fix it? ------------- PR Review: https://git.openjdk.org/jdk/pull/18871#pullrequestreview-2067620880 PR Review Comment: https://git.openjdk.org/jdk/pull/18871#discussion_r1607694956 From iwalulya at openjdk.org Tue May 21 07:00:03 2024 From: iwalulya at openjdk.org (Ivan Walulya) Date: Tue, 21 May 2024 07:00:03 GMT Subject: RFR: 8332448: Make SpaceMangler inherit AllStatic [v2] In-Reply-To: References: Message-ID: On Mon, 20 May 2024 21:11:28 GMT, Albert Mingkun Yang wrote: >> Extract the state for `top` out of `SpaceMangler`. Users (Serial and Parallel GC) already tracks the top before/after GC. The "real" change in this PR are only two places: `serialFullGC.cpp` and `PSParallelCompact::post_compact`. >> >> Test: tier1-5 > > Albert Mingkun Yang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - review > - Merge branch 'master' into mangle > - mangle LGTM! ------------- Marked as reviewed by iwalulya (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19279#pullrequestreview-2067696814 From thartmann at openjdk.org Tue May 21 07:14:10 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 21 May 2024 07:14:10 GMT Subject: RFR: 8329538: Accelerate P256 on x86_64 using Montgomery intrinsic [v11] In-Reply-To: References: Message-ID: On Fri, 17 May 2024 21:16:47 GMT, Volodymyr Paprotski wrote: >> Performance. Before: >> >> Benchmark (algorithm) (dataSize) (keyLength) (provider) Mode Cnt Score Error Units >> SignatureBench.ECDSA.sign SHA256withECDSA 1024 256 thrpt 3 6443.934 ? 6.491 ops/s >> SignatureBench.ECDSA.sign SHA256withECDSA 16384 256 thrpt 3 6152.979 ? 4.954 ops/s >> SignatureBench.ECDSA.verify SHA256withECDSA 1024 256 thrpt 3 1895.410 ? 36.979 ops/s >> SignatureBench.ECDSA.verify SHA256withECDSA 16384 256 thrpt 3 1878.955 ? 45.487 ops/s >> Benchmark (algorithm) (keyLength) (kpgAlgorithm) (provider) Mode Cnt Score Error Units >> o.o.b.j.c.full.KeyAgreementBench.EC.generateSecret ECDH 256 EC thrpt 3 1357.810 ? 26.584 ops/s >> o.o.b.j.c.small.KeyAgreementBench.EC.generateSecret ECDH 256 EC thrpt 3 1352.119 ? 23.547 ops/s >> Benchmark (isMontBench) Mode Cnt Score Error Units >> PolynomialP256Bench.benchMultiply false thrpt 3 1746.126 ? 10.970 ops/s >> >> Performance, no intrinsic: >> >> Benchmark (algorithm) (dataSize) (keyLength) (provider) Mode Cnt Score Error Units >> SignatureBench.ECDSA.sign SHA256withECDSA 1024 256 thrpt 3 6529.839 ? 42.420 ops/s >> SignatureBench.ECDSA.sign SHA256withECDSA 16384 256 thrpt 3 6199.747 ? 133.566 ops/s >> SignatureBench.ECDSA.verify SHA256withECDSA 1024 256 thrpt 3 1973.676 ? 54.071 ops/s >> SignatureBench.ECDSA.verify SHA256withECDSA 16384 256 thrpt 3 1932.127 ? 35.920 ops/s >> Benchmark (algorithm) (keyLength) (kpgAlgorithm) (provider) Mode Cnt Score Error Units >> o.o.b.j.c.full.KeyAgreementBench.EC.generateSecret ECDH 256 EC thrpt 3 1355.788 ? 29.858 ops/s >> o.o.b.j.c.small.KeyAgreementBench.EC.generateSecret ECDH 256 EC thrpt 3 1346.523 ? 28.722 ops/s >> Benchmark (isMontBench) Mode Cnt Score Error Units >> PolynomialP256Bench.benchMultiply true thrpt 3 1919.57... > > Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision: > > shenandoah verifier I'll send this through our testing and will report back once it passed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18583#issuecomment-2121914071 From alanb at openjdk.org Tue May 21 07:23:07 2024 From: alanb at openjdk.org (Alan Bateman) Date: Tue, 21 May 2024 07:23:07 GMT Subject: RFR: 8331671: Implement JEP 472: Prepare to Restrict the Use of JNI [v8] In-Reply-To: References: Message-ID: On Mon, 20 May 2024 18:47:35 GMT, Phil Race wrote: > Have you looked into / thought about how this will work for jpackaged apps ? I suspect that both the existing FFM usage and this will be options the application packager will need to supply when building the jpackaged app - the end user cannot pass in command line VM options. Seems there should be some testing of this as some kind of native access could be a common case for jpackaged apps. I don't see any tests in test/jdk/tools/jpackage that creates an application that uses JNI code. Seems like a good idea to add this via another PR and it specify --java-options so that the application launcher enables native access. It could test jpackage using jlink too. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19213#issuecomment-2121927727 From thartmann at openjdk.org Tue May 21 07:24:03 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 21 May 2024 07:24:03 GMT Subject: RFR: 8329538: Accelerate P256 on x86_64 using Montgomery intrinsic [v11] In-Reply-To: References: Message-ID: On Fri, 17 May 2024 21:16:47 GMT, Volodymyr Paprotski wrote: >> Performance. Before: >> >> Benchmark (algorithm) (dataSize) (keyLength) (provider) Mode Cnt Score Error Units >> SignatureBench.ECDSA.sign SHA256withECDSA 1024 256 thrpt 3 6443.934 ? 6.491 ops/s >> SignatureBench.ECDSA.sign SHA256withECDSA 16384 256 thrpt 3 6152.979 ? 4.954 ops/s >> SignatureBench.ECDSA.verify SHA256withECDSA 1024 256 thrpt 3 1895.410 ? 36.979 ops/s >> SignatureBench.ECDSA.verify SHA256withECDSA 16384 256 thrpt 3 1878.955 ? 45.487 ops/s >> Benchmark (algorithm) (keyLength) (kpgAlgorithm) (provider) Mode Cnt Score Error Units >> o.o.b.j.c.full.KeyAgreementBench.EC.generateSecret ECDH 256 EC thrpt 3 1357.810 ? 26.584 ops/s >> o.o.b.j.c.small.KeyAgreementBench.EC.generateSecret ECDH 256 EC thrpt 3 1352.119 ? 23.547 ops/s >> Benchmark (isMontBench) Mode Cnt Score Error Units >> PolynomialP256Bench.benchMultiply false thrpt 3 1746.126 ? 10.970 ops/s >> >> Performance, no intrinsic: >> >> Benchmark (algorithm) (dataSize) (keyLength) (provider) Mode Cnt Score Error Units >> SignatureBench.ECDSA.sign SHA256withECDSA 1024 256 thrpt 3 6529.839 ? 42.420 ops/s >> SignatureBench.ECDSA.sign SHA256withECDSA 16384 256 thrpt 3 6199.747 ? 133.566 ops/s >> SignatureBench.ECDSA.verify SHA256withECDSA 1024 256 thrpt 3 1973.676 ? 54.071 ops/s >> SignatureBench.ECDSA.verify SHA256withECDSA 16384 256 thrpt 3 1932.127 ? 35.920 ops/s >> Benchmark (algorithm) (keyLength) (kpgAlgorithm) (provider) Mode Cnt Score Error Units >> o.o.b.j.c.full.KeyAgreementBench.EC.generateSecret ECDH 256 EC thrpt 3 1355.788 ? 29.858 ops/s >> o.o.b.j.c.small.KeyAgreementBench.EC.generateSecret ECDH 256 EC thrpt 3 1346.523 ? 28.722 ops/s >> Benchmark (isMontBench) Mode Cnt Score Error Units >> PolynomialP256Bench.benchMultiply true thrpt 3 1919.57... > > Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision: > > shenandoah verifier I'm getting some conflicts when trying to apply this to master. Could you please merge the PR? ------------- PR Comment: https://git.openjdk.org/jdk/pull/18583#issuecomment-2121929550 From ayang at openjdk.org Tue May 21 07:46:09 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Tue, 21 May 2024 07:46:09 GMT Subject: Integrated: 8332448: Make SpaceMangler inherit AllStatic In-Reply-To: References: Message-ID: On Fri, 17 May 2024 09:29:07 GMT, Albert Mingkun Yang wrote: > Extract the state for `top` out of `SpaceMangler`. Users (Serial and Parallel GC) already tracks the top before/after GC. The "real" change in this PR are only two places: `serialFullGC.cpp` and `PSParallelCompact::post_compact`. > > Test: tier1-5 This pull request has now been integrated. Changeset: 5f2b8d02 Author: Albert Mingkun Yang URL: https://git.openjdk.org/jdk/commit/5f2b8d0224868d09ff54e93fabe4a6db177aef8f Stats: 523 lines in 26 files changed: 9 ins; 488 del; 26 mod 8332448: Make SpaceMangler inherit AllStatic Reviewed-by: kbarrett, iwalulya ------------- PR: https://git.openjdk.org/jdk/pull/19279 From ayang at openjdk.org Tue May 21 07:46:08 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Tue, 21 May 2024 07:46:08 GMT Subject: RFR: 8332448: Make SpaceMangler inherit AllStatic [v2] In-Reply-To: References: Message-ID: On Mon, 20 May 2024 21:11:28 GMT, Albert Mingkun Yang wrote: >> Extract the state for `top` out of `SpaceMangler`. Users (Serial and Parallel GC) already tracks the top before/after GC. The "real" change in this PR are only two places: `serialFullGC.cpp` and `PSParallelCompact::post_compact`. >> >> Test: tier1-5 > > Albert Mingkun Yang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - review > - Merge branch 'master' into mangle > - mangle Thanks for review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19279#issuecomment-2121969102 From clanger at openjdk.org Tue May 21 08:01:06 2024 From: clanger at openjdk.org (Christoph Langer) Date: Tue, 21 May 2024 08:01:06 GMT Subject: RFR: 8332473: ubsan: growableArray.hpp:290:10: runtime error: null pointer passed as argument 1, which is declared to never be null In-Reply-To: <-LubBa-IRTqX4WOO-P9_9ulsmTV2KUgUAwZjbiRKcZg=.f3958562-a66d-4b09-9136-002f0736c472@github.com> References: <-LubBa-IRTqX4WOO-P9_9ulsmTV2KUgUAwZjbiRKcZg=.f3958562-a66d-4b09-9136-002f0736c472@github.com> Message-ID: On Fri, 17 May 2024 12:59:07 GMT, Matthias Baesken wrote: > On Linux x86_64 fastdebug with ubsan enabled we run into this error because we call qsort with a first parameter that is null. > > /jdk/src/hotspot/share/utilities/growableArray.hpp:290:10: runtime error: null pointer passed as argument 1, which is declared to never be null > #0 0x150d701bb4b1 in GrowableArrayView::sort(int (*)(nmethod**, nmethod**)) /jdk/src/hotspot/share/utilities/growableArray.hpp:290 > #1 0x150d701bb4b1 in ClassUnloadingContext::free_nmethods() /jdk/src/hotspot/share/gc/shared/classUnloadingContext.cpp:159 > #2 0x150d71f5cca3 in G1CollectedHeap::unload_classes_and_code(char const*, BoolObjectClosure*, GCTimer*) /jdk/src/hotspot/share/gc/g1/g1CollectedHeap.cpp:2538 > #3 0x150d71ffb009 in G1FullCollector::phase1_mark_live_objects() /jdk/src/hotspot/share/gc/g1/g1FullCollector.cpp:330 > #4 0x150d71ffc675 in G1FullCollector::collect() /jdk/src/hotspot/share/gc/g1/g1FullCollector.cpp:209 > #5 0x150d71f3e593 in G1CollectedHeap::do_full_collection(bool, bool) /jdk/src/hotspot/share/gc/g1/g1CollectedHeap.cpp:842 > #6 0x150d71f5b12d in G1CollectedHeap::satisfy_failed_allocation_helper(unsigned long, bool, bool, bool, bool*) /jdk/src/hotspot/share/gc/g1/g1CollectedHeap.cpp:917 > #7 0x150d71f5b3dc in G1CollectedHeap::satisfy_failed_allocation(unsigned long, bool*) /jdk/src/hotspot/share/gc/g1/g1CollectedHeap.cpp:930 > #8 0x150d721835f7 in VM_G1CollectForAllocation::doit() /jdk/src/hotspot/share/gc/g1/g1VMOperations.cpp:127 > #9 0x150d74291ec8 in VM_Operation::evaluate() /jdk/src/hotspot/share/runtime/vmOperations.cpp:75 > #10 0x150d742ca1be in VMThread::evaluate_operation(VM_Operation*) /jdk/src/hotspot/share/runtime/vmThread.cpp:283 > #11 0x150d742cb9e7 in VMThread::inner_execute(VM_Operation*) /jdk/src/hotspot/share/runtime/vmThread.cpp:427 > #12 0x150d742cc601 in VMThread::loop() /jdk/src/hotspot/share/runtime/vmThread.cpp:493 > #13 0x150d742cc601 in VMThread::loop() /jdk/src/hotspot/share/runtime/vmThread.cpp:478 > > seems we sometimes call qsort with nullptr as first parameter, this is not recommended. > When adding a guarantee the same can be seen (_data is null). > So better add a check and do not sort, if there is nothing provided to be sorted . Marked as reviewed by clanger (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/19283#pullrequestreview-2067826172 From shade at openjdk.org Tue May 21 08:01:07 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 21 May 2024 08:01:07 GMT Subject: RFR: 8331572: Allow using OopMapCache outside of STW GC phases [v4] In-Reply-To: <_mFVw8VmpUzTscas3PU4wFHW63mgIrEPlbGPo3iTMrM=.81b20124-b3dd-4264-9d23-e4fbfc79fc78@github.com> References: <_mFVw8VmpUzTscas3PU4wFHW63mgIrEPlbGPo3iTMrM=.81b20124-b3dd-4264-9d23-e4fbfc79fc78@github.com> Message-ID: <5oqqqssGKeq67XEpuNL1T7g0U-X75igjYYXVWQb0Vq8=.73a6e149-bc83-46c9-8c0b-7c16059af533@github.com> On Fri, 17 May 2024 15:58:32 GMT, Aleksey Shipilev wrote: >> As the reproducer in the issue shows, we would also like to use the `OopMapCache` during the concurrent GC phases. Zhengyu mentions there is also a production problem for stack walking that would benefit from letting `OopMapCache` be used without looking at GC at all. >> >> This PR unblocks `OopMapCache` uses for everything. Cleanups are nominally done by service thread. But, still appreciating that majority of use cases would be from GCs, we leave the proactive cleanups from the GC ops here as well. It requires the synchronization between readers that might be copying out the entries out of the hashmap and the concurrent reclamation. Handily, `GlobalCounter` can be used for that purpose. >> >> After this lands, I think we can go over `OopMapCache::compute_one_oop_map` uses and see if they would instead like to use the cached `lookup` to benefit from this cache too. I think those paths are for OSR and deopts, so their performance is unlikely to be critical. This PR already covers the concurrent GC paths well. >> >> Additional testing: >> - [x] Performance test reproducer from the bug improves significantly >> - [x] Linux AArch64 server fastdebug, `hotspot_gc_shenandoah` (10x) >> - [x] Linux AArch64 server fastdebug, `all` >> - [x] Linux x86_64 server fastdebug, `all` > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Add more GC triggers around Thanks for reviews! @stefank, I assume you are fine with the way we (lightly) touched ZGC code? ------------- PR Comment: https://git.openjdk.org/jdk/pull/19229#issuecomment-2121997820 From luhenry at openjdk.org Tue May 21 08:02:08 2024 From: luhenry at openjdk.org (Ludovic Henry) Date: Tue, 21 May 2024 08:02:08 GMT Subject: RFR: 8332265: RISC-V: Materialize pointers faster by using a temp register [v2] In-Reply-To: <2f25EhAHETKwXhFcg6nE_W37QAU7U7opYHa8Wzo2MfU=.05e5cfce-2d3b-4825-a8af-7963d4c266f7@github.com> References: <2f25EhAHETKwXhFcg6nE_W37QAU7U7opYHa8Wzo2MfU=.05e5cfce-2d3b-4825-a8af-7963d4c266f7@github.com> Message-ID: On Tue, 21 May 2024 05:52:48 GMT, Fei Yang wrote: >> Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: >> >> - li48 -> movptr >> - Merge branch 'master' into 8332265 >> - li48 > > src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 1426: > >> 1424: } >> 1425: >> 1426: static int patch_addr_in_movptr2(address instruction_address, address target) { > > Can we have a common entry of `patch_addr_in_movptr` which delegates work to `patch_addr_in_movptr1` and `patch_addr_in_movptr2`? I think it makes sense to split them up as the difference between movptr1 and movptr2 is already done in the caller then. And IIUC we don't plan to merge movptr1 and movptr2 together at any point, so we don't particularly need to abstract them away. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19246#discussion_r1607822360 From luhenry at openjdk.org Tue May 21 08:02:09 2024 From: luhenry at openjdk.org (Ludovic Henry) Date: Tue, 21 May 2024 08:02:09 GMT Subject: RFR: 8332265: RISC-V: Materialize pointers faster by using a temp register [v2] In-Reply-To: References: Message-ID: On Mon, 20 May 2024 13:15:15 GMT, Robbin Ehn wrote: >> Hi, please consider! >> >> Materializing a 48-bit pointer, using an additional register, we can do with: >> lui + lui + slli + add + addi >> This 15% faster both on VF2 and in CPU models, compared to movptr(). >> >> As we often materialize during calls there is free registers. >> >> I have choose just a few spot to use it, many more can use. >> E.g. la() with tmp register can use li48 instead of movptr. >> >> Running tests now (so far so good), as if I screwed up IC calls it should be seen fast. >> And benchmarks when hardware is free. > > Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - li48 -> movptr > - Merge branch 'master' into 8332265 > - li48 src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 1603: > 1601: } else if (NativeInstruction::is_li32_at(insn_addr)) { // li32 > 1602: return get_target_of_li32(insn_addr); > 1603: } else if (NativeInstruction::is_movptr2_at(insn_addr)) { // movptr2 You could move that `else if` block right under the `NativeInstruction::is_movptr1_at` one. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19246#discussion_r1607823211 From mbaesken at openjdk.org Tue May 21 08:14:08 2024 From: mbaesken at openjdk.org (Matthias Baesken) Date: Tue, 21 May 2024 08:14:08 GMT Subject: Integrated: 8332473: ubsan: growableArray.hpp:290:10: runtime error: null pointer passed as argument 1, which is declared to never be null In-Reply-To: <-LubBa-IRTqX4WOO-P9_9ulsmTV2KUgUAwZjbiRKcZg=.f3958562-a66d-4b09-9136-002f0736c472@github.com> References: <-LubBa-IRTqX4WOO-P9_9ulsmTV2KUgUAwZjbiRKcZg=.f3958562-a66d-4b09-9136-002f0736c472@github.com> Message-ID: On Fri, 17 May 2024 12:59:07 GMT, Matthias Baesken wrote: > On Linux x86_64 fastdebug with ubsan enabled we run into this error because we call qsort with a first parameter that is null. > > /jdk/src/hotspot/share/utilities/growableArray.hpp:290:10: runtime error: null pointer passed as argument 1, which is declared to never be null > #0 0x150d701bb4b1 in GrowableArrayView::sort(int (*)(nmethod**, nmethod**)) /jdk/src/hotspot/share/utilities/growableArray.hpp:290 > #1 0x150d701bb4b1 in ClassUnloadingContext::free_nmethods() /jdk/src/hotspot/share/gc/shared/classUnloadingContext.cpp:159 > #2 0x150d71f5cca3 in G1CollectedHeap::unload_classes_and_code(char const*, BoolObjectClosure*, GCTimer*) /jdk/src/hotspot/share/gc/g1/g1CollectedHeap.cpp:2538 > #3 0x150d71ffb009 in G1FullCollector::phase1_mark_live_objects() /jdk/src/hotspot/share/gc/g1/g1FullCollector.cpp:330 > #4 0x150d71ffc675 in G1FullCollector::collect() /jdk/src/hotspot/share/gc/g1/g1FullCollector.cpp:209 > #5 0x150d71f3e593 in G1CollectedHeap::do_full_collection(bool, bool) /jdk/src/hotspot/share/gc/g1/g1CollectedHeap.cpp:842 > #6 0x150d71f5b12d in G1CollectedHeap::satisfy_failed_allocation_helper(unsigned long, bool, bool, bool, bool*) /jdk/src/hotspot/share/gc/g1/g1CollectedHeap.cpp:917 > #7 0x150d71f5b3dc in G1CollectedHeap::satisfy_failed_allocation(unsigned long, bool*) /jdk/src/hotspot/share/gc/g1/g1CollectedHeap.cpp:930 > #8 0x150d721835f7 in VM_G1CollectForAllocation::doit() /jdk/src/hotspot/share/gc/g1/g1VMOperations.cpp:127 > #9 0x150d74291ec8 in VM_Operation::evaluate() /jdk/src/hotspot/share/runtime/vmOperations.cpp:75 > #10 0x150d742ca1be in VMThread::evaluate_operation(VM_Operation*) /jdk/src/hotspot/share/runtime/vmThread.cpp:283 > #11 0x150d742cb9e7 in VMThread::inner_execute(VM_Operation*) /jdk/src/hotspot/share/runtime/vmThread.cpp:427 > #12 0x150d742cc601 in VMThread::loop() /jdk/src/hotspot/share/runtime/vmThread.cpp:493 > #13 0x150d742cc601 in VMThread::loop() /jdk/src/hotspot/share/runtime/vmThread.cpp:478 > > seems we sometimes call qsort with nullptr as first parameter, this is not recommended. > When adding a guarantee the same can be seen (_data is null). > So better add a check and do not sort, if there is nothing provided to be sorted . This pull request has now been integrated. Changeset: e529101e Author: Matthias Baesken URL: https://git.openjdk.org/jdk/commit/e529101ea30b49a6601088ce5ab81df590fc52f0 Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod 8332473: ubsan: growableArray.hpp:290:10: runtime error: null pointer passed as argument 1, which is declared to never be null Reviewed-by: jsjolen, clanger ------------- PR: https://git.openjdk.org/jdk/pull/19283 From stefank at openjdk.org Tue May 21 08:37:05 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Tue, 21 May 2024 08:37:05 GMT Subject: RFR: 8331572: Allow using OopMapCache outside of STW GC phases [v4] In-Reply-To: <_mFVw8VmpUzTscas3PU4wFHW63mgIrEPlbGPo3iTMrM=.81b20124-b3dd-4264-9d23-e4fbfc79fc78@github.com> References: <_mFVw8VmpUzTscas3PU4wFHW63mgIrEPlbGPo3iTMrM=.81b20124-b3dd-4264-9d23-e4fbfc79fc78@github.com> Message-ID: On Fri, 17 May 2024 15:58:32 GMT, Aleksey Shipilev wrote: >> As the reproducer in the issue shows, we would also like to use the `OopMapCache` during the concurrent GC phases. Zhengyu mentions there is also a production problem for stack walking that would benefit from letting `OopMapCache` be used without looking at GC at all. >> >> This PR unblocks `OopMapCache` uses for everything. Cleanups are nominally done by service thread. But, still appreciating that majority of use cases would be from GCs, we leave the proactive cleanups from the GC ops here as well. It requires the synchronization between readers that might be copying out the entries out of the hashmap and the concurrent reclamation. Handily, `GlobalCounter` can be used for that purpose. >> >> After this lands, I think we can go over `OopMapCache::compute_one_oop_map` uses and see if they would instead like to use the cached `lookup` to benefit from this cache too. I think those paths are for OSR and deopts, so their performance is unlikely to be critical. This PR already covers the concurrent GC paths well. >> >> Additional testing: >> - [x] Performance test reproducer from the bug improves significantly >> - [x] Linux AArch64 server fastdebug, `hotspot_gc_shenandoah` (10x) >> - [x] Linux AArch64 server fastdebug, `all` >> - [x] Linux x86_64 server fastdebug, `all` > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Add more GC triggers around Yes, ZGC code looks fine. Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/19229#issuecomment-2122074293 From mcimadamore at openjdk.org Tue May 21 08:47:05 2024 From: mcimadamore at openjdk.org (Maurizio Cimadamore) Date: Tue, 21 May 2024 08:47:05 GMT Subject: RFR: 8331671: Implement JEP 472: Prepare to Restrict the Use of JNI [v8] In-Reply-To: References: Message-ID: On Tue, 21 May 2024 07:20:05 GMT, Alan Bateman wrote: > > Have you looked into / thought about how this will work for jpackaged apps ? I suspect that both the existing FFM usage and this will be options the application packager will need to supply when building the jpackaged app - the end user cannot pass in command line VM options. Seems there should be some testing of this as some kind of native access could be a common case for jpackaged apps. > > I don't see any tests in test/jdk/tools/jpackage that creates an application that uses JNI code. Seems like a good idea to add this via another PR and it specify --java-options so that the application launcher enables native access. It could test jpackage using jlink too. These are all good suggestions. I have not looked into jpackage, but yes, I would expect that the jpackage user would need to provide extra options when packaging the application. The same is true for creating JDK image jlink (which we use in the jextract build) - although, in that case the end user also has the possibility to pass options on the command line. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19213#issuecomment-2122095444 From rehn at openjdk.org Tue May 21 08:56:19 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Tue, 21 May 2024 08:56:19 GMT Subject: RFR: 8332265: RISC-V: Materialize pointers faster by using a temp register [v3] In-Reply-To: References: Message-ID: > Hi, please consider! > > Materializing a 48-bit pointer, using an additional register, we can do with: > lui + lui + slli + add + addi > This 15% faster both on VF2 and in CPU models, compared to movptr(). > > As we often materialize during calls there is free registers. > > I have choose just a few spot to use it, many more can use. > E.g. la() with tmp register can use li48 instead of movptr. > > Running tests now (so far so good), as if I screwed up IC calls it should be seen fast. > And benchmarks when hardware is free. Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: - Merge branch 'master' into 8332265 - Small review update - li48 -> movptr - Merge branch 'master' into 8332265 - li48 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19246/files - new: https://git.openjdk.org/jdk/pull/19246/files/edfdda28..c406294a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19246&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19246&range=01-02 Stats: 1130 lines in 62 files changed: 557 ins; 502 del; 71 mod Patch: https://git.openjdk.org/jdk/pull/19246.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19246/head:pull/19246 PR: https://git.openjdk.org/jdk/pull/19246 From luhenry at openjdk.org Tue May 21 08:56:19 2024 From: luhenry at openjdk.org (Ludovic Henry) Date: Tue, 21 May 2024 08:56:19 GMT Subject: RFR: 8332265: RISC-V: Materialize pointers faster by using a temp register [v3] In-Reply-To: References: Message-ID: <11cTZDsDanZQl1JRMmWTzj4hU53WuXEfYUiOL-Qowcs=.06e33378-bf1c-4728-9ea4-c4771cec5bf8@github.com> On Tue, 21 May 2024 08:53:28 GMT, Robbin Ehn wrote: >> Hi, please consider! >> >> Materializing a 48-bit pointer, using an additional register, we can do with: >> lui + lui + slli + add + addi >> This 15% faster both on VF2 and in CPU models, compared to movptr(). >> >> As we often materialize during calls there is free registers. >> >> I have choose just a few spot to use it, many more can use. >> E.g. la() with tmp register can use li48 instead of movptr. >> >> Running tests now (so far so good), as if I screwed up IC calls it should be seen fast. >> And benchmarks when hardware is free. > > Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - Merge branch 'master' into 8332265 > - Small review update > - li48 -> movptr > - Merge branch 'master' into 8332265 > - li48 Marked as reviewed by luhenry (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/19246#pullrequestreview-2067976983 From rehn at openjdk.org Tue May 21 08:56:21 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Tue, 21 May 2024 08:56:21 GMT Subject: RFR: 8332265: RISC-V: Materialize pointers faster by using a temp register [v2] In-Reply-To: References: Message-ID: On Tue, 21 May 2024 07:59:12 GMT, Ludovic Henry wrote: >> Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: >> >> - li48 -> movptr >> - Merge branch 'master' into 8332265 >> - li48 > > src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 1603: > >> 1601: } else if (NativeInstruction::is_li32_at(insn_addr)) { // li32 >> 1602: return get_target_of_li32(insn_addr); >> 1603: } else if (NativeInstruction::is_movptr2_at(insn_addr)) { // movptr2 > > You could move that `else if` block right under the `NativeInstruction::is_movptr1_at` one. Fixed > src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 1669: > >> 1667: } >> 1668: >> 1669: void MacroAssembler::movptr_1(Register Rd, uint64_t imm64, int32_t &offset) { > > `movptr1` instead, to make it easily searchable with `is_movptr1_at` Fixed > src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 1688: > >> 1686: } >> 1687: >> 1688: void MacroAssembler::movptr_2(Register Rd, uint64_t addr, int32_t &offset, Register tmp) { > > Also `movptr2` Fixed ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19246#discussion_r1607911213 PR Review Comment: https://git.openjdk.org/jdk/pull/19246#discussion_r1607910772 PR Review Comment: https://git.openjdk.org/jdk/pull/19246#discussion_r1607910943 From jsjolen at openjdk.org Tue May 21 09:37:41 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Tue, 21 May 2024 09:37:41 GMT Subject: RFR: 8312132: Add tracking of multiple address spaces in NMT [v96] In-Reply-To: References: Message-ID: > Hi, > > This PR introduces a new abstraction to NMT, named `MemoryFileTracker`. Today, NMT does not track any memory outside of the virtual memory address space. This means that if you allocated memory in something such as a memory-backed file and use `mmap` to map into that memory, then you'll have trouble reporting this to NMT. This is the situation that ZGC is in, and that is what this patch attempts to fix. > > ## `MemoryFileTracker` > > The `MemoryFileTracker` adds the ability of adding new virtual memory address spaces to NMT and committing memory to these, the basic API is: > > ```c++ > static MemoryFile* make_device(const char* descriptive_name); > static void free_device(MemoryFile* device); > > static void allocate_memory(MemoryFile* device, size_t offset, size_t size, > MEMFLAGS flag, const NativeCallStack& stack); > static void free_memory(MemoryFile* device, size_t offset, size_t size); > > > It is easiest to see how this is used by looking at what ZGC's `ZNMT` class does: > > ```c++ > void ZNMT::reserve(zaddress_unsafe start, size_t size) { > MemTracker::record_virtual_memory_reserve((address)start, size, CALLER_PC, mtJavaHeap); > } > void ZNMT::commit(zoffset offset, size_t size) { > MemTracker::allocate_memory_in(ZNMT::_device, static_cast(offset), size, mtJavaHeap, CALLER_PC); > } > void ZNMT::uncommit(zoffset offset, size_t size) { > MemTracker::free_memory_in(ZNMT::_device, (size_t)offset, size); > } > > void ZNMT::map(zaddress_unsafe addr, size_t size, zoffset offset) { > // NMT doesn't track mappings at the moment. > } > void ZNMT::unmap(zaddress_unsafe addr, size_t size) { > // NMT doesn't track mappings at the moment. > } > > > As you can see, any mapping between reserved regions and device-allocated memory is not recorded in NMT. This means that in detailed mode you only get reserved regions printed for the reserved memory, the device-allocated memory is reported separately. When performing summary reporting any memory allocated via these devices is added to the corresponding `MEMFLAGS` as `committed` memory. > > This patch is also acting as a base on which we deploy multiple new backend ideas to NMT. These ideas are: > > 1. Implement VMA tracking using a balanced binary tree approach. Today's `VirtualMemoryTracker`'s usage of linked lists is slow and brittle, we'd like to move away from it. Our Treap-based approach in this patch gives a performance boost such that we see 25x better performance in a benchmark. The idea and draft of this... Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: Switch to even/odd distinction ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18289/files - new: https://git.openjdk.org/jdk/pull/18289/files/4a68e141..0dababc3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18289&range=95 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18289&range=94-95 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/18289.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18289/head:pull/18289 PR: https://git.openjdk.org/jdk/pull/18289 From jsjolen at openjdk.org Tue May 21 09:37:41 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Tue, 21 May 2024 09:37:41 GMT Subject: RFR: 8312132: Add tracking of multiple address spaces in NMT [v95] In-Reply-To: References: Message-ID: On Mon, 20 May 2024 23:42:58 GMT, Gerard Ziemski wrote: >> Johan Sj?len has updated the pull request incrementally with four additional commits since the last revision: >> >> - Remove unused include >> - Basic tests for NativeCallStackStorage >> - Allow for passing in nr of buckets >> - Remove friend-ness > > test/hotspot/gtest/nmt/test_nmt_treap.cpp line 326: > >> 324: for (int i = 0; i < ten_thousand; i++) { >> 325: int r = os::random(); >> 326: if (r >= 0) { > > I think `os::random()` will only return positive numbers, so this test case will only call `upsert` and will never call `remove`. > > Instead of: > > ` if (r >= 0) {` > > we should do: > > ` if (r%2 == 0) {` Thanks! I'll fix that. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18289#discussion_r1607978359 From aboldtch at openjdk.org Tue May 21 12:14:04 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Tue, 21 May 2024 12:14:04 GMT Subject: RFR: 8319947: Recursive lightweight locking: s390x implementation In-Reply-To: <_HXRejW4TcLfTYbXlRQUVyejaomiwx6HhObYfG1lX6E=.f89b29a0-4c2d-43e1-83be-d7bafd84816c@github.com> References: <_HXRejW4TcLfTYbXlRQUVyejaomiwx6HhObYfG1lX6E=.f89b29a0-4c2d-43e1-83be-d7bafd84816c@github.com> Message-ID: On Sun, 21 Apr 2024 16:30:43 GMT, Amit Kumar wrote: > s390x port for recursive locking. > > testing: > - [x] build fastdebug-vm > - [x] build slowdebug-vm > - [x] build release-vm > - [x] build optimized-vm > - [x] ./test/jdk/java/util/concurrent (fastdebug-vm) > - [x] with C1 > - [x] with C2 > - [x] with interpreter > - [x] ./test/jdk/java/util/concurrent (release-vm) > - [x] with C1 > - [x] with C2 > - [x] with interpreter > - [x] ./test/jdk/java/util/concurrent (slowdebug-vm) > - [x] with C1 > - [x] with C2 > - [x] with interpreter > - [x] tier1 with fastdebug-vm > - [x] tier1 with slowdebug-vm > - [x] tier1 with release-vm > > *BenchMarks*: > > Results from Performance LPARs : > > > Locking Mode = 1 (without Patch) > > Benchmark (innerCount) Mode Cnt Score Error Units > LockUnlock.testContendedLock 100 avgt 12 5.144 ? 0.035 ns/op > LockUnlock.testRecursiveLockUnlock 100 avgt 12 3824.742 ? 89.475 ns/op > LockUnlock.testRecursiveSynchronization 100 avgt 12 25.348 ? 0.559 ns/op > LockUnlock.testSerialLockUnlock 100 avgt 12 466.629 ? 3.036 ns/op > LockUnlock.testSimpleLockUnlock 100 avgt 12 468.532 ? 1.793 ns/op > Finished running test 'micro:vm.lang.LockUnlock' > > Locking Mode = 1 (with patch) > > Benchmark (innerCount) Mode Cnt Score Error Units > LockUnlock.testContendedLock 100 avgt 12 5.146 ? 0.027 ns/op > LockUnlock.testRecursiveLockUnlock 100 avgt 12 3833.175 ? 75.863 ns/op > LockUnlock.testRecursiveSynchronization 100 avgt 12 25.206 ? 0.519 ns/op > LockUnlock.testSerialLockUnlock 100 avgt 12 473.973 ? 2.103 ns/op > LockUnlock.testSimpleLockUnlock 100 avgt 12 470.749 ? 2.229 ns/op > Finished running test 'micro:vm.lang.LockUnlock' > > > > > Locking Mode = 2 (without Patch) > > Benchmark (innerCount) Mode Cnt Score Error Units > LockUnlock.testContendedLock 100 avgt 12 4.688 ? 0.051 ns/op > LockUnlock.testRecursiveLockUnlock 100 avgt 12 12800.544 ? 92.265 ns/op > LockUnlock.testRecursiveSynchronization 100 avgt 12 26.486 ? 2.229 ns/op > LockUnlock.testSerialLockUnlock 100 avgt 12 424.499 ? 0.416 ns/op > LockUnlock.testSimpleLockUnlock 100 avgt 12 424.241 ? 0.840 ns/op > Finished running test 'micro:vm.lang.Lo... This review only looks at the soundness of the algorithm and not that the actual implementation is correct with regards to the specific instructions, register widths etc. Looks good. Had a handful comments / questions. src/hotspot/cpu/s390/interp_masm_s390.cpp line 1013: > 1011: assert((JVM_ACC_IS_VALUE_BASED_CLASS & 0xFFFF) == 0, "or change following instruction"); > 1012: z_nilh(tmp, JVM_ACC_IS_VALUE_BASED_CLASS >> 16); > 1013: z_brne(slow_case); Is this change unrelated to recursive lightweight? If so should it be a separate RFE? src/hotspot/cpu/s390/macroAssembler_s390.cpp line 5757: > 5755: > 5756: z_csg(mark, top, oopDesc::mark_offset_in_bytes(), obj); > 5757: branch_optimized(Assembler::bcondNotEqual, slow); The previous comment and spacing makes this a little harder for me to read and understand. ```C++ // Try to lock. Transition lock-bits 0b01 => 0b00 z_oill(mark, markWord::unlocked_value); z_lgr(top, mark); z_xilf(top, markWord::unlocked_value); z_csg(mark, top, oopDesc::mark_offset_in_bytes(), obj); branch_optimized(Assembler::bcondNotEqual, slow); If comments to clarify the `mark |= 1; top = mark; top ^= 1;` logic requires a comment, it can be added but without the newlines. src/hotspot/cpu/s390/macroAssembler_s390.cpp line 5767: > 5765: > 5766: // as locking was successful, set CC to EQ > 5767: z_cr(top, top); // z_ahi instruction above can change the cc, so we need this Given that `lightweight_lock` is now only used from the interpreter and c1 the CC flag should not matter? src/hotspot/cpu/s390/macroAssembler_s390.cpp line 5856: > 5854: branch_optimized(bcondAlways, slow); > 5855: > 5856: bind(unlocked); // CC is already set to EQ, if we jumped here Same as with `lightweight_lock` Given that `lightweight_unlock` is now only used from the interpreter and c1 the CC flag should not matter? src/hotspot/cpu/s390/macroAssembler_s390.cpp line 5906: > 5904: // not inflated > 5905: > 5906: // Try to lock. Transition lock bits 0b00 => 0b01 Flipped the bits in the comment. Suggestion: // Try to lock. Transition lock bits 0b01 => 0b00 src/hotspot/cpu/s390/macroAssembler_s390.cpp line 6015: > 6013: // we will encounter a loop while handling the inflated monitor case > 6014: // so, we need to make sure, when we reach there only top one object is removed. > 6015: // if we load top there then it could result into infinite loop, So preserving top is a Must here; I assume this refers to the assert/debug code that checks that the lock stack does not contain the object when doing inflated unlocking. The debug code could just unconditionally reload top from the thread. ------------- Changes requested by aboldtch (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18878#pullrequestreview-2067976856 PR Review Comment: https://git.openjdk.org/jdk/pull/18878#discussion_r1608176525 PR Review Comment: https://git.openjdk.org/jdk/pull/18878#discussion_r1608194021 PR Review Comment: https://git.openjdk.org/jdk/pull/18878#discussion_r1607913366 PR Review Comment: https://git.openjdk.org/jdk/pull/18878#discussion_r1607924012 PR Review Comment: https://git.openjdk.org/jdk/pull/18878#discussion_r1608214358 PR Review Comment: https://git.openjdk.org/jdk/pull/18878#discussion_r1608167611 From aboldtch at openjdk.org Tue May 21 12:14:04 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Tue, 21 May 2024 12:14:04 GMT Subject: RFR: 8319947: Recursive lightweight locking: s390x implementation In-Reply-To: References: <_HXRejW4TcLfTYbXlRQUVyejaomiwx6HhObYfG1lX6E=.f89b29a0-4c2d-43e1-83be-d7bafd84816c@github.com> Message-ID: On Tue, 21 May 2024 11:42:28 GMT, Axel Boldt-Christmas wrote: >> s390x port for recursive locking. >> >> testing: >> - [x] build fastdebug-vm >> - [x] build slowdebug-vm >> - [x] build release-vm >> - [x] build optimized-vm >> - [x] ./test/jdk/java/util/concurrent (fastdebug-vm) >> - [x] with C1 >> - [x] with C2 >> - [x] with interpreter >> - [x] ./test/jdk/java/util/concurrent (release-vm) >> - [x] with C1 >> - [x] with C2 >> - [x] with interpreter >> - [x] ./test/jdk/java/util/concurrent (slowdebug-vm) >> - [x] with C1 >> - [x] with C2 >> - [x] with interpreter >> - [x] tier1 with fastdebug-vm >> - [x] tier1 with slowdebug-vm >> - [x] tier1 with release-vm >> >> *BenchMarks*: >> >> Results from Performance LPARs : >> >> >> Locking Mode = 1 (without Patch) >> >> Benchmark (innerCount) Mode Cnt Score Error Units >> LockUnlock.testContendedLock 100 avgt 12 5.144 ? 0.035 ns/op >> LockUnlock.testRecursiveLockUnlock 100 avgt 12 3824.742 ? 89.475 ns/op >> LockUnlock.testRecursiveSynchronization 100 avgt 12 25.348 ? 0.559 ns/op >> LockUnlock.testSerialLockUnlock 100 avgt 12 466.629 ? 3.036 ns/op >> LockUnlock.testSimpleLockUnlock 100 avgt 12 468.532 ? 1.793 ns/op >> Finished running test 'micro:vm.lang.LockUnlock' >> >> Locking Mode = 1 (with patch) >> >> Benchmark (innerCount) Mode Cnt Score Error Units >> LockUnlock.testContendedLock 100 avgt 12 5.146 ? 0.027 ns/op >> LockUnlock.testRecursiveLockUnlock 100 avgt 12 3833.175 ? 75.863 ns/op >> LockUnlock.testRecursiveSynchronization 100 avgt 12 25.206 ? 0.519 ns/op >> LockUnlock.testSerialLockUnlock 100 avgt 12 473.973 ? 2.103 ns/op >> LockUnlock.testSimpleLockUnlock 100 avgt 12 470.749 ? 2.229 ns/op >> Finished running test 'micro:vm.lang.LockUnlock' >> >> >> >> >> Locking Mode = 2 (without Patch) >> >> Benchmark (innerCount) Mode Cnt Score Error Units >> LockUnlock.testContendedLock 100 avgt 12 4.688 ? 0.051 ns/op >> LockUnlock.testRecursiveLockUnlock 100 avgt 12 12800.544 ? 92.265 ns/op >> LockUnlock.testRecursiveSynchronization 100 avgt 12 26.486 ? 2.229 ns/op >> LockUnlock.testSerialLockUnlock 100 avgt 12 424.499 ? 0.416 ns/op >> LockUnlock.te... > > src/hotspot/cpu/s390/interp_masm_s390.cpp line 1013: > >> 1011: assert((JVM_ACC_IS_VALUE_BASED_CLASS & 0xFFFF) == 0, "or change following instruction"); >> 1012: z_nilh(tmp, JVM_ACC_IS_VALUE_BASED_CLASS >> 16); >> 1013: z_brne(slow_case); > > Is this change unrelated to recursive lightweight? If so should it be a separate RFE? I see you implemented it in `compiler_fast_lock_lightweight_object` and changed it in `compiler_fast_lock_object` as well. Maybe that answers my question. > src/hotspot/cpu/s390/macroAssembler_s390.cpp line 5757: > >> 5755: >> 5756: z_csg(mark, top, oopDesc::mark_offset_in_bytes(), obj); >> 5757: branch_optimized(Assembler::bcondNotEqual, slow); > > The previous comment and spacing makes this a little harder for me to read and understand. > ```C++ > // Try to lock. Transition lock-bits 0b01 => 0b00 > z_oill(mark, markWord::unlocked_value); > z_lgr(top, mark); > z_xilf(top, markWord::unlocked_value); > z_csg(mark, top, oopDesc::mark_offset_in_bytes(), obj); > branch_optimized(Assembler::bcondNotEqual, slow); > > If comments to clarify the `mark |= 1; top = mark; top ^= 1;` logic requires a comment, it can be added but without the newlines. I saw below that you do it like this with the ` // Clear lock-bits from locked_obj (locked state)` comment added in `compiler_fast_lock_lightweight_object`. Maybe have it the same way here. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18878#discussion_r1608212819 PR Review Comment: https://git.openjdk.org/jdk/pull/18878#discussion_r1608210500 From amitkumar at openjdk.org Tue May 21 12:28:02 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Tue, 21 May 2024 12:28:02 GMT Subject: RFR: 8319947: Recursive lightweight locking: s390x implementation In-Reply-To: References: <_HXRejW4TcLfTYbXlRQUVyejaomiwx6HhObYfG1lX6E=.f89b29a0-4c2d-43e1-83be-d7bafd84816c@github.com> Message-ID: <4JrDp9TvzydFLxoRquho7QFoAJK-bdQRrfn6oB-_u0s=.1af13c5e-d4f3-4103-8187-72917a196aa4@github.com> On Tue, 21 May 2024 12:06:08 GMT, Axel Boldt-Christmas wrote: >> src/hotspot/cpu/s390/interp_masm_s390.cpp line 1013: >> >>> 1011: assert((JVM_ACC_IS_VALUE_BASED_CLASS & 0xFFFF) == 0, "or change following instruction"); >>> 1012: z_nilh(tmp, JVM_ACC_IS_VALUE_BASED_CLASS >> 16); >>> 1013: z_brne(slow_case); >> >> Is this change unrelated to recursive lightweight? If so should it be a separate RFE? > > I see you implemented it in `compiler_fast_lock_lightweight_object` and changed it in `compiler_fast_lock_object` as well. Maybe that answers my question. Let's start from this one. I guess we made a mistake while pushing changes in https://github.com/openjdk/jdk/pull/18709. CC in case was being set to `true`. But we required it to be `EQ` or `NE`. At that time the check which you have added were not there, to check the CC, and as a reason incorrect code was pushed :-( I wasn't sure whether I should open a separate PR to revert those change, Though I have opened the https://github.com/openjdk/jdk/pull/18987, but still I thought to ship it with this one. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18878#discussion_r1608237237 From jsjolen at openjdk.org Tue May 21 12:46:30 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Tue, 21 May 2024 12:46:30 GMT Subject: RFR: 8312132: Add tracking of multiple address spaces in NMT [v97] In-Reply-To: References: Message-ID: > Hi, > > This PR introduces a new abstraction to NMT, named `MemoryFileTracker`. Today, NMT does not track any memory outside of the virtual memory address space. This means that if you allocated memory in something such as a memory-backed file and use `mmap` to map into that memory, then you'll have trouble reporting this to NMT. This is the situation that ZGC is in, and that is what this patch attempts to fix. > > ## `MemoryFileTracker` > > The `MemoryFileTracker` adds the ability of adding new virtual memory address spaces to NMT and committing memory to these, the basic API is: > > ```c++ > static MemoryFile* make_device(const char* descriptive_name); > static void free_device(MemoryFile* device); > > static void allocate_memory(MemoryFile* device, size_t offset, size_t size, > MEMFLAGS flag, const NativeCallStack& stack); > static void free_memory(MemoryFile* device, size_t offset, size_t size); > > > It is easiest to see how this is used by looking at what ZGC's `ZNMT` class does: > > ```c++ > void ZNMT::reserve(zaddress_unsafe start, size_t size) { > MemTracker::record_virtual_memory_reserve((address)start, size, CALLER_PC, mtJavaHeap); > } > void ZNMT::commit(zoffset offset, size_t size) { > MemTracker::allocate_memory_in(ZNMT::_device, static_cast(offset), size, mtJavaHeap, CALLER_PC); > } > void ZNMT::uncommit(zoffset offset, size_t size) { > MemTracker::free_memory_in(ZNMT::_device, (size_t)offset, size); > } > > void ZNMT::map(zaddress_unsafe addr, size_t size, zoffset offset) { > // NMT doesn't track mappings at the moment. > } > void ZNMT::unmap(zaddress_unsafe addr, size_t size) { > // NMT doesn't track mappings at the moment. > } > > > As you can see, any mapping between reserved regions and device-allocated memory is not recorded in NMT. This means that in detailed mode you only get reserved regions printed for the reserved memory, the device-allocated memory is reported separately. When performing summary reporting any memory allocated via these devices is added to the corresponding `MEMFLAGS` as `committed` memory. > > This patch is also acting as a base on which we deploy multiple new backend ideas to NMT. These ideas are: > > 1. Implement VMA tracking using a balanced binary tree approach. Today's `VirtualMemoryTracker`'s usage of linked lists is slow and brittle, we'd like to move away from it. Our Treap-based approach in this patch gives a performance boost such that we see 25x better performance in a benchmark. The idea and draft of this... Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: Fix copyright ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18289/files - new: https://git.openjdk.org/jdk/pull/18289/files/0dababc3..549a9393 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18289&range=96 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18289&range=95-96 Stats: 2 lines in 2 files changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/18289.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18289/head:pull/18289 PR: https://git.openjdk.org/jdk/pull/18289 From amitkumar at openjdk.org Tue May 21 12:58:01 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Tue, 21 May 2024 12:58:01 GMT Subject: RFR: 8319947: Recursive lightweight locking: s390x implementation In-Reply-To: References: <_HXRejW4TcLfTYbXlRQUVyejaomiwx6HhObYfG1lX6E=.f89b29a0-4c2d-43e1-83be-d7bafd84816c@github.com> Message-ID: On Tue, 21 May 2024 11:34:52 GMT, Axel Boldt-Christmas wrote: >> s390x port for recursive locking. >> >> testing: >> - [x] build fastdebug-vm >> - [x] build slowdebug-vm >> - [x] build release-vm >> - [x] build optimized-vm >> - [x] ./test/jdk/java/util/concurrent (fastdebug-vm) >> - [x] with C1 >> - [x] with C2 >> - [x] with interpreter >> - [x] ./test/jdk/java/util/concurrent (release-vm) >> - [x] with C1 >> - [x] with C2 >> - [x] with interpreter >> - [x] ./test/jdk/java/util/concurrent (slowdebug-vm) >> - [x] with C1 >> - [x] with C2 >> - [x] with interpreter >> - [x] tier1 with fastdebug-vm >> - [x] tier1 with slowdebug-vm >> - [x] tier1 with release-vm >> >> *BenchMarks*: >> >> Results from Performance LPARs : >> >> >> Locking Mode = 1 (without Patch) >> >> Benchmark (innerCount) Mode Cnt Score Error Units >> LockUnlock.testContendedLock 100 avgt 12 5.144 ? 0.035 ns/op >> LockUnlock.testRecursiveLockUnlock 100 avgt 12 3824.742 ? 89.475 ns/op >> LockUnlock.testRecursiveSynchronization 100 avgt 12 25.348 ? 0.559 ns/op >> LockUnlock.testSerialLockUnlock 100 avgt 12 466.629 ? 3.036 ns/op >> LockUnlock.testSimpleLockUnlock 100 avgt 12 468.532 ? 1.793 ns/op >> Finished running test 'micro:vm.lang.LockUnlock' >> >> Locking Mode = 1 (with patch) >> >> Benchmark (innerCount) Mode Cnt Score Error Units >> LockUnlock.testContendedLock 100 avgt 12 5.146 ? 0.027 ns/op >> LockUnlock.testRecursiveLockUnlock 100 avgt 12 3833.175 ? 75.863 ns/op >> LockUnlock.testRecursiveSynchronization 100 avgt 12 25.206 ? 0.519 ns/op >> LockUnlock.testSerialLockUnlock 100 avgt 12 473.973 ? 2.103 ns/op >> LockUnlock.testSimpleLockUnlock 100 avgt 12 470.749 ? 2.229 ns/op >> Finished running test 'micro:vm.lang.LockUnlock' >> >> >> >> >> Locking Mode = 2 (without Patch) >> >> Benchmark (innerCount) Mode Cnt Score Error Units >> LockUnlock.testContendedLock 100 avgt 12 4.688 ? 0.051 ns/op >> LockUnlock.testRecursiveLockUnlock 100 avgt 12 12800.544 ? 92.265 ns/op >> LockUnlock.testRecursiveSynchronization 100 avgt 12 26.486 ? 2.229 ns/op >> LockUnlock.testSerialLockUnlock 100 avgt 12 424.499 ? 0.416 ns/op >> LockUnlock.te... > > src/hotspot/cpu/s390/macroAssembler_s390.cpp line 6015: > >> 6013: // we will encounter a loop while handling the inflated monitor case >> 6014: // so, we need to make sure, when we reach there only top one object is removed. >> 6015: // if we load top there then it could result into infinite loop, So preserving top is a Must here; > > I assume this refers to the assert/debug code that checks that the lock stack does not contain the object when doing inflated unlocking. The debug code could just unconditionally reload top from the thread. probably we can do something like this: diff --git a/src/hotspot/cpu/s390/macroAssembler_s390.cpp b/src/hotspot/cpu/s390/macroAssembler_s390.cpp index 5a77e0d49f2..cfde21c84f0 100644 --- a/src/hotspot/cpu/s390/macroAssembler_s390.cpp +++ b/src/hotspot/cpu/s390/macroAssembler_s390.cpp @@ -5977,7 +5977,7 @@ void MacroAssembler::compiler_fast_unlock_lightweight_object(Register obj, Regis assert_different_registers(obj, tmp1, tmp2); // Handle inflated monitor. - NearLabel inflated, inflated_load_monitor; + NearLabel inflated, inflated_load_monitor, inflated_intermediate ; // Finish fast unlock successfully. MUST reach to with flag == EQ. NearLabel unlocked; // Finish fast unlock unsuccessfully. MUST branch to with flag == NE. @@ -6021,7 +6021,7 @@ void MacroAssembler::compiler_fast_unlock_lightweight_object(Register obj, Regis // Check for monitor (0b10). z_lg(mark, Address(obj, oopDesc::mark_offset_in_bytes())); z_tmll(mark, markWord::monitor_value); - z_brnaz(inflated); + z_brnaz(inflated_intermediate); #ifdef ASSERT // Check header not unlocked (0b01). @@ -6063,6 +6063,8 @@ void MacroAssembler::compiler_fast_unlock_lightweight_object(Register obj, Regis stop("Fast Unlock not monitor"); #endif // ASSERT + bind(inflated_intermediate); + z_lgf(top, Address(Z_thread, JavaThread::lock_stack_top_offset())); bind(inflated); #ifdef ASSERT But instead I kept the code a bit similar to other architectures and for future just added the comment as a warning to be careful which tweaking the code. I guess except this comment everything is fine ? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18878#discussion_r1608279960 From jsjolen at openjdk.org Tue May 21 13:21:04 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Tue, 21 May 2024 13:21:04 GMT Subject: RFR: 8331193: Return references when possible in GrowableArray [v6] In-Reply-To: References: Message-ID: On Mon, 6 May 2024 09:20:43 GMT, Emanuel Peter wrote: > Can you add a regression test that checks exactly the example that you have in your PR descrition? Hi Emanuel, I've been thinking about this a bit. We can add such a test, but it would essentially be a test that checks whether something compiles or not (which it trivially should). We can still add the test you suggested if you feel it is necessary, but it doesn't seem to me like it adds much value? Cheers, Johan ------------- PR Comment: https://git.openjdk.org/jdk/pull/18975#issuecomment-2122620584 From epeter at openjdk.org Tue May 21 13:25:03 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 21 May 2024 13:25:03 GMT Subject: RFR: 8331193: Return references when possible in GrowableArray [v6] In-Reply-To: References: Message-ID: On Tue, 21 May 2024 13:18:35 GMT, Johan Sj?len wrote: >> Can you add a regression test that checks exactly the example that you have in your PR descrition? > >> Can you add a regression test that checks exactly the example that you have in your PR descrition? > > Hi Emanuel, > > I've been thinking about this a bit. We can add such a test, but it would essentially be a test that checks whether something compiles or not (which it trivially should). We can still add the test you suggested if you feel it is necessary, but it doesn't seem to me like it adds much value? > > Cheers, > Johan @jdksjolen I mean we already test the other methods, so why not test this one? And it is not just about compilation working, but also the results that the new methods return. If @stefank @kimbarrett say this is not necessary, then ignore this. Not sure how rigorous they want GrowableArray tested. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18975#issuecomment-2122628524 From gziemski at openjdk.org Tue May 21 14:54:14 2024 From: gziemski at openjdk.org (Gerard Ziemski) Date: Tue, 21 May 2024 14:54:14 GMT Subject: RFR: 8312132: Add tracking of multiple address spaces in NMT [v97] In-Reply-To: References: Message-ID: On Tue, 21 May 2024 12:46:30 GMT, Johan Sj?len wrote: >> Hi, >> >> This PR introduces a new abstraction to NMT, named `MemoryFileTracker`. Today, NMT does not track any memory outside of the virtual memory address space. This means that if you allocated memory in something such as a memory-backed file and use `mmap` to map into that memory, then you'll have trouble reporting this to NMT. This is the situation that ZGC is in, and that is what this patch attempts to fix. >> >> ## `MemoryFileTracker` >> >> The `MemoryFileTracker` adds the ability of adding new virtual memory address spaces to NMT and committing memory to these, the basic API is: >> >> ```c++ >> static MemoryFile* make_device(const char* descriptive_name); >> static void free_device(MemoryFile* device); >> >> static void allocate_memory(MemoryFile* device, size_t offset, size_t size, >> MEMFLAGS flag, const NativeCallStack& stack); >> static void free_memory(MemoryFile* device, size_t offset, size_t size); >> >> >> It is easiest to see how this is used by looking at what ZGC's `ZNMT` class does: >> >> ```c++ >> void ZNMT::reserve(zaddress_unsafe start, size_t size) { >> MemTracker::record_virtual_memory_reserve((address)start, size, CALLER_PC, mtJavaHeap); >> } >> void ZNMT::commit(zoffset offset, size_t size) { >> MemTracker::allocate_memory_in(ZNMT::_device, static_cast(offset), size, mtJavaHeap, CALLER_PC); >> } >> void ZNMT::uncommit(zoffset offset, size_t size) { >> MemTracker::free_memory_in(ZNMT::_device, (size_t)offset, size); >> } >> >> void ZNMT::map(zaddress_unsafe addr, size_t size, zoffset offset) { >> // NMT doesn't track mappings at the moment. >> } >> void ZNMT::unmap(zaddress_unsafe addr, size_t size) { >> // NMT doesn't track mappings at the moment. >> } >> >> >> As you can see, any mapping between reserved regions and device-allocated memory is not recorded in NMT. This means that in detailed mode you only get reserved regions printed for the reserved memory, the device-allocated memory is reported separately. When performing summary reporting any memory allocated via these devices is added to the corresponding `MEMFLAGS` as `committed` memory. >> >> This patch is also acting as a base on which we deploy multiple new backend ideas to NMT. These ideas are: >> >> 1. Implement VMA tracking using a balanced binary tree approach. Today's `VirtualMemoryTracker`'s usage of linked lists is slow and brittle, we'd like to move away from it. Our Treap-based approach in this patch gives a performance bo... > > Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: > > Fix copyright Changes requested by gziemski (Committer). src/hotspot/share/nmt/nmtTreap.hpp line 236: > 234: } > 235: > 236: void upsert(const K& k, const V& v) { Could we rename this to simply `add()` instead of `upsert()` ? ------------- PR Review: https://git.openjdk.org/jdk/pull/18289#pullrequestreview-2068870426 PR Review Comment: https://git.openjdk.org/jdk/pull/18289#discussion_r1608474488 From shade at openjdk.org Tue May 21 15:00:08 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 21 May 2024 15:00:08 GMT Subject: RFR: 8331572: Allow using OopMapCache outside of STW GC phases [v4] In-Reply-To: <_mFVw8VmpUzTscas3PU4wFHW63mgIrEPlbGPo3iTMrM=.81b20124-b3dd-4264-9d23-e4fbfc79fc78@github.com> References: <_mFVw8VmpUzTscas3PU4wFHW63mgIrEPlbGPo3iTMrM=.81b20124-b3dd-4264-9d23-e4fbfc79fc78@github.com> Message-ID: On Fri, 17 May 2024 15:58:32 GMT, Aleksey Shipilev wrote: >> As the reproducer in the issue shows, we would also like to use the `OopMapCache` during the concurrent GC phases. Zhengyu mentions there is also a production problem for stack walking that would benefit from letting `OopMapCache` be used without looking at GC at all. >> >> This PR unblocks `OopMapCache` uses for everything. Cleanups are nominally done by service thread. But, still appreciating that majority of use cases would be from GCs, we leave the proactive cleanups from the GC ops here as well. It requires the synchronization between readers that might be copying out the entries out of the hashmap and the concurrent reclamation. Handily, `GlobalCounter` can be used for that purpose. >> >> After this lands, I think we can go over `OopMapCache::compute_one_oop_map` uses and see if they would instead like to use the cached `lookup` to benefit from this cache too. I think those paths are for OSR and deopts, so their performance is unlikely to be critical. This PR already covers the concurrent GC paths well. >> >> Additional testing: >> - [x] Performance test reproducer from the bug improves significantly >> - [x] Linux AArch64 server fastdebug, `hotspot_gc_shenandoah` (10x) >> - [x] Linux AArch64 server fastdebug, `all` >> - [x] Linux x86_64 server fastdebug, `all` > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Add more GC triggers around All right, thank you all. Test re-run passes, so I am integrating. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19229#issuecomment-2122826846 From shade at openjdk.org Tue May 21 15:00:09 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 21 May 2024 15:00:09 GMT Subject: Integrated: 8331572: Allow using OopMapCache outside of STW GC phases In-Reply-To: References: Message-ID: <74gZNddHCA9MbyV0SP4Tha8ywNntRbFfNZC1lIHNkw0=.f76fd1a8-4f43-4382-ad36-31c539531e61@github.com> On Tue, 14 May 2024 12:31:08 GMT, Aleksey Shipilev wrote: > As the reproducer in the issue shows, we would also like to use the `OopMapCache` during the concurrent GC phases. Zhengyu mentions there is also a production problem for stack walking that would benefit from letting `OopMapCache` be used without looking at GC at all. > > This PR unblocks `OopMapCache` uses for everything. Cleanups are nominally done by service thread. But, still appreciating that majority of use cases would be from GCs, we leave the proactive cleanups from the GC ops here as well. It requires the synchronization between readers that might be copying out the entries out of the hashmap and the concurrent reclamation. Handily, `GlobalCounter` can be used for that purpose. > > After this lands, I think we can go over `OopMapCache::compute_one_oop_map` uses and see if they would instead like to use the cached `lookup` to benefit from this cache too. I think those paths are for OSR and deopts, so their performance is unlikely to be critical. This PR already covers the concurrent GC paths well. > > Additional testing: > - [x] Performance test reproducer from the bug improves significantly > - [x] Linux AArch64 server fastdebug, `hotspot_gc_shenandoah` (10x) > - [x] Linux AArch64 server fastdebug, `all` > - [x] Linux x86_64 server fastdebug, `all` This pull request has now been integrated. Changeset: d999b81e Author: Aleksey Shipilev URL: https://git.openjdk.org/jdk/commit/d999b81e7110751be402012e1ed41b3256f5895e Stats: 105 lines in 10 files changed: 63 ins; 14 del; 28 mod 8331572: Allow using OopMapCache outside of STW GC phases Co-authored-by: Zhengyu Gu Reviewed-by: coleenp, zgu ------------- PR: https://git.openjdk.org/jdk/pull/19229 From jsjolen at openjdk.org Tue May 21 15:22:04 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Tue, 21 May 2024 15:22:04 GMT Subject: RFR: 8331193: Return references when possible in GrowableArray [v6] In-Reply-To: References: Message-ID: <4b5-Sv8u9rTTDZBVYtdVt3kakwYdPx67YdJX6xyB6Sc=.30dff1bc-31aa-4726-878c-66458045d8c6@github.com> On Tue, 21 May 2024 13:18:35 GMT, Johan Sj?len wrote: >> Can you add a regression test that checks exactly the example that you have in your PR descrition? > >> Can you add a regression test that checks exactly the example that you have in your PR descrition? > > Hi Emanuel, > > I've been thinking about this a bit. We can add such a test, but it would essentially be a test that checks whether something compiles or not (which it trivially should). We can still add the test you suggested if you feel it is necessary, but it doesn't seem to me like it adds much value? > > Cheers, > Johan > @jdksjolen I mean we already test the other methods, so why not test this one? And it is not just about compilation working, but also the results that the new methods return. > > If @stefank @kimbarrett say this is not necessary, then ignore this. Not sure how rigorous they want GrowableArray tested. Alright, I'll add a couple of tests and ping you for review on them :). ------------- PR Comment: https://git.openjdk.org/jdk/pull/18975#issuecomment-2122873912 From gziemski at openjdk.org Tue May 21 15:25:14 2024 From: gziemski at openjdk.org (Gerard Ziemski) Date: Tue, 21 May 2024 15:25:14 GMT Subject: RFR: 8312132: Add tracking of multiple address spaces in NMT [v97] In-Reply-To: References: Message-ID: On Tue, 21 May 2024 12:46:30 GMT, Johan Sj?len wrote: >> Hi, >> >> This PR introduces a new abstraction to NMT, named `MemoryFileTracker`. Today, NMT does not track any memory outside of the virtual memory address space. This means that if you allocated memory in something such as a memory-backed file and use `mmap` to map into that memory, then you'll have trouble reporting this to NMT. This is the situation that ZGC is in, and that is what this patch attempts to fix. >> >> ## `MemoryFileTracker` >> >> The `MemoryFileTracker` adds the ability of adding new virtual memory address spaces to NMT and committing memory to these, the basic API is: >> >> ```c++ >> static MemoryFile* make_device(const char* descriptive_name); >> static void free_device(MemoryFile* device); >> >> static void allocate_memory(MemoryFile* device, size_t offset, size_t size, >> MEMFLAGS flag, const NativeCallStack& stack); >> static void free_memory(MemoryFile* device, size_t offset, size_t size); >> >> >> It is easiest to see how this is used by looking at what ZGC's `ZNMT` class does: >> >> ```c++ >> void ZNMT::reserve(zaddress_unsafe start, size_t size) { >> MemTracker::record_virtual_memory_reserve((address)start, size, CALLER_PC, mtJavaHeap); >> } >> void ZNMT::commit(zoffset offset, size_t size) { >> MemTracker::allocate_memory_in(ZNMT::_device, static_cast(offset), size, mtJavaHeap, CALLER_PC); >> } >> void ZNMT::uncommit(zoffset offset, size_t size) { >> MemTracker::free_memory_in(ZNMT::_device, (size_t)offset, size); >> } >> >> void ZNMT::map(zaddress_unsafe addr, size_t size, zoffset offset) { >> // NMT doesn't track mappings at the moment. >> } >> void ZNMT::unmap(zaddress_unsafe addr, size_t size) { >> // NMT doesn't track mappings at the moment. >> } >> >> >> As you can see, any mapping between reserved regions and device-allocated memory is not recorded in NMT. This means that in detailed mode you only get reserved regions printed for the reserved memory, the device-allocated memory is reported separately. When performing summary reporting any memory allocated via these devices is added to the corresponding `MEMFLAGS` as `committed` memory. >> >> This patch is also acting as a base on which we deploy multiple new backend ideas to NMT. These ideas are: >> >> 1. Implement VMA tracking using a balanced binary tree approach. Today's `VirtualMemoryTracker`'s usage of linked lists is slow and brittle, we'd like to move away from it. Our Treap-based approach in this patch gives a performance bo... > > Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: > > Fix copyright Changes requested by gziemski (Committer). src/hotspot/share/nmt/nmtTreap.hpp line 98: > 96: _prng_seed = (PrngMult * _prng_seed + PrngAdd) & PrngModMask; > 97: return _prng_seed; > 98: } We are now adding a 3rd standalone identical implementation, there are 2 already in: - ThreadHeapSampler::next_random() - JfrPRNG::next_uniform() Perhaps it's high time to consider moving this into `share/utilities` ? ------------- PR Review: https://git.openjdk.org/jdk/pull/18289#pullrequestreview-2068944880 PR Review Comment: https://git.openjdk.org/jdk/pull/18289#discussion_r1608521869 From gziemski at openjdk.org Tue May 21 15:33:12 2024 From: gziemski at openjdk.org (Gerard Ziemski) Date: Tue, 21 May 2024 15:33:12 GMT Subject: RFR: 8312132: Add tracking of multiple address spaces in NMT [v97] In-Reply-To: References: Message-ID: <0Fz-0O_EzGBauE9_jELPy1pw1xQXIhlED3r93qgscvM=.4d4a1a09-4aef-4b0a-a750-962ffa17e4c5@github.com> On Tue, 21 May 2024 12:46:30 GMT, Johan Sj?len wrote: >> Hi, >> >> This PR introduces a new abstraction to NMT, named `MemoryFileTracker`. Today, NMT does not track any memory outside of the virtual memory address space. This means that if you allocated memory in something such as a memory-backed file and use `mmap` to map into that memory, then you'll have trouble reporting this to NMT. This is the situation that ZGC is in, and that is what this patch attempts to fix. >> >> ## `MemoryFileTracker` >> >> The `MemoryFileTracker` adds the ability of adding new virtual memory address spaces to NMT and committing memory to these, the basic API is: >> >> ```c++ >> static MemoryFile* make_device(const char* descriptive_name); >> static void free_device(MemoryFile* device); >> >> static void allocate_memory(MemoryFile* device, size_t offset, size_t size, >> MEMFLAGS flag, const NativeCallStack& stack); >> static void free_memory(MemoryFile* device, size_t offset, size_t size); >> >> >> It is easiest to see how this is used by looking at what ZGC's `ZNMT` class does: >> >> ```c++ >> void ZNMT::reserve(zaddress_unsafe start, size_t size) { >> MemTracker::record_virtual_memory_reserve((address)start, size, CALLER_PC, mtJavaHeap); >> } >> void ZNMT::commit(zoffset offset, size_t size) { >> MemTracker::allocate_memory_in(ZNMT::_device, static_cast(offset), size, mtJavaHeap, CALLER_PC); >> } >> void ZNMT::uncommit(zoffset offset, size_t size) { >> MemTracker::free_memory_in(ZNMT::_device, (size_t)offset, size); >> } >> >> void ZNMT::map(zaddress_unsafe addr, size_t size, zoffset offset) { >> // NMT doesn't track mappings at the moment. >> } >> void ZNMT::unmap(zaddress_unsafe addr, size_t size) { >> // NMT doesn't track mappings at the moment. >> } >> >> >> As you can see, any mapping between reserved regions and device-allocated memory is not recorded in NMT. This means that in detailed mode you only get reserved regions printed for the reserved memory, the device-allocated memory is reported separately. When performing summary reporting any memory allocated via these devices is added to the corresponding `MEMFLAGS` as `committed` memory. >> >> This patch is also acting as a base on which we deploy multiple new backend ideas to NMT. These ideas are: >> >> 1. Implement VMA tracking using a balanced binary tree approach. Today's `VirtualMemoryTracker`'s usage of linked lists is slow and brittle, we'd like to move away from it. Our Treap-based approach in this patch gives a performance bo... > > Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: > > Fix copyright Changes requested by gziemski (Committer). src/hotspot/share/nmt/nmtTreap.hpp line 226: > 224: > 225: public: > 226: Treap(uint64_t seed = static_cast(os::random())) Do we need 64 bit random number here? If we really need 64 bits shouldn't we do something like: ` Treap(uint64_t seed = static_cast(os::random())) | (static_cast(os::random())) << 32)` or add `os::random_64bits()` that does exactly this? Notice that I filed https://bugs.openjdk.org/browse/JDK-8332618 asking to rename `os::random() `to `os::random_32bits()`, so this would go nicely with that change later. ------------- PR Review: https://git.openjdk.org/jdk/pull/18289#pullrequestreview-2068965111 PR Review Comment: https://git.openjdk.org/jdk/pull/18289#discussion_r1608534422 From gziemski at openjdk.org Tue May 21 15:41:17 2024 From: gziemski at openjdk.org (Gerard Ziemski) Date: Tue, 21 May 2024 15:41:17 GMT Subject: RFR: 8312132: Add tracking of multiple address spaces in NMT [v97] In-Reply-To: References: Message-ID: On Tue, 21 May 2024 12:46:30 GMT, Johan Sj?len wrote: >> Hi, >> >> This PR introduces a new abstraction to NMT, named `MemoryFileTracker`. Today, NMT does not track any memory outside of the virtual memory address space. This means that if you allocated memory in something such as a memory-backed file and use `mmap` to map into that memory, then you'll have trouble reporting this to NMT. This is the situation that ZGC is in, and that is what this patch attempts to fix. >> >> ## `MemoryFileTracker` >> >> The `MemoryFileTracker` adds the ability of adding new virtual memory address spaces to NMT and committing memory to these, the basic API is: >> >> ```c++ >> static MemoryFile* make_device(const char* descriptive_name); >> static void free_device(MemoryFile* device); >> >> static void allocate_memory(MemoryFile* device, size_t offset, size_t size, >> MEMFLAGS flag, const NativeCallStack& stack); >> static void free_memory(MemoryFile* device, size_t offset, size_t size); >> >> >> It is easiest to see how this is used by looking at what ZGC's `ZNMT` class does: >> >> ```c++ >> void ZNMT::reserve(zaddress_unsafe start, size_t size) { >> MemTracker::record_virtual_memory_reserve((address)start, size, CALLER_PC, mtJavaHeap); >> } >> void ZNMT::commit(zoffset offset, size_t size) { >> MemTracker::allocate_memory_in(ZNMT::_device, static_cast(offset), size, mtJavaHeap, CALLER_PC); >> } >> void ZNMT::uncommit(zoffset offset, size_t size) { >> MemTracker::free_memory_in(ZNMT::_device, (size_t)offset, size); >> } >> >> void ZNMT::map(zaddress_unsafe addr, size_t size, zoffset offset) { >> // NMT doesn't track mappings at the moment. >> } >> void ZNMT::unmap(zaddress_unsafe addr, size_t size) { >> // NMT doesn't track mappings at the moment. >> } >> >> >> As you can see, any mapping between reserved regions and device-allocated memory is not recorded in NMT. This means that in detailed mode you only get reserved regions printed for the reserved memory, the device-allocated memory is reported separately. When performing summary reporting any memory allocated via these devices is added to the corresponding `MEMFLAGS` as `committed` memory. >> >> This patch is also acting as a base on which we deploy multiple new backend ideas to NMT. These ideas are: >> >> 1. Implement VMA tracking using a balanced binary tree approach. Today's `VirtualMemoryTracker`'s usage of linked lists is slow and brittle, we'd like to move away from it. Our Treap-based approach in this patch gives a performance bo... > > Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: > > Fix copyright Changes requested by gziemski (Committer). test/hotspot/gtest/nmt/test_nmt_treap.cpp line 331: > 329: treap.remove(i); > 330: } > 331: verify_it(treap); Now that we fixed how we use `random()` here the expected depth of the tree should be close to 0, which will not exercise the randomness of priorities. We need to add back a test case that only adds (no remove) to correctly exercise and check the tree depth. ------------- PR Review: https://git.openjdk.org/jdk/pull/18289#pullrequestreview-2068983752 PR Review Comment: https://git.openjdk.org/jdk/pull/18289#discussion_r1608546155 From asemenyuk at openjdk.org Tue May 21 15:57:06 2024 From: asemenyuk at openjdk.org (Alexey Semenyuk) Date: Tue, 21 May 2024 15:57:06 GMT Subject: RFR: 8331671: Implement JEP 472: Prepare to Restrict the Use of JNI [v8] In-Reply-To: References: Message-ID: On Fri, 17 May 2024 13:38:25 GMT, Maurizio Cimadamore wrote: >> This PR implements [JEP 472](https://openjdk.org/jeps/472), by restricting the use of JNI in the following ways: >> >> * `System::load` and `System::loadLibrary` are now restricted methods >> * `Runtime::load` and `Runtime::loadLibrary` are now restricted methods >> * binding a JNI `native` method declaration to a native implementation is now considered a restricted operation >> >> This PR slightly changes the way in which the JDK deals with restricted methods, even for FFM API calls. In Java 22, the single `--enable-native-access` was used both to specify a set of modules for which native access should be allowed *and* to specify whether illegal native access (that is, native access occurring from a module not specified by `--enable-native-access`) should be treated as an error or a warning. More specifically, an error is only issued if the `--enable-native-access flag` is used at least once. >> >> Here, a new flag is introduced, namely `illegal-native-access=allow/warn/deny`, which is used to specify what should happen when access to a restricted method and/or functionality is found outside the set of modules specified with `--enable-native-access`. The default policy is `warn`, but users can select `allow` to suppress the warnings, or `deny` to cause `IllegalCallerException` to be thrown. This aligns the treatment of restricted methods with other mechanisms, such as `--illegal-access` and the more recent `--sun-misc-unsafe-memory-access`. >> >> Some changes were required in the package-info javadoc for `java.lang.foreign`, to reflect the changes in the command line flags described above. > > Maurizio Cimadamore has updated the pull request incrementally with one additional commit since the last revision: > > Address review comments `jdk.jpackage` changes look good ------------- PR Comment: https://git.openjdk.org/jdk/pull/19213#issuecomment-2122942586 From jsjolen at openjdk.org Tue May 21 16:17:14 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Tue, 21 May 2024 16:17:14 GMT Subject: RFR: 8312132: Add tracking of multiple address spaces in NMT [v97] In-Reply-To: References: Message-ID: On Tue, 21 May 2024 15:21:59 GMT, Gerard Ziemski wrote: >> Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix copyright > > src/hotspot/share/nmt/nmtTreap.hpp line 98: > >> 96: _prng_seed = (PrngMult * _prng_seed + PrngAdd) & PrngModMask; >> 97: return _prng_seed; >> 98: } > > We are now adding a 3rd standalone identical implementation, there are 2 already in: > > - ThreadHeapSampler::next_random() > - JfrPRNG::next_uniform() > > Perhaps it's high time to consider moving this into `share/utilities` ? I agree, let's make a separate RFE for that. I'm keeping track of the instances where I've said that, so we shouldn't miss those opportunities :-). > src/hotspot/share/nmt/nmtTreap.hpp line 226: > >> 224: >> 225: public: >> 226: Treap(uint64_t seed = static_cast(os::random())) > > Do we need 64 bit random number here? If we really need 64 bits shouldn't we do something like: > > ` Treap(uint64_t seed = static_cast(os::random())) | (static_cast(os::random())) << 32)` > > or add `os::random_62bits()` that does exactly this? > > Notice that I filed https://bugs.openjdk.org/browse/JDK-8332618 asking to rename `os::random() `to `os::random_31bits()`, so this would go nicely with that change later. I just want to get as much entropy as possible. Yeah, let's do the bit shift inline for now ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18289#discussion_r1608605318 PR Review Comment: https://git.openjdk.org/jdk/pull/18289#discussion_r1608603873 From prr at openjdk.org Tue May 21 16:45:06 2024 From: prr at openjdk.org (Phil Race) Date: Tue, 21 May 2024 16:45:06 GMT Subject: RFR: 8331671: Implement JEP 472: Prepare to Restrict the Use of JNI [v8] In-Reply-To: References: Message-ID: <2sHuToXAXHYrqtE31r7-wDvJ3JM0nQYujuLFAtqWQQI=.3c61631b-ecb1-4073-9b5f-6a379ab614cf@github.com> On Fri, 17 May 2024 13:38:25 GMT, Maurizio Cimadamore wrote: >> This PR implements [JEP 472](https://openjdk.org/jeps/472), by restricting the use of JNI in the following ways: >> >> * `System::load` and `System::loadLibrary` are now restricted methods >> * `Runtime::load` and `Runtime::loadLibrary` are now restricted methods >> * binding a JNI `native` method declaration to a native implementation is now considered a restricted operation >> >> This PR slightly changes the way in which the JDK deals with restricted methods, even for FFM API calls. In Java 22, the single `--enable-native-access` was used both to specify a set of modules for which native access should be allowed *and* to specify whether illegal native access (that is, native access occurring from a module not specified by `--enable-native-access`) should be treated as an error or a warning. More specifically, an error is only issued if the `--enable-native-access flag` is used at least once. >> >> Here, a new flag is introduced, namely `illegal-native-access=allow/warn/deny`, which is used to specify what should happen when access to a restricted method and/or functionality is found outside the set of modules specified with `--enable-native-access`. The default policy is `warn`, but users can select `allow` to suppress the warnings, or `deny` to cause `IllegalCallerException` to be thrown. This aligns the treatment of restricted methods with other mechanisms, such as `--illegal-access` and the more recent `--sun-misc-unsafe-memory-access`. >> >> Some changes were required in the package-info javadoc for `java.lang.foreign`, to reflect the changes in the command line flags described above. > > Maurizio Cimadamore has updated the pull request incrementally with one additional commit since the last revision: > > Address review comments client parts look fine. ------------- Marked as reviewed by prr (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19213#pullrequestreview-2069134455 From asemenyuk at openjdk.org Tue May 21 16:59:04 2024 From: asemenyuk at openjdk.org (Alexey Semenyuk) Date: Tue, 21 May 2024 16:59:04 GMT Subject: RFR: 8331671: Implement JEP 472: Prepare to Restrict the Use of JNI [v8] In-Reply-To: References: Message-ID: On Tue, 21 May 2024 08:44:47 GMT, Maurizio Cimadamore wrote: > These are all good suggestions. I have not looked into jpackage, but yes, I would expect that the jpackage user would need to provide extra options when packaging the application. It would be good to document how jpackage users packaging apps with native access will be affected by this change. Primarily that they need to pass `--illegal-native-access` parameter to affected jpackage app launchers. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19213#issuecomment-2123054154 From duke at openjdk.org Tue May 21 17:41:46 2024 From: duke at openjdk.org (Volodymyr Paprotski) Date: Tue, 21 May 2024 17:41:46 GMT Subject: RFR: 8329538: Accelerate P256 on x86_64 using Montgomery intrinsic [v12] In-Reply-To: References: Message-ID: > Performance. Before: > > Benchmark (algorithm) (dataSize) (keyLength) (provider) Mode Cnt Score Error Units > SignatureBench.ECDSA.sign SHA256withECDSA 1024 256 thrpt 3 6443.934 ? 6.491 ops/s > SignatureBench.ECDSA.sign SHA256withECDSA 16384 256 thrpt 3 6152.979 ? 4.954 ops/s > SignatureBench.ECDSA.verify SHA256withECDSA 1024 256 thrpt 3 1895.410 ? 36.979 ops/s > SignatureBench.ECDSA.verify SHA256withECDSA 16384 256 thrpt 3 1878.955 ? 45.487 ops/s > Benchmark (algorithm) (keyLength) (kpgAlgorithm) (provider) Mode Cnt Score Error Units > o.o.b.j.c.full.KeyAgreementBench.EC.generateSecret ECDH 256 EC thrpt 3 1357.810 ? 26.584 ops/s > o.o.b.j.c.small.KeyAgreementBench.EC.generateSecret ECDH 256 EC thrpt 3 1352.119 ? 23.547 ops/s > Benchmark (isMontBench) Mode Cnt Score Error Units > PolynomialP256Bench.benchMultiply false thrpt 3 1746.126 ? 10.970 ops/s > > Performance, no intrinsic: > > Benchmark (algorithm) (dataSize) (keyLength) (provider) Mode Cnt Score Error Units > SignatureBench.ECDSA.sign SHA256withECDSA 1024 256 thrpt 3 6529.839 ? 42.420 ops/s > SignatureBench.ECDSA.sign SHA256withECDSA 16384 256 thrpt 3 6199.747 ? 133.566 ops/s > SignatureBench.ECDSA.verify SHA256withECDSA 1024 256 thrpt 3 1973.676 ? 54.071 ops/s > SignatureBench.ECDSA.verify SHA256withECDSA 16384 256 thrpt 3 1932.127 ? 35.920 ops/s > Benchmark (algorithm) (keyLength) (kpgAlgorithm) (provider) Mode Cnt Score Error Units > o.o.b.j.c.full.KeyAgreementBench.EC.generateSecret ECDH 256 EC thrpt 3 1355.788 ? 29.858 ops/s > o.o.b.j.c.small.KeyAgreementBench.EC.generateSecret ECDH 256 EC thrpt 3 1346.523 ? 28.722 ops/s > Benchmark (isMontBench) Mode Cnt Score Error Units > PolynomialP256Bench.benchMultiply true thrpt 3 1919.574 ? 10.591 ops/s > > Performance, **with intrinsics*... Volodymyr Paprotski has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 17 additional commits since the last revision: - Merge remote-tracking branch 'origin/master' into ecc-montgomery - shenandoah verifier - comments from Sandhya - whitespace - add message back - whitespace - Use AffinePoint to exit Montgomery domain Style notes: Affine.equals() - Mismatched fields only appear to be used from testing, perhaps should be moved there instead Affine.getX(boolean)|getY(boolean) - "Passing flag is bad design" - cleanest/performant alternative to several instanceof checks - needed to convert Affine to Projective (need to stay in montgomery domain) ECOperations.PointMultiplier - changes could probably be restored to original (since ProjectivePoint handling no longer required) - consider these changes an improvement? (fewer nested classes) - was an inner-class but not using inner-class features (i.e. ecOps variable should be converted) - whitespace - Comments from Tony and Jatin - Comments from Jatin and Tony - ... and 7 more: https://git.openjdk.org/jdk/compare/12e8009b...b1a33004 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18583/files - new: https://git.openjdk.org/jdk/pull/18583/files/df4fe6fa..b1a33004 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18583&range=11 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18583&range=10-11 Stats: 190975 lines in 3949 files changed: 105304 ins; 64688 del; 20983 mod Patch: https://git.openjdk.org/jdk/pull/18583.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18583/head:pull/18583 PR: https://git.openjdk.org/jdk/pull/18583 From duke at openjdk.org Tue May 21 17:41:46 2024 From: duke at openjdk.org (Volodymyr Paprotski) Date: Tue, 21 May 2024 17:41:46 GMT Subject: RFR: 8329538: Accelerate P256 on x86_64 using Montgomery intrinsic [v11] In-Reply-To: References: Message-ID: On Tue, 21 May 2024 07:21:14 GMT, Tobias Hartmann wrote: >> Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision: >> >> shenandoah verifier > > I'm getting some conflicts when trying to apply this to master. Could you please merge the PR? Hi @TobiHartmann , merged with no issues for me. Could you please run the tests again? (I think Tony did run them, but can't hurt verifying again). Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/18583#issuecomment-2123122468 From sviswanathan at openjdk.org Tue May 21 18:41:11 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Tue, 21 May 2024 18:41:11 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v20] In-Reply-To: References: Message-ID: <2y8TuEb98PH5hxKQAxPdnPfuqqkDmGDmHxS6byTZoas=.7c1f9bc9-75c6-4057-8b74-35cb1a086509@github.com> On Fri, 17 May 2024 23:47:45 GMT, Scott Gibbons wrote: >> Re-write the IndexOf code without the use of the pcmpestri instruction, only using AVX2 instructions. This change accelerates String.IndexOf on average 1.3x for AVX2. The benchmark numbers: >> >> >> Benchmark Score Latest >> StringIndexOf.advancedWithMediumSub 343.573 317.934 0.925375393x >> StringIndexOf.advancedWithShortSub1 1039.081 1053.96 1.014319384x >> StringIndexOf.advancedWithShortSub2 55.828 110.541 1.980027943x >> StringIndexOf.constantPattern 9.361 11.906 1.271872663x >> StringIndexOf.searchCharLongSuccess 4.216 4.218 1.000474383x >> StringIndexOf.searchCharMediumSuccess 3.133 3.216 1.02649218x >> StringIndexOf.searchCharShortSuccess 3.76 3.761 1.000265957x >> StringIndexOf.success 9.186 9.713 1.057369911x >> StringIndexOf.successBig 14.341 46.343 3.231504079x >> StringIndexOfChar.latin1_AVX2_String 6220.918 12154.52 1.953814533x >> StringIndexOfChar.latin1_AVX2_char 5503.556 5540.044 1.006629895x >> StringIndexOfChar.latin1_SSE4_String 6978.854 6818.689 0.977049957x >> StringIndexOfChar.latin1_SSE4_char 5657.499 5474.624 0.967675646x >> StringIndexOfChar.latin1_Short_String 7132.541 6863.359 0.962260014x >> StringIndexOfChar.latin1_Short_char 16013.389 16162.437 1.009307711x >> StringIndexOfChar.latin1_mixed_String 7386.123 14771.622 1.999915517x >> StringIndexOfChar.latin1_mixed_char 9901.671 9782.245 0.987938803 > > Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: > > Addressing lots of comments. Interim commit. src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 4648: > 4646: vpxor(vec1, vec2); > 4647: > 4648: vptest(vec1, vec1); These should be only 128 bit: pxor(vec1, vec2); ptest(vec1, vec1); src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1351: > 1349: assert_different_registers(needle, needleVal); > 1350: > 1351: bool isLL = (ae == StrIntrinsicNode::LL); isLL not used in this function. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1608732591 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1605624430 From cjplummer at openjdk.org Tue May 21 19:09:04 2024 From: cjplummer at openjdk.org (Chris Plummer) Date: Tue, 21 May 2024 19:09:04 GMT Subject: RFR: 8331683: Clean up GetCarrierThread In-Reply-To: References: Message-ID: On Sat, 18 May 2024 00:47:59 GMT, Alex Menkov wrote: > JVMTI GetCarrierThread extension function was introduced by loom for testing. > It's used by several tests in hotspot/jtreg/serviceability. > > Testings: tier1..tier6 Marked as reviewed by cjplummer (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/19289#pullrequestreview-2069434194 From sgibbons at openjdk.org Wed May 22 02:07:36 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Wed, 22 May 2024 02:07:36 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v21] In-Reply-To: References: Message-ID: > Re-write the IndexOf code without the use of the pcmpestri instruction, only using AVX2 instructions. This change accelerates String.IndexOf on average 1.3x for AVX2. The benchmark numbers: > > > Benchmark Score Latest > StringIndexOf.advancedWithMediumSub 343.573 317.934 0.925375393x > StringIndexOf.advancedWithShortSub1 1039.081 1053.96 1.014319384x > StringIndexOf.advancedWithShortSub2 55.828 110.541 1.980027943x > StringIndexOf.constantPattern 9.361 11.906 1.271872663x > StringIndexOf.searchCharLongSuccess 4.216 4.218 1.000474383x > StringIndexOf.searchCharMediumSuccess 3.133 3.216 1.02649218x > StringIndexOf.searchCharShortSuccess 3.76 3.761 1.000265957x > StringIndexOf.success 9.186 9.713 1.057369911x > StringIndexOf.successBig 14.341 46.343 3.231504079x > StringIndexOfChar.latin1_AVX2_String 6220.918 12154.52 1.953814533x > StringIndexOfChar.latin1_AVX2_char 5503.556 5540.044 1.006629895x > StringIndexOfChar.latin1_SSE4_String 6978.854 6818.689 0.977049957x > StringIndexOfChar.latin1_SSE4_char 5657.499 5474.624 0.967675646x > StringIndexOfChar.latin1_Short_String 7132.541 6863.359 0.962260014x > StringIndexOfChar.latin1_Short_char 16013.389 16162.437 1.009307711x > StringIndexOfChar.latin1_mixed_String 7386.123 14771.622 1.999915517x > StringIndexOfChar.latin1_mixed_char 9901.671 9782.245 0.987938803 Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: Fixed CI compiles; re-factor UL processing ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16753/files - new: https://git.openjdk.org/jdk/pull/16753/files/9a861979..38868a35 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16753&range=20 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16753&range=19-20 Stats: 570 lines in 2 files changed: 327 ins; 158 del; 85 mod Patch: https://git.openjdk.org/jdk/pull/16753.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16753/head:pull/16753 PR: https://git.openjdk.org/jdk/pull/16753 From sgibbons at openjdk.org Wed May 22 02:07:36 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Wed, 22 May 2024 02:07:36 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v20] In-Reply-To: References: Message-ID: <2K6GTqVka0-FS4NQcZ6z6izsDZVC1DuN1GuzzpkLlZk=.3853f424-d8fc-4c65-827d-a7abb321f38e@github.com> On Fri, 17 May 2024 23:47:45 GMT, Scott Gibbons wrote: >> Re-write the IndexOf code without the use of the pcmpestri instruction, only using AVX2 instructions. This change accelerates String.IndexOf on average 1.3x for AVX2. The benchmark numbers: >> >> >> Benchmark Score Latest >> StringIndexOf.advancedWithMediumSub 343.573 317.934 0.925375393x >> StringIndexOf.advancedWithShortSub1 1039.081 1053.96 1.014319384x >> StringIndexOf.advancedWithShortSub2 55.828 110.541 1.980027943x >> StringIndexOf.constantPattern 9.361 11.906 1.271872663x >> StringIndexOf.searchCharLongSuccess 4.216 4.218 1.000474383x >> StringIndexOf.searchCharMediumSuccess 3.133 3.216 1.02649218x >> StringIndexOf.searchCharShortSuccess 3.76 3.761 1.000265957x >> StringIndexOf.success 9.186 9.713 1.057369911x >> StringIndexOf.successBig 14.341 46.343 3.231504079x >> StringIndexOfChar.latin1_AVX2_String 6220.918 12154.52 1.953814533x >> StringIndexOfChar.latin1_AVX2_char 5503.556 5540.044 1.006629895x >> StringIndexOfChar.latin1_SSE4_String 6978.854 6818.689 0.977049957x >> StringIndexOfChar.latin1_SSE4_char 5657.499 5474.624 0.967675646x >> StringIndexOfChar.latin1_Short_String 7132.541 6863.359 0.962260014x >> StringIndexOfChar.latin1_Short_char 16013.389 16162.437 1.009307711x >> StringIndexOfChar.latin1_mixed_String 7386.123 14771.622 1.999915517x >> StringIndexOfChar.latin1_mixed_char 9901.671 9782.245 0.987938803 > > Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: > > Addressing lots of comments. Interim commit. Comment on behalf of @sviswa7 : Unclear whether `size` in `byte_compare_helper` is intended to be in bytes or in elements. Please check its consistency. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16753#issuecomment-2123736900 From sgibbons at openjdk.org Wed May 22 02:07:36 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Wed, 22 May 2024 02:07:36 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v20] In-Reply-To: <2y8TuEb98PH5hxKQAxPdnPfuqqkDmGDmHxS6byTZoas=.7c1f9bc9-75c6-4057-8b74-35cb1a086509@github.com> References: <2y8TuEb98PH5hxKQAxPdnPfuqqkDmGDmHxS6byTZoas=.7c1f9bc9-75c6-4057-8b74-35cb1a086509@github.com> Message-ID: On Tue, 21 May 2024 18:03:41 GMT, Sandhya Viswanathan wrote: >> Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: >> >> Addressing lots of comments. Interim commit. > > src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 4648: > >> 4646: vpxor(vec1, vec2); >> 4647: >> 4648: vptest(vec1, vec1); > > These should be only 128 bit: > pxor(vec1, vec2); > ptest(vec1, vec1); Fixed > src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1351: > >> 1349: assert_different_registers(needle, needleVal); >> 1350: >> 1351: bool isLL = (ae == StrIntrinsicNode::LL); > > isLL not used in this function. Fixed ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1609164643 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1609164578 From duke at openjdk.org Wed May 22 02:53:23 2024 From: duke at openjdk.org (kuaiwei) Date: Wed, 22 May 2024 02:53:23 GMT Subject: RFR: 8325821: [REDO] use "dmb.ishst+dmb.ishld" for release barrier [v2] In-Reply-To: References: Message-ID: > he origin patch for https://bugs.openjdk.org/browse/JDK-8324186 has 2 issues: > 1 It show regression in some platform, like Apple silicon in mac os > 2 Can not handle instruction sequence like "dmb.ishld; dmb.ishst; dmb.ishld; dmb.ishld" > > It can be fixed by: > 1 Enable AlwaysMergeDMB by default, only disable it in architecture we can see performance improvement (N1 or N2) > 2 Check the special pattern and merge the subsequent dmb. > > It also fix a bug when code buffer is expanding, st/ld/dmb can not be merged. I added unit tests for these. > > This patch still has a unhandled case. Insts like "dmb.ishld; dmb.ishst; dmb.ish", it will merge the last 2 instructions and can not merge all three. Because when emitting dmb.ish, if merge all previous dmbs, the code buffer will shrink the size. I think it may break some resumption and think it's not a common pattern. > > In previous PR https://github.com/openjdk/jdk/pull/18467 , I tried an implementation to use state machine for merging. But it looks risky to pending instruction during emitting. kuaiwei has updated the pull request incrementally with one additional commit since the last revision: Make MacroAssembler::merge more clear ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19278/files - new: https://git.openjdk.org/jdk/pull/19278/files/b71a1b31..8767e8fa Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19278&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19278&range=00-01 Stats: 8 lines in 1 file changed: 5 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/19278.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19278/head:pull/19278 PR: https://git.openjdk.org/jdk/pull/19278 From thartmann at openjdk.org Wed May 22 05:03:15 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 22 May 2024 05:03:15 GMT Subject: RFR: 8329538: Accelerate P256 on x86_64 using Montgomery intrinsic [v12] In-Reply-To: References: Message-ID: On Tue, 21 May 2024 17:41:46 GMT, Volodymyr Paprotski wrote: >> Performance. Before: >> >> Benchmark (algorithm) (dataSize) (keyLength) (provider) Mode Cnt Score Error Units >> SignatureBench.ECDSA.sign SHA256withECDSA 1024 256 thrpt 3 6443.934 ? 6.491 ops/s >> SignatureBench.ECDSA.sign SHA256withECDSA 16384 256 thrpt 3 6152.979 ? 4.954 ops/s >> SignatureBench.ECDSA.verify SHA256withECDSA 1024 256 thrpt 3 1895.410 ? 36.979 ops/s >> SignatureBench.ECDSA.verify SHA256withECDSA 16384 256 thrpt 3 1878.955 ? 45.487 ops/s >> Benchmark (algorithm) (keyLength) (kpgAlgorithm) (provider) Mode Cnt Score Error Units >> o.o.b.j.c.full.KeyAgreementBench.EC.generateSecret ECDH 256 EC thrpt 3 1357.810 ? 26.584 ops/s >> o.o.b.j.c.small.KeyAgreementBench.EC.generateSecret ECDH 256 EC thrpt 3 1352.119 ? 23.547 ops/s >> Benchmark (isMontBench) Mode Cnt Score Error Units >> PolynomialP256Bench.benchMultiply false thrpt 3 1746.126 ? 10.970 ops/s >> >> Performance, no intrinsic: >> >> Benchmark (algorithm) (dataSize) (keyLength) (provider) Mode Cnt Score Error Units >> SignatureBench.ECDSA.sign SHA256withECDSA 1024 256 thrpt 3 6529.839 ? 42.420 ops/s >> SignatureBench.ECDSA.sign SHA256withECDSA 16384 256 thrpt 3 6199.747 ? 133.566 ops/s >> SignatureBench.ECDSA.verify SHA256withECDSA 1024 256 thrpt 3 1973.676 ? 54.071 ops/s >> SignatureBench.ECDSA.verify SHA256withECDSA 16384 256 thrpt 3 1932.127 ? 35.920 ops/s >> Benchmark (algorithm) (keyLength) (kpgAlgorithm) (provider) Mode Cnt Score Error Units >> o.o.b.j.c.full.KeyAgreementBench.EC.generateSecret ECDH 256 EC thrpt 3 1355.788 ? 29.858 ops/s >> o.o.b.j.c.small.KeyAgreementBench.EC.generateSecret ECDH 256 EC thrpt 3 1346.523 ? 28.722 ops/s >> Benchmark (isMontBench) Mode Cnt Score Error Units >> PolynomialP256Bench.benchMultiply true thrpt 3 1919.57... > > Volodymyr Paprotski has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 17 additional commits since the last revision: > > - Merge remote-tracking branch 'origin/master' into ecc-montgomery > - shenandoah verifier > - comments from Sandhya > - whitespace > - add message back > - whitespace > - Use AffinePoint to exit Montgomery domain > > Style notes: > Affine.equals() > - Mismatched fields only appear to be used from testing, perhaps should be moved there instead > Affine.getX(boolean)|getY(boolean) > - "Passing flag is bad design" - cleanest/performant alternative to several instanceof checks > - needed to convert Affine to Projective (need to stay in montgomery domain) > ECOperations.PointMultiplier > - changes could probably be restored to original (since ProjectivePoint handling no longer required) > - consider these changes an improvement? (fewer nested classes) > - was an inner-class but not using inner-class features (i.e. ecOps variable should be converted) > - whitespace > - Comments from Tony and Jatin > - Comments from Jatin and Tony > - ... and 7 more: https://git.openjdk.org/jdk/compare/9ee91a9f...b1a33004 Thanks! I submitted testing and will report back once it passed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18583#issuecomment-2123869579 From ddong at openjdk.org Wed May 22 06:17:07 2024 From: ddong at openjdk.org (Denghui Dong) Date: Wed, 22 May 2024 06:17:07 GMT Subject: RFR: 8326012: JFR: Event for time to safepoint [v11] In-Reply-To: References: <68hS0kQgtDIk4ioAJj_r0_GLT6h0lcif6Daj6WRwxlI=.40c2a6e7-70a8-4954-bcde-9318ee311028@github.com> Message-ID: On Fri, 12 Apr 2024 13:08:06 GMT, Denghui Dong wrote: >> There are now some JFR events related to safepoint. When time-to-safepoint (aka ttsp) is too long, these events could not be very helpful since based on them we cannot know which threads cause it and what those threads are doing. >> >> Users can use `-XX:+SafepointTimeout -XX:SafepointTimeoutDelay=100` to see the threads that don't reach safepoint in time but without stack traces. Using `-XX:+ AbortVMOnSafepointTimeout` can capture the stack traces but it crashes the process, hence it's not sensible to enable the flag in production. >> >> ~~This patch adds a new JFR event `EventSafepointTimeout` to record the threads that cause ttsp too long.~~ >> >> ~~This event includes two fields:~~ >> >> ~~- safepointId: the relevant safepoint id~~ >> ~~- timeExceeded: the amount of time exceeding `SafepointTimeoutDelay` used by the thread to reach safepoint~~ >> >> ~~In the current version, this event records the stack of those problematic threads when they finally reach safepoint. Hence, there is a bias, but it's still helpful to deduce the root place.~~ >> >> A better implementation is to record a more accurate stack, but this will increase complexity. At the same time, the native stack may also be important for this problem, but it is not currently supported by JFR. >> >> Any input would be greatly appreciated. >> >> Testing: jdk/jdk/jfr > > Denghui Dong has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 21 commits: > > - Merge branch 'master' into JDK-8326012 > - update > - delete _entries when disabled > - fix test failures > - update > - refactor > - update > - update > - update > - update > - ... and 11 more: https://git.openjdk.org/jdk/compare/0f78d017...df58b055 There are no more feedbacks. I plan to close it. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17888#issuecomment-2123942011 From fyang at openjdk.org Wed May 22 06:57:07 2024 From: fyang at openjdk.org (Fei Yang) Date: Wed, 22 May 2024 06:57:07 GMT Subject: RFR: 8332265: RISC-V: Materialize pointers faster by using a temp register [v3] In-Reply-To: References: Message-ID: On Tue, 21 May 2024 08:56:19 GMT, Robbin Ehn wrote: >> Hi, please consider! >> >> Materializing a 48-bit pointer, using an additional register, we can do with: >> lui + lui + slli + add + addi >> This 15% faster both on VF2 and in CPU models, compared to movptr(). >> >> As we often materialize during calls there is free registers. >> >> I have choose just a few spot to use it, many more can use. >> E.g. la() with tmp register can use li48 instead of movptr. >> >> Running tests now (so far so good), as if I screwed up IC calls it should be seen fast. >> And benchmarks when hardware is free. > > Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - Merge branch 'master' into 8332265 > - Small review update > - li48 -> movptr > - Merge branch 'master' into 8332265 > - li48 Hi, Nice work! I only have several minor comments. src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 1423: > 1421: Assembler::patch(branch + 12, 31, 20, (lower >> 6) & 0x7ff); // Addi. target[16: 6] ==> branch[31:20] > 1422: Assembler::patch(branch + 20, 31, 20, lower & 0x3f); // Addi/Jalr/Load. target[ 5: 0] ==> branch[31:20] > 1423: return MOVPTR_INSTRUCTIONS_NUM * NativeInstruction::instruction_size; Maybe rename `MOVPTR_INSTRUCTIONS_NUM` as `MOVPTR1_INSTRUCTIONS_NUM`? (And `MOVPTR2_INSTRUCTIONS_NUM` for `patch_addr_in_movptr2` at the same time) Or simply remove `MOVPTR_INSTRUCTIONS_NUM`: `return 6 * NativeInstruction::instruction_size; // lui + addi + slli + addi + slli + addi/jalr/load` src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 1565: > 1563: } else if (NativeInstruction::is_pc_relative_at(instruction_address)) { // auipc, addi/jalr/load > 1564: return patch_offset_in_pc_relative(instruction_address, offset); > 1565: } else if (NativeInstruction::is_movptr1_at(instruction_address)) { // movptr Code comment: s/movptr/movptr1/ src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 1598: > 1596: offset = get_offset_of_pc_relative(insn_addr); > 1597: } else if (NativeInstruction::is_movptr1_at(insn_addr)) { // movptr > 1598: return get_target_of_movptr(insn_addr); Maybe rename `get_target_of_movptr` as `get_target_of_movptr1` so that it will be consistent in naming? src/hotspot/cpu/riscv/nativeInst_riscv.hpp line 381: > 379: enum RISCV_specific_constants { > 380: movptr1_instruction_size = 6 * NativeInstruction::instruction_size, // lui, addi, slli, addi, slli, addi. See movptr(). > 381: movptr2_instruction_size = 5 * NativeInstruction::instruction_size, // lui, lui, slli, add, addi. See movptr2_imp(). Code comment needs update. No `movptr2_imp`? src/hotspot/cpu/riscv/nativeInst_riscv.hpp line 403: > 401: // Assume: auipc, ld > 402: return addr_at(load_pc_relative_instruction_size); > 403: } else if (is_movptr2_at(instruction_address())) { Move this after the `if (is_movptr1_at(instruction_address())) {` check to group the two together as always? src/hotspot/cpu/riscv/riscv.ad line 1290: > 1288: // skip the movptr2 in MacroAssembler::ic_call(): > 1289: // lui + addi + slli + addi + slli + addi > 1290: // Though movptr() has already 4-byte aligned with or without RVC, Code comment here needs update too. ------------- Changes requested by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19246#pullrequestreview-2069949779 PR Review Comment: https://git.openjdk.org/jdk/pull/19246#discussion_r1609375185 PR Review Comment: https://git.openjdk.org/jdk/pull/19246#discussion_r1609378500 PR Review Comment: https://git.openjdk.org/jdk/pull/19246#discussion_r1609190379 PR Review Comment: https://git.openjdk.org/jdk/pull/19246#discussion_r1609151043 PR Review Comment: https://git.openjdk.org/jdk/pull/19246#discussion_r1609241656 PR Review Comment: https://git.openjdk.org/jdk/pull/19246#discussion_r1609157823 From fyang at openjdk.org Wed May 22 06:57:08 2024 From: fyang at openjdk.org (Fei Yang) Date: Wed, 22 May 2024 06:57:08 GMT Subject: RFR: 8332265: RISC-V: Materialize pointers faster by using a temp register [v2] In-Reply-To: References: <2f25EhAHETKwXhFcg6nE_W37QAU7U7opYHa8Wzo2MfU=.05e5cfce-2d3b-4825-a8af-7963d4c266f7@github.com> Message-ID: On Tue, 21 May 2024 07:58:33 GMT, Ludovic Henry wrote: >> src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 1426: >> >>> 1424: } >>> 1425: >>> 1426: static int patch_addr_in_movptr2(address instruction_address, address target) { >> >> Can we have a common entry of `patch_addr_in_movptr` which delegates work to `patch_addr_in_movptr1` and `patch_addr_in_movptr2`? > > I think it makes sense to split them up as the difference between movptr1 and movptr2 is already done in the caller then. And IIUC we don't plan to merge movptr1 and movptr2 together at any point, so we don't particularly need to abstract them away. All right. I can live with this. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19246#discussion_r1609385061 From aboldtch at openjdk.org Wed May 22 07:18:02 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Wed, 22 May 2024 07:18:02 GMT Subject: RFR: 8319947: Recursive lightweight locking: s390x implementation In-Reply-To: References: <_HXRejW4TcLfTYbXlRQUVyejaomiwx6HhObYfG1lX6E=.f89b29a0-4c2d-43e1-83be-d7bafd84816c@github.com> Message-ID: <8ftPbjSfPRGU8ibdxLD7cxBsC0U26dJgZf8IzHdK0ng=.e3769ecf-87b2-46a5-98bd-22a27a068be0@github.com> On Tue, 21 May 2024 12:55:09 GMT, Amit Kumar wrote: >> src/hotspot/cpu/s390/macroAssembler_s390.cpp line 6015: >> >>> 6013: // we will encounter a loop while handling the inflated monitor case >>> 6014: // so, we need to make sure, when we reach there only top one object is removed. >>> 6015: // if we load top there then it could result into infinite loop, So preserving top is a Must here; >> >> I assume this refers to the assert/debug code that checks that the lock stack does not contain the object when doing inflated unlocking. The debug code could just unconditionally reload top from the thread. > > probably we can do something like this: > > diff --git a/src/hotspot/cpu/s390/macroAssembler_s390.cpp b/src/hotspot/cpu/s390/macroAssembler_s390.cpp > index 5a77e0d49f2..cfde21c84f0 100644 > --- a/src/hotspot/cpu/s390/macroAssembler_s390.cpp > +++ b/src/hotspot/cpu/s390/macroAssembler_s390.cpp > @@ -5977,7 +5977,7 @@ void MacroAssembler::compiler_fast_unlock_lightweight_object(Register obj, Regis > assert_different_registers(obj, tmp1, tmp2); > > // Handle inflated monitor. > - NearLabel inflated, inflated_load_monitor; > + NearLabel inflated, inflated_load_monitor, inflated_intermediate ; > // Finish fast unlock successfully. MUST reach to with flag == EQ. > NearLabel unlocked; > // Finish fast unlock unsuccessfully. MUST branch to with flag == NE. > @@ -6021,7 +6021,7 @@ void MacroAssembler::compiler_fast_unlock_lightweight_object(Register obj, Regis > // Check for monitor (0b10). > z_lg(mark, Address(obj, oopDesc::mark_offset_in_bytes())); > z_tmll(mark, markWord::monitor_value); > - z_brnaz(inflated); > + z_brnaz(inflated_intermediate); > > #ifdef ASSERT > // Check header not unlocked (0b01). > @@ -6063,6 +6063,8 @@ void MacroAssembler::compiler_fast_unlock_lightweight_object(Register obj, Regis > stop("Fast Unlock not monitor"); > #endif // ASSERT > > + bind(inflated_intermediate); > + z_lgf(top, Address(Z_thread, JavaThread::lock_stack_top_offset())); > bind(inflated); > > #ifdef ASSERT > > > But instead I kept the code a bit similar to other architectures and for future just added the comment as a warning to be careful which tweaking the code. I guess except this comment everything is fine ? The current code is fine, but that comment made me wonder why preserving the original top value was important. My think was that you could only change the assert snippet as follows: #ifdef ASSERT NearLabel check_done; + NearLabel loop; + z_lgf(top, Address(Z_thread, JavaThread::lock_stack_top_offset())); + bind(loop); z_aghi(top, -oopSize); compareU32_and_branch(top, in_bytes(JavaThread::lock_stack_base_offset()), bcondLow, check_done); z_cg(obj, Address(Z_thread, top)); - z_brne(inflated); + z_brne(loop); stop("Fast Unlock lock on stack"); bind(check_done); #endif // ASSERT then remove the comment and use either whatever register. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18878#discussion_r1609416066 From aboldtch at openjdk.org Wed May 22 07:18:02 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Wed, 22 May 2024 07:18:02 GMT Subject: RFR: 8319947: Recursive lightweight locking: s390x implementation In-Reply-To: <8ftPbjSfPRGU8ibdxLD7cxBsC0U26dJgZf8IzHdK0ng=.e3769ecf-87b2-46a5-98bd-22a27a068be0@github.com> References: <_HXRejW4TcLfTYbXlRQUVyejaomiwx6HhObYfG1lX6E=.f89b29a0-4c2d-43e1-83be-d7bafd84816c@github.com> <8ftPbjSfPRGU8ibdxLD7cxBsC0U26dJgZf8IzHdK0ng=.e3769ecf-87b2-46a5-98bd-22a27a068be0@github.com> Message-ID: On Wed, 22 May 2024 07:12:30 GMT, Axel Boldt-Christmas wrote: >> probably we can do something like this: >> >> diff --git a/src/hotspot/cpu/s390/macroAssembler_s390.cpp b/src/hotspot/cpu/s390/macroAssembler_s390.cpp >> index 5a77e0d49f2..cfde21c84f0 100644 >> --- a/src/hotspot/cpu/s390/macroAssembler_s390.cpp >> +++ b/src/hotspot/cpu/s390/macroAssembler_s390.cpp >> @@ -5977,7 +5977,7 @@ void MacroAssembler::compiler_fast_unlock_lightweight_object(Register obj, Regis >> assert_different_registers(obj, tmp1, tmp2); >> >> // Handle inflated monitor. >> - NearLabel inflated, inflated_load_monitor; >> + NearLabel inflated, inflated_load_monitor, inflated_intermediate ; >> // Finish fast unlock successfully. MUST reach to with flag == EQ. >> NearLabel unlocked; >> // Finish fast unlock unsuccessfully. MUST branch to with flag == NE. >> @@ -6021,7 +6021,7 @@ void MacroAssembler::compiler_fast_unlock_lightweight_object(Register obj, Regis >> // Check for monitor (0b10). >> z_lg(mark, Address(obj, oopDesc::mark_offset_in_bytes())); >> z_tmll(mark, markWord::monitor_value); >> - z_brnaz(inflated); >> + z_brnaz(inflated_intermediate); >> >> #ifdef ASSERT >> // Check header not unlocked (0b01). >> @@ -6063,6 +6063,8 @@ void MacroAssembler::compiler_fast_unlock_lightweight_object(Register obj, Regis >> stop("Fast Unlock not monitor"); >> #endif // ASSERT >> >> + bind(inflated_intermediate); >> + z_lgf(top, Address(Z_thread, JavaThread::lock_stack_top_offset())); >> bind(inflated); >> >> #ifdef ASSERT >> >> >> But instead I kept the code a bit similar to other architectures and for future just added the comment as a warning to be careful which tweaking the code. I guess except this comment everything is fine ? > > The current code is fine, but that comment made me wonder why preserving the original top value was important. My think was that you could only change the assert snippet as follows: > > #ifdef ASSERT > NearLabel check_done; > + NearLabel loop; > + z_lgf(top, Address(Z_thread, JavaThread::lock_stack_top_offset())); > + bind(loop); > z_aghi(top, -oopSize); > compareU32_and_branch(top, in_bytes(JavaThread::lock_stack_base_offset()), > bcondLow, check_done); > z_cg(obj, Address(Z_thread, top)); > - z_brne(inflated); > + z_brne(loop); > stop("Fast Unlock lock on stack"); > bind(check_done); > #endif // ASSERT > > > then remove the comment and use either whatever register. I still do not think I understand what ` if we load top there then it could result into infinite loop` is referring to. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18878#discussion_r1609421183 From stefank at openjdk.org Wed May 22 07:52:05 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Wed, 22 May 2024 07:52:05 GMT Subject: RFR: 8324781: runtime/Thread/TestAlwaysPreTouchStacks.java failed with Expected a higher ratio between stack committed and reserved [v9] In-Reply-To: References: Message-ID: On Mon, 6 May 2024 03:33:30 GMT, Liming Liu wrote: >> The testcase failed on Oracle CI since JDK-8315923. The root cause is that Oracle CI runs Linux-5.4.17-UEK where the value of MADV_POPULATE_WRITE (23) is used as MADV_DONTEXEC which is not supported by upstream. This PR solves the testcase failure by checking versions of kernels first, and checking the availability of MADV_POPULATE_WRITE when they are not older than 5.14. > > Liming Liu has updated the pull request incrementally with one additional commit since the last revision: > > Fix the wrong condition I'm running this through our tier1-tier3 testing now. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18592#issuecomment-2124103103 From jsjolen at openjdk.org Wed May 22 08:21:47 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Wed, 22 May 2024 08:21:47 GMT Subject: RFR: 8312132: Add tracking of multiple address spaces in NMT [v98] In-Reply-To: References: Message-ID: > Hi, > > This PR introduces a new abstraction to NMT, named `MemoryFileTracker`. Today, NMT does not track any memory outside of the virtual memory address space. This means that if you allocated memory in something such as a memory-backed file and use `mmap` to map into that memory, then you'll have trouble reporting this to NMT. This is the situation that ZGC is in, and that is what this patch attempts to fix. > > ## `MemoryFileTracker` > > The `MemoryFileTracker` adds the ability of adding new virtual memory address spaces to NMT and committing memory to these, the basic API is: > > ```c++ > static MemoryFile* make_device(const char* descriptive_name); > static void free_device(MemoryFile* device); > > static void allocate_memory(MemoryFile* device, size_t offset, size_t size, > MEMFLAGS flag, const NativeCallStack& stack); > static void free_memory(MemoryFile* device, size_t offset, size_t size); > > > It is easiest to see how this is used by looking at what ZGC's `ZNMT` class does: > > ```c++ > void ZNMT::reserve(zaddress_unsafe start, size_t size) { > MemTracker::record_virtual_memory_reserve((address)start, size, CALLER_PC, mtJavaHeap); > } > void ZNMT::commit(zoffset offset, size_t size) { > MemTracker::allocate_memory_in(ZNMT::_device, static_cast(offset), size, mtJavaHeap, CALLER_PC); > } > void ZNMT::uncommit(zoffset offset, size_t size) { > MemTracker::free_memory_in(ZNMT::_device, (size_t)offset, size); > } > > void ZNMT::map(zaddress_unsafe addr, size_t size, zoffset offset) { > // NMT doesn't track mappings at the moment. > } > void ZNMT::unmap(zaddress_unsafe addr, size_t size) { > // NMT doesn't track mappings at the moment. > } > > > As you can see, any mapping between reserved regions and device-allocated memory is not recorded in NMT. This means that in detailed mode you only get reserved regions printed for the reserved memory, the device-allocated memory is reported separately. When performing summary reporting any memory allocated via these devices is added to the corresponding `MEMFLAGS` as `committed` memory. > > This patch is also acting as a base on which we deploy multiple new backend ideas to NMT. These ideas are: > > 1. Implement VMA tracking using a balanced binary tree approach. Today's `VirtualMemoryTracker`'s usage of linked lists is slow and brittle, we'd like to move away from it. Our Treap-based approach in this patch gives a performance boost such that we see 25x better performance in a benchmark. The idea and draft of this... Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: Bit shift for 64bits of entropy ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18289/files - new: https://git.openjdk.org/jdk/pull/18289/files/549a9393..8e4f8bec Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18289&range=97 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18289&range=96-97 Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/18289.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18289/head:pull/18289 PR: https://git.openjdk.org/jdk/pull/18289 From azafari at openjdk.org Wed May 22 08:34:25 2024 From: azafari at openjdk.org (Afshin Zafari) Date: Wed, 22 May 2024 08:34:25 GMT Subject: RFR: 8331539: [REDO] NMT: add/make a mandatory MEMFLAGS argument to family of os::reserve/commit/uncommit memory API Message-ID: <1i0PKv9mCusM6BZqXG8ULe0lRA2Nz2ix4aZHz9otNMM=.b9d2d151-883e-4cb6-be48-4ba45b49ed43@github.com> This PR fixes the problems existed in the original PR (https://github.com/openjdk/jdk/pull/18745). There are two main fixes here: 1- `ReservedSpace` class is changed so that the `_flag` member never changes after it is set in ctor. Since reserving memory regions may go thru a try and fail sequence of reserve-release pairs, changing the `_flag` member at failed releases would lead to incorrect flags in subsequent reserves. Also, some assertion are added to the getters of a `ReservedSpace` to check if the region is successfully reserved. 2- In order to have adjacent regions with different flags, CDS reserves a (large) region `R` and then splits it into sub regions `R1` and `R2` (`R == <---R1---><--R2-->`). At release time, NMT tracks only `R` and ignores releasing `R1` and `R2`. This ignoring is problematic when a requested region `R` is size-aligned to `R1---R---R2` first and then the `R1` and `R2` released (`chop_extra_memory` function is called for this). In this case, NMT ignores tracking `R1` and `R2` with assumption that a containing `R` will be released. Therefore, `R1` and `R2` remain in the NMT region-list and when a new reserve happens at that regions, NMT complains by raising an exception. ------------- Commit messages: - 8331539: [REDO] NMT: add/make a mandatory MEMFLAGS argument to family of os::reserve/commit/uncommit memory API Changes: https://git.openjdk.org/jdk/pull/19343/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19343&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8331539 Stats: 609 lines in 62 files changed: 78 ins; 148 del; 383 mod Patch: https://git.openjdk.org/jdk/pull/19343.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19343/head:pull/19343 PR: https://git.openjdk.org/jdk/pull/19343 From rehn at openjdk.org Wed May 22 08:35:33 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Wed, 22 May 2024 08:35:33 GMT Subject: RFR: 8332265: RISC-V: Materialize pointers faster by using a temp register [v4] In-Reply-To: References: Message-ID: > Hi, please consider! > > Materializing a 48-bit pointer, using an additional register, we can do with: > lui + lui + slli + add + addi > This 15% faster both on VF2 and in CPU models, compared to movptr(). > > As we often materialize during calls there is free registers. > > I have choose just a few spot to use it, many more can use. > E.g. la() with tmp register can use li48 instead of movptr. > > Running tests now (so far so good), as if I screwed up IC calls it should be seen fast. > And benchmarks when hardware is free. Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: - Review changes - Merge branch 'master' into 8332265 - Merge branch 'master' into 8332265 - Small review update - li48 -> movptr - Merge branch 'master' into 8332265 - li48 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19246/files - new: https://git.openjdk.org/jdk/pull/19246/files/c406294a..e017302b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19246&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19246&range=02-03 Stats: 1352 lines in 57 files changed: 833 ins; 330 del; 189 mod Patch: https://git.openjdk.org/jdk/pull/19246.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19246/head:pull/19246 PR: https://git.openjdk.org/jdk/pull/19246 From rehn at openjdk.org Wed May 22 08:35:34 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Wed, 22 May 2024 08:35:34 GMT Subject: RFR: 8332265: RISC-V: Materialize pointers faster by using a temp register [v3] In-Reply-To: <11cTZDsDanZQl1JRMmWTzj4hU53WuXEfYUiOL-Qowcs=.06e33378-bf1c-4728-9ea4-c4771cec5bf8@github.com> References: <11cTZDsDanZQl1JRMmWTzj4hU53WuXEfYUiOL-Qowcs=.06e33378-bf1c-4728-9ea4-c4771cec5bf8@github.com> Message-ID: On Tue, 21 May 2024 08:53:03 GMT, Ludovic Henry wrote: >> Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: >> >> - Merge branch 'master' into 8332265 >> - Small review update >> - li48 -> movptr >> - Merge branch 'master' into 8332265 >> - li48 > > Marked as reviewed by luhenry (Committer). Thanks @luhenry ! Thanks for the second review pass @RealFYang ! ------------- PR Comment: https://git.openjdk.org/jdk/pull/19246#issuecomment-2124189055 From rehn at openjdk.org Wed May 22 08:35:35 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Wed, 22 May 2024 08:35:35 GMT Subject: RFR: 8332265: RISC-V: Materialize pointers faster by using a temp register [v3] In-Reply-To: References: Message-ID: On Wed, 22 May 2024 06:42:56 GMT, Fei Yang wrote: >> Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: >> >> - Merge branch 'master' into 8332265 >> - Small review update >> - li48 -> movptr >> - Merge branch 'master' into 8332265 >> - li48 > > src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 1423: > >> 1421: Assembler::patch(branch + 12, 31, 20, (lower >> 6) & 0x7ff); // Addi. target[16: 6] ==> branch[31:20] >> 1422: Assembler::patch(branch + 20, 31, 20, lower & 0x3f); // Addi/Jalr/Load. target[ 5: 0] ==> branch[31:20] >> 1423: return MOVPTR_INSTRUCTIONS_NUM * NativeInstruction::instruction_size; > > Maybe rename `MOVPTR_INSTRUCTIONS_NUM` as `MOVPTR1_INSTRUCTIONS_NUM`? (And `MOVPTR2_INSTRUCTIONS_NUM` for `patch_addr_in_movptr2` at the same time) > Or simply remove `MOVPTR_INSTRUCTIONS_NUM`: > `return 6 * NativeInstruction::instruction_size; // lui + addi + slli + addi + slli + addi/jalr/load` We already have this defined in nativeinst, I used that enum. > src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 1565: > >> 1563: } else if (NativeInstruction::is_pc_relative_at(instruction_address)) { // auipc, addi/jalr/load >> 1564: return patch_offset_in_pc_relative(instruction_address, offset); >> 1565: } else if (NativeInstruction::is_movptr1_at(instruction_address)) { // movptr > > Code comment: s/movptr/movptr1/ Fixed > src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 1598: > >> 1596: offset = get_offset_of_pc_relative(insn_addr); >> 1597: } else if (NativeInstruction::is_movptr1_at(insn_addr)) { // movptr >> 1598: return get_target_of_movptr(insn_addr); > > Maybe rename `get_target_of_movptr` as `get_target_of_movptr1` so that it will be consistent in naming? Fixed > src/hotspot/cpu/riscv/nativeInst_riscv.hpp line 381: > >> 379: enum RISCV_specific_constants { >> 380: movptr1_instruction_size = 6 * NativeInstruction::instruction_size, // lui, addi, slli, addi, slli, addi. See movptr(). >> 381: movptr2_instruction_size = 5 * NativeInstruction::instruction_size, // lui, lui, slli, add, addi. See movptr2_imp(). > > Code comment needs update. No `movptr2_imp`? Fixed > src/hotspot/cpu/riscv/nativeInst_riscv.hpp line 403: > >> 401: // Assume: auipc, ld >> 402: return addr_at(load_pc_relative_instruction_size); >> 403: } else if (is_movptr2_at(instruction_address())) { > > Move this after the `if (is_movptr1_at(instruction_address())) {` check to group the two together as always? Fixed > src/hotspot/cpu/riscv/riscv.ad line 1290: > >> 1288: // skip the movptr2 in MacroAssembler::ic_call(): >> 1289: // lui + addi + slli + addi + slli + addi >> 1290: // Though movptr() has already 4-byte aligned with or without RVC, > > Code comment here needs update too. Fixed ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19246#discussion_r1609535038 PR Review Comment: https://git.openjdk.org/jdk/pull/19246#discussion_r1609535247 PR Review Comment: https://git.openjdk.org/jdk/pull/19246#discussion_r1609533925 PR Review Comment: https://git.openjdk.org/jdk/pull/19246#discussion_r1609533631 PR Review Comment: https://git.openjdk.org/jdk/pull/19246#discussion_r1609534062 PR Review Comment: https://git.openjdk.org/jdk/pull/19246#discussion_r1609533778 From ihse at openjdk.org Wed May 22 08:37:10 2024 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Wed, 22 May 2024 08:37:10 GMT Subject: RFR: 8332473: ubsan: growableArray.hpp:290:10: runtime error: null pointer passed as argument 1, which is declared to never be null In-Reply-To: References: <-LubBa-IRTqX4WOO-P9_9ulsmTV2KUgUAwZjbiRKcZg=.f3958562-a66d-4b09-9136-002f0736c472@github.com> Message-ID: <0rmbjA6P58dPE1IP-uCbGr2El-50YBy_N7U-O6lF4H4=.d7a5b3a6-d5b7-476b-828e-cf6caeafcc57@github.com> On Fri, 17 May 2024 16:21:57 GMT, Matthias Baesken wrote: >> On Linux x86_64 fastdebug with ubsan enabled we run into this error because we call qsort with a first parameter that is null. >> >> /jdk/src/hotspot/share/utilities/growableArray.hpp:290:10: runtime error: null pointer passed as argument 1, which is declared to never be null >> #0 0x150d701bb4b1 in GrowableArrayView::sort(int (*)(nmethod**, nmethod**)) /jdk/src/hotspot/share/utilities/growableArray.hpp:290 >> #1 0x150d701bb4b1 in ClassUnloadingContext::free_nmethods() /jdk/src/hotspot/share/gc/shared/classUnloadingContext.cpp:159 >> #2 0x150d71f5cca3 in G1CollectedHeap::unload_classes_and_code(char const*, BoolObjectClosure*, GCTimer*) /jdk/src/hotspot/share/gc/g1/g1CollectedHeap.cpp:2538 >> #3 0x150d71ffb009 in G1FullCollector::phase1_mark_live_objects() /jdk/src/hotspot/share/gc/g1/g1FullCollector.cpp:330 >> #4 0x150d71ffc675 in G1FullCollector::collect() /jdk/src/hotspot/share/gc/g1/g1FullCollector.cpp:209 >> #5 0x150d71f3e593 in G1CollectedHeap::do_full_collection(bool, bool) /jdk/src/hotspot/share/gc/g1/g1CollectedHeap.cpp:842 >> #6 0x150d71f5b12d in G1CollectedHeap::satisfy_failed_allocation_helper(unsigned long, bool, bool, bool, bool*) /jdk/src/hotspot/share/gc/g1/g1CollectedHeap.cpp:917 >> #7 0x150d71f5b3dc in G1CollectedHeap::satisfy_failed_allocation(unsigned long, bool*) /jdk/src/hotspot/share/gc/g1/g1CollectedHeap.cpp:930 >> #8 0x150d721835f7 in VM_G1CollectForAllocation::doit() /jdk/src/hotspot/share/gc/g1/g1VMOperations.cpp:127 >> #9 0x150d74291ec8 in VM_Operation::evaluate() /jdk/src/hotspot/share/runtime/vmOperations.cpp:75 >> #10 0x150d742ca1be in VMThread::evaluate_operation(VM_Operation*) /jdk/src/hotspot/share/runtime/vmThread.cpp:283 >> #11 0x150d742cb9e7 in VMThread::inner_execute(VM_Operation*) /jdk/src/hotspot/share/runtime/vmThread.cpp:427 >> #12 0x150d742cc601 in VMThread::loop() /jdk/src/hotspot/share/runtime/vmThread.cpp:493 >> #13 0x150d742cc601 in VMThread::loop() /jdk/src/hotspot/share/runtime/vmThread.cpp:478 >> >> seems we sometimes call qsort with nullptr as first parameter, this is not recommended. >> When adding a guarantee the same can be seen (_data is null). >> So better add a check and do not sort, if there is nothing provided to be sorted . > > Hi Johan, thanks for the review . > > btw seems I found a similar one > > > /jdk/src/java.base/unix/native/libjava/ProcessImpl_md.c:562:5: runtime error: null pointer passed as argument 2, which is declared to never be null > #0 0x7fd95bec78d8 in spawnChild /jdk/src/java.base/unix/native/libjava/ProcessImpl_md.c:562 > #1 0x7fd95bec78d8 in startChild /jdk/src/java.base/unix/native/libjava/ProcessImpl_md.c:612 > #2 0x7fd95bec78d8 in Java_java_lang_ProcessImpl_forkAndExec /jdk/src/java.base/unix/native/libjava/ProcessImpl_md.c:712 > #3 0x7fd93797a06d () > > > but here it is memcpy not qsort . > ` memcpy(buf+offset, c->pdir, sp.dirlen);` gets a second parameter null. > Something similar was discussed and fixed here https://bugs.python.org/issue27570 for Python . > > More info can be found here https://github.com/bellard/quickjs/issues/225 @MBaesken Thank you for your efforts of making the ubsan actually usable! ------------- PR Comment: https://git.openjdk.org/jdk/pull/19283#issuecomment-2124192774 From amitkumar at openjdk.org Wed May 22 08:41:02 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Wed, 22 May 2024 08:41:02 GMT Subject: RFR: 8319947: Recursive lightweight locking: s390x implementation In-Reply-To: References: <_HXRejW4TcLfTYbXlRQUVyejaomiwx6HhObYfG1lX6E=.f89b29a0-4c2d-43e1-83be-d7bafd84816c@github.com> <8ftPbjSfPRGU8ibdxLD7cxBsC0U26dJgZf8IzHdK0ng=.e3769ecf-87b2-46a5-98bd-22a27a068be0@github.com> Message-ID: On Wed, 22 May 2024 07:14:59 GMT, Axel Boldt-Christmas wrote: >> The current code is fine, but that comment made me wonder why preserving the original top value was important. My think was that you could only change the assert snippet as follows: >> >> #ifdef ASSERT >> NearLabel check_done; >> + NearLabel loop; >> + z_lgf(top, Address(Z_thread, JavaThread::lock_stack_top_offset())); >> + bind(loop); >> z_aghi(top, -oopSize); >> compareU32_and_branch(top, in_bytes(JavaThread::lock_stack_base_offset()), >> bcondLow, check_done); >> z_cg(obj, Address(Z_thread, top)); >> - z_brne(inflated); >> + z_brne(loop); >> stop("Fast Unlock lock on stack"); >> bind(check_done); >> #endif // ASSERT >> >> >> then remove the comment and use either whatever register. > > I still do not think I understand what ` if we load top there then it could result into infinite loop` is referring to. Initial version of my code change was like this: diff --git a/src/hotspot/cpu/s390/macroAssembler_s390.cpp b/src/hotspot/cpu/s390/macroAssembler_s390.cpp index 5a77e0d49f2..8fd3e241ca8 100644 --- a/src/hotspot/cpu/s390/macroAssembler_s390.cpp +++ b/src/hotspot/cpu/s390/macroAssembler_s390.cpp @@ -6064,7 +6064,7 @@ void MacroAssembler::compiler_fast_unlock_lightweight_object(Register obj, Regis #endif // ASSERT bind(inflated); - + z_lgf(top, Address(Z_thread, JavaThread::lock_stack_top_offset())); #ifdef ASSERT NearLabel check_done; z_aghi(top, -oopSize); that's why added that comment there. But personally I liked the suggestion you gave. I'll update the code. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18878#discussion_r1609545336 From tholenstein at openjdk.org Wed May 22 08:53:11 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Wed, 22 May 2024 08:53:11 GMT Subject: Integrated: 8329748: Change default value of AssertWXAtThreadSync to true In-Reply-To: References: Message-ID: On Mon, 6 May 2024 11:10:08 GMT, Tobias Holenstein wrote: > The debug flag `-XX:+AssertWXAtThreadSync` conservatively checks for correct W^X thread state at possible safepoints or handshake. The flag is useful to detect missing `MACOS_AARCH64_ONLY(ThreadWXEnable __wx(WXWrite, thread));`. Since the check is cheap and it is a `AARCH64_ONLY(develop(..))` only flag it makes sense to enable the flag by default. > > There was one missing `MACOS_AARCH64_ONLY(ThreadWXEnable __wx(WXWrite, thread));` to make all tests (tier1-7) pass. This pull request has now been integrated. Changeset: 3d511ff6 Author: Tobias Holenstein URL: https://git.openjdk.org/jdk/commit/3d511ff63e59f542ae20c722bfef1c867cd1da0e Stats: 3 lines in 2 files changed: 1 ins; 0 del; 2 mod 8329748: Change default value of AssertWXAtThreadSync to true Reviewed-by: kvn, rrich ------------- PR: https://git.openjdk.org/jdk/pull/19102 From tholenstein at openjdk.org Wed May 22 08:53:10 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Wed, 22 May 2024 08:53:10 GMT Subject: RFR: 8329748: Change default value of AssertWXAtThreadSync to true In-Reply-To: References: Message-ID: On Mon, 6 May 2024 16:52:19 GMT, Vladimir Kozlov wrote: >> The debug flag `-XX:+AssertWXAtThreadSync` conservatively checks for correct W^X thread state at possible safepoints or handshake. The flag is useful to detect missing `MACOS_AARCH64_ONLY(ThreadWXEnable __wx(WXWrite, thread));`. Since the check is cheap and it is a `AARCH64_ONLY(develop(..))` only flag it makes sense to enable the flag by default. >> >> There was one missing `MACOS_AARCH64_ONLY(ThreadWXEnable __wx(WXWrite, thread));` to make all tests (tier1-7) pass. > > Good. Thanks for the reviews @vnkozlov , @reinrich and @dean-long ------------- PR Comment: https://git.openjdk.org/jdk/pull/19102#issuecomment-2124223006 From ihse at openjdk.org Wed May 22 08:59:03 2024 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Wed, 22 May 2024 08:59:03 GMT Subject: RFR: 8331671: Implement JEP 472: Prepare to Restrict the Use of JNI [v8] In-Reply-To: References: Message-ID: On Fri, 17 May 2024 13:38:25 GMT, Maurizio Cimadamore wrote: >> This PR implements [JEP 472](https://openjdk.org/jeps/472), by restricting the use of JNI in the following ways: >> >> * `System::load` and `System::loadLibrary` are now restricted methods >> * `Runtime::load` and `Runtime::loadLibrary` are now restricted methods >> * binding a JNI `native` method declaration to a native implementation is now considered a restricted operation >> >> This PR slightly changes the way in which the JDK deals with restricted methods, even for FFM API calls. In Java 22, the single `--enable-native-access` was used both to specify a set of modules for which native access should be allowed *and* to specify whether illegal native access (that is, native access occurring from a module not specified by `--enable-native-access`) should be treated as an error or a warning. More specifically, an error is only issued if the `--enable-native-access flag` is used at least once. >> >> Here, a new flag is introduced, namely `illegal-native-access=allow/warn/deny`, which is used to specify what should happen when access to a restricted method and/or functionality is found outside the set of modules specified with `--enable-native-access`. The default policy is `warn`, but users can select `allow` to suppress the warnings, or `deny` to cause `IllegalCallerException` to be thrown. This aligns the treatment of restricted methods with other mechanisms, such as `--illegal-access` and the more recent `--sun-misc-unsafe-memory-access`. >> >> Some changes were required in the package-info javadoc for `java.lang.foreign`, to reflect the changes in the command line flags described above. > > Maurizio Cimadamore has updated the pull request incrementally with one additional commit since the last revision: > > Address review comments Build changes look good. Thanks for trimming down NATIVE_ACCESS_MODULES. ------------- Marked as reviewed by ihse (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19213#pullrequestreview-2070573791 From luhenry at openjdk.org Wed May 22 09:19:06 2024 From: luhenry at openjdk.org (Ludovic Henry) Date: Wed, 22 May 2024 09:19:06 GMT Subject: RFR: 8326306: RISC-V: Re-structure MASM calls and jumps [v13] In-Reply-To: References: Message-ID: On Fri, 17 May 2024 12:50:18 GMT, Robbin Ehn wrote: >> Hi, please consider. >> >> We have code that directly use the asm for call/jumps instead masm. >> Our masm have a bit odd naming, and we don't use 'proper' pseudoinstructions/mnemonics. >> Suggested by [riscv-asm-manual](https://github.com/riscv-non-isa/riscv-asm-manual/tree/master) >> >> j offset jal x0, offset Jump >> jal offset jal x1, offset Jump and link >> jr rs jalr x0, rs, 0 Jump register >> jalr rs jalr x1, rs, 0 Jump and link register >> ret jalr x0, x1, 0 Return from subroutine >> call offset auipc x1, offset[31:12]; jalr x1, x1, offset[11:0] Call far-away subroutine >> tail offset auipc x6, offset[31:12]; jalr x0, x6, offset[11:0] Tail call far-away subroutine >> >> But these can only be implemented like this if you have small enough application. >> The fallback of these is to use GOT (your C compiler should place a copy of GOT every 2G so it's always reachable). >> We don't have GOT, instead we materialize, so there is still differences between these and ours. >> >> This patch: >> - Tries to follow these suggested mappings as good we can. >> - Make sure all jumps/calls go through MASM. (so we get control and can easily change for sites using a certain calling convention) >> - To avoid confusion between MASM public/private methods and ASM methods and the mnemonics there are some renaming. >> E.g. the mnemonics jal means call offset, as we can't use that so there is no 'jal'. >> - I enabled c.j, but right now we never generate it. >> - As always the macro does no good and are legacy from when code base did not use templates. (also the x-macros screws up my IDE (vim+rtags)) >> >> I started down this path due to I have followup patch on top of this which removes trampoline in favor for load-n-jump. >> (WIP: https://github.com/robehn/jdk/compare/jal-fixes...robehn:jdk:load-n-link?expand=1) >> While looking into our calls it was a bit confusing, this helps. >> >> Done a couple of t1-3 slightly different version of this patch, and as part of the followup, no issues found. (VF2, qemu, LP4) >> Re-running tests, had some last minute changes. >> >> Thanks, Robbin > > Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 16 additional commits since the last revision: > > - Merge branch 'master' into jal-fixes > - Use la() instead movptr where ok. > - Review changes > - Merge branch 'master' into jal-fixes > - Merge branch 'master' into jal-fixes > - Revert JNI field, call()->li() > - Use li instead of movptr for call > - REVERT: Use li instead of movptr > - Use li instead of movptr > - VM leaf should use li > - ... and 6 more: https://git.openjdk.org/jdk/compare/94f1c08c...d882cd59 Marked as reviewed by luhenry (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/18942#pullrequestreview-2070618310 From ayang at openjdk.org Wed May 22 09:30:23 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Wed, 22 May 2024 09:30:23 GMT Subject: RFR: 8332676: Remove unused BarrierSetAssembler::incr_allocated_bytes Message-ID: Trivial removing dead code. ------------- Commit messages: - basm-trivial Changes: https://git.openjdk.org/jdk/pull/19345/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19345&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8332676 Stats: 126 lines in 10 files changed: 0 ins; 126 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/19345.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19345/head:pull/19345 PR: https://git.openjdk.org/jdk/pull/19345 From jsjolen at openjdk.org Wed May 22 10:01:30 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Wed, 22 May 2024 10:01:30 GMT Subject: RFR: 8312132: Add tracking of multiple address spaces in NMT [v99] In-Reply-To: References: Message-ID: > Hi, > > This PR introduces a new abstraction to NMT, named `MemoryFileTracker`. Today, NMT does not track any memory outside of the virtual memory address space. This means that if you allocated memory in something such as a memory-backed file and use `mmap` to map into that memory, then you'll have trouble reporting this to NMT. This is the situation that ZGC is in, and that is what this patch attempts to fix. > > ## `MemoryFileTracker` > > The `MemoryFileTracker` adds the ability of adding new virtual memory address spaces to NMT and committing memory to these, the basic API is: > > ```c++ > static MemoryFile* make_device(const char* descriptive_name); > static void free_device(MemoryFile* device); > > static void allocate_memory(MemoryFile* device, size_t offset, size_t size, > MEMFLAGS flag, const NativeCallStack& stack); > static void free_memory(MemoryFile* device, size_t offset, size_t size); > > > It is easiest to see how this is used by looking at what ZGC's `ZNMT` class does: > > ```c++ > void ZNMT::reserve(zaddress_unsafe start, size_t size) { > MemTracker::record_virtual_memory_reserve((address)start, size, CALLER_PC, mtJavaHeap); > } > void ZNMT::commit(zoffset offset, size_t size) { > MemTracker::allocate_memory_in(ZNMT::_device, static_cast(offset), size, mtJavaHeap, CALLER_PC); > } > void ZNMT::uncommit(zoffset offset, size_t size) { > MemTracker::free_memory_in(ZNMT::_device, (size_t)offset, size); > } > > void ZNMT::map(zaddress_unsafe addr, size_t size, zoffset offset) { > // NMT doesn't track mappings at the moment. > } > void ZNMT::unmap(zaddress_unsafe addr, size_t size) { > // NMT doesn't track mappings at the moment. > } > > > As you can see, any mapping between reserved regions and device-allocated memory is not recorded in NMT. This means that in detailed mode you only get reserved regions printed for the reserved memory, the device-allocated memory is reported separately. When performing summary reporting any memory allocated via these devices is added to the corresponding `MEMFLAGS` as `committed` memory. > > This patch is also acting as a base on which we deploy multiple new backend ideas to NMT. These ideas are: > > 1. Implement VMA tracking using a balanced binary tree approach. Today's `VirtualMemoryTracker`'s usage of linked lists is slow and brittle, we'd like to move away from it. Our Treap-based approach in this patch gives a performance boost such that we see 25x better performance in a benchmark. The idea and draft of this... Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: Use a simple page-granular tracker to check consistency with ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18289/files - new: https://git.openjdk.org/jdk/pull/18289/files/8e4f8bec..2de1de20 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18289&range=98 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18289&range=97-98 Stats: 123 lines in 1 file changed: 123 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/18289.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18289/head:pull/18289 PR: https://git.openjdk.org/jdk/pull/18289 From jsjolen at openjdk.org Wed May 22 10:06:41 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Wed, 22 May 2024 10:06:41 GMT Subject: RFR: 8312132: Add tracking of multiple address spaces in NMT [v100] In-Reply-To: References: Message-ID: > Hi, > > This PR introduces a new abstraction to NMT, named `MemoryFileTracker`. Today, NMT does not track any memory outside of the virtual memory address space. This means that if you allocated memory in something such as a memory-backed file and use `mmap` to map into that memory, then you'll have trouble reporting this to NMT. This is the situation that ZGC is in, and that is what this patch attempts to fix. > > ## `MemoryFileTracker` > > The `MemoryFileTracker` adds the ability of adding new virtual memory address spaces to NMT and committing memory to these, the basic API is: > > ```c++ > static MemoryFile* make_device(const char* descriptive_name); > static void free_device(MemoryFile* device); > > static void allocate_memory(MemoryFile* device, size_t offset, size_t size, > MEMFLAGS flag, const NativeCallStack& stack); > static void free_memory(MemoryFile* device, size_t offset, size_t size); > > > It is easiest to see how this is used by looking at what ZGC's `ZNMT` class does: > > ```c++ > void ZNMT::reserve(zaddress_unsafe start, size_t size) { > MemTracker::record_virtual_memory_reserve((address)start, size, CALLER_PC, mtJavaHeap); > } > void ZNMT::commit(zoffset offset, size_t size) { > MemTracker::allocate_memory_in(ZNMT::_device, static_cast(offset), size, mtJavaHeap, CALLER_PC); > } > void ZNMT::uncommit(zoffset offset, size_t size) { > MemTracker::free_memory_in(ZNMT::_device, (size_t)offset, size); > } > > void ZNMT::map(zaddress_unsafe addr, size_t size, zoffset offset) { > // NMT doesn't track mappings at the moment. > } > void ZNMT::unmap(zaddress_unsafe addr, size_t size) { > // NMT doesn't track mappings at the moment. > } > > > As you can see, any mapping between reserved regions and device-allocated memory is not recorded in NMT. This means that in detailed mode you only get reserved regions printed for the reserved memory, the device-allocated memory is reported separately. When performing summary reporting any memory allocated via these devices is added to the corresponding `MEMFLAGS` as `committed` memory. > > This patch is also acting as a base on which we deploy multiple new backend ideas to NMT. These ideas are: > > 1. Implement VMA tracking using a balanced binary tree approach. Today's `VirtualMemoryTracker`'s usage of linked lists is slow and brittle, we'd like to move away from it. Our Treap-based approach in this patch gives a performance boost such that we see 25x better performance in a benchmark. The idea and draft of this... Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: Use size_t ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18289/files - new: https://git.openjdk.org/jdk/pull/18289/files/2de1de20..9f85a797 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18289&range=99 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18289&range=98-99 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/18289.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18289/head:pull/18289 PR: https://git.openjdk.org/jdk/pull/18289 From shade at openjdk.org Wed May 22 10:20:06 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 22 May 2024 10:20:06 GMT Subject: RFR: 8332082: Shenandoah: Use consistent tests to determine when pre-write barrier is active [v3] In-Reply-To: <7YitGep10T35vf9lzitE2Oz3A9XwZywdDpgeiQoMXho=.7bb368d9-ea10-447d-ad29-6429f8ef6631@github.com> References: <7YitGep10T35vf9lzitE2Oz3A9XwZywdDpgeiQoMXho=.7bb368d9-ea10-447d-ad29-6429f8ef6631@github.com> Message-ID: On Mon, 20 May 2024 16:59:25 GMT, William Kemper wrote: >> This is consistent with c1 and other platforms. > > William Kemper has updated the pull request incrementally with one additional commit since the last revision: > > Fix typo Dang. So we are regressing C1 performance a little here, by changing the single load-cmp to load-and-cmp. That is unfortunate, but might be acceptable? src/hotspot/share/gc/shenandoah/c1/shenandoahBarrierSetC1.cpp line 75: > 73: > 74: // Create a mask to test if the marking bit is set. > 75: // TODO: can we directly test if bit is set? No, we cannot: C1 LIR does not have a corresponding operation. ------------- PR Review: https://git.openjdk.org/jdk/pull/19180#pullrequestreview-2070736549 PR Review Comment: https://git.openjdk.org/jdk/pull/19180#discussion_r1609666303 From jsjolen at openjdk.org Wed May 22 10:20:44 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Wed, 22 May 2024 10:20:44 GMT Subject: RFR: 8312132: Add tracking of multiple address spaces in NMT [v101] In-Reply-To: References: Message-ID: <08ICYPmMNNb4BRlieWoyLtF8-65iK8PdsitM2aUVc-8=.6dd2c75b-35cc-412c-88e3-e0bd16af1686@github.com> > Hi, > > This PR introduces a new abstraction to NMT, named `MemoryFileTracker`. Today, NMT does not track any memory outside of the virtual memory address space. This means that if you allocated memory in something such as a memory-backed file and use `mmap` to map into that memory, then you'll have trouble reporting this to NMT. This is the situation that ZGC is in, and that is what this patch attempts to fix. > > ## `MemoryFileTracker` > > The `MemoryFileTracker` adds the ability of adding new virtual memory address spaces to NMT and committing memory to these, the basic API is: > > ```c++ > static MemoryFile* make_device(const char* descriptive_name); > static void free_device(MemoryFile* device); > > static void allocate_memory(MemoryFile* device, size_t offset, size_t size, > MEMFLAGS flag, const NativeCallStack& stack); > static void free_memory(MemoryFile* device, size_t offset, size_t size); > > > It is easiest to see how this is used by looking at what ZGC's `ZNMT` class does: > > ```c++ > void ZNMT::reserve(zaddress_unsafe start, size_t size) { > MemTracker::record_virtual_memory_reserve((address)start, size, CALLER_PC, mtJavaHeap); > } > void ZNMT::commit(zoffset offset, size_t size) { > MemTracker::allocate_memory_in(ZNMT::_device, static_cast(offset), size, mtJavaHeap, CALLER_PC); > } > void ZNMT::uncommit(zoffset offset, size_t size) { > MemTracker::free_memory_in(ZNMT::_device, (size_t)offset, size); > } > > void ZNMT::map(zaddress_unsafe addr, size_t size, zoffset offset) { > // NMT doesn't track mappings at the moment. > } > void ZNMT::unmap(zaddress_unsafe addr, size_t size) { > // NMT doesn't track mappings at the moment. > } > > > As you can see, any mapping between reserved regions and device-allocated memory is not recorded in NMT. This means that in detailed mode you only get reserved regions printed for the reserved memory, the device-allocated memory is reported separately. When performing summary reporting any memory allocated via these devices is added to the corresponding `MEMFLAGS` as `committed` memory. > > This patch is also acting as a base on which we deploy multiple new backend ideas to NMT. These ideas are: > > 1. Implement VMA tracking using a balanced binary tree approach. Today's `VirtualMemoryTracker`'s usage of linked lists is slow and brittle, we'd like to move away from it. Our Treap-based approach in this patch gives a performance boost such that we see 25x better performance in a benchmark. The idea and draft of this... Johan Sj?len has updated the pull request incrementally with three additional commits since the last revision: - Off-by-1 - Fix - Assert ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18289/files - new: https://git.openjdk.org/jdk/pull/18289/files/9f85a797..5c8d384a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18289&range=100 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18289&range=99-100 Stats: 5 lines in 1 file changed: 2 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/18289.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18289/head:pull/18289 PR: https://git.openjdk.org/jdk/pull/18289 From rehn at openjdk.org Wed May 22 10:23:11 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Wed, 22 May 2024 10:23:11 GMT Subject: RFR: 8326306: RISC-V: Re-structure MASM calls and jumps [v13] In-Reply-To: References: Message-ID: On Wed, 22 May 2024 09:15:55 GMT, Ludovic Henry wrote: >> Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 16 additional commits since the last revision: >> >> - Merge branch 'master' into jal-fixes >> - Use la() instead movptr where ok. >> - Review changes >> - Merge branch 'master' into jal-fixes >> - Merge branch 'master' into jal-fixes >> - Revert JNI field, call()->li() >> - Use li instead of movptr for call >> - REVERT: Use li instead of movptr >> - Use li instead of movptr >> - VM leaf should use li >> - ... and 6 more: https://git.openjdk.org/jdk/compare/9862ee01...d882cd59 > > Marked as reviewed by luhenry (Committer). Thank you @luhenry. I did a full t1-t3 (t1 also with 2047) (VF2) I have seen no issues with this, I'll integrate later today. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18942#issuecomment-2124434347 From jsjolen at openjdk.org Wed May 22 10:23:49 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Wed, 22 May 2024 10:23:49 GMT Subject: RFR: 8312132: Add tracking of multiple address spaces in NMT [v102] In-Reply-To: References: Message-ID: <2glJ9BKSUHldHO9FOqzcx-tC8cjXgb1l5x7eXjZmNQ8=.29c7c192-ba93-4593-9ecd-3d71e77e0b71@github.com> > Hi, > > This PR introduces a new abstraction to NMT, named `MemoryFileTracker`. Today, NMT does not track any memory outside of the virtual memory address space. This means that if you allocated memory in something such as a memory-backed file and use `mmap` to map into that memory, then you'll have trouble reporting this to NMT. This is the situation that ZGC is in, and that is what this patch attempts to fix. > > ## `MemoryFileTracker` > > The `MemoryFileTracker` adds the ability of adding new virtual memory address spaces to NMT and committing memory to these, the basic API is: > > ```c++ > static MemoryFile* make_device(const char* descriptive_name); > static void free_device(MemoryFile* device); > > static void allocate_memory(MemoryFile* device, size_t offset, size_t size, > MEMFLAGS flag, const NativeCallStack& stack); > static void free_memory(MemoryFile* device, size_t offset, size_t size); > > > It is easiest to see how this is used by looking at what ZGC's `ZNMT` class does: > > ```c++ > void ZNMT::reserve(zaddress_unsafe start, size_t size) { > MemTracker::record_virtual_memory_reserve((address)start, size, CALLER_PC, mtJavaHeap); > } > void ZNMT::commit(zoffset offset, size_t size) { > MemTracker::allocate_memory_in(ZNMT::_device, static_cast(offset), size, mtJavaHeap, CALLER_PC); > } > void ZNMT::uncommit(zoffset offset, size_t size) { > MemTracker::free_memory_in(ZNMT::_device, (size_t)offset, size); > } > > void ZNMT::map(zaddress_unsafe addr, size_t size, zoffset offset) { > // NMT doesn't track mappings at the moment. > } > void ZNMT::unmap(zaddress_unsafe addr, size_t size) { > // NMT doesn't track mappings at the moment. > } > > > As you can see, any mapping between reserved regions and device-allocated memory is not recorded in NMT. This means that in detailed mode you only get reserved regions printed for the reserved memory, the device-allocated memory is reported separately. When performing summary reporting any memory allocated via these devices is added to the corresponding `MEMFLAGS` as `committed` memory. > > This patch is also acting as a base on which we deploy multiple new backend ideas to NMT. These ideas are: > > 1. Implement VMA tracking using a balanced binary tree approach. Today's `VirtualMemoryTracker`'s usage of linked lists is slow and brittle, we'd like to move away from it. Our Treap-based approach in this patch gives a performance boost such that we see 25x better performance in a benchmark. The idea and draft of this... Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: Off-by-2 :) ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18289/files - new: https://git.openjdk.org/jdk/pull/18289/files/5c8d384a..8cf973af Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18289&range=101 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18289&range=100-101 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/18289.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18289/head:pull/18289 PR: https://git.openjdk.org/jdk/pull/18289 From tschatzl at openjdk.org Wed May 22 10:37:00 2024 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Wed, 22 May 2024 10:37:00 GMT Subject: RFR: 8332676: Remove unused BarrierSetAssembler::incr_allocated_bytes In-Reply-To: References: Message-ID: On Wed, 22 May 2024 09:23:46 GMT, Albert Mingkun Yang wrote: > Trivial removing dead code. I think `Thread::allocated_bytes_offset()` can now also removed. ------------- Changes requested by tschatzl (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19345#pullrequestreview-2070812814 From shade at openjdk.org Wed May 22 11:11:04 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 22 May 2024 11:11:04 GMT Subject: RFR: 8325821: [REDO] use "dmb.ishst+dmb.ishld" for release barrier [v2] In-Reply-To: References: Message-ID: On Wed, 22 May 2024 02:53:23 GMT, kuaiwei wrote: >> he origin patch for https://bugs.openjdk.org/browse/JDK-8324186 has 2 issues: >> 1 It show regression in some platform, like Apple silicon in mac os >> 2 Can not handle instruction sequence like "dmb.ishld; dmb.ishst; dmb.ishld; dmb.ishld" >> >> It can be fixed by: >> 1 Enable AlwaysMergeDMB by default, only disable it in architecture we can see performance improvement (N1 or N2) >> 2 Check the special pattern and merge the subsequent dmb. >> >> It also fix a bug when code buffer is expanding, st/ld/dmb can not be merged. I added unit tests for these. >> >> This patch still has a unhandled case. Insts like "dmb.ishld; dmb.ishst; dmb.ish", it will merge the last 2 instructions and can not merge all three. Because when emitting dmb.ish, if merge all previous dmbs, the code buffer will shrink the size. I think it may break some resumption and think it's not a common pattern. >> >> In previous PR https://github.com/openjdk/jdk/pull/18467 , I tried an implementation to use state machine for merging. But it looks risky to pending instruction during emitting. > > kuaiwei has updated the pull request incrementally with one additional commit since the last revision: > > Make MacroAssembler::merge more clear Cursory review: src/hotspot/cpu/aarch64/globals_aarch64.hpp line 127: > 125: product(ccstr, UseBranchProtection, "none", \ > 126: "Branch Protection to use: none, standard, pac-ret") \ > 127: product(bool, AlwaysMergeDMB, true, DIAGNOSTIC, \ Suggestion: product(bool, AlwaysMergeDMB, true, DIAGNOSTIC, \ test/hotspot/gtest/aarch64/test_assembler_aarch64.cpp line 93: > 91: } > 92: > 93: TEST_VM(AssemblerAArch64, merge_dmb) { Given the previous experience with barrier merges that prompted the backout, I would prefer to have a more comprehensive test here, maybe an additional one. I am thinking something like the exhaustive combination of 4 back-to-back barriers of each of 5 types. This gives us 5^4 = 625 test cases, which I think is still manageable. test/hotspot/gtest/aarch64/test_assembler_aarch64.cpp line 198: > 196: } > 197: > 198: TEST_VM(AssemblerAArch64, merge_ldst) { This test seems to be irrelevant for the issue at hand? Tests `ld/st` -> `ldp/stp` merging, not the barrier merges? ------------- PR Review: https://git.openjdk.org/jdk/pull/19278#pullrequestreview-2070795468 PR Review Comment: https://git.openjdk.org/jdk/pull/19278#discussion_r1609708317 PR Review Comment: https://git.openjdk.org/jdk/pull/19278#discussion_r1609759284 PR Review Comment: https://git.openjdk.org/jdk/pull/19278#discussion_r1609707981 From duke at openjdk.org Wed May 22 11:15:02 2024 From: duke at openjdk.org (kuaiwei) Date: Wed, 22 May 2024 11:15:02 GMT Subject: RFR: 8325821: [REDO] use "dmb.ishst+dmb.ishld" for release barrier [v2] In-Reply-To: References: Message-ID: On Wed, 22 May 2024 10:25:40 GMT, Aleksey Shipilev wrote: >> kuaiwei has updated the pull request incrementally with one additional commit since the last revision: >> >> Make MacroAssembler::merge more clear > > test/hotspot/gtest/aarch64/test_assembler_aarch64.cpp line 198: > >> 196: } >> 197: >> 198: TEST_VM(AssemblerAArch64, merge_ldst) { > > This test seems to be irrelevant for the issue at hand? Tests `ld/st` -> `ldp/stp` merging, not the barrier merges? In this patch, I fixed an issue, dmb/st/ld may not merge if CodeBuffer is expanding, I added some unit tests to check it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19278#discussion_r1609764081 From rehn at openjdk.org Wed May 22 11:16:04 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Wed, 22 May 2024 11:16:04 GMT Subject: RFR: 8332265: RISC-V: Materialize pointers faster by using a temp register [v2] In-Reply-To: <2f25EhAHETKwXhFcg6nE_W37QAU7U7opYHa8Wzo2MfU=.05e5cfce-2d3b-4825-a8af-7963d4c266f7@github.com> References: <2f25EhAHETKwXhFcg6nE_W37QAU7U7opYHa8Wzo2MfU=.05e5cfce-2d3b-4825-a8af-7963d4c266f7@github.com> Message-ID: On Tue, 21 May 2024 05:53:27 GMT, Fei Yang wrote: >> Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: >> >> - li48 -> movptr >> - Merge branch 'master' into 8332265 >> - li48 > > src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 1526: > >> 1524: } >> 1525: >> 1526: static address get_target_of_movptr2(address insn_addr) { > > Similar here. Maybe we can have a common entry of `get_target_of_movptr` which delegates work to `get_target_of_movptr1` and `get_target_of_movptr2`? Same here or? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19246#discussion_r1609765802 From ayang at openjdk.org Wed May 22 11:26:13 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Wed, 22 May 2024 11:26:13 GMT Subject: RFR: 8332676: Remove unused BarrierSetAssembler::incr_allocated_bytes [v2] In-Reply-To: References: Message-ID: > Trivial removing dead code. Albert Mingkun Yang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: - review - Merge branch 'master' into basm-trivial - basm-trivial ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19345/files - new: https://git.openjdk.org/jdk/pull/19345/files/fb060cda..970ea0f9 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19345&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19345&range=00-01 Stats: 14 lines in 5 files changed: 8 ins; 2 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/19345.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19345/head:pull/19345 PR: https://git.openjdk.org/jdk/pull/19345 From rehn at openjdk.org Wed May 22 11:51:10 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Wed, 22 May 2024 11:51:10 GMT Subject: Integrated: 8326306: RISC-V: Re-structure MASM calls and jumps In-Reply-To: References: Message-ID: On Thu, 25 Apr 2024 07:17:07 GMT, Robbin Ehn wrote: > Hi, please consider. > > We have code that directly use the asm for call/jumps instead masm. > Our masm have a bit odd naming, and we don't use 'proper' pseudoinstructions/mnemonics. > Suggested by [riscv-asm-manual](https://github.com/riscv-non-isa/riscv-asm-manual/tree/master) > > j offset jal x0, offset Jump > jal offset jal x1, offset Jump and link > jr rs jalr x0, rs, 0 Jump register > jalr rs jalr x1, rs, 0 Jump and link register > ret jalr x0, x1, 0 Return from subroutine > call offset auipc x1, offset[31:12]; jalr x1, x1, offset[11:0] Call far-away subroutine > tail offset auipc x6, offset[31:12]; jalr x0, x6, offset[11:0] Tail call far-away subroutine > > But these can only be implemented like this if you have small enough application. > The fallback of these is to use GOT (your C compiler should place a copy of GOT every 2G so it's always reachable). > We don't have GOT, instead we materialize, so there is still differences between these and ours. > > This patch: > - Tries to follow these suggested mappings as good we can. > - Make sure all jumps/calls go through MASM. (so we get control and can easily change for sites using a certain calling convention) > - To avoid confusion between MASM public/private methods and ASM methods and the mnemonics there are some renaming. > E.g. the mnemonics jal means call offset, as we can't use that so there is no 'jal'. > - I enabled c.j, but right now we never generate it. > - As always the macro does no good and are legacy from when code base did not use templates. (also the x-macros screws up my IDE (vim+rtags)) > > I started down this path due to I have followup patch on top of this which removes trampoline in favor for load-n-jump. > (WIP: https://github.com/robehn/jdk/compare/jal-fixes...robehn:jdk:load-n-link?expand=1) > While looking into our calls it was a bit confusing, this helps. > > Done a couple of t1-3 slightly different version of this patch, and as part of the followup, no issues found. (VF2, qemu, LP4) > Re-running tests, had some last minute changes. > > Thanks, Robbin This pull request has now been integrated. Changeset: c3bc23fe Author: Robbin Ehn URL: https://git.openjdk.org/jdk/commit/c3bc23fe48ca1603afe68a6ac4aaa523a1edbb41 Stats: 363 lines in 9 files changed: 107 ins; 106 del; 150 mod 8326306: RISC-V: Re-structure MASM calls and jumps Reviewed-by: fyang, luhenry ------------- PR: https://git.openjdk.org/jdk/pull/18942 From azafari at openjdk.org Wed May 22 12:09:14 2024 From: azafari at openjdk.org (Afshin Zafari) Date: Wed, 22 May 2024 12:09:14 GMT Subject: RFR: 8331539: [REDO] NMT: add/make a mandatory MEMFLAGS argument to family of os::reserve/commit/uncommit memory API [v2] In-Reply-To: <1i0PKv9mCusM6BZqXG8ULe0lRA2Nz2ix4aZHz9otNMM=.b9d2d151-883e-4cb6-be48-4ba45b49ed43@github.com> References: <1i0PKv9mCusM6BZqXG8ULe0lRA2Nz2ix4aZHz9otNMM=.b9d2d151-883e-4cb6-be48-4ba45b49ed43@github.com> Message-ID: <_M5SvhyN_E_8HUeamhiLJMp37flhjgTVE_X7t8jmPVc=.f86cbb23-9461-4013-83bf-d6b154b96cfd@github.com> > This PR fixes the problems existed in the original PR (https://github.com/openjdk/jdk/pull/18745). There are two main fixes here: > 1- `ReservedSpace` class is changed so that the `_flag` member never changes after it is set in ctor. Since reserving memory regions may go thru a try and fail sequence of reserve-release pairs, changing the `_flag` member at failed releases would lead to incorrect flags in subsequent reserves. > Also, some assertion are added to the getters of a `ReservedSpace` to check if the region is successfully reserved. > > 2- In order to have adjacent regions with different flags, CDS reserves a (large) region `R` and then splits it into sub regions `R1` and `R2` (`R == <---R1---><--R2-->`). At release time, NMT tracks only `R` and ignores releasing `R1` and `R2`. This ignoring is problematic when a requested region `R` is size-aligned to `R1---R---R2` first and then the `R1` and `R2` released (`chop_extra_memory` function is called for this). In this case, NMT ignores tracking `R1` and `R2` with assumption that a containing `R` will be released. Therefore, `R1` and `R2` remain in the NMT region-list and when a new reserve happens at that regions, NMT complains by raising an exception. Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision: fixed the missing parts of shenandoahHeap.cpp ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19343/files - new: https://git.openjdk.org/jdk/pull/19343/files/6b6e2e12..86ae1e37 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19343&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19343&range=00-01 Stats: 122 lines in 1 file changed: 94 ins; 23 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/19343.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19343/head:pull/19343 PR: https://git.openjdk.org/jdk/pull/19343 From jsjolen at openjdk.org Wed May 22 13:59:33 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Wed, 22 May 2024 13:59:33 GMT Subject: RFR: 8312132: Add tracking of multiple address spaces in NMT [v103] In-Reply-To: References: Message-ID: > Hi, > > This PR introduces a new abstraction to NMT, named `MemoryFileTracker`. Today, NMT does not track any memory outside of the virtual memory address space. This means that if you allocated memory in something such as a memory-backed file and use `mmap` to map into that memory, then you'll have trouble reporting this to NMT. This is the situation that ZGC is in, and that is what this patch attempts to fix. > > ## `MemoryFileTracker` > > The `MemoryFileTracker` adds the ability of adding new virtual memory address spaces to NMT and committing memory to these, the basic API is: > > ```c++ > static MemoryFile* make_device(const char* descriptive_name); > static void free_device(MemoryFile* device); > > static void allocate_memory(MemoryFile* device, size_t offset, size_t size, > MEMFLAGS flag, const NativeCallStack& stack); > static void free_memory(MemoryFile* device, size_t offset, size_t size); > > > It is easiest to see how this is used by looking at what ZGC's `ZNMT` class does: > > ```c++ > void ZNMT::reserve(zaddress_unsafe start, size_t size) { > MemTracker::record_virtual_memory_reserve((address)start, size, CALLER_PC, mtJavaHeap); > } > void ZNMT::commit(zoffset offset, size_t size) { > MemTracker::allocate_memory_in(ZNMT::_device, static_cast(offset), size, mtJavaHeap, CALLER_PC); > } > void ZNMT::uncommit(zoffset offset, size_t size) { > MemTracker::free_memory_in(ZNMT::_device, (size_t)offset, size); > } > > void ZNMT::map(zaddress_unsafe addr, size_t size, zoffset offset) { > // NMT doesn't track mappings at the moment. > } > void ZNMT::unmap(zaddress_unsafe addr, size_t size) { > // NMT doesn't track mappings at the moment. > } > > > As you can see, any mapping between reserved regions and device-allocated memory is not recorded in NMT. This means that in detailed mode you only get reserved regions printed for the reserved memory, the device-allocated memory is reported separately. When performing summary reporting any memory allocated via these devices is added to the corresponding `MEMFLAGS` as `committed` memory. > > This patch is also acting as a base on which we deploy multiple new backend ideas to NMT. These ideas are: > > 1. Implement VMA tracking using a balanced binary tree approach. Today's `VirtualMemoryTracker`'s usage of linked lists is slow and brittle, we'd like to move away from it. Our Treap-based approach in this patch gives a performance boost such that we see 25x better performance in a benchmark. The idea and draft of this... Johan Sj?len has updated the pull request incrementally with two additional commits since the last revision: - In-depth check - Remove asserts ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18289/files - new: https://git.openjdk.org/jdk/pull/18289/files/8cf973af..252350c2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18289&range=102 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18289&range=101-102 Stats: 56 lines in 1 file changed: 51 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/18289.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18289/head:pull/18289 PR: https://git.openjdk.org/jdk/pull/18289 From jsjolen at openjdk.org Wed May 22 14:07:28 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Wed, 22 May 2024 14:07:28 GMT Subject: RFR: 8312132: Add tracking of multiple address spaces in NMT [v104] In-Reply-To: References: Message-ID: <5-fmd0kfcD7ASRjJNKiLdN4TpnxGppQhAnqSg0rU4ck=.ae99d39b-ee79-4d76-b4e0-333e7993c6b3@github.com> > Hi, > > This PR introduces a new abstraction to NMT, named `MemoryFileTracker`. Today, NMT does not track any memory outside of the virtual memory address space. This means that if you allocated memory in something such as a memory-backed file and use `mmap` to map into that memory, then you'll have trouble reporting this to NMT. This is the situation that ZGC is in, and that is what this patch attempts to fix. > > ## `MemoryFileTracker` > > The `MemoryFileTracker` adds the ability of adding new virtual memory address spaces to NMT and committing memory to these, the basic API is: > > ```c++ > static MemoryFile* make_device(const char* descriptive_name); > static void free_device(MemoryFile* device); > > static void allocate_memory(MemoryFile* device, size_t offset, size_t size, > MEMFLAGS flag, const NativeCallStack& stack); > static void free_memory(MemoryFile* device, size_t offset, size_t size); > > > It is easiest to see how this is used by looking at what ZGC's `ZNMT` class does: > > ```c++ > void ZNMT::reserve(zaddress_unsafe start, size_t size) { > MemTracker::record_virtual_memory_reserve((address)start, size, CALLER_PC, mtJavaHeap); > } > void ZNMT::commit(zoffset offset, size_t size) { > MemTracker::allocate_memory_in(ZNMT::_device, static_cast(offset), size, mtJavaHeap, CALLER_PC); > } > void ZNMT::uncommit(zoffset offset, size_t size) { > MemTracker::free_memory_in(ZNMT::_device, (size_t)offset, size); > } > > void ZNMT::map(zaddress_unsafe addr, size_t size, zoffset offset) { > // NMT doesn't track mappings at the moment. > } > void ZNMT::unmap(zaddress_unsafe addr, size_t size) { > // NMT doesn't track mappings at the moment. > } > > > As you can see, any mapping between reserved regions and device-allocated memory is not recorded in NMT. This means that in detailed mode you only get reserved regions printed for the reserved memory, the device-allocated memory is reported separately. When performing summary reporting any memory allocated via these devices is added to the corresponding `MEMFLAGS` as `committed` memory. > > This patch is also acting as a base on which we deploy multiple new backend ideas to NMT. These ideas are: > > 1. Implement VMA tracking using a balanced binary tree approach. Today's `VirtualMemoryTracker`'s usage of linked lists is slow and brittle, we'd like to move away from it. Our Treap-based approach in this patch gives a performance boost such that we see 25x better performance in a benchmark. The idea and draft of this... Johan Sj?len has updated the pull request incrementally with two additional commits since the last revision: - Abstract out the page size - Lower number of in-depth checks ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18289/files - new: https://git.openjdk.org/jdk/pull/18289/files/252350c2..75db7249 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18289&range=103 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18289&range=102-103 Stats: 19 lines in 1 file changed: 2 ins; 1 del; 16 mod Patch: https://git.openjdk.org/jdk/pull/18289.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18289/head:pull/18289 PR: https://git.openjdk.org/jdk/pull/18289 From thartmann at openjdk.org Wed May 22 14:10:14 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 22 May 2024 14:10:14 GMT Subject: RFR: 8329538: Accelerate P256 on x86_64 using Montgomery intrinsic [v12] In-Reply-To: References: Message-ID: On Tue, 21 May 2024 17:41:46 GMT, Volodymyr Paprotski wrote: >> Performance. Before: >> >> Benchmark (algorithm) (dataSize) (keyLength) (provider) Mode Cnt Score Error Units >> SignatureBench.ECDSA.sign SHA256withECDSA 1024 256 thrpt 3 6443.934 ? 6.491 ops/s >> SignatureBench.ECDSA.sign SHA256withECDSA 16384 256 thrpt 3 6152.979 ? 4.954 ops/s >> SignatureBench.ECDSA.verify SHA256withECDSA 1024 256 thrpt 3 1895.410 ? 36.979 ops/s >> SignatureBench.ECDSA.verify SHA256withECDSA 16384 256 thrpt 3 1878.955 ? 45.487 ops/s >> Benchmark (algorithm) (keyLength) (kpgAlgorithm) (provider) Mode Cnt Score Error Units >> o.o.b.j.c.full.KeyAgreementBench.EC.generateSecret ECDH 256 EC thrpt 3 1357.810 ? 26.584 ops/s >> o.o.b.j.c.small.KeyAgreementBench.EC.generateSecret ECDH 256 EC thrpt 3 1352.119 ? 23.547 ops/s >> Benchmark (isMontBench) Mode Cnt Score Error Units >> PolynomialP256Bench.benchMultiply false thrpt 3 1746.126 ? 10.970 ops/s >> >> Performance, no intrinsic: >> >> Benchmark (algorithm) (dataSize) (keyLength) (provider) Mode Cnt Score Error Units >> SignatureBench.ECDSA.sign SHA256withECDSA 1024 256 thrpt 3 6529.839 ? 42.420 ops/s >> SignatureBench.ECDSA.sign SHA256withECDSA 16384 256 thrpt 3 6199.747 ? 133.566 ops/s >> SignatureBench.ECDSA.verify SHA256withECDSA 1024 256 thrpt 3 1973.676 ? 54.071 ops/s >> SignatureBench.ECDSA.verify SHA256withECDSA 16384 256 thrpt 3 1932.127 ? 35.920 ops/s >> Benchmark (algorithm) (keyLength) (kpgAlgorithm) (provider) Mode Cnt Score Error Units >> o.o.b.j.c.full.KeyAgreementBench.EC.generateSecret ECDH 256 EC thrpt 3 1355.788 ? 29.858 ops/s >> o.o.b.j.c.small.KeyAgreementBench.EC.generateSecret ECDH 256 EC thrpt 3 1346.523 ? 28.722 ops/s >> Benchmark (isMontBench) Mode Cnt Score Error Units >> PolynomialP256Bench.benchMultiply true thrpt 3 1919.57... > > Volodymyr Paprotski has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 17 additional commits since the last revision: > > - Merge remote-tracking branch 'origin/master' into ecc-montgomery > - shenandoah verifier > - comments from Sandhya > - whitespace > - add message back > - whitespace > - Use AffinePoint to exit Montgomery domain > > Style notes: > Affine.equals() > - Mismatched fields only appear to be used from testing, perhaps should be moved there instead > Affine.getX(boolean)|getY(boolean) > - "Passing flag is bad design" - cleanest/performant alternative to several instanceof checks > - needed to convert Affine to Projective (need to stay in montgomery domain) > ECOperations.PointMultiplier > - changes could probably be restored to original (since ProjectivePoint handling no longer required) > - consider these changes an improvement? (fewer nested classes) > - was an inner-class but not using inner-class features (i.e. ecOps variable should be converted) > - whitespace > - Comments from Tony and Jatin > - Comments from Jatin and Tony > - ... and 7 more: https://git.openjdk.org/jdk/compare/45457761...b1a33004 All tests passed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18583#issuecomment-2124892444 From jsjolen at openjdk.org Wed May 22 14:18:21 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Wed, 22 May 2024 14:18:21 GMT Subject: RFR: 8312132: Add tracking of multiple address spaces in NMT [v105] In-Reply-To: References: Message-ID: <2ybqkyANkDhUsvmOISMWO5_lnBpiNbsG9rA6u__iy70=.d6ea67eb-6df9-40b1-aae8-24c60202a5d4@github.com> On Wed, 22 May 2024 14:12:44 GMT, Johan Sj?len wrote: >> Hi, >> >> This PR introduces a new abstraction to NMT, named `MemoryFileTracker`. Today, NMT does not track any memory outside of the virtual memory address space. This means that if you allocated memory in something such as a memory-backed file and use `mmap` to map into that memory, then you'll have trouble reporting this to NMT. This is the situation that ZGC is in, and that is what this patch attempts to fix. >> >> ## `MemoryFileTracker` >> >> The `MemoryFileTracker` adds the ability of adding new virtual memory address spaces to NMT and committing memory to these, the basic API is: >> >> ```c++ >> static MemoryFile* make_device(const char* descriptive_name); >> static void free_device(MemoryFile* device); >> >> static void allocate_memory(MemoryFile* device, size_t offset, size_t size, >> MEMFLAGS flag, const NativeCallStack& stack); >> static void free_memory(MemoryFile* device, size_t offset, size_t size); >> >> >> It is easiest to see how this is used by looking at what ZGC's `ZNMT` class does: >> >> ```c++ >> void ZNMT::reserve(zaddress_unsafe start, size_t size) { >> MemTracker::record_virtual_memory_reserve((address)start, size, CALLER_PC, mtJavaHeap); >> } >> void ZNMT::commit(zoffset offset, size_t size) { >> MemTracker::allocate_memory_in(ZNMT::_device, static_cast(offset), size, mtJavaHeap, CALLER_PC); >> } >> void ZNMT::uncommit(zoffset offset, size_t size) { >> MemTracker::free_memory_in(ZNMT::_device, (size_t)offset, size); >> } >> >> void ZNMT::map(zaddress_unsafe addr, size_t size, zoffset offset) { >> // NMT doesn't track mappings at the moment. >> } >> void ZNMT::unmap(zaddress_unsafe addr, size_t size) { >> // NMT doesn't track mappings at the moment. >> } >> >> >> As you can see, any mapping between reserved regions and device-allocated memory is not recorded in NMT. This means that in detailed mode you only get reserved regions printed for the reserved memory, the device-allocated memory is reported separately. When performing summary reporting any memory allocated via these devices is added to the corresponding `MEMFLAGS` as `committed` memory. >> >> This patch is also acting as a base on which we deploy multiple new backend ideas to NMT. These ideas are: >> >> 1. Implement VMA tracking using a balanced binary tree approach. Today's `VirtualMemoryTracker`'s usage of linked lists is slow and brittle, we'd like to move away from it. Our Treap-based approach in this patch gives a performance bo... > > Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: > > Lower number of pages I've introduced a new test for the `VMATree` which uses a simple tracker (based on an array of page-sized slots) that performs random operations on the VMATree and simple tracker. These are then checked for consistency between the two, both by checking the summary diff and periodically a more in-depth check. The in-depth check loops over all of the pages in the simple tracker and finds ranges of regions, the start and end of these regions are looked for in the VMATree and checked that both flag and stack matches. It's a fairly costly test, taking approximately 2 minutes to run on one of my machines. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18289#issuecomment-2124915448 From jsjolen at openjdk.org Wed May 22 14:12:44 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Wed, 22 May 2024 14:12:44 GMT Subject: RFR: 8312132: Add tracking of multiple address spaces in NMT [v105] In-Reply-To: References: Message-ID: > Hi, > > This PR introduces a new abstraction to NMT, named `MemoryFileTracker`. Today, NMT does not track any memory outside of the virtual memory address space. This means that if you allocated memory in something such as a memory-backed file and use `mmap` to map into that memory, then you'll have trouble reporting this to NMT. This is the situation that ZGC is in, and that is what this patch attempts to fix. > > ## `MemoryFileTracker` > > The `MemoryFileTracker` adds the ability of adding new virtual memory address spaces to NMT and committing memory to these, the basic API is: > > ```c++ > static MemoryFile* make_device(const char* descriptive_name); > static void free_device(MemoryFile* device); > > static void allocate_memory(MemoryFile* device, size_t offset, size_t size, > MEMFLAGS flag, const NativeCallStack& stack); > static void free_memory(MemoryFile* device, size_t offset, size_t size); > > > It is easiest to see how this is used by looking at what ZGC's `ZNMT` class does: > > ```c++ > void ZNMT::reserve(zaddress_unsafe start, size_t size) { > MemTracker::record_virtual_memory_reserve((address)start, size, CALLER_PC, mtJavaHeap); > } > void ZNMT::commit(zoffset offset, size_t size) { > MemTracker::allocate_memory_in(ZNMT::_device, static_cast(offset), size, mtJavaHeap, CALLER_PC); > } > void ZNMT::uncommit(zoffset offset, size_t size) { > MemTracker::free_memory_in(ZNMT::_device, (size_t)offset, size); > } > > void ZNMT::map(zaddress_unsafe addr, size_t size, zoffset offset) { > // NMT doesn't track mappings at the moment. > } > void ZNMT::unmap(zaddress_unsafe addr, size_t size) { > // NMT doesn't track mappings at the moment. > } > > > As you can see, any mapping between reserved regions and device-allocated memory is not recorded in NMT. This means that in detailed mode you only get reserved regions printed for the reserved memory, the device-allocated memory is reported separately. When performing summary reporting any memory allocated via these devices is added to the corresponding `MEMFLAGS` as `committed` memory. > > This patch is also acting as a base on which we deploy multiple new backend ideas to NMT. These ideas are: > > 1. Implement VMA tracking using a balanced binary tree approach. Today's `VirtualMemoryTracker`'s usage of linked lists is slow and brittle, we'd like to move away from it. Our Treap-based approach in this patch gives a performance boost such that we see 25x better performance in a benchmark. The idea and draft of this... Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: Lower number of pages ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18289/files - new: https://git.openjdk.org/jdk/pull/18289/files/75db7249..80605766 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18289&range=104 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18289&range=103-104 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/18289.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18289/head:pull/18289 PR: https://git.openjdk.org/jdk/pull/18289 From kbarrett at openjdk.org Wed May 22 14:13:04 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Wed, 22 May 2024 14:13:04 GMT Subject: RFR: 8332676: Remove unused BarrierSetAssembler::incr_allocated_bytes [v2] In-Reply-To: References: Message-ID: On Wed, 22 May 2024 11:26:13 GMT, Albert Mingkun Yang wrote: >> Trivial removing dead code. > > Albert Mingkun Yang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - review > - Merge branch 'master' into basm-trivial > - basm-trivial Looks good, and trivial (though maybe wait for @tschatzl to circle back). ------------- Marked as reviewed by kbarrett (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19345#pullrequestreview-2071360928 From tschatzl at openjdk.org Wed May 22 14:20:15 2024 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Wed, 22 May 2024 14:20:15 GMT Subject: RFR: 8332676: Remove unused BarrierSetAssembler::incr_allocated_bytes [v2] In-Reply-To: References: Message-ID: On Wed, 22 May 2024 11:26:13 GMT, Albert Mingkun Yang wrote: >> Trivial removing dead code. > > Albert Mingkun Yang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - review > - Merge branch 'master' into basm-trivial > - basm-trivial Fine with me, thanks and trivial. There are two more mentions of `incr_allocated_bytes` in comments, please remove too. ------------- Marked as reviewed by tschatzl (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19345#pullrequestreview-2071387350 From duke at openjdk.org Wed May 22 14:22:16 2024 From: duke at openjdk.org (Volodymyr Paprotski) Date: Wed, 22 May 2024 14:22:16 GMT Subject: RFR: 8329538: Accelerate P256 on x86_64 using Montgomery intrinsic [v12] In-Reply-To: References: Message-ID: <2HF_LGpK7B6i1UcgJ8g9JgzGF27gVAHZkGnVQk-Fo4w=.98339735-cd89-4059-a449-6a4911e9af41@github.com> On Tue, 21 May 2024 17:41:46 GMT, Volodymyr Paprotski wrote: >> Performance. Before: >> >> Benchmark (algorithm) (dataSize) (keyLength) (provider) Mode Cnt Score Error Units >> SignatureBench.ECDSA.sign SHA256withECDSA 1024 256 thrpt 3 6443.934 ? 6.491 ops/s >> SignatureBench.ECDSA.sign SHA256withECDSA 16384 256 thrpt 3 6152.979 ? 4.954 ops/s >> SignatureBench.ECDSA.verify SHA256withECDSA 1024 256 thrpt 3 1895.410 ? 36.979 ops/s >> SignatureBench.ECDSA.verify SHA256withECDSA 16384 256 thrpt 3 1878.955 ? 45.487 ops/s >> Benchmark (algorithm) (keyLength) (kpgAlgorithm) (provider) Mode Cnt Score Error Units >> o.o.b.j.c.full.KeyAgreementBench.EC.generateSecret ECDH 256 EC thrpt 3 1357.810 ? 26.584 ops/s >> o.o.b.j.c.small.KeyAgreementBench.EC.generateSecret ECDH 256 EC thrpt 3 1352.119 ? 23.547 ops/s >> Benchmark (isMontBench) Mode Cnt Score Error Units >> PolynomialP256Bench.benchMultiply false thrpt 3 1746.126 ? 10.970 ops/s >> >> Performance, no intrinsic: >> >> Benchmark (algorithm) (dataSize) (keyLength) (provider) Mode Cnt Score Error Units >> SignatureBench.ECDSA.sign SHA256withECDSA 1024 256 thrpt 3 6529.839 ? 42.420 ops/s >> SignatureBench.ECDSA.sign SHA256withECDSA 16384 256 thrpt 3 6199.747 ? 133.566 ops/s >> SignatureBench.ECDSA.verify SHA256withECDSA 1024 256 thrpt 3 1973.676 ? 54.071 ops/s >> SignatureBench.ECDSA.verify SHA256withECDSA 16384 256 thrpt 3 1932.127 ? 35.920 ops/s >> Benchmark (algorithm) (keyLength) (kpgAlgorithm) (provider) Mode Cnt Score Error Units >> o.o.b.j.c.full.KeyAgreementBench.EC.generateSecret ECDH 256 EC thrpt 3 1355.788 ? 29.858 ops/s >> o.o.b.j.c.small.KeyAgreementBench.EC.generateSecret ECDH 256 EC thrpt 3 1346.523 ? 28.722 ops/s >> Benchmark (isMontBench) Mode Cnt Score Error Units >> PolynomialP256Bench.benchMultiply true thrpt 3 1919.57... > > Volodymyr Paprotski has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 17 additional commits since the last revision: > > - Merge remote-tracking branch 'origin/master' into ecc-montgomery > - shenandoah verifier > - comments from Sandhya > - whitespace > - add message back > - whitespace > - Use AffinePoint to exit Montgomery domain > > Style notes: > Affine.equals() > - Mismatched fields only appear to be used from testing, perhaps should be moved there instead > Affine.getX(boolean)|getY(boolean) > - "Passing flag is bad design" - cleanest/performant alternative to several instanceof checks > - needed to convert Affine to Projective (need to stay in montgomery domain) > ECOperations.PointMultiplier > - changes could probably be restored to original (since ProjectivePoint handling no longer required) > - consider these changes an improvement? (fewer nested classes) > - was an inner-class but not using inner-class features (i.e. ecOps variable should be converted) > - whitespace > - Comments from Tony and Jatin > - Comments from Jatin and Tony > - ... and 7 more: https://git.openjdk.org/jdk/compare/c0032e2c...b1a33004 Thanks Tobi! ------------- PR Comment: https://git.openjdk.org/jdk/pull/18583#issuecomment-2124924526 From sgibbons at openjdk.org Wed May 22 14:26:18 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Wed, 22 May 2024 14:26:18 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v6] In-Reply-To: References: Message-ID: <-IZk0dL-Bd2Gp5zsI3DSsHzNl6-6lB_8HRd4KkBUALw=.0ee706a8-9281-40f8-a0ba-d53385edcdcf@github.com> On Tue, 9 Jan 2024 15:06:10 GMT, Emanuel Peter wrote: >> Scott Gibbons has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 21 commits: >> >> - Merge branch 'openjdk:master' into indexof >> - Addressing review comments. >> - Fix for JDK-8321599 >> - Support UU IndexOf >> - Only use optimization when EnableX86ECoreOpts is true >> - Fix whitespace >> - Merge branch 'openjdk:master' into indexof >> - Comments; added exhaustive-ish test >> - Subtracting 0x10 twice. >> - Stomped on r13 in switch branch calculation >> - ... and 11 more: https://git.openjdk.org/jdk/compare/8a4dc79e...600377b0 > > src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1608: > >> 1606: // vector compares when size is 2 * VEC_SIZE or less. 38 8. Use 4 >> 1607: // vector compares when size is 4 * VEC_SIZE or less. 39 9. Use 8 >> 1608: // vector compares when size is 8 * VEC_SIZE or less. */ > > Is this formatting intended? Fixed > src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1672: > >> 1670: >> 1671: // 98 VPCMPEQ VEC_SIZE(%rdi), %ymm2, %ymm2 >> 1672: // 99 vpmovmskb %ymm2, %eax > > It seems that here the comments and code is strangely interleaved / shifted. What is this all for? All this has been remedied > src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 2301: > >> 2299: // 388 setg %dl >> 2300: // 389 leal -1(%rdx, %rdx), %eax >> 2301: __ movzbl(rcx, Address(rsi, rax, Address::times_1, -0x20)); > > Down here it is even worse All this has been remedied ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1610074501 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1610076284 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1610076661 From ayang at openjdk.org Wed May 22 14:27:16 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Wed, 22 May 2024 14:27:16 GMT Subject: RFR: 8332676: Remove unused BarrierSetAssembler::incr_allocated_bytes [v3] In-Reply-To: References: Message-ID: <_OEn7BK9EykA6z5ARry8eu17tV3z3bS0jKZyw9huz74=.92b76c58-0941-41e5-86de-b430d902e8fd@github.com> > Trivial removing dead code. Albert Mingkun Yang has updated the pull request incrementally with one additional commit since the last revision: review ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19345/files - new: https://git.openjdk.org/jdk/pull/19345/files/970ea0f9..66ce9202 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19345&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19345&range=01-02 Stats: 2 lines in 2 files changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/19345.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19345/head:pull/19345 PR: https://git.openjdk.org/jdk/pull/19345 From mbaesken at openjdk.org Wed May 22 14:35:27 2024 From: mbaesken at openjdk.org (Matthias Baesken) Date: Wed, 22 May 2024 14:35:27 GMT Subject: RFR: 8332720: ubsan: instanceKlass.cpp:3550:76: runtime error: member call on null pointer of type 'struct Array' Message-ID: When running hs :tier1 tests, with ubsan enabled (configure flag --enable-ubsan), in test runtime/CommandLine/PrintClasses_id0.jtr this error is reported ; seems we miss a nullptr check that is in place at similar coding in instanceKlass.cpp . /jdk/src/hotspot/share/oops/instanceKlass.cpp:3550:76: runtime error: member call on null pointer of type 'struct Array' #0 0x7fed098d2362 in InstanceKlass::print_on(outputStream*) const /jdk/src/hotspot/share/oops/instanceKlass.cpp:3550 #1 0x7fed09897cdc in PrintClassClosure::do_klass(Klass*) /jdk/src/hotspot/share/oops/instanceKlass.cpp:2228 #2 0x7fed08bed334 in ClassLoaderData::classes_do(KlassClosure*) /jdk/src/hotspot/share/classfile/classLoaderData.cpp:387 #3 0x7fed08c06403 in ClassLoaderDataGraph::classes_do(KlassClosure*) /jdk/src/hotspot/share/classfile/classLoaderDataGraph.cpp:303 #4 0x7fed09108768 in VM_PrintClasses::doit() /jdk/src/hotspot/share/services/diagnosticCommand.cpp:989 #5 0x7fed0b776c38 in VM_Operation::evaluate() /jdk/src/hotspot/share/runtime/vmOperations.cpp:75 #6 0x7fed0b7af23e in VMThread::evaluate_operation(VM_Operation*) /jdk/src/hotspot/share/runtime/vmThread.cpp:283 #7 0x7fed0b7b0a67 in VMThread::inner_execute(VM_Operation*) /jdk/src/hotspot/share/runtime/vmThread.cpp:427 #8 0x7fed0b7b1681 in VMThread::loop() /jdk/src/hotspot/share/runtime/vmThread.cpp:493 #9 0x7fed0b7b1681 in VMThread::loop() /jdk/src/hotspot/share/runtime/vmThread.cpp:478 #10 0x7fed0b7b182d in VMThread::run() /jdk/src/hotspot/share/runtime/vmThread.cpp:177 #11 0x7fed0b4e8b0f in Thread::call_run() /jdk/src/hotspot/share/runtime/thread.cpp:225 #12 0x7fed0a9dae75 in thread_native_entry /jdk/src/hotspot/os/linux/os_linux.cpp:846 #13 0x7fed10fed6e9 in start_thread (/lib64/libpthread.so.0+0xa6e9) (BuildId: 2f8d3c2d0f4d7888c2598d2ff6356537f5708a73) #14 0x7fed1051550e in clone (/lib64/libc.so.6+0x11850e) (BuildId: f732026552f6adff988b338e92d466bc81a01c37) ------------- Commit messages: - JDK-8332720 Changes: https://git.openjdk.org/jdk/pull/19349/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19349&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8332720 Stats: 3 lines in 1 file changed: 2 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/19349.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19349/head:pull/19349 PR: https://git.openjdk.org/jdk/pull/19349 From stefank at openjdk.org Wed May 22 14:48:02 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Wed, 22 May 2024 14:48:02 GMT Subject: RFR: 8332720: ubsan: instanceKlass.cpp:3550:76: runtime error: member call on null pointer of type 'struct Array' In-Reply-To: References: Message-ID: On Wed, 22 May 2024 14:30:01 GMT, Matthias Baesken wrote: > When running hs :tier1 tests, with ubsan enabled (configure flag --enable-ubsan), in test runtime/CommandLine/PrintClasses_id0.jtr > this error is reported ; seems we miss a nullptr check that is in place at similar coding in instanceKlass.cpp . > > /jdk/src/hotspot/share/oops/instanceKlass.cpp:3550:76: runtime error: member call on null pointer of type 'struct Array' > #0 0x7fed098d2362 in InstanceKlass::print_on(outputStream*) const /jdk/src/hotspot/share/oops/instanceKlass.cpp:3550 > #1 0x7fed09897cdc in PrintClassClosure::do_klass(Klass*) /jdk/src/hotspot/share/oops/instanceKlass.cpp:2228 > #2 0x7fed08bed334 in ClassLoaderData::classes_do(KlassClosure*) /jdk/src/hotspot/share/classfile/classLoaderData.cpp:387 > #3 0x7fed08c06403 in ClassLoaderDataGraph::classes_do(KlassClosure*) /jdk/src/hotspot/share/classfile/classLoaderDataGraph.cpp:303 > #4 0x7fed09108768 in VM_PrintClasses::doit() /jdk/src/hotspot/share/services/diagnosticCommand.cpp:989 > #5 0x7fed0b776c38 in VM_Operation::evaluate() /jdk/src/hotspot/share/runtime/vmOperations.cpp:75 > #6 0x7fed0b7af23e in VMThread::evaluate_operation(VM_Operation*) /jdk/src/hotspot/share/runtime/vmThread.cpp:283 > #7 0x7fed0b7b0a67 in VMThread::inner_execute(VM_Operation*) /jdk/src/hotspot/share/runtime/vmThread.cpp:427 > #8 0x7fed0b7b1681 in VMThread::loop() /jdk/src/hotspot/share/runtime/vmThread.cpp:493 > #9 0x7fed0b7b1681 in VMThread::loop() /jdk/src/hotspot/share/runtime/vmThread.cpp:478 > #10 0x7fed0b7b182d in VMThread::run() /jdk/src/hotspot/share/runtime/vmThread.cpp:177 > #11 0x7fed0b4e8b0f in Thread::call_run() /jdk/src/hotspot/share/runtime/thread.cpp:225 > #12 0x7fed0a9dae75 in thread_native_entry /jdk/src/hotspot/os/linux/os_linux.cpp:846 > #13 0x7fed10fed6e9 in start_thread (/lib64/libpthread.so.0+0xa6e9) (BuildId: 2f8d3c2d0f4d7888c2598d2ff6356537f5708a73) > #14 0x7fed1051550e in clone (/lib64/libc.so.6+0x11850e) (BuildId: f732026552f6adff988b338e92d466bc81a01c37) src/hotspot/share/oops/instanceKlass.cpp line 3552: > 3550: st->print(BULLET"default_methods: "); > 3551: if (default_methods() != nullptr) { default_methods()->print_value_on(st); } > 3552: st->cr(); The `default_vtable_indicies() printing looks like this: if (default_vtable_indices() != nullptr) { st->print(BULLET"default vtable indices: "); default_vtable_indices()->print_value_on(st); st->cr(); } Should this change make the code follow the same pattern? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19349#discussion_r1610132839 From mbaesken at openjdk.org Wed May 22 14:53:05 2024 From: mbaesken at openjdk.org (Matthias Baesken) Date: Wed, 22 May 2024 14:53:05 GMT Subject: RFR: 8332720: ubsan: instanceKlass.cpp:3550:76: runtime error: member call on null pointer of type 'struct Array' In-Reply-To: References: Message-ID: On Wed, 22 May 2024 14:30:01 GMT, Matthias Baesken wrote: > When running hs :tier1 tests, with ubsan enabled (configure flag --enable-ubsan), in test runtime/CommandLine/PrintClasses_id0.jtr > this error is reported ; seems we miss a nullptr check that is in place at similar coding in instanceKlass.cpp . > > /jdk/src/hotspot/share/oops/instanceKlass.cpp:3550:76: runtime error: member call on null pointer of type 'struct Array' > #0 0x7fed098d2362 in InstanceKlass::print_on(outputStream*) const /jdk/src/hotspot/share/oops/instanceKlass.cpp:3550 > #1 0x7fed09897cdc in PrintClassClosure::do_klass(Klass*) /jdk/src/hotspot/share/oops/instanceKlass.cpp:2228 > #2 0x7fed08bed334 in ClassLoaderData::classes_do(KlassClosure*) /jdk/src/hotspot/share/classfile/classLoaderData.cpp:387 > #3 0x7fed08c06403 in ClassLoaderDataGraph::classes_do(KlassClosure*) /jdk/src/hotspot/share/classfile/classLoaderDataGraph.cpp:303 > #4 0x7fed09108768 in VM_PrintClasses::doit() /jdk/src/hotspot/share/services/diagnosticCommand.cpp:989 > #5 0x7fed0b776c38 in VM_Operation::evaluate() /jdk/src/hotspot/share/runtime/vmOperations.cpp:75 > #6 0x7fed0b7af23e in VMThread::evaluate_operation(VM_Operation*) /jdk/src/hotspot/share/runtime/vmThread.cpp:283 > #7 0x7fed0b7b0a67 in VMThread::inner_execute(VM_Operation*) /jdk/src/hotspot/share/runtime/vmThread.cpp:427 > #8 0x7fed0b7b1681 in VMThread::loop() /jdk/src/hotspot/share/runtime/vmThread.cpp:493 > #9 0x7fed0b7b1681 in VMThread::loop() /jdk/src/hotspot/share/runtime/vmThread.cpp:478 > #10 0x7fed0b7b182d in VMThread::run() /jdk/src/hotspot/share/runtime/vmThread.cpp:177 > #11 0x7fed0b4e8b0f in Thread::call_run() /jdk/src/hotspot/share/runtime/thread.cpp:225 > #12 0x7fed0a9dae75 in thread_native_entry /jdk/src/hotspot/os/linux/os_linux.cpp:846 > #13 0x7fed10fed6e9 in start_thread (/lib64/libpthread.so.0+0xa6e9) (BuildId: 2f8d3c2d0f4d7888c2598d2ff6356537f5708a73) > #14 0x7fed1051550e in clone (/lib64/libc.so.6+0x11850e) (BuildId: f732026552f6adff988b338e92d466bc81a01c37) > Should this change make the code follow the same pattern? Good question, I considered this too. Should I change it what do you think? ------------- PR Comment: https://git.openjdk.org/jdk/pull/19349#issuecomment-2125000006 From sgibbons at openjdk.org Wed May 22 14:53:16 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Wed, 22 May 2024 14:53:16 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v13] In-Reply-To: References: Message-ID: On Mon, 26 Feb 2024 14:50:30 GMT, Jatin Bhateja wrote: >> Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: >> >> Addressed some review coments; replaced hard-coded registers with descriptive names. > > src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 303: > >> 301: __ subq(rdi, rax); >> 302: __ movq(rdx, rdi); >> 303: __ andq(rdx, -16); > > Hi @asgibbons , may I request you to please use meaningful names instead of directly using actual GPR names to ease the review process. Done. > src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 777: > >> 775: __ movq(rax, rbx); >> 776: __ movq(rbx, r14); >> 777: __ leaq(r15, Address(r12, -0x2)); > > Kindly use semantically meaningful names instead of direct GPR names. Done. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1610121347 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1610121724 From sgibbons at openjdk.org Wed May 22 14:53:17 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Wed, 22 May 2024 14:53:17 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v19] In-Reply-To: References: Message-ID: <8ifsYHB0SLuD1ZbWhMWmBZn_UjW-iNpXrmsIkZFUczg=.ce670add-3afb-48be-8c81-2fd462d19bbd@github.com> On Mon, 6 May 2024 23:19:07 GMT, Sandhya Viswanathan wrote: >> Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: >> >> Rearrange; add lambdas for clarity > > src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 329: > >> 327: //////////////////////////////////////////////////////////////////////////////////////// >> 328: >> 329: __ bind(L_begin); > > So far we have handled haystack <= 32 and needle_size <= 5 (?) in bytes. A high level algorithm description here is needed in comments to follow the code below. A description of what are the various paths in terms of haystack and needle sizes and how to reason the assembly code below and make sure that all the paths are taken care of. Also the abstraction level suddenly changes here to detailed code below instead of methods for the various paths. I added a description. Can you please check to ensure it meets your objective? Thanks. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1610124233 From sgibbons at openjdk.org Wed May 22 14:53:26 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Wed, 22 May 2024 14:53:26 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v7] In-Reply-To: <0XxCusssrDiiKzXBfdsY1XHkv9T6mJwJe7dwFz5Uy-I=.3325e496-5bf1-4a79-8969-e28e018b77db@github.com> References: <0XxCusssrDiiKzXBfdsY1XHkv9T6mJwJe7dwFz5Uy-I=.3325e496-5bf1-4a79-8969-e28e018b77db@github.com> Message-ID: On Tue, 16 Jan 2024 13:26:15 GMT, Jatin Bhateja wrote: >> Scott Gibbons has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 22 commits: >> >> - Merge branch 'openjdk:master' into indexof >> - Merge branch 'openjdk:master' into indexof >> - Addressing review comments. >> - Fix for JDK-8321599 >> - Support UU IndexOf >> - Only use optimization when EnableX86ECoreOpts is true >> - Fix whitespace >> - Merge branch 'openjdk:master' into indexof >> - Comments; added exhaustive-ish test >> - Subtracting 0x10 twice. >> - ... and 12 more: https://git.openjdk.org/jdk/compare/8e12053e...3e58d0c2 > > src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 417: > >> 415: __ cmpl(Address(rbx, r15, Address::times_1, -0x14), rax); >> 416: __ jne(L_top_loop_1); >> 417: __ jmp(L_0x406019); > > For cases which are multiple of 4 bytes we can use VMASKMOVPS (conditional moves) and VPTEST. Not sure what you mean here. Could you elaborate (although it may be moot after all the changes)? > src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1526: > >> 1524: __ movq(rdx, r8); >> 1525: __ movq(rcx, r9); >> 1526: #endif > > Can we spill them into XXMs, to save costly stack operations. Changed. > src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1545: > >> 1543: // return 0; >> 1544: // } >> 1545: __ movq(r12, rcx); > > Check for K == 0 should use rsi. k is needle length, which is rcx. > src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1545: > >> 1543: // return 0; >> 1544: // } >> 1545: __ movq(r12, rcx); > > Kindly use meaningful variable and label names. It will ease the review process and maintenance. Done. > src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1551: > >> 1549: __ movq(r15, rsi); >> 1550: __ movq(r11, rdi); >> 1551: __ cmpq(rsi, 0x20); > > Comparisons with 32 bit integer length can use cmpl instead of cmpq, this may save emitting REX encoding prefix if index is allocated a GPR from lower register bank (no need for setting REX.W). I fixed as many as I could find. Thanks. > src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1552: > >> 1550: __ movq(r11, rdi); >> 1551: __ cmpq(rsi, 0x20); >> 1552: __ jb(L_small_string); > > All the comparisons against needle length are signed integer comparisons, so jb should be replaced by jl I'm treating everything as unsigned except where intentional negative values are used. It never makes sense for needle length to be negative. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1610118449 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1610110754 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1610105405 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1610111320 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1610113343 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1610116033 From sgibbons at openjdk.org Wed May 22 14:53:27 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Wed, 22 May 2024 14:53:27 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v7] In-Reply-To: References: Message-ID: On Mon, 22 Jan 2024 07:08:31 GMT, Jatin Bhateja wrote: >> src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 505: >> >>> 503: __ cmpb(Address(rbx, r15, Address::times_1, -0xa), rax); >>> 504: __ jne(L_top_loop_1); >>> 505: __ jmp(L_0x406019); >> >> Instead of having special handling for each tail size (3 - 31 bytes), can we directly use 32 bytes VMASKMOVPS with appropriate mask for different tail sizes and only residual part (0 - 3 bytes) can fall over to scalar tail. > > Basically tail size can be rounded to nearest multiple of doubleword. I have since changed the algorithm due to request from @sviswa7 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1610120366 From sgibbons at openjdk.org Wed May 22 14:53:28 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Wed, 22 May 2024 14:53:28 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v5] In-Reply-To: References: Message-ID: <8FGB4fvnPGhSSdLgY5POXyGajpA-b-Ir31ee1WrG660=.0afedbf4-b717-4d1a-a3f0-c36b5e02a4d8@github.com> On Mon, 8 Jan 2024 10:32:51 GMT, Jatin Bhateja wrote: >> Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: >> >> Addressing review comments. > > src/hotspot/share/opto/library_call.cpp line 1273: > >> 1271: Node* result = nullptr; >> 1272: >> 1273: if ((StubRoutines::string_indexof() != nullptr) && (ae == StrIntrinsicNode::LL)) { > > Why are we not calling stub for StrIntrinsicNode::UU Stub being called for LL, UL, and UU now. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1610089409 From sgibbons at openjdk.org Wed May 22 14:53:30 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Wed, 22 May 2024 14:53:30 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v6] In-Reply-To: References: Message-ID: On Tue, 9 Jan 2024 15:14:41 GMT, Emanuel Peter wrote: >> Scott Gibbons has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 21 commits: >> >> - Merge branch 'openjdk:master' into indexof >> - Addressing review comments. >> - Fix for JDK-8321599 >> - Support UU IndexOf >> - Only use optimization when EnableX86ECoreOpts is true >> - Fix whitespace >> - Merge branch 'openjdk:master' into indexof >> - Comments; added exhaustive-ish test >> - Subtracting 0x10 twice. >> - Stomped on r13 in switch branch calculation >> - ... and 11 more: https://git.openjdk.org/jdk/compare/8a4dc79e...600377b0 > > test/jdk/java/lang/StringBuffer/IndexOf.java line 34: > >> 32: public class IndexOf { >> 33: >> 34: static Random generator = new Random(1999); > > Would it be an alternative to use the this: > > import jdk.test.lib.Utils; > ... > Random random = Utils.getRandomInstance(); > > This has a random seed, but it is always printed in the output and can be set via a test-flag. Changed. > test/jdk/java/lang/StringBuffer/IndexOf.java line 44: > >> 42: } >> 43: System.out.println(""); >> 44: generator.setSeed(1999); > > Is there a good reason for a fixed seed? Nope :-). Needed consistency during testing. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1610087089 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1610088114 From sgibbons at openjdk.org Wed May 22 14:53:31 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Wed, 22 May 2024 14:53:31 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v19] In-Reply-To: References: Message-ID: On Wed, 15 May 2024 19:18:27 GMT, Volodymyr Paprotski wrote: >> Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: >> >> Rearrange; add lambdas for clarity > > test/jdk/java/lang/StringBuffer/IndexOf.java line 54: > >> 52: // for (int i = 1; i < 128; i++) { >> 53: // haystack_16[i] = (char) (i); >> 54: // } > > dead code Removed. > test/jdk/java/lang/StringBuffer/IndexOf.java line 83: > >> 81: shs = "$&),,18+-!'8)+"; >> 82: endNeedle = "8)-"; >> 83: l_offset = 9; > > dead code Fixed. > test/jdk/java/lang/StringBuffer/IndexOf.java line 237: > >> 235: + sourceBuffer.toString() + " len Buffer = " + sourceBuffer.toString().length()); >> 236: System.err.println(" naive = " + naiveFind(sourceBuffer.toString(), targetString, 0) + ", IndexOf = " >> 237: + sourceBuffer.indexOf(targetString)); > > More tracing left behind here and rest of this function (original just recorded failure and moved along) I think it's more valuable for a test to print out what it can when a failure occurs rather than just saying "failed". > test/jdk/java/lang/StringBuffer/IndexOf.java line 284: > >> 282: >> 283: // Note: it is possible although highly improbable that failCount will >> 284: // be > 0 even if everthing is working ok > > This sounds like either a bug or a testcase bug? Same as line 301, `extremely remote possibility of > 1 match`? This was there from the original author. I think they were trying to infer that a match could occur in the rare case that the same random string was produced. They're random after all, and there's no reason the same sequence could be generated. > test/micro/org/openjdk/bench/java/lang/StringIndexOfHuge.java line 81: > >> 79: lateMatchString16 = dataStringHuge16.substring(dataStringHuge16.length() - 31); >> 80: >> 81: searchString = "oscar"; > > Would had liked to see a few more small needles (i.e. to test/verify individual switch statement cases) I'm hoping we can incorporate your test to cover more cases :-). > test/micro/org/openjdk/bench/java/lang/StringIndexOfHuge.java line 132: > >> 130: @Benchmark >> 131: public int searchHugeLargeSubstring() { >> 132: return dataStringHuge.indexOf("B".repeat(30) + "X" + "A".repeat(30), 74); > > .repeat() call and string concatenation shouldn't be part of the benchmark (here and several other @Benchmark functions in this file) since it will detract from the measurement. > > (String concatenation gets converted (by javac) into StringBuilder().append().append()....append().toString()) Since we're only concerned with the delta of performance, does this really matter? Can you suggest an alternative? > test/micro/org/openjdk/bench/java/lang/StringIndexOfHuge.java line 242: > >> 240: @Benchmark >> 241: public int search16HugeLargeSubstring16() { >> 242: return dataStringHuge16.indexOf("B".repeat(30) + "X" + "A".repeat(30), 74); > > `search16HugeLargeSubstring16` implies UU, but with `"B".repeat(30) + "X" + "A".repeat(30)` is UL Fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1610131285 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1610134566 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1610138116 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1610142104 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1610130140 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1610126743 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1610128630 From sgibbons at openjdk.org Wed May 22 14:53:31 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Wed, 22 May 2024 14:53:31 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v7] In-Reply-To: <3m2_CQE-NHOCN20Z4LbosqwihcUCVopTgycXADInLEI=.25f797e8-e620-4f10-9da0-245a890c41de@github.com> References: <3m2_CQE-NHOCN20Z4LbosqwihcUCVopTgycXADInLEI=.25f797e8-e620-4f10-9da0-245a890c41de@github.com> Message-ID: On Mon, 15 Jan 2024 13:30:42 GMT, Andrey Turbanov wrote: >> Scott Gibbons has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 22 commits: >> >> - Merge branch 'openjdk:master' into indexof >> - Merge branch 'openjdk:master' into indexof >> - Addressing review comments. >> - Fix for JDK-8321599 >> - Support UU IndexOf >> - Only use optimization when EnableX86ECoreOpts is true >> - Fix whitespace >> - Merge branch 'openjdk:master' into indexof >> - Comments; added exhaustive-ish test >> - Subtracting 0x10 twice. >> - ... and 12 more: https://git.openjdk.org/jdk/compare/8e12053e...3e58d0c2 > > test/jdk/java/lang/StringBuffer/IndexOf.java line 220: > >> 218: >> 219: for (int x = 0; x < 1000000; x++) { >> 220: if(make_new) { > > Suggestion: > > if (make_new) { Fixed. > test/jdk/java/lang/StringBuffer/IndexOf.java line 262: > >> 260: } >> 261: >> 262: if(make_new) > > Suggestion: > > if (make_new) Fixed. > test/jdk/java/lang/StringBuffer/IndexOf.java line 295: > >> 293: } >> 294: >> 295: if(make_new) testIndex = getRandomIndex(-100, 100); > > Suggestion: > > if (make_new) testIndex = getRandomIndex(-100, 100); Fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1610093771 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1610094790 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1610097958 From fyang at openjdk.org Wed May 22 14:58:10 2024 From: fyang at openjdk.org (Fei Yang) Date: Wed, 22 May 2024 14:58:10 GMT Subject: RFR: 8332265: RISC-V: Materialize pointers faster by using a temp register [v4] In-Reply-To: References: Message-ID: On Wed, 22 May 2024 08:35:33 GMT, Robbin Ehn wrote: >> Hi, please consider! >> >> Materializing a 48-bit pointer, using an additional register, we can do with: >> lui + lui + slli + add + addi >> This 15% faster both on VF2 and in CPU models, compared to movptr(). >> >> As we often materialize during calls there is free registers. >> >> I have choose just a few spot to use it, many more can use. >> E.g. la() with tmp register can use li48 instead of movptr. >> >> Running tests now (so far so good), as if I screwed up IC calls it should be seen fast. >> And benchmarks when hardware is free. > > Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: > > - Review changes > - Merge branch 'master' into 8332265 > - Merge branch 'master' into 8332265 > - Small review update > - li48 -> movptr > - Merge branch 'master' into 8332265 > - li48 Three more minor comments, looks good otherwise. Thanks. src/hotspot/cpu/riscv/nativeInst_riscv.hpp line 141: > 139: // add > 140: // addi/jalr/load > 141: static bool check_movptr2_data_dependency(address instr) { Better to rename the existing `check_movptr_data_dependency` as `check_movptr1_data_dependency` at the same time. src/hotspot/cpu/riscv/nativeInst_riscv.hpp line 421: > 419: void flush() { > 420: if (!maybe_cpool_ref(instruction_address())) { > 421: ICache::invalidate_range(instruction_address(), movptr1_instruction_size /* > movptr2_instruction_size */); Maybe we can simply remove this `flush()` member function which is not used anywhere. src/hotspot/cpu/riscv/riscv.ad line 1289: > 1287: { > 1288: // skip the movptr2 in MacroAssembler::ic_call(): > 1289: // lui + addi + slli + addi + slli + addi You might also want to update this instruction sequence in the code comment to reflect `movptr2()`. ------------- PR Review: https://git.openjdk.org/jdk/pull/19246#pullrequestreview-2071081979 PR Review Comment: https://git.openjdk.org/jdk/pull/19246#discussion_r1609882747 PR Review Comment: https://git.openjdk.org/jdk/pull/19246#discussion_r1610113183 PR Review Comment: https://git.openjdk.org/jdk/pull/19246#discussion_r1609960292 From fyang at openjdk.org Wed May 22 14:58:11 2024 From: fyang at openjdk.org (Fei Yang) Date: Wed, 22 May 2024 14:58:11 GMT Subject: RFR: 8332265: RISC-V: Materialize pointers faster by using a temp register [v2] In-Reply-To: References: <2f25EhAHETKwXhFcg6nE_W37QAU7U7opYHa8Wzo2MfU=.05e5cfce-2d3b-4825-a8af-7963d4c266f7@github.com> Message-ID: On Wed, 22 May 2024 11:13:27 GMT, Robbin Ehn wrote: >> src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 1526: >> >>> 1524: } >>> 1525: >>> 1526: static address get_target_of_movptr2(address insn_addr) { >> >> Similar here. Maybe we can have a common entry of `get_target_of_movptr` which delegates work to `get_target_of_movptr1` and `get_target_of_movptr2`? > > Same here or? Yeah, just keep the current shape. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19246#discussion_r1610150278 From stefank at openjdk.org Wed May 22 15:01:01 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Wed, 22 May 2024 15:01:01 GMT Subject: RFR: 8332720: ubsan: instanceKlass.cpp:3550:76: runtime error: member call on null pointer of type 'struct Array' In-Reply-To: References: Message-ID: <2HX_7ZjSsgik_eJXniCBDmivFrysHTaU8x1Ot-bIrFg=.3c32d22c-d987-4877-8b60-5f440d9658a5@github.com> On Wed, 22 May 2024 14:50:28 GMT, Matthias Baesken wrote: > > Should this change make the code follow the same pattern? > > Good question, I considered this too. Should I change it what do you think? I usually don't look at these fields, so maybe someone from the Runtime team can help decide? ------------- PR Comment: https://git.openjdk.org/jdk/pull/19349#issuecomment-2125019467 From sgibbons at openjdk.org Wed May 22 15:05:11 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Wed, 22 May 2024 15:05:11 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v19] In-Reply-To: References: <8Y-nIHc8vfB1X_hp3tpqqqgpCzu6dAt6BBIP_zc4Q70=.c9a48c68-8c14-4af9-8357-ab50e62a5fd3@github.com> Message-ID: On Fri, 17 May 2024 22:37:13 GMT, Sandhya Viswanathan wrote: >> Not sure what you mean here. I *think* you mean that hsLength is not the length of the remaining bytes in the haystack, but the actual length. There may be an issue if that is correct, right? I'll investigate. > > Yes, that is what I meant. Thanks for investigating. I've moved the code checking for (n-k)<32 to `big_case_loop_helper`, so there's no need for this in here any longer. Removing unneeded parameters from `compare_big_haystack_to_needle`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1610166656 From jsjolen at openjdk.org Wed May 22 15:34:17 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Wed, 22 May 2024 15:34:17 GMT Subject: RFR: 8331193: Return references when possible in GrowableArray [v7] In-Reply-To: References: Message-ID: > Hi, > > This PR introduces the possibility of using references more often when using GrowableArray, where as previously this was only possible when using the `at()` method. This lets us avoid copying and redundant method calls and makes the API more streamlined. After the patch, we can use `at_grow` just like `at` works. The same goes for `top`, `first`, and `last`. > > > Some example code: > ```c++ > // Before this patch this worked: > GrowableArray arr(8,8,-1); // Pre-fill with 8 -1s > int& x = arr.at(7); > if (x == -1) { > x = 2; > } > assert(arr.at(7) == 2, "this holds"); > // but this was forbidden > int& x = arr.at_grow(9, -1); // Compilation error! at_grow returns E, not E& > // so we had to do > int x = arr.at_grow(9, -1); > if (x == -1) { > arr.at_put(9, 2); > } > > > Thanks. Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: Add test ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18975/files - new: https://git.openjdk.org/jdk/pull/18975/files/7a575e5a..f71e2ce2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18975&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18975&range=05-06 Stats: 14 lines in 1 file changed: 14 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/18975.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18975/head:pull/18975 PR: https://git.openjdk.org/jdk/pull/18975 From coleenp at openjdk.org Wed May 22 15:43:04 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 22 May 2024 15:43:04 GMT Subject: RFR: 8332720: ubsan: instanceKlass.cpp:3550:76: runtime error: member call on null pointer of type 'struct Array' In-Reply-To: References: Message-ID: <9gFWhtZ2HPjQTtVCA7ZmyPtMZYNeeOdkBzxqaYYn67k=.603bd077-df2d-4419-9e08-27e87ca2ac5b@github.com> On Wed, 22 May 2024 14:45:08 GMT, Stefan Karlsson wrote: >> When running hs :tier1 tests, with ubsan enabled (configure flag --enable-ubsan), in test runtime/CommandLine/PrintClasses_id0.jtr >> this error is reported ; seems we miss a nullptr check that is in place at similar coding in instanceKlass.cpp . >> >> /jdk/src/hotspot/share/oops/instanceKlass.cpp:3550:76: runtime error: member call on null pointer of type 'struct Array' >> #0 0x7fed098d2362 in InstanceKlass::print_on(outputStream*) const /jdk/src/hotspot/share/oops/instanceKlass.cpp:3550 >> #1 0x7fed09897cdc in PrintClassClosure::do_klass(Klass*) /jdk/src/hotspot/share/oops/instanceKlass.cpp:2228 >> #2 0x7fed08bed334 in ClassLoaderData::classes_do(KlassClosure*) /jdk/src/hotspot/share/classfile/classLoaderData.cpp:387 >> #3 0x7fed08c06403 in ClassLoaderDataGraph::classes_do(KlassClosure*) /jdk/src/hotspot/share/classfile/classLoaderDataGraph.cpp:303 >> #4 0x7fed09108768 in VM_PrintClasses::doit() /jdk/src/hotspot/share/services/diagnosticCommand.cpp:989 >> #5 0x7fed0b776c38 in VM_Operation::evaluate() /jdk/src/hotspot/share/runtime/vmOperations.cpp:75 >> #6 0x7fed0b7af23e in VMThread::evaluate_operation(VM_Operation*) /jdk/src/hotspot/share/runtime/vmThread.cpp:283 >> #7 0x7fed0b7b0a67 in VMThread::inner_execute(VM_Operation*) /jdk/src/hotspot/share/runtime/vmThread.cpp:427 >> #8 0x7fed0b7b1681 in VMThread::loop() /jdk/src/hotspot/share/runtime/vmThread.cpp:493 >> #9 0x7fed0b7b1681 in VMThread::loop() /jdk/src/hotspot/share/runtime/vmThread.cpp:478 >> #10 0x7fed0b7b182d in VMThread::run() /jdk/src/hotspot/share/runtime/vmThread.cpp:177 >> #11 0x7fed0b4e8b0f in Thread::call_run() /jdk/src/hotspot/share/runtime/thread.cpp:225 >> #12 0x7fed0a9dae75 in thread_native_entry /jdk/src/hotspot/os/linux/os_linux.cpp:846 >> #13 0x7fed10fed6e9 in start_thread (/lib64/libpthread.so.0+0xa6e9) (BuildId: 2f8d3c2d0f4d7888c2598d2ff6356537f5708a73) >> #14 0x7fed1051550e in clone (/lib64/libc.so.6+0x11850e) (BuildId: f732026552f6adff988b338e92d466bc81a01c37) > > src/hotspot/share/oops/instanceKlass.cpp line 3552: > >> 3550: st->print(BULLET"default_methods: "); >> 3551: if (default_methods() != nullptr) { default_methods()->print_value_on(st); } >> 3552: st->cr(); > > The `default_vtable_indicies() printing looks like this: > > if (default_vtable_indices() != nullptr) { > st->print(BULLET"default vtable indices: "); default_vtable_indices()->print_value_on(st); st->cr(); > } > > Should this change make the code follow the same pattern? Yes, can you put the BULLET"default methods inside the condition to test if default methods are non-null? You could fold it into the below conditional, and have the Verbose test to print each method. Aside, I thought there was supposed to be a blank in between concatenated strings because some compiler complained. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19349#discussion_r1610227053 From jsjolen at openjdk.org Wed May 22 15:44:04 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Wed, 22 May 2024 15:44:04 GMT Subject: RFR: 8331193: Return references when possible in GrowableArray [v6] In-Reply-To: References: Message-ID: On Tue, 21 May 2024 13:22:30 GMT, Emanuel Peter wrote: >>> Can you add a regression test that checks exactly the example that you have in your PR descrition? >> >> Hi Emanuel, >> >> I've been thinking about this a bit. We can add such a test, but it would essentially be a test that checks whether something compiles or not (which it trivially should). We can still add the test you suggested if you feel it is necessary, but it doesn't seem to me like it adds much value? >> >> Cheers, >> Johan > > @jdksjolen I mean we already test the other methods, so why not test this one? And it is not just about compilation working, but also the results that the new methods return. > > If @stefank @kimbarrett say this is not necessary, then ignore this. Not sure how rigorous they want GrowableArray tested. @eme64 , I added a test and it passes, does this seem OK? Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/18975#issuecomment-2125116060 From dcubed at openjdk.org Wed May 22 16:11:03 2024 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Wed, 22 May 2024 16:11:03 GMT Subject: RFR: 8331539: [REDO] NMT: add/make a mandatory MEMFLAGS argument to family of os::reserve/commit/uncommit memory API [v2] In-Reply-To: <_M5SvhyN_E_8HUeamhiLJMp37flhjgTVE_X7t8jmPVc=.f86cbb23-9461-4013-83bf-d6b154b96cfd@github.com> References: <1i0PKv9mCusM6BZqXG8ULe0lRA2Nz2ix4aZHz9otNMM=.b9d2d151-883e-4cb6-be48-4ba45b49ed43@github.com> <_M5SvhyN_E_8HUeamhiLJMp37flhjgTVE_X7t8jmPVc=.f86cbb23-9461-4013-83bf-d6b154b96cfd@github.com> Message-ID: On Wed, 22 May 2024 12:09:14 GMT, Afshin Zafari wrote: >> This PR fixes the problems existed in the original PR (https://github.com/openjdk/jdk/pull/18745). There are two main fixes here: >> 1- `ReservedSpace` class is changed so that the `_flag` member never changes after it is set in ctor. Since reserving memory regions may go thru a try and fail sequence of reserve-release pairs, changing the `_flag` member at failed releases would lead to incorrect flags in subsequent reserves. >> Also, some assertion are added to the getters of a `ReservedSpace` to check if the region is successfully reserved. >> >> 2- In order to have adjacent regions with different flags, CDS reserves a (large) region `R` and then splits it into sub regions `R1` and `R2` (`R == <---R1---><--R2-->`). At release time, NMT tracks only `R` and ignores releasing `R1` and `R2`. This ignoring is problematic when a requested region `R` is size-aligned to `R1---R---R2` first and then the `R1` and `R2` released (`chop_extra_memory` function is called for this). In this case, NMT ignores tracking `R1` and `R2` with assumption that a containing `R` will be released. Therefore, `R1` and `R2` remain in the NMT region-list and when a new reserve happens at that regions, NMT complains by raising an exception. > > Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision: > > fixed the missing parts of shenandoahHeap.cpp Please document the testing that has been done on this PR. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19343#issuecomment-2125172737 From sgibbons at openjdk.org Wed May 22 16:25:24 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Wed, 22 May 2024 16:25:24 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v22] In-Reply-To: References: Message-ID: > Re-write the IndexOf code without the use of the pcmpestri instruction, only using AVX2 instructions. This change accelerates String.IndexOf on average 1.3x for AVX2. The benchmark numbers: > > > Benchmark Score Latest > StringIndexOf.advancedWithMediumSub 343.573 317.934 0.925375393x > StringIndexOf.advancedWithShortSub1 1039.081 1053.96 1.014319384x > StringIndexOf.advancedWithShortSub2 55.828 110.541 1.980027943x > StringIndexOf.constantPattern 9.361 11.906 1.271872663x > StringIndexOf.searchCharLongSuccess 4.216 4.218 1.000474383x > StringIndexOf.searchCharMediumSuccess 3.133 3.216 1.02649218x > StringIndexOf.searchCharShortSuccess 3.76 3.761 1.000265957x > StringIndexOf.success 9.186 9.713 1.057369911x > StringIndexOf.successBig 14.341 46.343 3.231504079x > StringIndexOfChar.latin1_AVX2_String 6220.918 12154.52 1.953814533x > StringIndexOfChar.latin1_AVX2_char 5503.556 5540.044 1.006629895x > StringIndexOfChar.latin1_SSE4_String 6978.854 6818.689 0.977049957x > StringIndexOfChar.latin1_SSE4_char 5657.499 5474.624 0.967675646x > StringIndexOfChar.latin1_Short_String 7132.541 6863.359 0.962260014x > StringIndexOfChar.latin1_Short_char 16013.389 16162.437 1.009307711x > StringIndexOfChar.latin1_mixed_String 7386.123 14771.622 1.999915517x > StringIndexOfChar.latin1_mixed_char 9901.671 9782.245 0.987938803 Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: Added comments; move n-k<32 code up a level ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16753/files - new: https://git.openjdk.org/jdk/pull/16753/files/38868a35..f4ca4a5e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16753&range=21 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16753&range=20-21 Stats: 214 lines in 4 files changed: 100 ins; 72 del; 42 mod Patch: https://git.openjdk.org/jdk/pull/16753.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16753/head:pull/16753 PR: https://git.openjdk.org/jdk/pull/16753 From duke at openjdk.org Wed May 22 16:30:14 2024 From: duke at openjdk.org (Volodymyr Paprotski) Date: Wed, 22 May 2024 16:30:14 GMT Subject: Integrated: 8329538: Accelerate P256 on x86_64 using Montgomery intrinsic In-Reply-To: References: Message-ID: On Tue, 2 Apr 2024 15:42:05 GMT, Volodymyr Paprotski wrote: > Performance. Before: > > Benchmark (algorithm) (dataSize) (keyLength) (provider) Mode Cnt Score Error Units > SignatureBench.ECDSA.sign SHA256withECDSA 1024 256 thrpt 3 6443.934 ? 6.491 ops/s > SignatureBench.ECDSA.sign SHA256withECDSA 16384 256 thrpt 3 6152.979 ? 4.954 ops/s > SignatureBench.ECDSA.verify SHA256withECDSA 1024 256 thrpt 3 1895.410 ? 36.979 ops/s > SignatureBench.ECDSA.verify SHA256withECDSA 16384 256 thrpt 3 1878.955 ? 45.487 ops/s > Benchmark (algorithm) (keyLength) (kpgAlgorithm) (provider) Mode Cnt Score Error Units > o.o.b.j.c.full.KeyAgreementBench.EC.generateSecret ECDH 256 EC thrpt 3 1357.810 ? 26.584 ops/s > o.o.b.j.c.small.KeyAgreementBench.EC.generateSecret ECDH 256 EC thrpt 3 1352.119 ? 23.547 ops/s > Benchmark (isMontBench) Mode Cnt Score Error Units > PolynomialP256Bench.benchMultiply false thrpt 3 1746.126 ? 10.970 ops/s > > Performance, no intrinsic: > > Benchmark (algorithm) (dataSize) (keyLength) (provider) Mode Cnt Score Error Units > SignatureBench.ECDSA.sign SHA256withECDSA 1024 256 thrpt 3 6529.839 ? 42.420 ops/s > SignatureBench.ECDSA.sign SHA256withECDSA 16384 256 thrpt 3 6199.747 ? 133.566 ops/s > SignatureBench.ECDSA.verify SHA256withECDSA 1024 256 thrpt 3 1973.676 ? 54.071 ops/s > SignatureBench.ECDSA.verify SHA256withECDSA 16384 256 thrpt 3 1932.127 ? 35.920 ops/s > Benchmark (algorithm) (keyLength) (kpgAlgorithm) (provider) Mode Cnt Score Error Units > o.o.b.j.c.full.KeyAgreementBench.EC.generateSecret ECDH 256 EC thrpt 3 1355.788 ? 29.858 ops/s > o.o.b.j.c.small.KeyAgreementBench.EC.generateSecret ECDH 256 EC thrpt 3 1346.523 ? 28.722 ops/s > Benchmark (isMontBench) Mode Cnt Score Error Units > PolynomialP256Bench.benchMultiply true thrpt 3 1919.574 ? 10.591 ops/s > > Performance, **with intrinsics*... This pull request has now been integrated. Changeset: afed7d0b Author: Volodymyr Paprotski Committer: Sandhya Viswanathan URL: https://git.openjdk.org/jdk/commit/afed7d0b0593864e5595840a6b645c210ff28c7c Stats: 2409 lines in 36 files changed: 2093 ins; 156 del; 160 mod 8329538: Accelerate P256 on x86_64 using Montgomery intrinsic Reviewed-by: ihse, ascarpino, sviswanathan ------------- PR: https://git.openjdk.org/jdk/pull/18583 From sgibbons at openjdk.org Wed May 22 16:36:41 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Wed, 22 May 2024 16:36:41 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v23] In-Reply-To: References: Message-ID: > Re-write the IndexOf code without the use of the pcmpestri instruction, only using AVX2 instructions. This change accelerates String.IndexOf on average 1.3x for AVX2. The benchmark numbers: > > > Benchmark Score Latest > StringIndexOf.advancedWithMediumSub 343.573 317.934 0.925375393x > StringIndexOf.advancedWithShortSub1 1039.081 1053.96 1.014319384x > StringIndexOf.advancedWithShortSub2 55.828 110.541 1.980027943x > StringIndexOf.constantPattern 9.361 11.906 1.271872663x > StringIndexOf.searchCharLongSuccess 4.216 4.218 1.000474383x > StringIndexOf.searchCharMediumSuccess 3.133 3.216 1.02649218x > StringIndexOf.searchCharShortSuccess 3.76 3.761 1.000265957x > StringIndexOf.success 9.186 9.713 1.057369911x > StringIndexOf.successBig 14.341 46.343 3.231504079x > StringIndexOfChar.latin1_AVX2_String 6220.918 12154.52 1.953814533x > StringIndexOfChar.latin1_AVX2_char 5503.556 5540.044 1.006629895x > StringIndexOfChar.latin1_SSE4_String 6978.854 6818.689 0.977049957x > StringIndexOfChar.latin1_SSE4_char 5657.499 5474.624 0.967675646x > StringIndexOfChar.latin1_Short_String 7132.541 6863.359 0.962260014x > StringIndexOfChar.latin1_Short_char 16013.389 16162.437 1.009307711x > StringIndexOfChar.latin1_mixed_String 7386.123 14771.622 1.999915517x > StringIndexOfChar.latin1_mixed_char 9901.671 9782.245 0.987938803 Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: Adding exhaustive test ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16753/files - new: https://git.openjdk.org/jdk/pull/16753/files/f4ca4a5e..b6d77fe0 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16753&range=22 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16753&range=21-22 Stats: 249 lines in 1 file changed: 249 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/16753.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16753/head:pull/16753 PR: https://git.openjdk.org/jdk/pull/16753 From sgibbons at openjdk.org Wed May 22 16:39:11 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Wed, 22 May 2024 16:39:11 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v22] In-Reply-To: References: Message-ID: On Wed, 22 May 2024 16:25:24 GMT, Scott Gibbons wrote: >> Re-write the IndexOf code without the use of the pcmpestri instruction, only using AVX2 instructions. This change accelerates String.IndexOf on average 1.3x for AVX2. The benchmark numbers: >> >> >> Benchmark Score Latest >> StringIndexOf.advancedWithMediumSub 343.573 317.934 0.925375393x >> StringIndexOf.advancedWithShortSub1 1039.081 1053.96 1.014319384x >> StringIndexOf.advancedWithShortSub2 55.828 110.541 1.980027943x >> StringIndexOf.constantPattern 9.361 11.906 1.271872663x >> StringIndexOf.searchCharLongSuccess 4.216 4.218 1.000474383x >> StringIndexOf.searchCharMediumSuccess 3.133 3.216 1.02649218x >> StringIndexOf.searchCharShortSuccess 3.76 3.761 1.000265957x >> StringIndexOf.success 9.186 9.713 1.057369911x >> StringIndexOf.successBig 14.341 46.343 3.231504079x >> StringIndexOfChar.latin1_AVX2_String 6220.918 12154.52 1.953814533x >> StringIndexOfChar.latin1_AVX2_char 5503.556 5540.044 1.006629895x >> StringIndexOfChar.latin1_SSE4_String 6978.854 6818.689 0.977049957x >> StringIndexOfChar.latin1_SSE4_char 5657.499 5474.624 0.967675646x >> StringIndexOfChar.latin1_Short_String 7132.541 6863.359 0.962260014x >> StringIndexOfChar.latin1_Short_char 16013.389 16162.437 1.009307711x >> StringIndexOfChar.latin1_mixed_String 7386.123 14771.622 1.999915517x >> StringIndexOfChar.latin1_mixed_char 9901.671 9782.245 0.987938803 > > Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: > > Added comments; move n-k<32 code up a level By her request ------------- PR Comment: https://git.openjdk.org/jdk/pull/16753#issuecomment-2125255793 From sgibbons at openjdk.org Wed May 22 16:54:26 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Wed, 22 May 2024 16:54:26 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v24] In-Reply-To: References: Message-ID: > Re-write the IndexOf code without the use of the pcmpestri instruction, only using AVX2 instructions. This change accelerates String.IndexOf on average 1.3x for AVX2. The benchmark numbers: > > > Benchmark Score Latest > StringIndexOf.advancedWithMediumSub 343.573 317.934 0.925375393x > StringIndexOf.advancedWithShortSub1 1039.081 1053.96 1.014319384x > StringIndexOf.advancedWithShortSub2 55.828 110.541 1.980027943x > StringIndexOf.constantPattern 9.361 11.906 1.271872663x > StringIndexOf.searchCharLongSuccess 4.216 4.218 1.000474383x > StringIndexOf.searchCharMediumSuccess 3.133 3.216 1.02649218x > StringIndexOf.searchCharShortSuccess 3.76 3.761 1.000265957x > StringIndexOf.success 9.186 9.713 1.057369911x > StringIndexOf.successBig 14.341 46.343 3.231504079x > StringIndexOfChar.latin1_AVX2_String 6220.918 12154.52 1.953814533x > StringIndexOfChar.latin1_AVX2_char 5503.556 5540.044 1.006629895x > StringIndexOfChar.latin1_SSE4_String 6978.854 6818.689 0.977049957x > StringIndexOfChar.latin1_SSE4_char 5657.499 5474.624 0.967675646x > StringIndexOfChar.latin1_Short_String 7132.541 6863.359 0.962260014x > StringIndexOfChar.latin1_Short_char 16013.389 16162.437 1.009307711x > StringIndexOfChar.latin1_mixed_String 7386.123 14771.622 1.999915517x > StringIndexOfChar.latin1_mixed_char 9901.671 9782.245 0.987938803 Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: Added header file ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16753/files - new: https://git.openjdk.org/jdk/pull/16753/files/b6d77fe0..f002fd54 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16753&range=23 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16753&range=22-23 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/16753.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16753/head:pull/16753 PR: https://git.openjdk.org/jdk/pull/16753 From gziemski at openjdk.org Wed May 22 17:21:21 2024 From: gziemski at openjdk.org (Gerard Ziemski) Date: Wed, 22 May 2024 17:21:21 GMT Subject: RFR: 8312132: Add tracking of multiple address spaces in NMT [v97] In-Reply-To: References: Message-ID: On Tue, 21 May 2024 14:51:40 GMT, Gerard Ziemski wrote: >> Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix copyright > > src/hotspot/share/nmt/nmtTreap.hpp line 236: > >> 234: } >> 235: >> 236: void upsert(const K& k, const V& v) { > > Could we rename this to simply `add()` instead of `upsert()` ? I would take `insert()` over `upsert()` too, if you don't like `add()` :-) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18289#discussion_r1610374113 From gziemski at openjdk.org Wed May 22 17:25:16 2024 From: gziemski at openjdk.org (Gerard Ziemski) Date: Wed, 22 May 2024 17:25:16 GMT Subject: RFR: 8312132: Add tracking of multiple address spaces in NMT [v105] In-Reply-To: References: Message-ID: On Wed, 22 May 2024 14:12:44 GMT, Johan Sj?len wrote: >> Hi, >> >> This PR introduces a new abstraction to NMT, named `MemoryFileTracker`. Today, NMT does not track any memory outside of the virtual memory address space. This means that if you allocated memory in something such as a memory-backed file and use `mmap` to map into that memory, then you'll have trouble reporting this to NMT. This is the situation that ZGC is in, and that is what this patch attempts to fix. >> >> ## `MemoryFileTracker` >> >> The `MemoryFileTracker` adds the ability of adding new virtual memory address spaces to NMT and committing memory to these, the basic API is: >> >> ```c++ >> static MemoryFile* make_device(const char* descriptive_name); >> static void free_device(MemoryFile* device); >> >> static void allocate_memory(MemoryFile* device, size_t offset, size_t size, >> MEMFLAGS flag, const NativeCallStack& stack); >> static void free_memory(MemoryFile* device, size_t offset, size_t size); >> >> >> It is easiest to see how this is used by looking at what ZGC's `ZNMT` class does: >> >> ```c++ >> void ZNMT::reserve(zaddress_unsafe start, size_t size) { >> MemTracker::record_virtual_memory_reserve((address)start, size, CALLER_PC, mtJavaHeap); >> } >> void ZNMT::commit(zoffset offset, size_t size) { >> MemTracker::allocate_memory_in(ZNMT::_device, static_cast(offset), size, mtJavaHeap, CALLER_PC); >> } >> void ZNMT::uncommit(zoffset offset, size_t size) { >> MemTracker::free_memory_in(ZNMT::_device, (size_t)offset, size); >> } >> >> void ZNMT::map(zaddress_unsafe addr, size_t size, zoffset offset) { >> // NMT doesn't track mappings at the moment. >> } >> void ZNMT::unmap(zaddress_unsafe addr, size_t size) { >> // NMT doesn't track mappings at the moment. >> } >> >> >> As you can see, any mapping between reserved regions and device-allocated memory is not recorded in NMT. This means that in detailed mode you only get reserved regions printed for the reserved memory, the device-allocated memory is reported separately. When performing summary reporting any memory allocated via these devices is added to the corresponding `MEMFLAGS` as `committed` memory. >> >> This patch is also acting as a base on which we deploy multiple new backend ideas to NMT. These ideas are: >> >> 1. Implement VMA tracking using a balanced binary tree approach. Today's `VirtualMemoryTracker`'s usage of linked lists is slow and brittle, we'd like to move away from it. Our Treap-based approach in this patch gives a performance bo... > > Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: > > Lower number of pages We claim that: > Our Treap-based approach in this patch gives a performance boost such that we see 25x better performance in a benchmark. May I ask how you ran it? I would like to be able to reproduce our claim. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18289#issuecomment-2125372965 From wkemper at openjdk.org Wed May 22 17:34:04 2024 From: wkemper at openjdk.org (William Kemper) Date: Wed, 22 May 2024 17:34:04 GMT Subject: RFR: 8332082: Shenandoah: Use consistent tests to determine when pre-write barrier is active [v3] In-Reply-To: <7YitGep10T35vf9lzitE2Oz3A9XwZywdDpgeiQoMXho=.7bb368d9-ea10-447d-ad29-6429f8ef6631@github.com> References: <7YitGep10T35vf9lzitE2Oz3A9XwZywdDpgeiQoMXho=.7bb368d9-ea10-447d-ad29-6429f8ef6631@github.com> Message-ID: On Mon, 20 May 2024 16:59:25 GMT, William Kemper wrote: >> This is consistent with c1 and other platforms. > > William Kemper has updated the pull request incrementally with one additional commit since the last revision: > > Fix typo We usually benchmark with tiered compilation disabled. I will run some tests with tiered compilation configured to stop at level 1. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19180#issuecomment-2125387707 From sgibbons at openjdk.org Wed May 22 17:40:24 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Wed, 22 May 2024 17:40:24 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v25] In-Reply-To: References: Message-ID: > Re-write the IndexOf code without the use of the pcmpestri instruction, only using AVX2 instructions. This change accelerates String.IndexOf on average 1.3x for AVX2. The benchmark numbers: > > > Benchmark Score Latest > StringIndexOf.advancedWithMediumSub 343.573 317.934 0.925375393x > StringIndexOf.advancedWithShortSub1 1039.081 1053.96 1.014319384x > StringIndexOf.advancedWithShortSub2 55.828 110.541 1.980027943x > StringIndexOf.constantPattern 9.361 11.906 1.271872663x > StringIndexOf.searchCharLongSuccess 4.216 4.218 1.000474383x > StringIndexOf.searchCharMediumSuccess 3.133 3.216 1.02649218x > StringIndexOf.searchCharShortSuccess 3.76 3.761 1.000265957x > StringIndexOf.success 9.186 9.713 1.057369911x > StringIndexOf.successBig 14.341 46.343 3.231504079x > StringIndexOfChar.latin1_AVX2_String 6220.918 12154.52 1.953814533x > StringIndexOfChar.latin1_AVX2_char 5503.556 5540.044 1.006629895x > StringIndexOfChar.latin1_SSE4_String 6978.854 6818.689 0.977049957x > StringIndexOfChar.latin1_SSE4_char 5657.499 5474.624 0.967675646x > StringIndexOfChar.latin1_Short_String 7132.541 6863.359 0.962260014x > StringIndexOfChar.latin1_Short_char 16013.389 16162.437 1.009307711x > StringIndexOfChar.latin1_mixed_String 7386.123 14771.622 1.999915517x > StringIndexOfChar.latin1_mixed_char 9901.671 9782.245 0.987938803 Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: un-helper-ize preload_needle_helper; try fix for macos build ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16753/files - new: https://git.openjdk.org/jdk/pull/16753/files/f002fd54..b0ef5e6f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16753&range=24 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16753&range=23-24 Stats: 102 lines in 1 file changed: 5 ins; 91 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/16753.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16753/head:pull/16753 PR: https://git.openjdk.org/jdk/pull/16753 From mseledtsov at openjdk.org Wed May 22 18:22:11 2024 From: mseledtsov at openjdk.org (Mikhailo Seledtsov) Date: Wed, 22 May 2024 18:22:11 GMT Subject: RFR: 8332739: Problemlist compiler/codecache/CheckLargePages until JDK-8332654 is fixed Message-ID: Please review this trivial problem listing change. ------------- Commit messages: - 8332739: problemlist compiler/codecache/CheckLargePages until JDK-8332654 is fixed Changes: https://git.openjdk.org/jdk/pull/19351/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19351&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8332739 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/19351.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19351/head:pull/19351 PR: https://git.openjdk.org/jdk/pull/19351 From rehn at openjdk.org Wed May 22 18:41:17 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Wed, 22 May 2024 18:41:17 GMT Subject: RFR: 8332265: RISC-V: Materialize pointers faster by using a temp register [v4] In-Reply-To: References: Message-ID: On Wed, 22 May 2024 08:35:33 GMT, Robbin Ehn wrote: >> Hi, please consider! >> >> Materializing a 48-bit pointer, using an additional register, we can do with: >> lui + lui + slli + add + addi >> This 15% faster both on VF2 and in CPU models, compared to movptr(). >> >> As we often materialize during calls there is free registers. >> >> I have choose just a few spot to use it, many more can use. >> E.g. la() with tmp register can use li48 instead of movptr. >> >> Running tests now (so far so good), as if I screwed up IC calls it should be seen fast. >> And benchmarks when hardware is free. > > Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains seven commits: > > - Review changes > - Merge branch 'master' into 8332265 > - Merge branch 'master' into 8332265 > - Small review update > - li48 -> movptr > - Merge branch 'master' into 8332265 > - li48 Yes, thanks, done! ------------- PR Comment: https://git.openjdk.org/jdk/pull/19246#issuecomment-2125500938 From rehn at openjdk.org Wed May 22 18:41:19 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Wed, 22 May 2024 18:41:19 GMT Subject: RFR: 8332265: RISC-V: Materialize pointers faster by using a temp register [v4] In-Reply-To: References: Message-ID: On Wed, 22 May 2024 12:43:50 GMT, Fei Yang wrote: >> Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains seven commits: >> >> - Review changes >> - Merge branch 'master' into 8332265 >> - Merge branch 'master' into 8332265 >> - Small review update >> - li48 -> movptr >> - Merge branch 'master' into 8332265 >> - li48 > > src/hotspot/cpu/riscv/nativeInst_riscv.hpp line 141: > >> 139: // add >> 140: // addi/jalr/load >> 141: static bool check_movptr2_data_dependency(address instr) { > > Better to rename the existing `check_movptr_data_dependency` as `check_movptr1_data_dependency` at the same time. Fixed > src/hotspot/cpu/riscv/nativeInst_riscv.hpp line 421: > >> 419: void flush() { >> 420: if (!maybe_cpool_ref(instruction_address())) { >> 421: ICache::invalidate_range(instruction_address(), movptr1_instruction_size /* > movptr2_instruction_size */); > > Maybe we can simply remove this `flush()` member function which is not used anywhere. Fixed > src/hotspot/cpu/riscv/riscv.ad line 1289: > >> 1287: { >> 1288: // skip the movptr2 in MacroAssembler::ic_call(): >> 1289: // lui + addi + slli + addi + slli + addi > > You might also want to update this instruction sequence in the code comment to reflect `movptr2()`. Fixed ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19246#discussion_r1610475284 PR Review Comment: https://git.openjdk.org/jdk/pull/19246#discussion_r1610474893 PR Review Comment: https://git.openjdk.org/jdk/pull/19246#discussion_r1610475031 From rehn at openjdk.org Wed May 22 18:41:16 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Wed, 22 May 2024 18:41:16 GMT Subject: RFR: 8332265: RISC-V: Materialize pointers faster by using a temp register [v5] In-Reply-To: References: Message-ID: > Hi, please consider! > > Materializing a 48-bit pointer, using an additional register, we can do with: > lui + lui + slli + add + addi > This 15% faster both on VF2 and in CPU models, compared to movptr(). > > As we often materialize during calls there is free registers. > > I have choose just a few spot to use it, many more can use. > E.g. la() with tmp register can use li48 instead of movptr. > > Running tests now (so far so good), as if I screwed up IC calls it should be seen fast. > And benchmarks when hardware is free. Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains ten commits: - Merge branch 'master' into 8332265 - More review comments - Review changes - Merge branch 'master' into 8332265 - Merge branch 'master' into 8332265 - Small review update - li48 -> movptr - Merge branch 'master' into 8332265 - li48 ------------- Changes: https://git.openjdk.org/jdk/pull/19246/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19246&range=04 Stats: 212 lines in 8 files changed: 123 ins; 13 del; 76 mod Patch: https://git.openjdk.org/jdk/pull/19246.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19246/head:pull/19246 PR: https://git.openjdk.org/jdk/pull/19246 From sgibbons at openjdk.org Wed May 22 18:44:21 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Wed, 22 May 2024 18:44:21 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v26] In-Reply-To: References: Message-ID: > Re-write the IndexOf code without the use of the pcmpestri instruction, only using AVX2 instructions. This change accelerates String.IndexOf on average 1.3x for AVX2. The benchmark numbers: > > > Benchmark Score Latest > StringIndexOf.advancedWithMediumSub 343.573 317.934 0.925375393x > StringIndexOf.advancedWithShortSub1 1039.081 1053.96 1.014319384x > StringIndexOf.advancedWithShortSub2 55.828 110.541 1.980027943x > StringIndexOf.constantPattern 9.361 11.906 1.271872663x > StringIndexOf.searchCharLongSuccess 4.216 4.218 1.000474383x > StringIndexOf.searchCharMediumSuccess 3.133 3.216 1.02649218x > StringIndexOf.searchCharShortSuccess 3.76 3.761 1.000265957x > StringIndexOf.success 9.186 9.713 1.057369911x > StringIndexOf.successBig 14.341 46.343 3.231504079x > StringIndexOfChar.latin1_AVX2_String 6220.918 12154.52 1.953814533x > StringIndexOfChar.latin1_AVX2_char 5503.556 5540.044 1.006629895x > StringIndexOfChar.latin1_SSE4_String 6978.854 6818.689 0.977049957x > StringIndexOfChar.latin1_SSE4_char 5657.499 5474.624 0.967675646x > StringIndexOfChar.latin1_Short_String 7132.541 6863.359 0.962260014x > StringIndexOfChar.latin1_Short_char 16013.389 16162.437 1.009307711x > StringIndexOfChar.latin1_mixed_String 7386.123 14771.622 1.999915517x > StringIndexOfChar.latin1_mixed_char 9901.671 9782.245 0.987938803 Scott Gibbons has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 58 commits: - Merge branch 'openjdk:master' into indexof - un-helper-ize preload_needle_helper; try fix for macos build - Added header file - Adding exhaustive test - Added comments; move n-k<32 code up a level - Fixed CI compiles; re-factor UL processing - Addressing lots of comments. Interim commit. - Rearrange; add lambdas for clarity - Merge remote-tracking branch 'origin/master' into indexof - Move arrays_equals back to c2_MacroAssembler - ... and 48 more: https://git.openjdk.org/jdk/compare/37c47785...f4eefe1a ------------- Changes: https://git.openjdk.org/jdk/pull/16753/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=16753&range=25 Stats: 4303 lines in 16 files changed: 4140 ins; 26 del; 137 mod Patch: https://git.openjdk.org/jdk/pull/16753.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16753/head:pull/16753 PR: https://git.openjdk.org/jdk/pull/16753 From sgibbons at openjdk.org Wed May 22 18:52:27 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Wed, 22 May 2024 18:52:27 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v27] In-Reply-To: References: Message-ID: > Re-write the IndexOf code without the use of the pcmpestri instruction, only using AVX2 instructions. This change accelerates String.IndexOf on average 1.3x for AVX2. The benchmark numbers: > > > Benchmark Score Latest > StringIndexOf.advancedWithMediumSub 343.573 317.934 0.925375393x > StringIndexOf.advancedWithShortSub1 1039.081 1053.96 1.014319384x > StringIndexOf.advancedWithShortSub2 55.828 110.541 1.980027943x > StringIndexOf.constantPattern 9.361 11.906 1.271872663x > StringIndexOf.searchCharLongSuccess 4.216 4.218 1.000474383x > StringIndexOf.searchCharMediumSuccess 3.133 3.216 1.02649218x > StringIndexOf.searchCharShortSuccess 3.76 3.761 1.000265957x > StringIndexOf.success 9.186 9.713 1.057369911x > StringIndexOf.successBig 14.341 46.343 3.231504079x > StringIndexOfChar.latin1_AVX2_String 6220.918 12154.52 1.953814533x > StringIndexOfChar.latin1_AVX2_char 5503.556 5540.044 1.006629895x > StringIndexOfChar.latin1_SSE4_String 6978.854 6818.689 0.977049957x > StringIndexOfChar.latin1_SSE4_char 5657.499 5474.624 0.967675646x > StringIndexOfChar.latin1_Short_String 7132.541 6863.359 0.962260014x > StringIndexOfChar.latin1_Short_char 16013.389 16162.437 1.009307711x > StringIndexOfChar.latin1_mixed_String 7386.123 14771.622 1.999915517x > StringIndexOfChar.latin1_mixed_char 9901.671 9782.245 0.987938803 Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: Revert last change to IndexOf.java ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16753/files - new: https://git.openjdk.org/jdk/pull/16753/files/f4eefe1a..ed4451d1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16753&range=26 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16753&range=25-26 Stats: 3 lines in 1 file changed: 0 ins; 2 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/16753.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16753/head:pull/16753 PR: https://git.openjdk.org/jdk/pull/16753 From kvn at openjdk.org Wed May 22 19:02:02 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 22 May 2024 19:02:02 GMT Subject: RFR: 8332739: Problemlist compiler/codecache/CheckLargePages until JDK-8332654 is fixed In-Reply-To: References: Message-ID: On Wed, 22 May 2024 18:00:58 GMT, Mikhailo Seledtsov wrote: > Please review this trivial problem listing change. Good and trivial. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19351#pullrequestreview-2072127735 From dcubed at openjdk.org Wed May 22 19:37:01 2024 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Wed, 22 May 2024 19:37:01 GMT Subject: RFR: 8332739: Problemlist compiler/codecache/CheckLargePages until JDK-8332654 is fixed In-Reply-To: References: Message-ID: On Wed, 22 May 2024 18:00:58 GMT, Mikhailo Seledtsov wrote: > Please review this trivial problem listing change. Thumbs up. I agree this is a trivial fix. ------------- Marked as reviewed by dcubed (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19351#pullrequestreview-2072193937 From mseledtsov at openjdk.org Wed May 22 20:08:06 2024 From: mseledtsov at openjdk.org (Mikhailo Seledtsov) Date: Wed, 22 May 2024 20:08:06 GMT Subject: RFR: 8332739: Problemlist compiler/codecache/CheckLargePages until JDK-8332654 is fixed In-Reply-To: References: Message-ID: On Wed, 22 May 2024 18:00:58 GMT, Mikhailo Seledtsov wrote: > Please review this trivial problem listing change. Thank you. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19351#issuecomment-2125654753 From mseledtsov at openjdk.org Wed May 22 20:08:06 2024 From: mseledtsov at openjdk.org (Mikhailo Seledtsov) Date: Wed, 22 May 2024 20:08:06 GMT Subject: Integrated: 8332739: Problemlist compiler/codecache/CheckLargePages until JDK-8332654 is fixed In-Reply-To: References: Message-ID: On Wed, 22 May 2024 18:00:58 GMT, Mikhailo Seledtsov wrote: > Please review this trivial problem listing change. This pull request has now been integrated. Changeset: 3d4185a9 Author: Mikhailo Seledtsov URL: https://git.openjdk.org/jdk/commit/3d4185a9ce482cc655a4c67f39cb2682b02ae4fe Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod 8332739: Problemlist compiler/codecache/CheckLargePages until JDK-8332654 is fixed Reviewed-by: kvn, dcubed ------------- PR: https://git.openjdk.org/jdk/pull/19351 From djelinski at openjdk.org Wed May 22 20:33:23 2024 From: djelinski at openjdk.org (Daniel =?UTF-8?B?SmVsacWEc2tp?=) Date: Wed, 22 May 2024 20:33:23 GMT Subject: RFR: 8332724: x86 MacroAssembler may over-align code Message-ID: The methods align32 and align64 are supposed to align the next instruction to the next 32 or 64 byte boundary using the minimum number of NOP bytes. However, when the target represented as a 32bit signed int is negative, the instructions generate 32 or 64 NOP bytes too many. This was observed in `jbyte_disjoint_arraycopy_avx3` on a Linux machine, where a single align32 invocation generated 63 bytes of NOPs. This PR addresses the problem by using bit operations to calculate the required number of bytes. Tier1-3 tests passed. On a side note, `align64` and `align32` instructions were meant for aligning data for use with zmm / ymm loads, but nowadays they are frequently used in places where `align(CodeEntryAlignment)` or `align(OptoLoopAlignment)` would be more appropriate. I can address that in a separate PR if you think it's worth fixing. ------------- Commit messages: - Reduce guarantee to assert - Fix over-alignment Changes: https://git.openjdk.org/jdk/pull/19353/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19353&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8332724 Stats: 7 lines in 1 file changed: 5 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/19353.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19353/head:pull/19353 PR: https://git.openjdk.org/jdk/pull/19353 From kcr at openjdk.org Wed May 22 21:45:10 2024 From: kcr at openjdk.org (Kevin Rushforth) Date: Wed, 22 May 2024 21:45:10 GMT Subject: RFR: 8331671: Implement JEP 472: Prepare to Restrict the Use of JNI [v8] In-Reply-To: References: Message-ID: On Fri, 17 May 2024 13:38:25 GMT, Maurizio Cimadamore wrote: >> This PR implements [JEP 472](https://openjdk.org/jeps/472), by restricting the use of JNI in the following ways: >> >> * `System::load` and `System::loadLibrary` are now restricted methods >> * `Runtime::load` and `Runtime::loadLibrary` are now restricted methods >> * binding a JNI `native` method declaration to a native implementation is now considered a restricted operation >> >> This PR slightly changes the way in which the JDK deals with restricted methods, even for FFM API calls. In Java 22, the single `--enable-native-access` was used both to specify a set of modules for which native access should be allowed *and* to specify whether illegal native access (that is, native access occurring from a module not specified by `--enable-native-access`) should be treated as an error or a warning. More specifically, an error is only issued if the `--enable-native-access flag` is used at least once. >> >> Here, a new flag is introduced, namely `illegal-native-access=allow/warn/deny`, which is used to specify what should happen when access to a restricted method and/or functionality is found outside the set of modules specified with `--enable-native-access`. The default policy is `warn`, but users can select `allow` to suppress the warnings, or `deny` to cause `IllegalCallerException` to be thrown. This aligns the treatment of restricted methods with other mechanisms, such as `--illegal-access` and the more recent `--sun-misc-unsafe-memory-access`. >> >> Some changes were required in the package-info javadoc for `java.lang.foreign`, to reflect the changes in the command line flags described above. > > Maurizio Cimadamore has updated the pull request incrementally with one additional commit since the last revision: > > Address review comments I tested this with JavaFX and everything is working as I would expect. Without any options, I get the expected warnings, one time per modules for the three `javafx.*` modules that use JNI. If I pass the `--enable-native-access` options at runtime, listing those three modules, there is no warning. Further, I confirm that if I pass that option to jlink or jpackage when creating a custom runtime, there is no warning. ------------- Marked as reviewed by kcr (Author). PR Review: https://git.openjdk.org/jdk/pull/19213#pullrequestreview-2072430338 From sgibbons at openjdk.org Wed May 22 21:45:50 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Wed, 22 May 2024 21:45:50 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v28] In-Reply-To: References: Message-ID: > Re-write the IndexOf code without the use of the pcmpestri instruction, only using AVX2 instructions. This change accelerates String.IndexOf on average 1.3x for AVX2. The benchmark numbers: > > > Benchmark Score Latest > StringIndexOf.advancedWithMediumSub 343.573 317.934 0.925375393x > StringIndexOf.advancedWithShortSub1 1039.081 1053.96 1.014319384x > StringIndexOf.advancedWithShortSub2 55.828 110.541 1.980027943x > StringIndexOf.constantPattern 9.361 11.906 1.271872663x > StringIndexOf.searchCharLongSuccess 4.216 4.218 1.000474383x > StringIndexOf.searchCharMediumSuccess 3.133 3.216 1.02649218x > StringIndexOf.searchCharShortSuccess 3.76 3.761 1.000265957x > StringIndexOf.success 9.186 9.713 1.057369911x > StringIndexOf.successBig 14.341 46.343 3.231504079x > StringIndexOfChar.latin1_AVX2_String 6220.918 12154.52 1.953814533x > StringIndexOfChar.latin1_AVX2_char 5503.556 5540.044 1.006629895x > StringIndexOfChar.latin1_SSE4_String 6978.854 6818.689 0.977049957x > StringIndexOfChar.latin1_SSE4_char 5657.499 5474.624 0.967675646x > StringIndexOfChar.latin1_Short_String 7132.541 6863.359 0.962260014x > StringIndexOfChar.latin1_Short_char 16013.389 16162.437 1.009307711x > StringIndexOfChar.latin1_mixed_String 7386.123 14771.622 1.999915517x > StringIndexOfChar.latin1_mixed_char 9901.671 9782.245 0.987938803 Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: Remove DO_EARLY_BAILOUT ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16753/files - new: https://git.openjdk.org/jdk/pull/16753/files/ed4451d1..027daf73 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16753&range=27 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16753&range=26-27 Stats: 19 lines in 1 file changed: 0 ins; 19 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/16753.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16753/head:pull/16753 PR: https://git.openjdk.org/jdk/pull/16753 From iklam at openjdk.org Wed May 22 21:53:28 2024 From: iklam at openjdk.org (Ioi Lam) Date: Wed, 22 May 2024 21:53:28 GMT Subject: RFR: 8293980: Resolve CONSTANT_FieldRef at CDS dump time Message-ID: <6tYcoQdH8mEhbBRUoAcOi3Gue7Lz9qIjJh3GpcLKGGE=.0b286773-cc03-434e-88ca-2c9cc5efce67@github.com> This PR tries store CONSTANT_FieldRef entries in the resolved state whenever it's safe to do so. I.e., when a constant pool entry in class `A` refers to a *non-static* field `B.F`, - `B` must be the same class as `A`; or - `B` is a supertype of `A`; or - `B` is one of the [vmClasses](https://github.com/openjdk/jdk/blob/3d4185a9ce482cc655a4c67f39cb2682b02ae4fe/src/hotspot/share/classfile/vmClasses.hpp), and `A` is loaded by the boot class loader. Under these conditions, it's guaranteed that whenever `A` tries to use this entry at run time, `B` is guaranteed to have already been resolved in A's system dictionary, to the same value as resolved during dump time. Therefore, we can safely archive the `ResolvedFieldEntry` in class `A` that refers to `B.F`. Note that this ------------- Commit messages: - 8293980: Resolve CONSTANT_FieldRef at CDS dump time Changes: https://git.openjdk.org/jdk/pull/19355/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19355&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8293980 Stats: 1134 lines in 30 files changed: 969 ins; 57 del; 108 mod Patch: https://git.openjdk.org/jdk/pull/19355.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19355/head:pull/19355 PR: https://git.openjdk.org/jdk/pull/19355 From dlong at openjdk.org Wed May 22 22:15:01 2024 From: dlong at openjdk.org (Dean Long) Date: Wed, 22 May 2024 22:15:01 GMT Subject: RFR: 8332724: x86 MacroAssembler may over-align code In-Reply-To: References: Message-ID: On Wed, 22 May 2024 19:04:27 GMT, Daniel Jeli?ski wrote: > The methods align32 and align64 are supposed to align the next instruction to the next 32 or 64 byte boundary using the minimum number of NOP bytes. However, when the target represented as a 32bit signed int is negative, the instructions generate 32 or 64 NOP bytes too many. This was observed in `jbyte_disjoint_arraycopy_avx3` on a Linux machine, where a single align32 invocation generated 63 bytes of NOPs. > > This PR addresses the problem by using bit operations to calculate the required number of bytes. > > Tier1-3 tests passed. > > On a side note, `align64` and `align32` instructions were meant for aligning data for use with zmm / ymm loads, but nowadays they are frequently used in places where `align(CodeEntryAlignment)` or `align(OptoLoopAlignment)` would be more appropriate. I can address that in a separate PR if you think it's worth fixing. src/hotspot/cpu/x86/macroAssembler_x86.cpp line 1166: > 1164: } > 1165: > 1166: void MacroAssembler::align(int modulus, int target) { How about making both parameters unsigned? And callers could be changed to something like: align(64, (uint)(uintptr_t)pc() & 63); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19353#discussion_r1610731612 From sgibbons at openjdk.org Thu May 23 01:29:45 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Thu, 23 May 2024 01:29:45 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v29] In-Reply-To: References: Message-ID: <3Qow6_N97mxWzdMj2zmgj9MHmDWuIG4LYm_Lj4arxcg=.c8dba6ef-26bf-48e5-9a70-b010dcc8940b@github.com> > Re-write the IndexOf code without the use of the pcmpestri instruction, only using AVX2 instructions. This change accelerates String.IndexOf on average 1.3x for AVX2. The benchmark numbers: > > > Benchmark Score Latest > StringIndexOf.advancedWithMediumSub 343.573 317.934 0.925375393x > StringIndexOf.advancedWithShortSub1 1039.081 1053.96 1.014319384x > StringIndexOf.advancedWithShortSub2 55.828 110.541 1.980027943x > StringIndexOf.constantPattern 9.361 11.906 1.271872663x > StringIndexOf.searchCharLongSuccess 4.216 4.218 1.000474383x > StringIndexOf.searchCharMediumSuccess 3.133 3.216 1.02649218x > StringIndexOf.searchCharShortSuccess 3.76 3.761 1.000265957x > StringIndexOf.success 9.186 9.713 1.057369911x > StringIndexOf.successBig 14.341 46.343 3.231504079x > StringIndexOfChar.latin1_AVX2_String 6220.918 12154.52 1.953814533x > StringIndexOfChar.latin1_AVX2_char 5503.556 5540.044 1.006629895x > StringIndexOfChar.latin1_SSE4_String 6978.854 6818.689 0.977049957x > StringIndexOfChar.latin1_SSE4_char 5657.499 5474.624 0.967675646x > StringIndexOfChar.latin1_Short_String 7132.541 6863.359 0.962260014x > StringIndexOfChar.latin1_Short_char 16013.389 16162.437 1.009307711x > StringIndexOfChar.latin1_mixed_String 7386.123 14771.622 1.999915517x > StringIndexOfChar.latin1_mixed_char 9901.671 9782.245 0.987938803 Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: Check macos build ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16753/files - new: https://git.openjdk.org/jdk/pull/16753/files/027daf73..42af0b50 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16753&range=28 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16753&range=27-28 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/16753.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16753/head:pull/16753 PR: https://git.openjdk.org/jdk/pull/16753 From sgibbons at openjdk.org Thu May 23 02:03:26 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Thu, 23 May 2024 02:03:26 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v30] In-Reply-To: References: Message-ID: > Re-write the IndexOf code without the use of the pcmpestri instruction, only using AVX2 instructions. This change accelerates String.IndexOf on average 1.3x for AVX2. The benchmark numbers: > > > Benchmark Score Latest > StringIndexOf.advancedWithMediumSub 343.573 317.934 0.925375393x > StringIndexOf.advancedWithShortSub1 1039.081 1053.96 1.014319384x > StringIndexOf.advancedWithShortSub2 55.828 110.541 1.980027943x > StringIndexOf.constantPattern 9.361 11.906 1.271872663x > StringIndexOf.searchCharLongSuccess 4.216 4.218 1.000474383x > StringIndexOf.searchCharMediumSuccess 3.133 3.216 1.02649218x > StringIndexOf.searchCharShortSuccess 3.76 3.761 1.000265957x > StringIndexOf.success 9.186 9.713 1.057369911x > StringIndexOf.successBig 14.341 46.343 3.231504079x > StringIndexOfChar.latin1_AVX2_String 6220.918 12154.52 1.953814533x > StringIndexOfChar.latin1_AVX2_char 5503.556 5540.044 1.006629895x > StringIndexOfChar.latin1_SSE4_String 6978.854 6818.689 0.977049957x > StringIndexOfChar.latin1_SSE4_char 5657.499 5474.624 0.967675646x > StringIndexOfChar.latin1_Short_String 7132.541 6863.359 0.962260014x > StringIndexOfChar.latin1_Short_char 16013.389 16162.437 1.009307711x > StringIndexOfChar.latin1_mixed_String 7386.123 14771.622 1.999915517x > StringIndexOfChar.latin1_mixed_char 9901.671 9782.245 0.987938803 Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: Check macos build ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16753/files - new: https://git.openjdk.org/jdk/pull/16753/files/42af0b50..40a1e628 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16753&range=29 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16753&range=28-29 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/16753.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16753/head:pull/16753 PR: https://git.openjdk.org/jdk/pull/16753 From liach at openjdk.org Thu May 23 03:33:10 2024 From: liach at openjdk.org (Chen Liang) Date: Thu, 23 May 2024 03:33:10 GMT Subject: RFR: 8242888: Convert dynamic proxy to hidden classes Message-ID: Please review this change that convert dynamic proxies implementations to hidden classes, intended to target JDK 24. Summary: 1. Adds new implementation while preserving the old implementation behind `-Djdk.reflect.useLegacyProxyImpl=true` in case there are compatibility issues. 2. ClassLoader.defineClass0 takes a ClassLoader instance but discards it in native code; I updated native code to reuse that ClassLoader for Proxy support. 3. ProxyGenerator changes mainly involve using Class data to pass Method list (accessed in a single condy) and removal of obsolete setup code generation. Testing: tier1 and tier2 have no related failures. Comment: Since #8278, Proxy has been converted to ClassFile API, and infrastructure has changed; now, the migration to hidden classes is much cleaner and has less impact, such as preserving ProtectionDomain and dynamic module without "anchor classes", and avoiding java.lang.invoke package. ------------- Commit messages: - Fixes - Merge branch 'master' of https://github.com/openjdk/jdk into feature/hidden-proxy - First draft Changes: https://git.openjdk.org/jdk/pull/19356/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19356&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8242888 Stats: 303 lines in 8 files changed: 70 ins; 153 del; 80 mod Patch: https://git.openjdk.org/jdk/pull/19356.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19356/head:pull/19356 PR: https://git.openjdk.org/jdk/pull/19356 From iklam at openjdk.org Thu May 23 03:35:19 2024 From: iklam at openjdk.org (Ioi Lam) Date: Thu, 23 May 2024 03:35:19 GMT Subject: RFR: 8293980: Resolve CONSTANT_FieldRef at CDS dump time [v2] In-Reply-To: <6tYcoQdH8mEhbBRUoAcOi3Gue7Lz9qIjJh3GpcLKGGE=.0b286773-cc03-434e-88ca-2c9cc5efce67@github.com> References: <6tYcoQdH8mEhbBRUoAcOi3Gue7Lz9qIjJh3GpcLKGGE=.0b286773-cc03-434e-88ca-2c9cc5efce67@github.com> Message-ID: <7Kk3VF3qMR0IdptWLG1GGiWLbDm1BfCP2zBh7s6n3WE=.f245c5a2-cc27-4331-a401-1eaea41262ed@github.com> > ### Overview > > This PR archives `CONSTANT_FieldRef` entries in the _resolved_ state when it's safe to do so. > > I.e., when a `CONSTANT_FieldRef` constant pool entry in class `A` refers to a *non-static* field `B.F`, > - `B` is the same class as `A`; or > - `B` is a supertype of `A`; or > - `B` is one of the [vmClasses](https://github.com/openjdk/jdk/blob/3d4185a9ce482cc655a4c67f39cb2682b02ae4fe/src/hotspot/share/classfile/vmClasses.hpp), and `A` is loaded by the boot class loader. > > Under these conditions, it's guaranteed that whenever `A` tries to use this entry at runtime, `B` is guaranteed to have already been resolved in A's system dictionary, to the same value as resolved during dump time. > > Therefore, we can safely archive the `ResolvedFieldEntry` in class `A` that refers to `B.F`. > > (Note that we do not archive the `CONSTANT_FieldRef` entries for static fields, as the resolution of such entries can lead to class initialization at runtime. We plan to handle them in a future RFE.) > > ### Static CDS Archive > > This feature is implemented in three steps for static CDS archive dump: > > 1. At the end of the training run, `ClassListWriter` iterates over all loaded classes and writes the indices of their resolved `Class` and `FieldRef` constant pool entries into the classlist file, with the `@cp` prefix. E.g., the following means that the constant pool entries at indices 2, 19 and 106 were resolved during the training run: > > @cp java/util/Objects 2 19 106 > > 2. When creating the static CDS archive from the classlist file, `ClassListParser` processes the `@cp` entries and resolves all the indicated entries. > > 3. Inside the `ArchiveBuilder::make_klasses_shareable()` function, we iterate over all entries in all archived `ConstantPools`. When we see a _resolved_ entry that does not satisfy the safety requirements as stated in _Overview_, we revert it back to the unresolved state. > > ### Dynamic CDS Archive > > When dumping the dynamic CDS archive, `ClassListWriter` and `ClassListParser` are not used, so steps 1 and 2 are skipped. We only perform step 3 when the archive is being written. > > ### Limitations > > - For safety, we limit this optimization to only classes loaded by the boot, platform, and app class loaders. This may be relaxed in the future. > - We archive only the constant pool entries that are actually resolved during the training run. We don't speculatively resolve other entries, as doing so may cause C2 to unnecessarily generate code for paths that are never taken by the app... Ioi Lam has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: - Merge branch 'master' into 8293980-resolve-fields-at-dumptime - 8293980: Resolve CONSTANT_FieldRef at CDS dump time ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19355/files - new: https://git.openjdk.org/jdk/pull/19355/files/6a3dc649..3900c568 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19355&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19355&range=00-01 Stats: 289 lines in 23 files changed: 123 ins; 139 del; 27 mod Patch: https://git.openjdk.org/jdk/pull/19355.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19355/head:pull/19355 PR: https://git.openjdk.org/jdk/pull/19355 From iklam at openjdk.org Thu May 23 03:35:19 2024 From: iklam at openjdk.org (Ioi Lam) Date: Thu, 23 May 2024 03:35:19 GMT Subject: RFR: 8293980: Resolve CONSTANT_FieldRef at CDS dump time In-Reply-To: <6tYcoQdH8mEhbBRUoAcOi3Gue7Lz9qIjJh3GpcLKGGE=.0b286773-cc03-434e-88ca-2c9cc5efce67@github.com> References: <6tYcoQdH8mEhbBRUoAcOi3Gue7Lz9qIjJh3GpcLKGGE=.0b286773-cc03-434e-88ca-2c9cc5efce67@github.com> Message-ID: <6z4OzxgiVgCOl0yboRFYbQ_sTDHplTN-w7gYAPixynY=.b8dc7862-054a-4e16-82ac-4c3cf72a8486@github.com> On Wed, 22 May 2024 21:48:44 GMT, Ioi Lam wrote: > ### Overview > > This PR archives `CONSTANT_FieldRef` entries in the _resolved_ state when it's safe to do so. > > I.e., when a `CONSTANT_FieldRef` constant pool entry in class `A` refers to a *non-static* field `B.F`, > - `B` is the same class as `A`; or > - `B` is a supertype of `A`; or > - `B` is one of the [vmClasses](https://github.com/openjdk/jdk/blob/3d4185a9ce482cc655a4c67f39cb2682b02ae4fe/src/hotspot/share/classfile/vmClasses.hpp), and `A` is loaded by the boot class loader. > > Under these conditions, it's guaranteed that whenever `A` tries to use this entry at runtime, `B` is guaranteed to have already been resolved in A's system dictionary, to the same value as resolved during dump time. > > Therefore, we can safely archive the `ResolvedFieldEntry` in class `A` that refers to `B.F`. > > (Note that we do not archive the `CONSTANT_FieldRef` entries for static fields, as the resolution of such entries can lead to class initialization at runtime. We plan to handle them in a future RFE.) > > ### Static CDS Archive > > This feature is implemented in three steps for static CDS archive dump: > > 1. At the end of the training run, `ClassListWriter` iterates over all loaded classes and writes the indices of their resolved `Class` and `FieldRef` constant pool entries into the classlist file, with the `@cp` prefix. E.g., the following means that the constant pool entries at indices 2, 19 and 106 were resolved during the training run: > > @cp java/util/Objects 2 19 106 > > 2. When creating the static CDS archive from the classlist file, `ClassListParser` processes the `@cp` entries and resolves all the indicated entries. > > 3. Inside the `ArchiveBuilder::make_klasses_shareable()` function, we iterate over all entries in all archived `ConstantPools`. When we see a _resolved_ entry that does not satisfy the safety requirements as stated in _Overview_, we revert it back to the unresolved state. > > ### Dynamic CDS Archive > > When dumping the dynamic CDS archive, `ClassListWriter` and `ClassListParser` are not used, so steps 1 and 2 are skipped. We only perform step 3 when the archive is being written. > > ### Limitations > > - For safety, we limit this optimization to only classes loaded by the boot, platform, and app class loaders. This may be relaxed in the future. > - We archive only the constant pool entries that are actually resolved during the training run. We don't speculatively resolve other entries, as doing so may cause C2 to unnecessarily generate code for paths that are never taken by the app... I pressed the wrong button and sent out the RFR mail too soon .... I have finished updating the PR description text. It's ready for review now. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19355#issuecomment-2126167774 From duke at openjdk.org Thu May 23 05:53:22 2024 From: duke at openjdk.org (kuaiwei) Date: Thu, 23 May 2024 05:53:22 GMT Subject: RFR: 8325821: [REDO] use "dmb.ishst+dmb.ishld" for release barrier [v3] In-Reply-To: References: Message-ID: > he origin patch for https://bugs.openjdk.org/browse/JDK-8324186 has 2 issues: > 1 It show regression in some platform, like Apple silicon in mac os > 2 Can not handle instruction sequence like "dmb.ishld; dmb.ishst; dmb.ishld; dmb.ishld" > > It can be fixed by: > 1 Enable AlwaysMergeDMB by default, only disable it in architecture we can see performance improvement (N1 or N2) > 2 Check the special pattern and merge the subsequent dmb. > > It also fix a bug when code buffer is expanding, st/ld/dmb can not be merged. I added unit tests for these. > > This patch still has a unhandled case. Insts like "dmb.ishld; dmb.ishst; dmb.ish", it will merge the last 2 instructions and can not merge all three. Because when emitting dmb.ish, if merge all previous dmbs, the code buffer will shrink the size. I think it may break some resumption and think it's not a common pattern. > > In previous PR https://github.com/openjdk/jdk/pull/18467 , I tried an implementation to use state machine for merging. But it looks risky to pending instruction during emitting. kuaiwei has updated the pull request incrementally with one additional commit since the last revision: Add more unit tests ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19278/files - new: https://git.openjdk.org/jdk/pull/19278/files/8767e8fa..6214b435 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19278&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19278&range=01-02 Stats: 7666 lines in 2 files changed: 7665 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/19278.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19278/head:pull/19278 PR: https://git.openjdk.org/jdk/pull/19278 From duke at openjdk.org Thu May 23 05:57:04 2024 From: duke at openjdk.org (kuaiwei) Date: Thu, 23 May 2024 05:57:04 GMT Subject: RFR: 8325821: [REDO] use "dmb.ishst+dmb.ishld" for release barrier [v2] In-Reply-To: References: Message-ID: <9h-ta3XTnzioy3Ghdeulm6FgZYDJb2y5mDdMLGw3oYc=.defe7ef1-15dd-451d-8b79-3688c1e7a1da@github.com> On Wed, 22 May 2024 10:25:55 GMT, Aleksey Shipilev wrote: >> kuaiwei has updated the pull request incrementally with one additional commit since the last revision: >> >> Make MacroAssembler::merge more clear > > src/hotspot/cpu/aarch64/globals_aarch64.hpp line 127: > >> 125: product(ccstr, UseBranchProtection, "none", \ >> 126: "Branch Protection to use: none, standard, pac-ret") \ >> 127: product(bool, AlwaysMergeDMB, true, DIAGNOSTIC, \ > > Suggestion: > > product(bool, AlwaysMergeDMB, true, DIAGNOSTIC, \ Fixed > test/hotspot/gtest/aarch64/test_assembler_aarch64.cpp line 93: > >> 91: } >> 92: >> 93: TEST_VM(AssemblerAArch64, merge_dmb) { > > Given the previous experience with barrier merges that prompted the backout, I would prefer to have a more comprehensive test here, maybe an additional one. I am thinking something like the exhaustive combination of 4 back-to-back barriers of each of 5 types. This gives us 5^4 = 625 test cases, which I think is still manageable. Test is added as merge_dmb_all_kinds ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19278#discussion_r1611041488 PR Review Comment: https://git.openjdk.org/jdk/pull/19278#discussion_r1611040971 From duke at openjdk.org Thu May 23 06:02:01 2024 From: duke at openjdk.org (kuaiwei) Date: Thu, 23 May 2024 06:02:01 GMT Subject: RFR: 8325821: [REDO] use "dmb.ishst+dmb.ishld" for release barrier [v3] In-Reply-To: References: <7eML4nr0XN1_QVOO_2tk-yXf8W578S4qb1kA3AoaU8w=.81b03ff5-7ba8-496d-acfe-285ba3de2004@github.com> Message-ID: On Tue, 21 May 2024 03:01:09 GMT, kuaiwei wrote: >> src/hotspot/cpu/aarch64/aarch64.ad line 7841: >> >>> 7839: ins_encode %{ >>> 7840: __ block_comment("membar_release"); >>> 7841: __ membar(Assembler::StoreStore); >> >> Do we need to respect `AlwaysMergeDMB`here? > > Yes, usually they can be merged in macroAssembler. but it can help to reduce the possibility of unmerged case. Thanks to point it. I checked code again. They will be merged if enable AlwaysMergeDMB. So we can skip the check. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19278#discussion_r1611045449 From alanb at openjdk.org Thu May 23 06:15:00 2024 From: alanb at openjdk.org (Alan Bateman) Date: Thu, 23 May 2024 06:15:00 GMT Subject: RFR: 8242888: Convert dynamic proxy to hidden classes In-Reply-To: References: Message-ID: On Thu, 23 May 2024 03:28:30 GMT, Chen Liang wrote: > Please review this change that convert dynamic proxies implementations to hidden classes, intended to target JDK 24. > > Summary: > 1. Adds new implementation while preserving the old implementation behind `-Djdk.reflect.useLegacyProxyImpl=true` in case there are compatibility issues. > 2. ClassLoader.defineClass0 takes a ClassLoader instance but discards it in native code; I updated native code to reuse that ClassLoader for Proxy support. > 3. ProxyGenerator changes mainly involve using Class data to pass Method list (accessed in a single condy) and removal of obsolete setup code generation. > > Testing: tier1 and tier2 have no related failures. > > Comment: Since #8278, Proxy has been converted to ClassFile API, and infrastructure has changed; now, the migration to hidden classes is much cleaner and has less impact, such as preserving ProtectionDomain and dynamic module without "anchor classes", and avoiding java.lang.invoke package. There are compatibility concerns and behavioural differences that will require significant effort to consider before doing this. This is the reason that this one has been kicked down the road several times. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19356#issuecomment-2126310091 From alanb at openjdk.org Thu May 23 06:23:04 2024 From: alanb at openjdk.org (Alan Bateman) Date: Thu, 23 May 2024 06:23:04 GMT Subject: RFR: 8331671: Implement JEP 472: Prepare to Restrict the Use of JNI [v8] In-Reply-To: References: Message-ID: On Wed, 22 May 2024 21:42:14 GMT, Kevin Rushforth wrote: > Further, I confirm that if I pass that option to jlink or jpackage when creating a custom runtime, there is no warning. Great! What about jpackage without a custom runtime, wondering if --java-options can be tested. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19213#issuecomment-2126320311 From epeter at openjdk.org Thu May 23 06:24:06 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 23 May 2024 06:24:06 GMT Subject: RFR: 8331193: Return references when possible in GrowableArray [v7] In-Reply-To: References: Message-ID: On Wed, 22 May 2024 15:34:17 GMT, Johan Sj?len wrote: >> Hi, >> >> This PR introduces the possibility of using references more often when using GrowableArray, where as previously this was only possible when using the `at()` method. This lets us avoid copying and redundant method calls and makes the API more streamlined. After the patch, we can use `at_grow` just like `at` works. The same goes for `top`, `first`, and `last`. >> >> >> Some example code: >> ```c++ >> // Before this patch this worked: >> GrowableArray arr(8,8,-1); // Pre-fill with 8 -1s >> int& x = arr.at(7); >> if (x == -1) { >> x = 2; >> } >> assert(arr.at(7) == 2, "this holds"); >> // but this was forbidden >> int& x = arr.at_grow(9, -1); // Compilation error! at_grow returns E, not E& >> // so we had to do >> int x = arr.at_grow(9, -1); >> if (x == -1) { >> arr.at_put(9, 2); >> } >> >> >> Thanks. > > Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: > > Add test Changes requested by epeter (Reviewer). test/hotspot/gtest/utilities/test_growableArray.cpp line 669: > 667: TEST(GrowableArrayCHeap, ReturningReferencesWorksAsExpected) { > 668: GrowableArrayCHeap arr(8, 8, -1); // Pre-fill with 8 -1s > 669: int& x = arr.at_grow(9, -1); Suggestion: int& x = arr.at_grow(9, -1); EXPECT_EQ(-1, arr.at(9)); EXPECT_EQ(-1, x); test/hotspot/gtest/utilities/test_growableArray.cpp line 672: > 670: x = 2; > 671: EXPECT_EQ(2, arr.at(9)); > 672: x = arr.top(); This is not using reference, right? I thought you can only use reference assignment when you declare the `x` variable. This here is using value, I think. ------------- PR Review: https://git.openjdk.org/jdk/pull/18975#pullrequestreview-2072954331 PR Review Comment: https://git.openjdk.org/jdk/pull/18975#discussion_r1611061861 PR Review Comment: https://git.openjdk.org/jdk/pull/18975#discussion_r1611063611 From epeter at openjdk.org Thu May 23 06:24:07 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 23 May 2024 06:24:07 GMT Subject: RFR: 8331193: Return references when possible in GrowableArray [v7] In-Reply-To: References: Message-ID: On Thu, 23 May 2024 06:20:30 GMT, Emanuel Peter wrote: >> Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: >> >> Add test > > test/hotspot/gtest/utilities/test_growableArray.cpp line 672: > >> 670: x = 2; >> 671: EXPECT_EQ(2, arr.at(9)); >> 672: x = arr.top(); > > This is not using reference, right? I thought you can only use reference assignment when you declare the `x` variable. This here is using value, I think. Maybe that was your intention. But then you should have another test where you use `top` with reference. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18975#discussion_r1611064827 From stuefe at openjdk.org Thu May 23 06:59:18 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 23 May 2024 06:59:18 GMT Subject: RFR: 8312132: Add tracking of multiple address spaces in NMT [v105] In-Reply-To: References: Message-ID: On Wed, 22 May 2024 14:12:44 GMT, Johan Sj?len wrote: >> Hi, >> >> This PR introduces a new abstraction to NMT, named `MemoryFileTracker`. Today, NMT does not track any memory outside of the virtual memory address space. This means that if you allocated memory in something such as a memory-backed file and use `mmap` to map into that memory, then you'll have trouble reporting this to NMT. This is the situation that ZGC is in, and that is what this patch attempts to fix. >> >> ## `MemoryFileTracker` >> >> The `MemoryFileTracker` adds the ability of adding new virtual memory address spaces to NMT and committing memory to these, the basic API is: >> >> ```c++ >> static MemoryFile* make_device(const char* descriptive_name); >> static void free_device(MemoryFile* device); >> >> static void allocate_memory(MemoryFile* device, size_t offset, size_t size, >> MEMFLAGS flag, const NativeCallStack& stack); >> static void free_memory(MemoryFile* device, size_t offset, size_t size); >> >> >> It is easiest to see how this is used by looking at what ZGC's `ZNMT` class does: >> >> ```c++ >> void ZNMT::reserve(zaddress_unsafe start, size_t size) { >> MemTracker::record_virtual_memory_reserve((address)start, size, CALLER_PC, mtJavaHeap); >> } >> void ZNMT::commit(zoffset offset, size_t size) { >> MemTracker::allocate_memory_in(ZNMT::_device, static_cast(offset), size, mtJavaHeap, CALLER_PC); >> } >> void ZNMT::uncommit(zoffset offset, size_t size) { >> MemTracker::free_memory_in(ZNMT::_device, (size_t)offset, size); >> } >> >> void ZNMT::map(zaddress_unsafe addr, size_t size, zoffset offset) { >> // NMT doesn't track mappings at the moment. >> } >> void ZNMT::unmap(zaddress_unsafe addr, size_t size) { >> // NMT doesn't track mappings at the moment. >> } >> >> >> As you can see, any mapping between reserved regions and device-allocated memory is not recorded in NMT. This means that in detailed mode you only get reserved regions printed for the reserved memory, the device-allocated memory is reported separately. When performing summary reporting any memory allocated via these devices is added to the corresponding `MEMFLAGS` as `committed` memory. >> >> This patch is also acting as a base on which we deploy multiple new backend ideas to NMT. These ideas are: >> >> 1. Implement VMA tracking using a balanced binary tree approach. Today's `VirtualMemoryTracker`'s usage of linked lists is slow and brittle, we'd like to move away from it. Our Treap-based approach in this patch gives a performance bo... > > Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: > > Lower number of pages More comments, but out of time. Will continue tomorrow or next week. test/hotspot/gtest/nmt/test_vmatree.cpp line 67: > 65: using Tree = VMATree; > 66: using Node = Tree::TreapNode; > 67: using NCS = NativeCallStackStorage; Tree: not consistently used. I would prefer consistent use of VMATree, but if you alias, it should at least be used consistently. NCS: not used at all. If you need these definitions, can they not be somewhere at the start of the file? test/hotspot/gtest/nmt/test_vmatree.cpp line 70: > 68: NativeCallStackStorage ncs(true); > 69: NativeCallStackStorage::StackIndex si1 = ncs.push(stack1); > 70: NativeCallStackStorage::StackIndex si2 = ncs.push(stack2); Should be provided by the fixup class, or as global statics. test/hotspot/gtest/nmt/test_vmatree.cpp line 71: > 69: NativeCallStackStorage::StackIndex si1 = ncs.push(stack1); > 70: NativeCallStackStorage::StackIndex si2 = ncs.push(stack2); > 71: Please make these auto functions normal conventional functions. I really dislike that style. It has real disadvantages. For example, no IDE I know can resolve a call graph across them, or even into them. E.g., in CDT, one of the most capable C++ IDEs, the call graph for VMATreeTest::in_type_of gives me .... nothing. I need to do a dumb full text search to find the call sites. And if the term to search for is very generic, I am out of luck. I know several developers that rely heavily on call graph search in CDT, I certainly do. New techniques should serve a purpose. Lambdas as replacement for our old Closures makes sense, since they bring benefit (avoiding runtime polymorphy). But here I don't see the point. test/hotspot/gtest/nmt/test_vmatree.cpp line 77: > 75: for (int i = 0; i < 100; i++) { > 76: tree.reserve_mapping(i * 100, 100, rd); > 77: } No need to do this 100 times, and it somewhat obfuscates what you want to do here. Another problem with 100 times is that you seem to like EXPECT more than ASSERT, and if error we get 100+ error messages. Comment is inprecise. Only 2 nodes *if properties match*. Please change comment to something like "adjacent areas having the same properties should be merged", then simplify to something like "reserve 1, reserve 2, expect count 2" test/hotspot/gtest/nmt/test_vmatree.cpp line 81: > 79: treap(tree).visit_range_in_order(0, 999999, [&](Node* x) { > 80: found_nodes++; > 81: }); Recurring pattern. Give the fixture class a `count_nodes()` and do it there. Ideally, with some sugar for asserts to keep testing code small, e.g. `assert_count(int expected_count)`. test/hotspot/gtest/nmt/test_vmatree.cpp line 125: > 123: for (int i = 0; i < 100; i++) { > 124: tree.release_mapping(i*100, 100); > 125: } If you feel you need the 100-times-reservation, write a helper function in the fixture. reserve_100() or so. test/hotspot/gtest/nmt/test_vmatree.cpp line 126: > 124: tree.release_mapping(i*100, 100); > 125: } > 126: EXPECT_EQ(nullptr, treap_root(tree)) << "Releasing all memory should result in an empty tree"; Recurring. Give fixture something like assert_null_root() test/hotspot/gtest/nmt/test_vmatree.cpp line 156: > 154: } > 155: i++; > 156: }); This doesn't really test the state, nor the stack. It also seems to be a lot of code for a single-use test. Similar in other tests. --- Proposal: since you have the recurring pattern of doing something with the tree, then checking its expected state, write a check function in the fixture (e.g. assert_tree_state() ) that does that. Then use it like this: // Committing in middle of reservation ends with a sequence of 4 nodes TEST_VM_F(VMATreeTest, commit_in_middle_of_reservation) { ... reserve, commit in middle, then const AddressState expected_states[] = { { 0, .... }, { 25, .... }, { 50, .... }, { 100, .... } }; assert_tree_state(expected_states, 4); } And to preserve your sanity when constructing AddressState with its many components, you can add a helper that allows that in a human-friendly form. For example: A string that encodes flag, stack, and reserved/Commit state in three letters. First letter: A-H, let this be one of 8 selected MEMFLAGS or - for mtNone Second letter: a-d, let this be one of 4 selected stacks, or - for no stack/empty Third letter: R or C , resreved, committed or none, or - for unreserved Season to taste. Then, e.g. for reserving 0..100, and committing 25..50, you write: const AddressState expected_states[] = { makestate( 0, "---", "AaR"), makestate(25, "AaR", "AaC"), makestate(50, "AaC", "AaR"), makestate( 0, "AaR", "---") }; assert_tree_state(expected_states, 4); Now you can write a bunch of corner-case tests without being too verbose, intent is immediately clear. As an added bonus, it checks for tree state *exactly*, e.g. which nodes are part of the tree, which are not, in which order, etc. If you reuse that string format for reserving etc, you could do this: reserve_or_commit(0, 100, "AaR"); reserve_or_commit(25, 50, "AaC"); or just do(0, 100, "AaR"); do(25, 50, "AaC"); >From there one, one could even auto-build the expected state, but if too much automatism goes into the test, this increases the possibility for errors creeping in one never finds because they make the test go green. test/hotspot/gtest/nmt/test_vmatree.cpp line 351: > 349: struct SimpleVMATracker : public CHeapObj { > 350: const size_t page_size = 4096; > 351: enum Tpe { Reserved, Committed, Free }; Wow, we save one letter! ;-) test/hotspot/gtest/nmt/test_vmatree.cpp line 366: > 364: }; > 365: // Page (4KiB) granular array > 366: const size_t num_pages = 1024 * 512; constexpr. And then use it :) see line below test/hotspot/gtest/nmt/test_vmatree.cpp line 377: > 375: > 376: VMATree::SummaryDiff do_it(Tpe tpe, size_t start, size_t size, NativeCallStack stack, MEMFLAGS flag) { > 377: assert(size % page_size == 0 && start % page_size == 0, "page alignment"); use is_aligned test/hotspot/gtest/nmt/test_vmatree.cpp line 440: > 438: for (int i = 0; i < operation_count; i++) { > 439: const size_t page_start = (size_t)(os::random() % tr->num_pages); > 440: const size_t num_pages = (size_t)(os::random() % (tr->num_pages - page_start)); Proposal: roll dice for two positions in num_pages range. Then sort them (swap if A > B). That gives you a fairer distribution, since you don't overemphasize the end of the range. test/hotspot/gtest/nmt/test_vmatree.cpp line 447: > 445: const size_t size = num_pages * page_size; > 446: > 447: const MEMFLAGS flag = (MEMFLAGS)(os::random() % mt_number_of_types); I would maybe scale down, not use mt_number_of_types. We want to stress merging, too. So the total number of possible states should not that large. E.g. 8 states, with e.g. 4 flags and 2 stacks. ------------- PR Review: https://git.openjdk.org/jdk/pull/18289#pullrequestreview-2067634593 PR Review Comment: https://git.openjdk.org/jdk/pull/18289#discussion_r1611021414 PR Review Comment: https://git.openjdk.org/jdk/pull/18289#discussion_r1611024758 PR Review Comment: https://git.openjdk.org/jdk/pull/18289#discussion_r1610959890 PR Review Comment: https://git.openjdk.org/jdk/pull/18289#discussion_r1611037411 PR Review Comment: https://git.openjdk.org/jdk/pull/18289#discussion_r1611035446 PR Review Comment: https://git.openjdk.org/jdk/pull/18289#discussion_r1611038979 PR Review Comment: https://git.openjdk.org/jdk/pull/18289#discussion_r1611039692 PR Review Comment: https://git.openjdk.org/jdk/pull/18289#discussion_r1611067145 PR Review Comment: https://git.openjdk.org/jdk/pull/18289#discussion_r1611069569 PR Review Comment: https://git.openjdk.org/jdk/pull/18289#discussion_r1611091143 PR Review Comment: https://git.openjdk.org/jdk/pull/18289#discussion_r1611075904 PR Review Comment: https://git.openjdk.org/jdk/pull/18289#discussion_r1611095608 PR Review Comment: https://git.openjdk.org/jdk/pull/18289#discussion_r1611098358 From stuefe at openjdk.org Thu May 23 06:59:19 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 23 May 2024 06:59:19 GMT Subject: RFR: 8312132: Add tracking of multiple address spaces in NMT [v95] In-Reply-To: References: Message-ID: On Mon, 20 May 2024 12:03:28 GMT, Johan Sj?len wrote: >> Hi, >> >> This PR introduces a new abstraction to NMT, named `MemoryFileTracker`. Today, NMT does not track any memory outside of the virtual memory address space. This means that if you allocated memory in something such as a memory-backed file and use `mmap` to map into that memory, then you'll have trouble reporting this to NMT. This is the situation that ZGC is in, and that is what this patch attempts to fix. >> >> ## `MemoryFileTracker` >> >> The `MemoryFileTracker` adds the ability of adding new virtual memory address spaces to NMT and committing memory to these, the basic API is: >> >> ```c++ >> static MemoryFile* make_device(const char* descriptive_name); >> static void free_device(MemoryFile* device); >> >> static void allocate_memory(MemoryFile* device, size_t offset, size_t size, >> MEMFLAGS flag, const NativeCallStack& stack); >> static void free_memory(MemoryFile* device, size_t offset, size_t size); >> >> >> It is easiest to see how this is used by looking at what ZGC's `ZNMT` class does: >> >> ```c++ >> void ZNMT::reserve(zaddress_unsafe start, size_t size) { >> MemTracker::record_virtual_memory_reserve((address)start, size, CALLER_PC, mtJavaHeap); >> } >> void ZNMT::commit(zoffset offset, size_t size) { >> MemTracker::allocate_memory_in(ZNMT::_device, static_cast(offset), size, mtJavaHeap, CALLER_PC); >> } >> void ZNMT::uncommit(zoffset offset, size_t size) { >> MemTracker::free_memory_in(ZNMT::_device, (size_t)offset, size); >> } >> >> void ZNMT::map(zaddress_unsafe addr, size_t size, zoffset offset) { >> // NMT doesn't track mappings at the moment. >> } >> void ZNMT::unmap(zaddress_unsafe addr, size_t size) { >> // NMT doesn't track mappings at the moment. >> } >> >> >> As you can see, any mapping between reserved regions and device-allocated memory is not recorded in NMT. This means that in detailed mode you only get reserved regions printed for the reserved memory, the device-allocated memory is reported separately. When performing summary reporting any memory allocated via these devices is added to the corresponding `MEMFLAGS` as `committed` memory. >> >> This patch is also acting as a base on which we deploy multiple new backend ideas to NMT. These ideas are: >> >> 1. Implement VMA tracking using a balanced binary tree approach. Today's `VirtualMemoryTracker`'s usage of linked lists is slow and brittle, we'd like to move away from it. Our Treap-based approach in this patch gives a performance bo... > > Johan Sj?len has updated the pull request incrementally with four additional commits since the last revision: > > - Remove unused include > - Basic tests for NativeCallStackStorage > - Allow for passing in nr of buckets > - Remove friend-ness test/hotspot/gtest/nmt/test_nmt_treap.cpp line 81: > 79: } > 80: } > 81: All nodes are same-sized, no? So we don't have to track individual allocations. We can just count them. In the end, counter must be 0. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18289#discussion_r1607703860 From stuefe at openjdk.org Thu May 23 06:59:20 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 23 May 2024 06:59:20 GMT Subject: RFR: 8312132: Add tracking of multiple address spaces in NMT [v105] In-Reply-To: References: Message-ID: On Thu, 23 May 2024 04:40:31 GMT, Thomas Stuefe wrote: >> Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: >> >> Lower number of pages > > test/hotspot/gtest/nmt/test_vmatree.cpp line 71: > >> 69: NativeCallStackStorage::StackIndex si1 = ncs.push(stack1); >> 70: NativeCallStackStorage::StackIndex si2 = ncs.push(stack2); >> 71: > > Please make these auto functions normal conventional functions. I really dislike that style. > > It has real disadvantages. For example, no IDE I know can resolve a call graph across them, or even into them. E.g., in CDT, one of the most capable C++ IDEs, the call graph for VMATreeTest::in_type_of gives me .... nothing. I need to do a dumb full text search to find the call sites. And if the term to search for is very generic, I am out of luck. I know several developers that rely heavily on call graph search in CDT, I certainly do. > > New techniques should serve a purpose. Lambdas as replacement for our old Closures makes sense, since they bring benefit (avoiding runtime polymorphy). But here I don't see the point. Also, please split all of these scopes up into individual TESTs. A finer granularity allows you to reproduce individual TESTS without having to sit through 20+ irrelevant ones. It also makes for cleaner code. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18289#discussion_r1610969221 From ayang at openjdk.org Thu May 23 07:04:09 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Thu, 23 May 2024 07:04:09 GMT Subject: RFR: 8332676: Remove unused BarrierSetAssembler::incr_allocated_bytes [v3] In-Reply-To: <_OEn7BK9EykA6z5ARry8eu17tV3z3bS0jKZyw9huz74=.92b76c58-0941-41e5-86de-b430d902e8fd@github.com> References: <_OEn7BK9EykA6z5ARry8eu17tV3z3bS0jKZyw9huz74=.92b76c58-0941-41e5-86de-b430d902e8fd@github.com> Message-ID: <3s9Vk5eRFtNXVnxHNMUiOAyEwtRYKXefIlyvGgWN9Kc=.3529fc76-38fb-46b6-87e7-bce16842cecf@github.com> On Wed, 22 May 2024 14:27:16 GMT, Albert Mingkun Yang wrote: >> Trivial removing dead code. > > Albert Mingkun Yang has updated the pull request incrementally with one additional commit since the last revision: > > review Thanks for review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19345#issuecomment-2126369782 From ayang at openjdk.org Thu May 23 07:04:10 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Thu, 23 May 2024 07:04:10 GMT Subject: Integrated: 8332676: Remove unused BarrierSetAssembler::incr_allocated_bytes In-Reply-To: References: Message-ID: On Wed, 22 May 2024 09:23:46 GMT, Albert Mingkun Yang wrote: > Trivial removing dead code. This pull request has now been integrated. Changeset: 1e5a2780 Author: Albert Mingkun Yang URL: https://git.openjdk.org/jdk/commit/1e5a2780d9cc8e73ce65bdccb98c1808aadd0784 Stats: 130 lines in 13 files changed: 0 ins; 128 del; 2 mod 8332676: Remove unused BarrierSetAssembler::incr_allocated_bytes Reviewed-by: tschatzl, kbarrett ------------- PR: https://git.openjdk.org/jdk/pull/19345 From alanb at openjdk.org Thu May 23 07:06:05 2024 From: alanb at openjdk.org (Alan Bateman) Date: Thu, 23 May 2024 07:06:05 GMT Subject: RFR: 8328083: degrade virtual thread support for GetObjectMonitorUsage [v7] In-Reply-To: References: <-lAT5GzHVRrOUJhhMLfV5CkkPA3DHHDUZfdE7CBOcHg=.ecb91a2d-82c2-4e94-a1f6-f84d7a8c2a87@github.com> Message-ID: <1vdMcmqpKuTGl0jRRa4_hI3ui2UZtZRHHZRsXzstuHc=.be4a18d0-03b9-451d-afff-d9da94539a1f@github.com> On Wed, 15 May 2024 20:29:17 GMT, Serguei Spitsyn wrote: >> The fix is to degrade virtual threads support in the JVM TI `GetObjectMonitorUsage` function so that it is specified to only return an owner when the owner is a platform thread. Also, virtual threads are not listed in the both `waiters` and `notify_waiters` lists returned in the `jvmtiMonitorUsage` structure. Java 19 re-specified a number of JVMTI functions and events for virtual threads, we missed this one. >> >> The main motivation for degrading it now is that the object monitor implementation is being updated to allow virtual threads unmount while owning monitors. It would add overhead to record monitor usage when freezing/unmount, overhead that couldn't be tied to a JVMTI capability as the capability can be enabled at any time. >> >> `GetObjectMonitorUsage` was broken for 20+ years ([8247972](https://bugs.openjdk.org/browse/JDK-8247972)) without bug reports so it seems unlikely that the function is widely used. Degrading it to only return an owner when the owner is a platform thread has no compatibility impact for tooling that uses it in conjunction with `HotSpot` thread dumps or `ThreadMXBean`. >> >> One other point about `GetObjectMonitorUsage` is that it pre-dates j.u.concurrent in Java 5 so it can't be used to get a full picture of the lock usage in a program. >> >> The specs of the impacted `JDWP ObjectReference.MonitorInfo` command and the JDI `ObjectReference` `ownerThread()`, `waitingThreads()` and `entryCount()` methods are updated to match the JVM TI spec. >> >> Also, please, review the related CSR and Release Note: >> - CSR: [8331422](https://bugs.openjdk.org/browse/JDK-8331422): degrade virtual thread support for GetObjectMonitorUsage >> - RN: [8331465](https://bugs.openjdk.org/browse/JDK-8331465): Release Note: degrade virtual thread support for GetObjectMonitorUsage >> >> Testing: >> - tested impacted and updated tests locally >> - tested with mach5 tiers 1-6 > > Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: > > review: UNDO: removed incorrect simplification that removed a tmp local skipped Spec + code changes look okay. I didn't study the tests closely but I see you have updated the test coverage to ensure that virtual threads are not reported as the owner, waiting to enter, or waiting to be notified. ------------- Marked as reviewed by alanb (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19030#pullrequestreview-2073032052 From mbaesken at openjdk.org Thu May 23 07:48:30 2024 From: mbaesken at openjdk.org (Matthias Baesken) Date: Thu, 23 May 2024 07:48:30 GMT Subject: RFR: 8332720: ubsan: instanceKlass.cpp:3550:76: runtime error: member call on null pointer of type 'struct Array' [v2] In-Reply-To: References: Message-ID: <-visGzw1GeoT6b35zj5l6Ii-m1BpS_slOuVOlVgWmqs=.679e3dd7-f22d-44eb-9cd3-24352ef82f92@github.com> > When running hs :tier1 tests, with ubsan enabled (configure flag --enable-ubsan), in test runtime/CommandLine/PrintClasses_id0.jtr > this error is reported ; seems we miss a nullptr check that is in place at similar coding in instanceKlass.cpp . > > /jdk/src/hotspot/share/oops/instanceKlass.cpp:3550:76: runtime error: member call on null pointer of type 'struct Array' > #0 0x7fed098d2362 in InstanceKlass::print_on(outputStream*) const /jdk/src/hotspot/share/oops/instanceKlass.cpp:3550 > #1 0x7fed09897cdc in PrintClassClosure::do_klass(Klass*) /jdk/src/hotspot/share/oops/instanceKlass.cpp:2228 > #2 0x7fed08bed334 in ClassLoaderData::classes_do(KlassClosure*) /jdk/src/hotspot/share/classfile/classLoaderData.cpp:387 > #3 0x7fed08c06403 in ClassLoaderDataGraph::classes_do(KlassClosure*) /jdk/src/hotspot/share/classfile/classLoaderDataGraph.cpp:303 > #4 0x7fed09108768 in VM_PrintClasses::doit() /jdk/src/hotspot/share/services/diagnosticCommand.cpp:989 > #5 0x7fed0b776c38 in VM_Operation::evaluate() /jdk/src/hotspot/share/runtime/vmOperations.cpp:75 > #6 0x7fed0b7af23e in VMThread::evaluate_operation(VM_Operation*) /jdk/src/hotspot/share/runtime/vmThread.cpp:283 > #7 0x7fed0b7b0a67 in VMThread::inner_execute(VM_Operation*) /jdk/src/hotspot/share/runtime/vmThread.cpp:427 > #8 0x7fed0b7b1681 in VMThread::loop() /jdk/src/hotspot/share/runtime/vmThread.cpp:493 > #9 0x7fed0b7b1681 in VMThread::loop() /jdk/src/hotspot/share/runtime/vmThread.cpp:478 > #10 0x7fed0b7b182d in VMThread::run() /jdk/src/hotspot/share/runtime/vmThread.cpp:177 > #11 0x7fed0b4e8b0f in Thread::call_run() /jdk/src/hotspot/share/runtime/thread.cpp:225 > #12 0x7fed0a9dae75 in thread_native_entry /jdk/src/hotspot/os/linux/os_linux.cpp:846 > #13 0x7fed10fed6e9 in start_thread (/lib64/libpthread.so.0+0xa6e9) (BuildId: 2f8d3c2d0f4d7888c2598d2ff6356537f5708a73) > #14 0x7fed1051550e in clone (/lib64/libc.so.6+0x11850e) (BuildId: f732026552f6adff988b338e92d466bc81a01c37) Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: adjust check ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19349/files - new: https://git.openjdk.org/jdk/pull/19349/files/837757c9..56b0907b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19349&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19349&range=00-01 Stats: 7 lines in 1 file changed: 0 ins; 0 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/19349.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19349/head:pull/19349 PR: https://git.openjdk.org/jdk/pull/19349 From mbaesken at openjdk.org Thu May 23 07:51:06 2024 From: mbaesken at openjdk.org (Matthias Baesken) Date: Thu, 23 May 2024 07:51:06 GMT Subject: RFR: 8332720: ubsan: instanceKlass.cpp:3550:76: runtime error: member call on null pointer of type 'struct Array' [v2] In-Reply-To: <-visGzw1GeoT6b35zj5l6Ii-m1BpS_slOuVOlVgWmqs=.679e3dd7-f22d-44eb-9cd3-24352ef82f92@github.com> References: <-visGzw1GeoT6b35zj5l6Ii-m1BpS_slOuVOlVgWmqs=.679e3dd7-f22d-44eb-9cd3-24352ef82f92@github.com> Message-ID: On Thu, 23 May 2024 07:48:30 GMT, Matthias Baesken wrote: >> When running hs :tier1 tests, with ubsan enabled (configure flag --enable-ubsan), in test runtime/CommandLine/PrintClasses_id0.jtr >> this error is reported ; seems we miss a nullptr check that is in place at similar coding in instanceKlass.cpp . >> >> /jdk/src/hotspot/share/oops/instanceKlass.cpp:3550:76: runtime error: member call on null pointer of type 'struct Array' >> #0 0x7fed098d2362 in InstanceKlass::print_on(outputStream*) const /jdk/src/hotspot/share/oops/instanceKlass.cpp:3550 >> #1 0x7fed09897cdc in PrintClassClosure::do_klass(Klass*) /jdk/src/hotspot/share/oops/instanceKlass.cpp:2228 >> #2 0x7fed08bed334 in ClassLoaderData::classes_do(KlassClosure*) /jdk/src/hotspot/share/classfile/classLoaderData.cpp:387 >> #3 0x7fed08c06403 in ClassLoaderDataGraph::classes_do(KlassClosure*) /jdk/src/hotspot/share/classfile/classLoaderDataGraph.cpp:303 >> #4 0x7fed09108768 in VM_PrintClasses::doit() /jdk/src/hotspot/share/services/diagnosticCommand.cpp:989 >> #5 0x7fed0b776c38 in VM_Operation::evaluate() /jdk/src/hotspot/share/runtime/vmOperations.cpp:75 >> #6 0x7fed0b7af23e in VMThread::evaluate_operation(VM_Operation*) /jdk/src/hotspot/share/runtime/vmThread.cpp:283 >> #7 0x7fed0b7b0a67 in VMThread::inner_execute(VM_Operation*) /jdk/src/hotspot/share/runtime/vmThread.cpp:427 >> #8 0x7fed0b7b1681 in VMThread::loop() /jdk/src/hotspot/share/runtime/vmThread.cpp:493 >> #9 0x7fed0b7b1681 in VMThread::loop() /jdk/src/hotspot/share/runtime/vmThread.cpp:478 >> #10 0x7fed0b7b182d in VMThread::run() /jdk/src/hotspot/share/runtime/vmThread.cpp:177 >> #11 0x7fed0b4e8b0f in Thread::call_run() /jdk/src/hotspot/share/runtime/thread.cpp:225 >> #12 0x7fed0a9dae75 in thread_native_entry /jdk/src/hotspot/os/linux/os_linux.cpp:846 >> #13 0x7fed10fed6e9 in start_thread (/lib64/libpthread.so.0+0xa6e9) (BuildId: 2f8d3c2d0f4d7888c2598d2ff6356537f5708a73) >> #14 0x7fed1051550e in clone (/lib64/libc.so.6+0x11850e) (BuildId: f732026552f6adff988b338e92d466bc81a01c37) > > Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: > > adjust check Hi Coleen and Stefan, I adjusted/moved the if check . > Aside, I thought there was supposed to be a blank in between concatenated strings because some compiler complained. It is the same at a lot of places in the file so I did not change it here . ------------- PR Comment: https://git.openjdk.org/jdk/pull/19349#issuecomment-2126453795 From jsjolen at openjdk.org Thu May 23 07:51:33 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Thu, 23 May 2024 07:51:33 GMT Subject: RFR: 8331193: Return references when possible in GrowableArray [v8] In-Reply-To: References: Message-ID: <3zmMqPCsXnFQ4vcYfJQN7N-FiRychd24CTP3Wjg-B4E=.a75c79cd-3136-4bb0-ba43-23b678f744f1@github.com> > Hi, > > This PR introduces the possibility of using references more often when using GrowableArray, where as previously this was only possible when using the `at()` method. This lets us avoid copying and redundant method calls and makes the API more streamlined. After the patch, we can use `at_grow` just like `at` works. The same goes for `top`, `first`, and `last`. > > > Some example code: > ```c++ > // Before this patch this worked: > GrowableArray arr(8,8,-1); // Pre-fill with 8 -1s > int& x = arr.at(7); > if (x == -1) { > x = 2; > } > assert(arr.at(7) == 2, "this holds"); > // but this was forbidden > int& x = arr.at_grow(9, -1); // Compilation error! at_grow returns E, not E& > // so we had to do > int x = arr.at_grow(9, -1); > if (x == -1) { > arr.at_put(9, 2); > } > > > Thanks. Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: Update test/hotspot/gtest/utilities/test_growableArray.cpp Co-authored-by: Emanuel Peter ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18975/files - new: https://git.openjdk.org/jdk/pull/18975/files/f71e2ce2..9042a111 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18975&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18975&range=06-07 Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/18975.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18975/head:pull/18975 PR: https://git.openjdk.org/jdk/pull/18975 From stefank at openjdk.org Thu May 23 08:15:01 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Thu, 23 May 2024 08:15:01 GMT Subject: RFR: 8332720: ubsan: instanceKlass.cpp:3550:76: runtime error: member call on null pointer of type 'struct Array' [v2] In-Reply-To: <-visGzw1GeoT6b35zj5l6Ii-m1BpS_slOuVOlVgWmqs=.679e3dd7-f22d-44eb-9cd3-24352ef82f92@github.com> References: <-visGzw1GeoT6b35zj5l6Ii-m1BpS_slOuVOlVgWmqs=.679e3dd7-f22d-44eb-9cd3-24352ef82f92@github.com> Message-ID: On Thu, 23 May 2024 07:48:30 GMT, Matthias Baesken wrote: >> When running hs :tier1 tests, with ubsan enabled (configure flag --enable-ubsan), in test runtime/CommandLine/PrintClasses_id0.jtr >> this error is reported ; seems we miss a nullptr check that is in place at similar coding in instanceKlass.cpp . >> >> /jdk/src/hotspot/share/oops/instanceKlass.cpp:3550:76: runtime error: member call on null pointer of type 'struct Array' >> #0 0x7fed098d2362 in InstanceKlass::print_on(outputStream*) const /jdk/src/hotspot/share/oops/instanceKlass.cpp:3550 >> #1 0x7fed09897cdc in PrintClassClosure::do_klass(Klass*) /jdk/src/hotspot/share/oops/instanceKlass.cpp:2228 >> #2 0x7fed08bed334 in ClassLoaderData::classes_do(KlassClosure*) /jdk/src/hotspot/share/classfile/classLoaderData.cpp:387 >> #3 0x7fed08c06403 in ClassLoaderDataGraph::classes_do(KlassClosure*) /jdk/src/hotspot/share/classfile/classLoaderDataGraph.cpp:303 >> #4 0x7fed09108768 in VM_PrintClasses::doit() /jdk/src/hotspot/share/services/diagnosticCommand.cpp:989 >> #5 0x7fed0b776c38 in VM_Operation::evaluate() /jdk/src/hotspot/share/runtime/vmOperations.cpp:75 >> #6 0x7fed0b7af23e in VMThread::evaluate_operation(VM_Operation*) /jdk/src/hotspot/share/runtime/vmThread.cpp:283 >> #7 0x7fed0b7b0a67 in VMThread::inner_execute(VM_Operation*) /jdk/src/hotspot/share/runtime/vmThread.cpp:427 >> #8 0x7fed0b7b1681 in VMThread::loop() /jdk/src/hotspot/share/runtime/vmThread.cpp:493 >> #9 0x7fed0b7b1681 in VMThread::loop() /jdk/src/hotspot/share/runtime/vmThread.cpp:478 >> #10 0x7fed0b7b182d in VMThread::run() /jdk/src/hotspot/share/runtime/vmThread.cpp:177 >> #11 0x7fed0b4e8b0f in Thread::call_run() /jdk/src/hotspot/share/runtime/thread.cpp:225 >> #12 0x7fed0a9dae75 in thread_native_entry /jdk/src/hotspot/os/linux/os_linux.cpp:846 >> #13 0x7fed10fed6e9 in start_thread (/lib64/libpthread.so.0+0xa6e9) (BuildId: 2f8d3c2d0f4d7888c2598d2ff6356537f5708a73) >> #14 0x7fed1051550e in clone (/lib64/libc.so.6+0x11850e) (BuildId: f732026552f6adff988b338e92d466bc81a01c37) > > Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: > > adjust check Marked as reviewed by stefank (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/19349#pullrequestreview-2073204123 From jsjolen at openjdk.org Thu May 23 08:24:03 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Thu, 23 May 2024 08:24:03 GMT Subject: RFR: 8331193: Return references when possible in GrowableArray [v7] In-Reply-To: References: Message-ID: <3LnIqKmM0yFRNLuS843yv1m0RiJXkZNp69aCcH69erQ=.b027e887-0360-4b4f-aff2-05c7b7c1736d@github.com> On Thu, 23 May 2024 06:21:48 GMT, Emanuel Peter wrote: >> test/hotspot/gtest/utilities/test_growableArray.cpp line 672: >> >>> 670: x = 2; >>> 671: EXPECT_EQ(2, arr.at(9)); >>> 672: x = arr.top(); >> >> This is not using reference, right? I thought you can only use reference assignment when you declare the `x` variable. This here is using value, I think. > > Maybe that was your intention. But then you should have another test where you use `top` with reference. It wasn't my intention, I just fumbled the semantics when writing the tests, thanks for the catch :-). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18975#discussion_r1611232880 From jsjolen at openjdk.org Thu May 23 08:28:14 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Thu, 23 May 2024 08:28:14 GMT Subject: RFR: 8331193: Return references when possible in GrowableArray [v9] In-Reply-To: References: Message-ID: > Hi, > > This PR introduces the possibility of using references more often when using GrowableArray, where as previously this was only possible when using the `at()` method. This lets us avoid copying and redundant method calls and makes the API more streamlined. After the patch, we can use `at_grow` just like `at` works. The same goes for `top`, `first`, and `last`. > > > Some example code: > ```c++ > // Before this patch this worked: > GrowableArray arr(8,8,-1); // Pre-fill with 8 -1s > int& x = arr.at(7); > if (x == -1) { > x = 2; > } > assert(arr.at(7) == 2, "this holds"); > // but this was forbidden > int& x = arr.at_grow(9, -1); // Compilation error! at_grow returns E, not E& > // so we had to do > int x = arr.at_grow(9, -1); > if (x == -1) { > arr.at_put(9, 2); > } > > > Thanks. Johan Sj?len has updated the pull request incrementally with two additional commits since the last revision: - Merge remote-tracking branch 'origin/return-reference' into return-reference - Use references when using top() ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18975/files - new: https://git.openjdk.org/jdk/pull/18975/files/9042a111..ff269e39 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18975&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18975&range=07-08 Stats: 4 lines in 1 file changed: 2 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/18975.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18975/head:pull/18975 PR: https://git.openjdk.org/jdk/pull/18975 From stuefe at openjdk.org Thu May 23 08:30:21 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 23 May 2024 08:30:21 GMT Subject: RFR: 8312132: Add tracking of multiple address spaces in NMT [v105] In-Reply-To: References: Message-ID: <7mvAVR2Qfa10hYSXXxaL1yXpq6qbvvXFtqu-9-unCCk=.3802b0a1-8bc6-4f89-844a-affa2bf1788b@github.com> On Thu, 23 May 2024 06:24:30 GMT, Thomas Stuefe wrote: >> Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: >> >> Lower number of pages > > test/hotspot/gtest/nmt/test_vmatree.cpp line 156: > >> 154: } >> 155: i++; >> 156: }); > > This doesn't really test the state, nor the stack. It also seems to be a lot of code for a single-use test. Similar in other tests. > > --- > > Proposal: since you have the recurring pattern of doing something with the tree, then checking its expected state, write a check function in the fixture (e.g. assert_tree_state() ) that does that. > > Then use it like this: > > // Committing in middle of reservation ends with a sequence of 4 nodes > TEST_VM_F(VMATreeTest, commit_in_middle_of_reservation) { > > ... reserve, commit in middle, then > > const AddressState expected_states[] = { > { 0, .... }, > { 25, .... }, > { 50, .... }, > { 100, .... } }; > assert_tree_state(expected_states, 4); > } > > > And to preserve your sanity when constructing AddressState with its many components, you can add a helper that allows that in a human-friendly form. > > For example: > A string that encodes flag, stack, and reserved/Commit state in three letters. > First letter: A-H, let this be one of 8 selected MEMFLAGS or - for mtNone > Second letter: a-d, let this be one of 4 selected stacks, or - for no stack/empty > Third letter: R or C , resreved, committed or none, or - for unreserved > Season to taste. > > Then, e.g. for reserving 0..100, and committing 25..50, you write: > > > const AddressState expected_states[] = { > makestate( 0, "---", "AaR"), > makestate(25, "AaR", "AaC"), > makestate(50, "AaC", "AaR"), > makestate( 0, "AaR", "---") }; > assert_tree_state(expected_states, 4); > > > Now you can write a bunch of corner-case tests without being too verbose, intent is immediately clear. As an added bonus, it checks for tree state *exactly*, e.g. which nodes are part of the tree, which are not, in which order, etc. > > If you reuse that string format for reserving etc, you could do this: > > > reserve_or_commit(0, 100, "AaR"); > reserve_or_commit(25, 50, "AaC"); > > > or just > > > do(0, 100, "AaR"); > do(25, 50, "AaC"); > > > From there one, one could even auto-build the expected state, but if too much automatism goes into the test, this increases the possibility for errors creeping in one never finds because they make the test go green. While going for a walk after writing this, I realized an alternative to a string would be just to define those AddressState directly as global constants. static const AddressState AaR = .... With e.g. 4 flags, 2 stacks and 3 states, this would come to 24 states. With a macro, this could even be easier. Up to you. If you put those into an array, you can later down in the random-tester-function just chose a state randomly from that array by random index. You don't have to roll the dice three times to select flag, stack and state. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18289#discussion_r1611239093 From djelinski at openjdk.org Thu May 23 09:16:22 2024 From: djelinski at openjdk.org (Daniel =?UTF-8?B?SmVsacWEc2tp?=) Date: Thu, 23 May 2024 09:16:22 GMT Subject: RFR: 8332724: x86 MacroAssembler may over-align code [v2] In-Reply-To: References: Message-ID: > The methods align32 and align64 are supposed to align the next instruction to the next 32 or 64 byte boundary using the minimum number of NOP bytes. However, when the target represented as a 32bit signed int is negative, the instructions generate 32 or 64 NOP bytes too many. This was observed in `jbyte_disjoint_arraycopy_avx3` on a Linux machine, where a single align32 invocation generated 63 bytes of NOPs. > > This PR addresses the problem by using bit operations to calculate the required number of bytes. > > Tier1-3 tests passed. > > On a side note, `align64` and `align32` instructions were meant for aligning data for use with zmm / ymm loads, but nowadays they are frequently used in places where `align(CodeEntryAlignment)` or `align(OptoLoopAlignment)` would be more appropriate. I can address that in a separate PR if you think it's worth fixing. Daniel Jeli?ski has updated the pull request incrementally with two additional commits since the last revision: - Explicit typecasts - Change to unsigned instead ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19353/files - new: https://git.openjdk.org/jdk/pull/19353/files/1786e1cb..d0220193 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19353&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19353&range=00-01 Stats: 15 lines in 4 files changed: 0 ins; 5 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/19353.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19353/head:pull/19353 PR: https://git.openjdk.org/jdk/pull/19353 From djelinski at openjdk.org Thu May 23 09:16:22 2024 From: djelinski at openjdk.org (Daniel =?UTF-8?B?SmVsacWEc2tp?=) Date: Thu, 23 May 2024 09:16:22 GMT Subject: RFR: 8332724: x86 MacroAssembler may over-align code [v2] In-Reply-To: References: Message-ID: On Wed, 22 May 2024 22:12:38 GMT, Dean Long wrote: >> Daniel Jeli?ski has updated the pull request incrementally with two additional commits since the last revision: >> >> - Explicit typecasts >> - Change to unsigned instead > > src/hotspot/cpu/x86/macroAssembler_x86.cpp line 1166: > >> 1164: } >> 1165: >> 1166: void MacroAssembler::align(int modulus, int target) { > > How about making both parameters unsigned? > And callers could be changed to something like: > > align(64, (uint)(uintptr_t)pc() & 63); Good idea. It also fixes a couple of conversion warnings. Updated. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19353#discussion_r1611321244 From sspitsyn at openjdk.org Thu May 23 09:20:05 2024 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Thu, 23 May 2024 09:20:05 GMT Subject: RFR: 8328083: degrade virtual thread support for GetObjectMonitorUsage [v7] In-Reply-To: References: <-lAT5GzHVRrOUJhhMLfV5CkkPA3DHHDUZfdE7CBOcHg=.ecb91a2d-82c2-4e94-a1f6-f84d7a8c2a87@github.com> Message-ID: On Wed, 15 May 2024 20:29:17 GMT, Serguei Spitsyn wrote: >> The fix is to degrade virtual threads support in the JVM TI `GetObjectMonitorUsage` function so that it is specified to only return an owner when the owner is a platform thread. Also, virtual threads are not listed in the both `waiters` and `notify_waiters` lists returned in the `jvmtiMonitorUsage` structure. Java 19 re-specified a number of JVMTI functions and events for virtual threads, we missed this one. >> >> The main motivation for degrading it now is that the object monitor implementation is being updated to allow virtual threads unmount while owning monitors. It would add overhead to record monitor usage when freezing/unmount, overhead that couldn't be tied to a JVMTI capability as the capability can be enabled at any time. >> >> `GetObjectMonitorUsage` was broken for 20+ years ([8247972](https://bugs.openjdk.org/browse/JDK-8247972)) without bug reports so it seems unlikely that the function is widely used. Degrading it to only return an owner when the owner is a platform thread has no compatibility impact for tooling that uses it in conjunction with `HotSpot` thread dumps or `ThreadMXBean`. >> >> One other point about `GetObjectMonitorUsage` is that it pre-dates j.u.concurrent in Java 5 so it can't be used to get a full picture of the lock usage in a program. >> >> The specs of the impacted `JDWP ObjectReference.MonitorInfo` command and the JDI `ObjectReference` `ownerThread()`, `waitingThreads()` and `entryCount()` methods are updated to match the JVM TI spec. >> >> Also, please, review the related CSR and Release Note: >> - CSR: [8331422](https://bugs.openjdk.org/browse/JDK-8331422): degrade virtual thread support for GetObjectMonitorUsage >> - RN: [8331465](https://bugs.openjdk.org/browse/JDK-8331465): Release Note: degrade virtual thread support for GetObjectMonitorUsage >> >> Testing: >> - tested impacted and updated tests locally >> - tested with mach5 tiers 1-6 > > Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: > > review: UNDO: removed incorrect simplification that removed a tmp local skipped Thank you for review, Alan! ------------- PR Comment: https://git.openjdk.org/jdk/pull/19030#issuecomment-2126628322 From amitkumar at openjdk.org Thu May 23 09:33:32 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Thu, 23 May 2024 09:33:32 GMT Subject: RFR: 8319947: Recursive lightweight locking: s390x implementation [v2] In-Reply-To: <_HXRejW4TcLfTYbXlRQUVyejaomiwx6HhObYfG1lX6E=.f89b29a0-4c2d-43e1-83be-d7bafd84816c@github.com> References: <_HXRejW4TcLfTYbXlRQUVyejaomiwx6HhObYfG1lX6E=.f89b29a0-4c2d-43e1-83be-d7bafd84816c@github.com> Message-ID: > s390x port for recursive locking. > > testing: > - [x] build fastdebug-vm > - [x] build slowdebug-vm > - [x] build release-vm > - [x] build optimized-vm > - [x] ./test/jdk/java/util/concurrent (fastdebug-vm) > - [x] with C1 > - [x] with C2 > - [x] with interpreter > - [x] ./test/jdk/java/util/concurrent (release-vm) > - [x] with C1 > - [x] with C2 > - [x] with interpreter > - [x] ./test/jdk/java/util/concurrent (slowdebug-vm) > - [x] with C1 > - [x] with C2 > - [x] with interpreter > - [x] tier1 with fastdebug-vm > - [x] tier1 with slowdebug-vm > - [x] tier1 with release-vm > > *BenchMarks*: > > Results from Performance LPARs : > > > Locking Mode = 1 (without Patch) > > Benchmark (innerCount) Mode Cnt Score Error Units > LockUnlock.testContendedLock 100 avgt 12 5.144 ? 0.035 ns/op > LockUnlock.testRecursiveLockUnlock 100 avgt 12 3824.742 ? 89.475 ns/op > LockUnlock.testRecursiveSynchronization 100 avgt 12 25.348 ? 0.559 ns/op > LockUnlock.testSerialLockUnlock 100 avgt 12 466.629 ? 3.036 ns/op > LockUnlock.testSimpleLockUnlock 100 avgt 12 468.532 ? 1.793 ns/op > Finished running test 'micro:vm.lang.LockUnlock' > > Locking Mode = 1 (with patch) > > Benchmark (innerCount) Mode Cnt Score Error Units > LockUnlock.testContendedLock 100 avgt 12 5.146 ? 0.027 ns/op > LockUnlock.testRecursiveLockUnlock 100 avgt 12 3833.175 ? 75.863 ns/op > LockUnlock.testRecursiveSynchronization 100 avgt 12 25.206 ? 0.519 ns/op > LockUnlock.testSerialLockUnlock 100 avgt 12 473.973 ? 2.103 ns/op > LockUnlock.testSimpleLockUnlock 100 avgt 12 470.749 ? 2.229 ns/op > Finished running test 'micro:vm.lang.LockUnlock' > > > > > Locking Mode = 2 (without Patch) > > Benchmark (innerCount) Mode Cnt Score Error Units > LockUnlock.testContendedLock 100 avgt 12 4.688 ? 0.051 ns/op > LockUnlock.testRecursiveLockUnlock 100 avgt 12 12800.544 ? 92.265 ns/op > LockUnlock.testRecursiveSynchronization 100 avgt 12 26.486 ? 2.229 ns/op > LockUnlock.testSerialLockUnlock 100 avgt 12 424.499 ? 0.416 ns/op > LockUnlock.testSimpleLockUnlock 100 avgt 12 424.241 ? 0.840 ns/op > Finished running test 'micro:vm.lang.Lo... Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: suggestions from Axel ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18878/files - new: https://git.openjdk.org/jdk/pull/18878/files/2cd05782..93826a09 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18878&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18878&range=00-01 Stats: 45 lines in 2 files changed: 3 ins; 19 del; 23 mod Patch: https://git.openjdk.org/jdk/pull/18878.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18878/head:pull/18878 PR: https://git.openjdk.org/jdk/pull/18878 From amitkumar at openjdk.org Thu May 23 09:56:27 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Thu, 23 May 2024 09:56:27 GMT Subject: RFR: 8319947: Recursive lightweight locking: s390x implementation [v3] In-Reply-To: <_HXRejW4TcLfTYbXlRQUVyejaomiwx6HhObYfG1lX6E=.f89b29a0-4c2d-43e1-83be-d7bafd84816c@github.com> References: <_HXRejW4TcLfTYbXlRQUVyejaomiwx6HhObYfG1lX6E=.f89b29a0-4c2d-43e1-83be-d7bafd84816c@github.com> Message-ID: > s390x port for recursive locking. > > testing: > - [x] build fastdebug-vm > - [x] build slowdebug-vm > - [x] build release-vm > - [x] build optimized-vm > - [x] ./test/jdk/java/util/concurrent (fastdebug-vm) > - [x] with C1 > - [x] with C2 > - [x] with interpreter > - [x] ./test/jdk/java/util/concurrent (release-vm) > - [x] with C1 > - [x] with C2 > - [x] with interpreter > - [x] ./test/jdk/java/util/concurrent (slowdebug-vm) > - [x] with C1 > - [x] with C2 > - [x] with interpreter > - [x] tier1 with fastdebug-vm > - [x] tier1 with slowdebug-vm > - [x] tier1 with release-vm > > *BenchMarks*: > > Results from Performance LPARs : > > > Locking Mode = 1 (without Patch) > > Benchmark (innerCount) Mode Cnt Score Error Units > LockUnlock.testContendedLock 100 avgt 12 5.144 ? 0.035 ns/op > LockUnlock.testRecursiveLockUnlock 100 avgt 12 3824.742 ? 89.475 ns/op > LockUnlock.testRecursiveSynchronization 100 avgt 12 25.348 ? 0.559 ns/op > LockUnlock.testSerialLockUnlock 100 avgt 12 466.629 ? 3.036 ns/op > LockUnlock.testSimpleLockUnlock 100 avgt 12 468.532 ? 1.793 ns/op > Finished running test 'micro:vm.lang.LockUnlock' > > Locking Mode = 1 (with patch) > > Benchmark (innerCount) Mode Cnt Score Error Units > LockUnlock.testContendedLock 100 avgt 12 5.146 ? 0.027 ns/op > LockUnlock.testRecursiveLockUnlock 100 avgt 12 3833.175 ? 75.863 ns/op > LockUnlock.testRecursiveSynchronization 100 avgt 12 25.206 ? 0.519 ns/op > LockUnlock.testSerialLockUnlock 100 avgt 12 473.973 ? 2.103 ns/op > LockUnlock.testSimpleLockUnlock 100 avgt 12 470.749 ? 2.229 ns/op > Finished running test 'micro:vm.lang.LockUnlock' > > > > > Locking Mode = 2 (without Patch) > > Benchmark (innerCount) Mode Cnt Score Error Units > LockUnlock.testContendedLock 100 avgt 12 4.688 ? 0.051 ns/op > LockUnlock.testRecursiveLockUnlock 100 avgt 12 12800.544 ? 92.265 ns/op > LockUnlock.testRecursiveSynchronization 100 avgt 12 26.486 ? 2.229 ns/op > LockUnlock.testSerialLockUnlock 100 avgt 12 424.499 ? 0.416 ns/op > LockUnlock.testSimpleLockUnlock 100 avgt 12 424.241 ? 0.840 ns/op > Finished running test 'micro:vm.lang.Lo... Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: revert DiagnoseSyncOnValueBasedClasses changes from c1 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18878/files - new: https://git.openjdk.org/jdk/pull/18878/files/93826a09..d91259bb Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18878&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18878&range=01-02 Stats: 4 lines in 1 file changed: 0 ins; 2 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/18878.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18878/head:pull/18878 PR: https://git.openjdk.org/jdk/pull/18878 From rehn at openjdk.org Thu May 23 10:55:35 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Thu, 23 May 2024 10:55:35 GMT Subject: RFR: 8332265: RISC-V: Materialize pointers faster by using a temp register [v6] In-Reply-To: References: Message-ID: > Hi, please consider! > > Materializing a 48-bit pointer, using an additional register, we can do with: > lui + lui + slli + add + addi > This 15% faster both on VF2 and in CPU models, compared to movptr(). > > As we often materialize during calls there is free registers. > > I have choose just a few spot to use it, many more can use. > E.g. la() with tmp register can use li48 instead of movptr. > > Running tests now (so far so good), as if I screwed up IC calls it should be seen fast. > And benchmarks when hardware is free. Robbin Ehn has updated the pull request incrementally with two additional commits since the last revision: - Fixed more comments - Fixed comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19246/files - new: https://git.openjdk.org/jdk/pull/19246/files/9134d4e8..cf4c3066 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19246&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19246&range=04-05 Stats: 4 lines in 2 files changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/19246.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19246/head:pull/19246 PR: https://git.openjdk.org/jdk/pull/19246 From luhenry at openjdk.org Thu May 23 10:55:36 2024 From: luhenry at openjdk.org (Ludovic Henry) Date: Thu, 23 May 2024 10:55:36 GMT Subject: RFR: 8332265: RISC-V: Materialize pointers faster by using a temp register [v5] In-Reply-To: References: Message-ID: On Wed, 22 May 2024 18:41:16 GMT, Robbin Ehn wrote: >> Hi, please consider! >> >> Materializing a 48-bit pointer, using an additional register, we can do with: >> lui + lui + slli + add + addi >> This 15% faster both on VF2 and in CPU models, compared to movptr(). >> >> As we often materialize during calls there is free registers. >> >> I have choose just a few spot to use it, many more can use. >> E.g. la() with tmp register can use li48 instead of movptr. >> >> Running tests now (so far so good), as if I screwed up IC calls it should be seen fast. >> And benchmarks when hardware is free. > > Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains ten commits: > > - Merge branch 'master' into 8332265 > - More review comments > - Review changes > - Merge branch 'master' into 8332265 > - Merge branch 'master' into 8332265 > - Small review update > - li48 -> movptr > - Merge branch 'master' into 8332265 > - li48 src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 3734: > 3732: > 3733: int MacroAssembler::static_call_stub_size() { > 3734: // (lui, addi, slli, addi, slli, addi) + (lui + lui + ssli + add) + jalr Instead of `ssli`, shouldn't it be `slli`? src/hotspot/cpu/riscv/nativeInst_riscv.hpp line 402: > 400: } else if (is_movptr2_at(instruction_address())) { > 401: if (is_addi_at(addr_at(movptr2_instruction_size - NativeInstruction::instruction_size))) { > 402: // Assume: lui, addi, slli, addi, slli, addi If it's a `movptr2` here, the `Assume: lui, addi, slli, addi, slli, addi` comment seems wrong. Same at https://github.com/openjdk/jdk/pull/19246/files#diff-cadab323d5b577bec017d8ed262bff9d2318e38c0a62afe567050e86ff62cbb9R405 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19246#discussion_r1611455331 PR Review Comment: https://git.openjdk.org/jdk/pull/19246#discussion_r1611459701 From rehn at openjdk.org Thu May 23 10:55:36 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Thu, 23 May 2024 10:55:36 GMT Subject: RFR: 8332265: RISC-V: Materialize pointers faster by using a temp register [v5] In-Reply-To: References: Message-ID: On Thu, 23 May 2024 10:46:15 GMT, Ludovic Henry wrote: >> Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains ten commits: >> >> - Merge branch 'master' into 8332265 >> - More review comments >> - Review changes >> - Merge branch 'master' into 8332265 >> - Merge branch 'master' into 8332265 >> - Small review update >> - li48 -> movptr >> - Merge branch 'master' into 8332265 >> - li48 > > src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 3734: > >> 3732: >> 3733: int MacroAssembler::static_call_stub_size() { >> 3734: // (lui, addi, slli, addi, slli, addi) + (lui + lui + ssli + add) + jalr > > Instead of `ssli`, shouldn't it be `slli`? Yes, for some reason I keep writing ssli instead slli :) > src/hotspot/cpu/riscv/nativeInst_riscv.hpp line 402: > >> 400: } else if (is_movptr2_at(instruction_address())) { >> 401: if (is_addi_at(addr_at(movptr2_instruction_size - NativeInstruction::instruction_size))) { >> 402: // Assume: lui, addi, slli, addi, slli, addi > > If it's a `movptr2` here, the `Assume: lui, addi, slli, addi, slli, addi` comment seems wrong. Same at https://github.com/openjdk/jdk/pull/19246/files#diff-cadab323d5b577bec017d8ed262bff9d2318e38c0a62afe567050e86ff62cbb9R405 Fixed ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19246#discussion_r1611458151 PR Review Comment: https://git.openjdk.org/jdk/pull/19246#discussion_r1611462685 From rehn at openjdk.org Thu May 23 10:55:36 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Thu, 23 May 2024 10:55:36 GMT Subject: RFR: 8332265: RISC-V: Materialize pointers faster by using a temp register [v5] In-Reply-To: References: Message-ID: On Thu, 23 May 2024 10:48:45 GMT, Robbin Ehn wrote: >> src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 3734: >> >>> 3732: >>> 3733: int MacroAssembler::static_call_stub_size() { >>> 3734: // (lui, addi, slli, addi, slli, addi) + (lui + lui + ssli + add) + jalr >> >> Instead of `ssli`, shouldn't it be `slli`? > > Yes, for some reason I keep writing ssli instead slli :) I found one more :) Pushed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19246#discussion_r1611459873 From luhenry at openjdk.org Thu May 23 10:55:36 2024 From: luhenry at openjdk.org (Ludovic Henry) Date: Thu, 23 May 2024 10:55:36 GMT Subject: RFR: 8332265: RISC-V: Materialize pointers faster by using a temp register [v6] In-Reply-To: References: Message-ID: On Thu, 23 May 2024 10:52:35 GMT, Robbin Ehn wrote: >> Hi, please consider! >> >> Materializing a 48-bit pointer, using an additional register, we can do with: >> lui + lui + slli + add + addi >> This 15% faster both on VF2 and in CPU models, compared to movptr(). >> >> As we often materialize during calls there is free registers. >> >> I have choose just a few spot to use it, many more can use. >> E.g. la() with tmp register can use li48 instead of movptr. >> >> Running tests now (so far so good), as if I screwed up IC calls it should be seen fast. >> And benchmarks when hardware is free. > > Robbin Ehn has updated the pull request incrementally with two additional commits since the last revision: > > - Fixed more comments > - Fixed comments src/hotspot/cpu/riscv/nativeInst_riscv.hpp line 386: > 384: void set_data(intptr_t x); > 385: > 386: void flush() { why the removal? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19246#discussion_r1611460178 From rehn at openjdk.org Thu May 23 10:55:36 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Thu, 23 May 2024 10:55:36 GMT Subject: RFR: 8332265: RISC-V: Materialize pointers faster by using a temp register [v6] In-Reply-To: References: Message-ID: On Thu, 23 May 2024 10:50:32 GMT, Ludovic Henry wrote: >> Robbin Ehn has updated the pull request incrementally with two additional commits since the last revision: >> >> - Fixed more comments >> - Fixed comments > > src/hotspot/cpu/riscv/nativeInst_riscv.hpp line 386: > >> 384: void set_data(intptr_t x); >> 385: >> 386: void flush() { > > why the removal? It was unused, and I had review comment in there. It was suggested to be removed instead of fixing the comment. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19246#discussion_r1611460936 From luhenry at openjdk.org Thu May 23 10:58:09 2024 From: luhenry at openjdk.org (Ludovic Henry) Date: Thu, 23 May 2024 10:58:09 GMT Subject: RFR: 8332265: RISC-V: Materialize pointers faster by using a temp register [v6] In-Reply-To: References: Message-ID: On Thu, 23 May 2024 10:55:35 GMT, Robbin Ehn wrote: >> Hi, please consider! >> >> Materializing a 48-bit pointer, using an additional register, we can do with: >> lui + lui + slli + add + addi >> This 15% faster both on VF2 and in CPU models, compared to movptr(). >> >> As we often materialize during calls there is free registers. >> >> I have choose just a few spot to use it, many more can use. >> E.g. la() with tmp register can use li48 instead of movptr. >> >> Running tests now (so far so good), as if I screwed up IC calls it should be seen fast. >> And benchmarks when hardware is free. > > Robbin Ehn has updated the pull request incrementally with two additional commits since the last revision: > > - Fixed more comments > - Fixed comments Marked as reviewed by luhenry (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/19246#pullrequestreview-2073606238 From shade at openjdk.org Thu May 23 10:58:04 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 23 May 2024 10:58:04 GMT Subject: RFR: 8325821: [REDO] use "dmb.ishst+dmb.ishld" for release barrier [v2] In-Reply-To: <9h-ta3XTnzioy3Ghdeulm6FgZYDJb2y5mDdMLGw3oYc=.defe7ef1-15dd-451d-8b79-3688c1e7a1da@github.com> References: <9h-ta3XTnzioy3Ghdeulm6FgZYDJb2y5mDdMLGw3oYc=.defe7ef1-15dd-451d-8b79-3688c1e7a1da@github.com> Message-ID: On Thu, 23 May 2024 05:53:25 GMT, kuaiwei wrote: >> test/hotspot/gtest/aarch64/test_assembler_aarch64.cpp line 93: >> >>> 91: } >>> 92: >>> 93: TEST_VM(AssemblerAArch64, merge_dmb) { >> >> Given the previous experience with barrier merges that prompted the backout, I would prefer to have a more comprehensive test here, maybe an additional one. I am thinking something like the exhaustive combination of 4 back-to-back barriers of each of 5 types. This gives us 5^4 = 625 test cases, which I think is still manageable. > > Test is added as merge_dmb_all_kinds Right. I was implicitly thinking that we can do this without coding the explicit patterns into the test. As it stands now, it is hard to check that generated patterns are actually correct. Let me see if I can whip up a sample of what I had in mind. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19278#discussion_r1611466284 From rehn at openjdk.org Thu May 23 11:04:07 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Thu, 23 May 2024 11:04:07 GMT Subject: RFR: 8332265: RISC-V: Materialize pointers faster by using a temp register [v6] In-Reply-To: References: Message-ID: On Thu, 23 May 2024 10:55:35 GMT, Robbin Ehn wrote: >> Hi, please consider! >> >> Materializing a 48-bit pointer, using an additional register, we can do with: >> lui + lui + slli + add + addi >> This 15% faster both on VF2 and in CPU models, compared to movptr(). >> >> As we often materialize during calls there is free registers. >> >> I have choose just a few spot to use it, many more can use. >> E.g. la() with tmp register can use li48 instead of movptr. >> >> Running tests now (so far so good), as if I screwed up IC calls it should be seen fast. >> And benchmarks when hardware is free. > > Robbin Ehn has updated the pull request incrementally with two additional commits since the last revision: > > - Fixed more comments > - Fixed comments Thanks again! ------------- PR Comment: https://git.openjdk.org/jdk/pull/19246#issuecomment-2126825469 From liach at openjdk.org Thu May 23 11:28:01 2024 From: liach at openjdk.org (Chen Liang) Date: Thu, 23 May 2024 11:28:01 GMT Subject: RFR: 8242888: Convert dynamic proxy to hidden classes In-Reply-To: References: Message-ID: On Thu, 23 May 2024 03:28:30 GMT, Chen Liang wrote: > Please review this change that convert dynamic proxies implementations to hidden classes, intended to target JDK 24. > > Summary: > 1. Adds new implementation while preserving the old implementation behind `-Djdk.reflect.useLegacyProxyImpl=true` in case there are compatibility issues. > 2. ClassLoader.defineClass0 takes a ClassLoader instance but discards it in native code; I updated native code to reuse that ClassLoader for Proxy support. > 3. ProxyGenerator changes mainly involve using Class data to pass Method list (accessed in a single condy) and removal of obsolete setup code generation. > > Testing: tier1 and tier2 have no related failures. > > Comment: Since #8278, Proxy has been converted to ClassFile API, and infrastructure has changed; now, the migration to hidden classes is much cleaner and has less impact, such as preserving ProtectionDomain and dynamic module without "anchor classes", and avoiding java.lang.invoke package. A CSR targeting 24 describing the compatibility concerns and behavioral differences is here, somehow not linked by skara: https://bugs.openjdk.org/browse/JDK-8332770 The incompatibilities were much greater in the previous iterations of this issue, such as in dynamic modules, serialization, and in proxy class protection domain. Now these aspects are addressed by this patch, the only real one left is the change in stack trace. Feel free to raise other incompatibilities you have discovered. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19356#issuecomment-2126869679 From alanb at openjdk.org Thu May 23 11:39:02 2024 From: alanb at openjdk.org (Alan Bateman) Date: Thu, 23 May 2024 11:39:02 GMT Subject: RFR: 8242888: Convert dynamic proxy to hidden classes In-Reply-To: References: Message-ID: On Thu, 23 May 2024 11:25:00 GMT, Chen Liang wrote: > A CSR targeting 24 describing the compatibility concerns and behavioral differences is here, somehow not linked by skara: https://bugs.openjdk.org/browse/JDK-8332770 The incompatibilities were much greater in the previous iterations of this issue, such as in dynamic modules, serialization, and in proxy class protection domain. Now these aspects are addressed by this patch, the only real one left is the change in stack trace. Feel free to raise other incompatibilities you have discovered. Thanks for starting a CSR. The CSR can't be low risk, it's medium at least, maybe high. If we are doing this change then doing it early in a release and putting into outreach to frameworks will be important. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19356#issuecomment-2126890063 From forax at univ-mlv.fr Thu May 23 11:43:38 2024 From: forax at univ-mlv.fr (Remi Forax) Date: Thu, 23 May 2024 13:43:38 +0200 (CEST) Subject: RFR: 8242888: Convert dynamic proxy to hidden classes In-Reply-To: References: Message-ID: <1193071457.29792413.1716464618330.JavaMail.zimbra@univ-eiffel.fr> ----- Original Message ----- > From: "Chen Liang" > To: "core-libs-dev" , "hotspot-dev" , kulla-dev at openjdk.org > Sent: Thursday, May 23, 2024 1:28:01 PM > Subject: Re: RFR: 8242888: Convert dynamic proxy to hidden classes > On Thu, 23 May 2024 03:28:30 GMT, Chen Liang wrote: > >> Please review this change that convert dynamic proxies implementations to hidden >> classes, intended to target JDK 24. >> >> Summary: >> 1. Adds new implementation while preserving the old implementation behind >> `-Djdk.reflect.useLegacyProxyImpl=true` in case there are compatibility issues. >> 2. ClassLoader.defineClass0 takes a ClassLoader instance but discards it in >> native code; I updated native code to reuse that ClassLoader for Proxy support. >> 3. ProxyGenerator changes mainly involve using Class data to pass Method list >> (accessed in a single condy) and removal of obsolete setup code generation. >> >> Testing: tier1 and tier2 have no related failures. >> >> Comment: Since #8278, Proxy has been converted to ClassFile API, and >> infrastructure has changed; now, the migration to hidden classes is much >> cleaner and has less impact, such as preserving ProtectionDomain and dynamic >> module without "anchor classes", and avoiding java.lang.invoke package. > > A CSR targeting 24 describing the compatibility concerns and behavioral > differences is here, somehow not linked by skara: > https://bugs.openjdk.org/browse/JDK-8332770 > The incompatibilities were much greater in the previous iterations of this > issue, such as in dynamic modules, serialization, and in proxy class protection > domain. Now these aspects are addressed by this patch, the only real one left > is the change in stack trace. Feel free to raise other incompatibilities you > have discovered. I wonder if instead of using hidden classes, we should not use usual named classes and add a new Lookup.defineClass() that takes a classData as parameter. This will solve the both the problem of the stacktrace and the problem of the roundtrip proxyClass != Class.forName(proxyClass.getName()). R?mi > > ------------- > > PR Comment: https://git.openjdk.org/jdk/pull/19356#issuecomment-2126869679 From dlong at openjdk.org Thu May 23 11:53:01 2024 From: dlong at openjdk.org (Dean Long) Date: Thu, 23 May 2024 11:53:01 GMT Subject: RFR: 8332724: x86 MacroAssembler may over-align code [v2] In-Reply-To: References: Message-ID: On Thu, 23 May 2024 09:16:22 GMT, Daniel Jeli?ski wrote: >> The methods align32 and align64 are supposed to align the next instruction to the next 32 or 64 byte boundary using the minimum number of NOP bytes. However, when the target represented as a 32bit signed int is negative, the instructions generate 32 or 64 NOP bytes too many. This was observed in `jbyte_disjoint_arraycopy_avx3` on a Linux machine, where a single align32 invocation generated 63 bytes of NOPs. >> >> This PR addresses the problem by using bit operations to calculate the required number of bytes. >> >> Tier1-3 tests passed. >> >> On a side note, `align64` and `align32` instructions were meant for aligning data for use with zmm / ymm loads, but nowadays they are frequently used in places where `align(CodeEntryAlignment)` or `align(OptoLoopAlignment)` would be more appropriate. I can address that in a separate PR if you think it's worth fixing. > > Daniel Jeli?ski has updated the pull request incrementally with two additional commits since the last revision: > > - Explicit typecasts > - Change to unsigned instead Looks good. ------------- Marked as reviewed by dlong (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19353#pullrequestreview-2073741788 From sspitsyn at openjdk.org Thu May 23 12:10:08 2024 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Thu, 23 May 2024 12:10:08 GMT Subject: Integrated: 8328083: degrade virtual thread support for GetObjectMonitorUsage In-Reply-To: <-lAT5GzHVRrOUJhhMLfV5CkkPA3DHHDUZfdE7CBOcHg=.ecb91a2d-82c2-4e94-a1f6-f84d7a8c2a87@github.com> References: <-lAT5GzHVRrOUJhhMLfV5CkkPA3DHHDUZfdE7CBOcHg=.ecb91a2d-82c2-4e94-a1f6-f84d7a8c2a87@github.com> Message-ID: On Wed, 1 May 2024 10:20:52 GMT, Serguei Spitsyn wrote: > The fix is to degrade virtual threads support in the JVM TI `GetObjectMonitorUsage` function so that it is specified to only return an owner when the owner is a platform thread. Also, virtual threads are not listed in the both `waiters` and `notify_waiters` lists returned in the `jvmtiMonitorUsage` structure. Java 19 re-specified a number of JVMTI functions and events for virtual threads, we missed this one. > > The main motivation for degrading it now is that the object monitor implementation is being updated to allow virtual threads unmount while owning monitors. It would add overhead to record monitor usage when freezing/unmount, overhead that couldn't be tied to a JVMTI capability as the capability can be enabled at any time. > > `GetObjectMonitorUsage` was broken for 20+ years ([8247972](https://bugs.openjdk.org/browse/JDK-8247972)) without bug reports so it seems unlikely that the function is widely used. Degrading it to only return an owner when the owner is a platform thread has no compatibility impact for tooling that uses it in conjunction with `HotSpot` thread dumps or `ThreadMXBean`. > > One other point about `GetObjectMonitorUsage` is that it pre-dates j.u.concurrent in Java 5 so it can't be used to get a full picture of the lock usage in a program. > > The specs of the impacted `JDWP ObjectReference.MonitorInfo` command and the JDI `ObjectReference` `ownerThread()`, `waitingThreads()` and `entryCount()` methods are updated to match the JVM TI spec. > > Also, please, review the related CSR and Release Note: > - CSR: [8331422](https://bugs.openjdk.org/browse/JDK-8331422): degrade virtual thread support for GetObjectMonitorUsage > - RN: [8331465](https://bugs.openjdk.org/browse/JDK-8331465): Release Note: degrade virtual thread support for GetObjectMonitorUsage > > Testing: > - tested impacted and updated tests locally > - tested with mach5 tiers 1-6 This pull request has now been integrated. Changeset: b890336e Author: Serguei Spitsyn URL: https://git.openjdk.org/jdk/commit/b890336e111ea8473ae49e9992bc2fd61e716792 Stats: 188 lines in 12 files changed: 131 ins; 2 del; 55 mod 8328083: degrade virtual thread support for GetObjectMonitorUsage Reviewed-by: cjplummer, alanb ------------- PR: https://git.openjdk.org/jdk/pull/19030 From jsjolen at openjdk.org Thu May 23 12:34:15 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Thu, 23 May 2024 12:34:15 GMT Subject: RFR: 8312132: Add tracking of multiple address spaces in NMT [v97] In-Reply-To: References: Message-ID: On Wed, 22 May 2024 17:18:10 GMT, Gerard Ziemski wrote: >> src/hotspot/share/nmt/nmtTreap.hpp line 236: >> >>> 234: } >>> 235: >>> 236: void upsert(const K& k, const V& v) { >> >> Could we rename this to simply `add()` instead of `upsert()` ? > > I would take `insert()` over `upsert()` too, if you don't like `add()` :-) I really prefer `upsert` over `insert`, the point is to show that it works for both *up*dating and in*sert*ing! It is pretty well established in database terminology. I can go with `add` if you're not comfortable with `upsert`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18289#discussion_r1611609405 From mdoerr at openjdk.org Thu May 23 12:39:02 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Thu, 23 May 2024 12:39:02 GMT Subject: RFR: 8332720: ubsan: instanceKlass.cpp:3550:76: runtime error: member call on null pointer of type 'struct Array' [v2] In-Reply-To: <-visGzw1GeoT6b35zj5l6Ii-m1BpS_slOuVOlVgWmqs=.679e3dd7-f22d-44eb-9cd3-24352ef82f92@github.com> References: <-visGzw1GeoT6b35zj5l6Ii-m1BpS_slOuVOlVgWmqs=.679e3dd7-f22d-44eb-9cd3-24352ef82f92@github.com> Message-ID: On Thu, 23 May 2024 07:48:30 GMT, Matthias Baesken wrote: >> When running hs :tier1 tests, with ubsan enabled (configure flag --enable-ubsan), in test runtime/CommandLine/PrintClasses_id0.jtr >> this error is reported ; seems we miss a nullptr check that is in place at similar coding in instanceKlass.cpp . >> >> /jdk/src/hotspot/share/oops/instanceKlass.cpp:3550:76: runtime error: member call on null pointer of type 'struct Array' >> #0 0x7fed098d2362 in InstanceKlass::print_on(outputStream*) const /jdk/src/hotspot/share/oops/instanceKlass.cpp:3550 >> #1 0x7fed09897cdc in PrintClassClosure::do_klass(Klass*) /jdk/src/hotspot/share/oops/instanceKlass.cpp:2228 >> #2 0x7fed08bed334 in ClassLoaderData::classes_do(KlassClosure*) /jdk/src/hotspot/share/classfile/classLoaderData.cpp:387 >> #3 0x7fed08c06403 in ClassLoaderDataGraph::classes_do(KlassClosure*) /jdk/src/hotspot/share/classfile/classLoaderDataGraph.cpp:303 >> #4 0x7fed09108768 in VM_PrintClasses::doit() /jdk/src/hotspot/share/services/diagnosticCommand.cpp:989 >> #5 0x7fed0b776c38 in VM_Operation::evaluate() /jdk/src/hotspot/share/runtime/vmOperations.cpp:75 >> #6 0x7fed0b7af23e in VMThread::evaluate_operation(VM_Operation*) /jdk/src/hotspot/share/runtime/vmThread.cpp:283 >> #7 0x7fed0b7b0a67 in VMThread::inner_execute(VM_Operation*) /jdk/src/hotspot/share/runtime/vmThread.cpp:427 >> #8 0x7fed0b7b1681 in VMThread::loop() /jdk/src/hotspot/share/runtime/vmThread.cpp:493 >> #9 0x7fed0b7b1681 in VMThread::loop() /jdk/src/hotspot/share/runtime/vmThread.cpp:478 >> #10 0x7fed0b7b182d in VMThread::run() /jdk/src/hotspot/share/runtime/vmThread.cpp:177 >> #11 0x7fed0b4e8b0f in Thread::call_run() /jdk/src/hotspot/share/runtime/thread.cpp:225 >> #12 0x7fed0a9dae75 in thread_native_entry /jdk/src/hotspot/os/linux/os_linux.cpp:846 >> #13 0x7fed10fed6e9 in start_thread (/lib64/libpthread.so.0+0xa6e9) (BuildId: 2f8d3c2d0f4d7888c2598d2ff6356537f5708a73) >> #14 0x7fed1051550e in clone (/lib64/libc.so.6+0x11850e) (BuildId: f732026552f6adff988b338e92d466bc81a01c37) > > Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: > > adjust check LGTM. ------------- Marked as reviewed by mdoerr (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19349#pullrequestreview-2073842683 From stuefe at openjdk.org Thu May 23 12:40:15 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 23 May 2024 12:40:15 GMT Subject: RFR: 8312132: Add tracking of multiple address spaces in NMT [v97] In-Reply-To: References: Message-ID: On Thu, 23 May 2024 12:31:04 GMT, Johan Sj?len wrote: >> I would take `insert()` over `upsert()` too, if you don't like `add()` :-) > > I really prefer `upsert` over `insert`, the point is to show that it works for both *up*dating and in*sert*ing! It is pretty well established in database terminology. I can go with `add` if you're not comfortable with `upsert`. +1 for upsert. Its an established term. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18289#discussion_r1611618728 From erikj at openjdk.org Thu May 23 12:40:07 2024 From: erikj at openjdk.org (Erik Joelsson) Date: Thu, 23 May 2024 12:40:07 GMT Subject: RFR: 8293980: Resolve CONSTANT_FieldRef at CDS dump time [v2] In-Reply-To: <7Kk3VF3qMR0IdptWLG1GGiWLbDm1BfCP2zBh7s6n3WE=.f245c5a2-cc27-4331-a401-1eaea41262ed@github.com> References: <6tYcoQdH8mEhbBRUoAcOi3Gue7Lz9qIjJh3GpcLKGGE=.0b286773-cc03-434e-88ca-2c9cc5efce67@github.com> <7Kk3VF3qMR0IdptWLG1GGiWLbDm1BfCP2zBh7s6n3WE=.f245c5a2-cc27-4331-a401-1eaea41262ed@github.com> Message-ID: <1vK1vlFb91j3ilWmbMRr3PHu14ZkI8fWwU-JV4CcsQ0=.03c848b2-5424-4a6e-89fb-4b774d293fc1@github.com> On Thu, 23 May 2024 03:35:19 GMT, Ioi Lam wrote: >> ### Overview >> >> This PR archives `CONSTANT_FieldRef` entries in the _resolved_ state when it's safe to do so. >> >> I.e., when a `CONSTANT_FieldRef` constant pool entry in class `A` refers to a *non-static* field `B.F`, >> - `B` is the same class as `A`; or >> - `B` is a supertype of `A`; or >> - `B` is one of the [vmClasses](https://github.com/openjdk/jdk/blob/3d4185a9ce482cc655a4c67f39cb2682b02ae4fe/src/hotspot/share/classfile/vmClasses.hpp), and `A` is loaded by the boot class loader. >> >> Under these conditions, it's guaranteed that whenever `A` tries to use this entry at runtime, `B` is guaranteed to have already been resolved in A's system dictionary, to the same value as resolved during dump time. >> >> Therefore, we can safely archive the `ResolvedFieldEntry` in class `A` that refers to `B.F`. >> >> (Note that we do not archive the `CONSTANT_FieldRef` entries for static fields, as the resolution of such entries can lead to class initialization at runtime. We plan to handle them in a future RFE.) >> >> ### Static CDS Archive >> >> This feature is implemented in three steps for static CDS archive dump: >> >> 1. At the end of the training run, `ClassListWriter` iterates over all loaded classes and writes the indices of their resolved `Class` and `FieldRef` constant pool entries into the classlist file, with the `@cp` prefix. E.g., the following means that the constant pool entries at indices 2, 19 and 106 were resolved during the training run: >> >> @cp java/util/Objects 2 19 106 >> >> 2. When creating the static CDS archive from the classlist file, `ClassListParser` processes the `@cp` entries and resolves all the indicated entries. >> >> 3. Inside the `ArchiveBuilder::make_klasses_shareable()` function, we iterate over all entries in all archived `ConstantPools`. When we see a _resolved_ entry that does not satisfy the safety requirements as stated in _Overview_, we revert it back to the unresolved state. >> >> ### Dynamic CDS Archive >> >> When dumping the dynamic CDS archive, `ClassListWriter` and `ClassListParser` are not used, so steps 1 and 2 are skipped. We only perform step 3 when the archive is being written. >> >> ### Limitations >> >> - For safety, we limit this optimization to only classes loaded by the boot, platform, and app class loaders. This may be relaxed in the future. >> - We archive only the constant pool entries that are actually resolved during the training run. We don't speculatively resolve other entries, as doing so may cause C2 to... > > Ioi Lam has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: > > - Merge branch 'master' into 8293980-resolve-fields-at-dumptime > - 8293980: Resolve CONSTANT_FieldRef at CDS dump time Build change looks good. ------------- Marked as reviewed by erikj (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19355#pullrequestreview-2073844617 From jsjolen at openjdk.org Thu May 23 12:44:13 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Thu, 23 May 2024 12:44:13 GMT Subject: RFR: 8312132: Add tracking of multiple address spaces in NMT [v105] In-Reply-To: References: Message-ID: On Wed, 22 May 2024 17:22:08 GMT, Gerard Ziemski wrote: > We claim that: > > > Our Treap-based approach in this patch gives a performance boost such that we see 25x better performance in a benchmark. > > May I ask how you ran it? I would like to be able to reproduce our claim. Sure, it was a while since I ran the benchmark. You're going to have to do a bit of work here, to get it working. You take this file: https://github.com/tstuefe/jdk/blob/6be830cd2e90a009effb016fbda2e92e1fca8247/test/hotspot/gtest/nmt/test_nmtvmadict.cpp#L1 And you port it to the VMATree instead of VMADict (or whatever it's called). Then you run it and look at output. You could also take one of the stress tests that I made, remove the verification calls, and run the same stress test for VirtualMemoryTracker. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18289#issuecomment-2127011316 From amitkumar at openjdk.org Thu May 23 12:49:16 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Thu, 23 May 2024 12:49:16 GMT Subject: RFR: 8319947: Recursive lightweight locking: s390x implementation [v4] In-Reply-To: <_HXRejW4TcLfTYbXlRQUVyejaomiwx6HhObYfG1lX6E=.f89b29a0-4c2d-43e1-83be-d7bafd84816c@github.com> References: <_HXRejW4TcLfTYbXlRQUVyejaomiwx6HhObYfG1lX6E=.f89b29a0-4c2d-43e1-83be-d7bafd84816c@github.com> Message-ID: > s390x port for recursive locking. > > testing: > - [x] build fastdebug-vm > - [x] build slowdebug-vm > - [x] build release-vm > - [x] build optimized-vm > - [x] ./test/jdk/java/util/concurrent (fastdebug-vm) > - [x] with C1 > - [x] with C2 > - [x] with interpreter > - [x] ./test/jdk/java/util/concurrent (release-vm) > - [x] with C1 > - [x] with C2 > - [x] with interpreter > - [x] ./test/jdk/java/util/concurrent (slowdebug-vm) > - [x] with C1 > - [x] with C2 > - [x] with interpreter > - [x] tier1 with fastdebug-vm > - [x] tier1 with slowdebug-vm > - [x] tier1 with release-vm > > *BenchMarks*: > > Results from Performance LPARs : > > > Locking Mode = 1 (without Patch) > > Benchmark (innerCount) Mode Cnt Score Error Units > LockUnlock.testContendedLock 100 avgt 12 5.144 ? 0.035 ns/op > LockUnlock.testRecursiveLockUnlock 100 avgt 12 3824.742 ? 89.475 ns/op > LockUnlock.testRecursiveSynchronization 100 avgt 12 25.348 ? 0.559 ns/op > LockUnlock.testSerialLockUnlock 100 avgt 12 466.629 ? 3.036 ns/op > LockUnlock.testSimpleLockUnlock 100 avgt 12 468.532 ? 1.793 ns/op > Finished running test 'micro:vm.lang.LockUnlock' > > Locking Mode = 1 (with patch) > > Benchmark (innerCount) Mode Cnt Score Error Units > LockUnlock.testContendedLock 100 avgt 12 5.146 ? 0.027 ns/op > LockUnlock.testRecursiveLockUnlock 100 avgt 12 3833.175 ? 75.863 ns/op > LockUnlock.testRecursiveSynchronization 100 avgt 12 25.206 ? 0.519 ns/op > LockUnlock.testSerialLockUnlock 100 avgt 12 473.973 ? 2.103 ns/op > LockUnlock.testSimpleLockUnlock 100 avgt 12 470.749 ? 2.229 ns/op > Finished running test 'micro:vm.lang.LockUnlock' > > > > > Locking Mode = 2 (without Patch) > > Benchmark (innerCount) Mode Cnt Score Error Units > LockUnlock.testContendedLock 100 avgt 12 4.688 ? 0.051 ns/op > LockUnlock.testRecursiveLockUnlock 100 avgt 12 12800.544 ? 92.265 ns/op > LockUnlock.testRecursiveSynchronization 100 avgt 12 26.486 ? 2.229 ns/op > LockUnlock.testSerialLockUnlock 100 avgt 12 424.499 ? 0.416 ns/op > LockUnlock.testSimpleLockUnlock 100 avgt 12 424.241 ? 0.840 ns/op > Finished running test 'micro:vm.lang.Lo... Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: minor code formatting & variable renamings ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18878/files - new: https://git.openjdk.org/jdk/pull/18878/files/d91259bb..2584484d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18878&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18878&range=02-03 Stats: 52 lines in 1 file changed: 9 ins; 0 del; 43 mod Patch: https://git.openjdk.org/jdk/pull/18878.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18878/head:pull/18878 PR: https://git.openjdk.org/jdk/pull/18878 From amitkumar at openjdk.org Thu May 23 12:52:01 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Thu, 23 May 2024 12:52:01 GMT Subject: RFR: 8319947: Recursive lightweight locking: s390x implementation [v4] In-Reply-To: References: <_HXRejW4TcLfTYbXlRQUVyejaomiwx6HhObYfG1lX6E=.f89b29a0-4c2d-43e1-83be-d7bafd84816c@github.com> <8ftPbjSfPRGU8ibdxLD7cxBsC0U26dJgZf8IzHdK0ng=.e3769ecf-87b2-46a5-98bd-22a27a068be0@github.com> Message-ID: On Wed, 22 May 2024 07:14:59 GMT, Axel Boldt-Christmas wrote: >> The current code is fine, but that comment made me wonder why preserving the original top value was important. My think was that you could only change the assert snippet as follows: >> >> #ifdef ASSERT >> NearLabel check_done; >> + NearLabel loop; >> + z_lgf(top, Address(Z_thread, JavaThread::lock_stack_top_offset())); >> + bind(loop); >> z_aghi(top, -oopSize); >> compareU32_and_branch(top, in_bytes(JavaThread::lock_stack_base_offset()), >> bcondLow, check_done); >> z_cg(obj, Address(Z_thread, top)); >> - z_brne(inflated); >> + z_brne(loop); >> stop("Fast Unlock lock on stack"); >> bind(check_done); >> #endif // ASSERT >> >> >> then remove the comment and use either whatever register. > > I still do not think I understand what ` if we load top there then it could result into infinite loop` is referring to. @xmas92 I have updated the code, please have a second look. Thanks, ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18878#discussion_r1611636785 From heidinga at openjdk.org Thu May 23 13:06:09 2024 From: heidinga at openjdk.org (Dan Heidinga) Date: Thu, 23 May 2024 13:06:09 GMT Subject: RFR: 8332745: Method::is_vanilla_constructor is never used Message-ID: Removed dead code related to identifying empty constructors. Missed when [JDK-8057777](https://bugs.openjdk.org/browse/JDK-8057777) cleaned up JVM_AllocateNewObject. Passes mach5 tier1. ------------- Commit messages: - Merge remote-tracking branch 'upstream/master' into 8332745 - 8332745: Method::is_vanilla_constructor is never used Changes: https://git.openjdk.org/jdk/pull/19367/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19367&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8332745 Stats: 78 lines in 5 files changed: 0 ins; 76 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/19367.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19367/head:pull/19367 PR: https://git.openjdk.org/jdk/pull/19367 From stefank at openjdk.org Thu May 23 13:08:07 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Thu, 23 May 2024 13:08:07 GMT Subject: RFR: 8331539: [REDO] NMT: add/make a mandatory MEMFLAGS argument to family of os::reserve/commit/uncommit memory API [v2] In-Reply-To: <_M5SvhyN_E_8HUeamhiLJMp37flhjgTVE_X7t8jmPVc=.f86cbb23-9461-4013-83bf-d6b154b96cfd@github.com> References: <1i0PKv9mCusM6BZqXG8ULe0lRA2Nz2ix4aZHz9otNMM=.b9d2d151-883e-4cb6-be48-4ba45b49ed43@github.com> <_M5SvhyN_E_8HUeamhiLJMp37flhjgTVE_X7t8jmPVc=.f86cbb23-9461-4013-83bf-d6b154b96cfd@github.com> Message-ID: <9XzKmn3xJvlbw4gz2vK_NZ6yOwfKB9VzHE6CBSz-73E=.dfaa5291-95b6-403d-b363-42131ebf4c4c@github.com> On Wed, 22 May 2024 12:09:14 GMT, Afshin Zafari wrote: >> This PR fixes the problems existed in the original PR (https://github.com/openjdk/jdk/pull/18745). There are two main fixes here: >> 1- `ReservedSpace` class is changed so that the `_flag` member never changes after it is set in ctor. Since reserving memory regions may go thru a try and fail sequence of reserve-release pairs, changing the `_flag` member at failed releases would lead to incorrect flags in subsequent reserves. >> Also, some assertion are added to the getters of a `ReservedSpace` to check if the region is successfully reserved. >> >> 2- In order to have adjacent regions with different flags, CDS reserves a (large) region `R` and then splits it into sub regions `R1` and `R2` (`R == <---R1---><--R2-->`). At release time, NMT tracks only `R` and ignores releasing `R1` and `R2`. This ignoring is problematic when a requested region `R` is size-aligned to `R1---R---R2` first and then the `R1` and `R2` are released (`chop_extra_memory` function is called for this). In this case, NMT ignores tracking `R1` and `R2` with false assumption that a containing `R` will be released. Therefore, `R1` and `R2` remain in the NMT reserved-regions-list and when a new reserve happens at that regions, NMT complains by raising an exception. >> >> Tests: >> mach5 tiers 1-5, {linux-x64, macosx-aarch64, windows-x64, linux-aarch64 } x {debug, non-debug} > > Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision: > > fixed the missing parts of shenandoahHeap.cpp I've left an initial set of comments. src/hotspot/os/posix/os_posix.cpp line 386: > 384: if (begin_offset > 0) { > 385: if (os::release_memory(extra_base, begin_offset)) > 386: { The `{` should be moved to the line above. src/hotspot/os/posix/os_posix.cpp line 387: > 385: if (os::release_memory(extra_base, begin_offset)) > 386: { > 387: ThreadCritical tc; In many of the functions we put the `ThreadCritical` inside the `MemTracker` after the `enabled()` check, but we don't do it here. Why is that? Shouldn't the `ThreadCritical` usage be hidden inside `MemTracker`? src/hotspot/share/cds/metaspaceShared.cpp line 1088: > 1086: #endif // ASSERT > 1087: > 1088: if (archive_space_rs.is_reserved()) { We've already asserted that this should be true, so this if should not be needed. src/hotspot/share/cds/metaspaceShared.cpp line 1092: > 1090: p2i(archive_space_rs.base()), p2i(archive_space_rs.end()), archive_space_rs.size()); > 1091: } > 1092: if (class_space_rs.is_reserved()) { `class_space_rs.is_reserved()` is asserted if `if (Metaspace::using_class_space())` is taken. I think this could be changed to: Suggestion: if (Metaspace::using_class_space()) { src/hotspot/share/cds/metaspaceShared.cpp line 1341: > 1339: } else { > 1340: if (use_archive_base_addr && base_address != nullptr) { > 1341: total_space_rs = ReservedSpace(total_range_size, archive_space_alignment, Can you explain why you changed this? It's also interesting that after this change we only use `base_address_alignment` in asserts. I think this indicates that something should be cleaned up / fixed here. src/hotspot/share/cds/metaspaceShared.cpp line 1370: > 1368: ccs_begin_offset, mtClassShared, mtClass); > 1369: } > 1370: assert(archive_space_rs.is_reserved(), "Archive space is not reserved."); Something is dubious about the code above: archive_space_rs = total_space_rs.first_part(ccs_begin_offset, (size_t)archive_space_alignment); class_space_rs = total_space_rs.last_part(ccs_begin_offset); MemTracker::record_virtual_memory_split_reserved(total_space_rs.base(), total_space_rs.size(), ccs_begin_offset, mtClassShared, mtClass); In one path `total_space_rs` gets initialized with `mtClass` and in another path it gets initialized with `mtClassShared`. This means that we always get the wrong flag in one of `archive_space_rs` and `class_space_rs`. src/hotspot/share/memory/virtualspace.hpp line 63: > 61: // it should not change after. > 62: // * _alignment - Not to be changed after initialization > 63: // * _executable - Not to be changed after initialization I think this would be a good change to do in the future, but currently this isn't true. `clear_members` do clear these fields, so I think you should remove these two lines. src/hotspot/share/memory/virtualspace.hpp line 76: > 74: > 75: MEMFLAGS nmt_flag() const { assert(is_reserved(), "Memory region is not reserved."); assert(_flag != mtNone, "Memory flag is not set."); return _flag; } > 76: Looking at this again, and realize that this function should probably be moved to the other accessors below. src/hotspot/share/memory/virtualspace.hpp line 98: > 96: bool special() const { assert(is_reserved(), "Memory region is not reserved."); return _special; } > 97: bool executable() const { assert(is_reserved(), "Memory region is not reserved."); return _executable; } > 98: size_t noaccess_prefix() const { assert(is_reserved(), "Memory region is not reserved."); return _noaccess_prefix; } FWIW, this change comes from one of my debugging sessions. I think it is good to have these asserts, I just wish they could says something like `assert(is_initialized(), ...)` to more clearly convey why we are doing this check. We are considering if there are ways to split ReservedSpace into two classes, one that handles reserving of memory and one that is a plain view of already reserved memory. If/when we do such a change we could consider updating these asserts to be more legible. In the meantime, it would be nice to change the string to "Fields not initialized" (and get rid of the `.`). src/hotspot/share/nmt/virtualMemoryTracker.cpp line 506: > 504: return true; > 505: assert(reserved_rgn->end() == rgn.end() || reserved_rgn->base() == rgn.base(), "extra memory should be at either end of the region."); > 506: } This seems like an extreme hack. I understand that this just follows the tradition of the rest of the hacks in this file, but can't this be better handled in the CDS layer above? ------------- Changes requested by stefank (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19343#pullrequestreview-2073757670 PR Review Comment: https://git.openjdk.org/jdk/pull/19343#discussion_r1611620811 PR Review Comment: https://git.openjdk.org/jdk/pull/19343#discussion_r1611624784 PR Review Comment: https://git.openjdk.org/jdk/pull/19343#discussion_r1611566535 PR Review Comment: https://git.openjdk.org/jdk/pull/19343#discussion_r1611567731 PR Review Comment: https://git.openjdk.org/jdk/pull/19343#discussion_r1611582321 PR Review Comment: https://git.openjdk.org/jdk/pull/19343#discussion_r1611608926 PR Review Comment: https://git.openjdk.org/jdk/pull/19343#discussion_r1611629459 PR Review Comment: https://git.openjdk.org/jdk/pull/19343#discussion_r1611630735 PR Review Comment: https://git.openjdk.org/jdk/pull/19343#discussion_r1611641554 PR Review Comment: https://git.openjdk.org/jdk/pull/19343#discussion_r1611657743 From mbaesken at openjdk.org Thu May 23 13:25:07 2024 From: mbaesken at openjdk.org (Matthias Baesken) Date: Thu, 23 May 2024 13:25:07 GMT Subject: Integrated: 8332720: ubsan: instanceKlass.cpp:3550:76: runtime error: member call on null pointer of type 'struct Array' In-Reply-To: References: Message-ID: On Wed, 22 May 2024 14:30:01 GMT, Matthias Baesken wrote: > When running hs :tier1 tests, with ubsan enabled (configure flag --enable-ubsan), in test runtime/CommandLine/PrintClasses_id0.jtr > this error is reported ; seems we miss a nullptr check that is in place at similar coding in instanceKlass.cpp . > > /jdk/src/hotspot/share/oops/instanceKlass.cpp:3550:76: runtime error: member call on null pointer of type 'struct Array' > #0 0x7fed098d2362 in InstanceKlass::print_on(outputStream*) const /jdk/src/hotspot/share/oops/instanceKlass.cpp:3550 > #1 0x7fed09897cdc in PrintClassClosure::do_klass(Klass*) /jdk/src/hotspot/share/oops/instanceKlass.cpp:2228 > #2 0x7fed08bed334 in ClassLoaderData::classes_do(KlassClosure*) /jdk/src/hotspot/share/classfile/classLoaderData.cpp:387 > #3 0x7fed08c06403 in ClassLoaderDataGraph::classes_do(KlassClosure*) /jdk/src/hotspot/share/classfile/classLoaderDataGraph.cpp:303 > #4 0x7fed09108768 in VM_PrintClasses::doit() /jdk/src/hotspot/share/services/diagnosticCommand.cpp:989 > #5 0x7fed0b776c38 in VM_Operation::evaluate() /jdk/src/hotspot/share/runtime/vmOperations.cpp:75 > #6 0x7fed0b7af23e in VMThread::evaluate_operation(VM_Operation*) /jdk/src/hotspot/share/runtime/vmThread.cpp:283 > #7 0x7fed0b7b0a67 in VMThread::inner_execute(VM_Operation*) /jdk/src/hotspot/share/runtime/vmThread.cpp:427 > #8 0x7fed0b7b1681 in VMThread::loop() /jdk/src/hotspot/share/runtime/vmThread.cpp:493 > #9 0x7fed0b7b1681 in VMThread::loop() /jdk/src/hotspot/share/runtime/vmThread.cpp:478 > #10 0x7fed0b7b182d in VMThread::run() /jdk/src/hotspot/share/runtime/vmThread.cpp:177 > #11 0x7fed0b4e8b0f in Thread::call_run() /jdk/src/hotspot/share/runtime/thread.cpp:225 > #12 0x7fed0a9dae75 in thread_native_entry /jdk/src/hotspot/os/linux/os_linux.cpp:846 > #13 0x7fed10fed6e9 in start_thread (/lib64/libpthread.so.0+0xa6e9) (BuildId: 2f8d3c2d0f4d7888c2598d2ff6356537f5708a73) > #14 0x7fed1051550e in clone (/lib64/libc.so.6+0x11850e) (BuildId: f732026552f6adff988b338e92d466bc81a01c37) This pull request has now been integrated. Changeset: e19a421c Author: Matthias Baesken URL: https://git.openjdk.org/jdk/commit/e19a421c30534566ba0dea0fa84f812ebeecfc87 Stats: 7 lines in 1 file changed: 2 ins; 0 del; 5 mod 8332720: ubsan: instanceKlass.cpp:3550:76: runtime error: member call on null pointer of type 'struct Array' Reviewed-by: stefank, mdoerr ------------- PR: https://git.openjdk.org/jdk/pull/19349 From mbaesken at openjdk.org Thu May 23 13:25:07 2024 From: mbaesken at openjdk.org (Matthias Baesken) Date: Thu, 23 May 2024 13:25:07 GMT Subject: RFR: 8332720: ubsan: instanceKlass.cpp:3550:76: runtime error: member call on null pointer of type 'struct Array' [v2] In-Reply-To: <-visGzw1GeoT6b35zj5l6Ii-m1BpS_slOuVOlVgWmqs=.679e3dd7-f22d-44eb-9cd3-24352ef82f92@github.com> References: <-visGzw1GeoT6b35zj5l6Ii-m1BpS_slOuVOlVgWmqs=.679e3dd7-f22d-44eb-9cd3-24352ef82f92@github.com> Message-ID: On Thu, 23 May 2024 07:48:30 GMT, Matthias Baesken wrote: >> When running hs :tier1 tests, with ubsan enabled (configure flag --enable-ubsan), in test runtime/CommandLine/PrintClasses_id0.jtr >> this error is reported ; seems we miss a nullptr check that is in place at similar coding in instanceKlass.cpp . >> >> /jdk/src/hotspot/share/oops/instanceKlass.cpp:3550:76: runtime error: member call on null pointer of type 'struct Array' >> #0 0x7fed098d2362 in InstanceKlass::print_on(outputStream*) const /jdk/src/hotspot/share/oops/instanceKlass.cpp:3550 >> #1 0x7fed09897cdc in PrintClassClosure::do_klass(Klass*) /jdk/src/hotspot/share/oops/instanceKlass.cpp:2228 >> #2 0x7fed08bed334 in ClassLoaderData::classes_do(KlassClosure*) /jdk/src/hotspot/share/classfile/classLoaderData.cpp:387 >> #3 0x7fed08c06403 in ClassLoaderDataGraph::classes_do(KlassClosure*) /jdk/src/hotspot/share/classfile/classLoaderDataGraph.cpp:303 >> #4 0x7fed09108768 in VM_PrintClasses::doit() /jdk/src/hotspot/share/services/diagnosticCommand.cpp:989 >> #5 0x7fed0b776c38 in VM_Operation::evaluate() /jdk/src/hotspot/share/runtime/vmOperations.cpp:75 >> #6 0x7fed0b7af23e in VMThread::evaluate_operation(VM_Operation*) /jdk/src/hotspot/share/runtime/vmThread.cpp:283 >> #7 0x7fed0b7b0a67 in VMThread::inner_execute(VM_Operation*) /jdk/src/hotspot/share/runtime/vmThread.cpp:427 >> #8 0x7fed0b7b1681 in VMThread::loop() /jdk/src/hotspot/share/runtime/vmThread.cpp:493 >> #9 0x7fed0b7b1681 in VMThread::loop() /jdk/src/hotspot/share/runtime/vmThread.cpp:478 >> #10 0x7fed0b7b182d in VMThread::run() /jdk/src/hotspot/share/runtime/vmThread.cpp:177 >> #11 0x7fed0b4e8b0f in Thread::call_run() /jdk/src/hotspot/share/runtime/thread.cpp:225 >> #12 0x7fed0a9dae75 in thread_native_entry /jdk/src/hotspot/os/linux/os_linux.cpp:846 >> #13 0x7fed10fed6e9 in start_thread (/lib64/libpthread.so.0+0xa6e9) (BuildId: 2f8d3c2d0f4d7888c2598d2ff6356537f5708a73) >> #14 0x7fed1051550e in clone (/lib64/libc.so.6+0x11850e) (BuildId: f732026552f6adff988b338e92d466bc81a01c37) > > Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: > > adjust check Thanks for the reviews ! ------------- PR Comment: https://git.openjdk.org/jdk/pull/19349#issuecomment-2127097682 From liach at openjdk.org Thu May 23 13:31:01 2024 From: liach at openjdk.org (Chen Liang) Date: Thu, 23 May 2024 13:31:01 GMT Subject: RFR: 8242888: Convert dynamic proxy to hidden classes In-Reply-To: References: Message-ID: On Thu, 23 May 2024 03:28:30 GMT, Chen Liang wrote: > Please review this change that convert dynamic proxies implementations to hidden classes, intended to target JDK 24. > > Summary: > 1. Adds new implementation while preserving the old implementation behind `-Djdk.reflect.useLegacyProxyImpl=true` in case there are compatibility issues. > 2. ClassLoader.defineClass0 takes a ClassLoader instance but discards it in native code; I updated native code to reuse that ClassLoader for Proxy support. > 3. ProxyGenerator changes mainly involve using Class data to pass Method list (accessed in a single condy) and removal of obsolete setup code generation. > > Testing: tier1 and tier2 have no related failures. > > Comment: Since #8278, Proxy has been converted to ClassFile API, and infrastructure has changed; now, the migration to hidden classes is much cleaner and has less impact, such as preserving ProtectionDomain and dynamic module without "anchor classes", and avoiding java.lang.invoke package. I have updated the compatibility risk description of the CSR. My CSR proposes to allow dynamic unloading of the proxy implementation classes, but currently it's not implemented as they are strongly referenced in the ClassLoaderValue caches. Should I implement dynamic unloading suggested in the CSR in this patch, or should I do it later? ------------- PR Comment: https://git.openjdk.org/jdk/pull/19356#issuecomment-2127111543 From forax at univ-mlv.fr Thu May 23 13:45:14 2024 From: forax at univ-mlv.fr (forax at univ-mlv.fr) Date: Thu, 23 May 2024 15:45:14 +0200 (CEST) Subject: RFR: 8242888: Convert dynamic proxy to hidden classes In-Reply-To: References: <1193071457.29792413.1716464618330.JavaMail.zimbra@univ-eiffel.fr> Message-ID: <1969145685.29901931.1716471914849.JavaMail.zimbra@univ-eiffel.fr> > From: "-" > To: "Remi Forax" > Cc: "Chen Liang" , "core-libs-dev" > , "hotspot-dev" , > "kulla-dev" > Sent: Thursday, May 23, 2024 2:56:58 PM > Subject: Re: RFR: 8242888: Convert dynamic proxy to hidden classes > Hmm, I think Proxy being hidden in stacktraces might be an advantage; the same > happens for lambdas. > The main advantage of hidden classes compared to an explicit class with > classData is that it supports flexible unloading, which might be useful for > Proxy. Flexible unloading has a high cost in term of memory, the class + methods, etc need their own metaspace. While on paper it seems great, I've my doubt that it's a good idea to use that option for proxies given that the Proxy API allows an umbounded number of proxy classes. That's why lambda proxies does not use the flexible unloading anymore. > I still believe the flexible unloading advantage justifies the migration to > hidden classes. > Chen R?mi > On Thu, May 23, 2024 at 6:43 AM Remi Forax < [ mailto:forax at univ-mlv.fr | > forax at univ-mlv.fr ] > wrote: >> ----- Original Message ----- >> > From: "Chen Liang" < [ mailto:liach at openjdk.org | liach at openjdk.org ] > >>> To: "core-libs-dev" < [ mailto:core-libs-dev at openjdk.org | >>> core-libs-dev at openjdk.org ] >, "hotspot-dev" < [ mailto:hotspot-dev at openjdk.org >>> | hotspot-dev at openjdk.org ] >, [ mailto:kulla-dev at openjdk.org | >> > kulla-dev at openjdk.org ] >> > Sent: Thursday, May 23, 2024 1:28:01 PM >> > Subject: Re: RFR: 8242888: Convert dynamic proxy to hidden classes >>> On Thu, 23 May 2024 03:28:30 GMT, Chen Liang < [ mailto:liach at openjdk.org | >> > liach at openjdk.org ] > wrote: >> >> Please review this change that convert dynamic proxies implementations to hidden >> >> classes, intended to target JDK 24. >> >> Summary: >> >> 1. Adds new implementation while preserving the old implementation behind >> >> `-Djdk.reflect.useLegacyProxyImpl=true` in case there are compatibility issues. >> >> 2. ClassLoader.defineClass0 takes a ClassLoader instance but discards it in >> >> native code; I updated native code to reuse that ClassLoader for Proxy support. >> >> 3. ProxyGenerator changes mainly involve using Class data to pass Method list >> >> (accessed in a single condy) and removal of obsolete setup code generation. >> >> Testing: tier1 and tier2 have no related failures. >> >> Comment: Since #8278, Proxy has been converted to ClassFile API, and >> >> infrastructure has changed; now, the migration to hidden classes is much >> >> cleaner and has less impact, such as preserving ProtectionDomain and dynamic >> >> module without "anchor classes", and avoiding java.lang.invoke package. >> > A CSR targeting 24 describing the compatibility concerns and behavioral >> > differences is here, somehow not linked by skara: >>> [ https://bugs.openjdk.org/browse/JDK-8332770 | >> > https://bugs.openjdk.org/browse/JDK-8332770 ] >> > The incompatibilities were much greater in the previous iterations of this >> > issue, such as in dynamic modules, serialization, and in proxy class protection >> > domain. Now these aspects are addressed by this patch, the only real one left >> > is the change in stack trace. Feel free to raise other incompatibilities you >> > have discovered. >> I wonder if instead of using hidden classes, we should not use usual named >> classes and add a new Lookup.defineClass() that takes a classData as parameter. >> This will solve the both the problem of the stacktrace and the problem of the >> roundtrip proxyClass != Class.forName(proxyClass.getName()). >> R?mi >> > ------------- >>> PR Comment: [ https://git.openjdk.org/jdk/pull/19356#issuecomment-2126869679 | >> > https://git.openjdk.org/jdk/pull/19356#issuecomment-2126869679 ] -------------- next part -------------- An HTML attachment was scrubbed... URL: From alanb at openjdk.org Thu May 23 14:00:12 2024 From: alanb at openjdk.org (Alan Bateman) Date: Thu, 23 May 2024 14:00:12 GMT Subject: RFR: 8242888: Convert dynamic proxy to hidden classes In-Reply-To: References: Message-ID: On Thu, 23 May 2024 13:28:16 GMT, Chen Liang wrote: > I have updated the compatibility risk description of the CSR. > > My CSR proposes to allow dynamic unloading of the proxy implementation classes, but currently it's not implemented as they are strongly referenced in the ClassLoaderValue caches. Should I implement dynamic unloading suggested in the CSR in this patch, or should I do it later? I think the main compatibility concern is going to be that hidden classes don't have a binary name so we have to get a sense as to whether there are frameworks that do anything with the class name and Class.forName. I suspect the work will also mean looking at cases where agents are somehow instrumenting proxy class (hidden classes are not modifiable). In the JDK 8 time frame we had to back out a change in this area due to one of the mocking tools filtering by class name and trying to redefine proxy classes. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19356#issuecomment-2127188839 From kcr at openjdk.org Thu May 23 14:00:14 2024 From: kcr at openjdk.org (Kevin Rushforth) Date: Thu, 23 May 2024 14:00:14 GMT Subject: RFR: 8331671: Implement JEP 472: Prepare to Restrict the Use of JNI [v8] In-Reply-To: References: Message-ID: On Thu, 23 May 2024 06:20:51 GMT, Alan Bateman wrote: > > Further, I confirm that if I pass that option to jlink or jpackage when creating a custom runtime, there is no warning. > > Great! What about jpackage without a custom runtime, wondering if --java-options can be tested. Yes, pointing to an existing runtime works, too. In either mode (jpackage using an existing Java runtime vs running jlink to create a new one), the options specified by `jpackage --java-options` are written to the application's `.cfg` file and used when the application launcher is run. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19213#issuecomment-2127188783 From sgibbons at openjdk.org Thu May 23 14:09:36 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Thu, 23 May 2024 14:09:36 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v31] In-Reply-To: References: Message-ID: > Re-write the IndexOf code without the use of the pcmpestri instruction, only using AVX2 instructions. This change accelerates String.IndexOf on average 1.3x for AVX2. The benchmark numbers: > > > Benchmark Score Latest > StringIndexOf.advancedWithMediumSub 343.573 317.934 0.925375393x > StringIndexOf.advancedWithShortSub1 1039.081 1053.96 1.014319384x > StringIndexOf.advancedWithShortSub2 55.828 110.541 1.980027943x > StringIndexOf.constantPattern 9.361 11.906 1.271872663x > StringIndexOf.searchCharLongSuccess 4.216 4.218 1.000474383x > StringIndexOf.searchCharMediumSuccess 3.133 3.216 1.02649218x > StringIndexOf.searchCharShortSuccess 3.76 3.761 1.000265957x > StringIndexOf.success 9.186 9.713 1.057369911x > StringIndexOf.successBig 14.341 46.343 3.231504079x > StringIndexOfChar.latin1_AVX2_String 6220.918 12154.52 1.953814533x > StringIndexOfChar.latin1_AVX2_char 5503.556 5540.044 1.006629895x > StringIndexOfChar.latin1_SSE4_String 6978.854 6818.689 0.977049957x > StringIndexOfChar.latin1_SSE4_char 5657.499 5474.624 0.967675646x > StringIndexOfChar.latin1_Short_String 7132.541 6863.359 0.962260014x > StringIndexOfChar.latin1_Short_char 16013.389 16162.437 1.009307711x > StringIndexOfChar.latin1_mixed_String 7386.123 14771.622 1.999915517x > StringIndexOfChar.latin1_mixed_char 9901.671 9782.245 0.987938803 Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: Check macos build ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16753/files - new: https://git.openjdk.org/jdk/pull/16753/files/40a1e628..87b1ebe8 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16753&range=30 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16753&range=29-30 Stats: 2 lines in 1 file changed: 1 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/16753.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16753/head:pull/16753 PR: https://git.openjdk.org/jdk/pull/16753 From mdoerr at openjdk.org Thu May 23 14:16:10 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Thu, 23 May 2024 14:16:10 GMT Subject: RFR: 8331117: [PPC64] secondary_super_cache does not scale well Message-ID: PPC64 implementation of [JDK-8180450](https://bugs.openjdk.org/browse/JDK-8180450). Please review! I noticed that `r_array_length` is sometimes 0 and I don't see code for that on x86. Any idea? How can we verify it? By comparing the performance using the micro benchmarks? ------------- Commit messages: - 8331117: [PPC64] secondary_super_cache does not scale well Changes: https://git.openjdk.org/jdk/pull/19368/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19368&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8331117 Stats: 408 lines in 5 files changed: 408 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/19368.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19368/head:pull/19368 PR: https://git.openjdk.org/jdk/pull/19368 From gziemski at openjdk.org Thu May 23 14:54:21 2024 From: gziemski at openjdk.org (Gerard Ziemski) Date: Thu, 23 May 2024 14:54:21 GMT Subject: RFR: 8312132: Add tracking of multiple address spaces in NMT [v97] In-Reply-To: References: Message-ID: <3NF_EoANeRkuTkClYKS-VBP6Hd6oPiOmnEuGl4BLPZw=.5cddafbc-c8b4-4986-b955-522d1f17d640@github.com> On Thu, 23 May 2024 12:37:07 GMT, Thomas Stuefe wrote: >> I really prefer `upsert` over `insert`, the point is to show that it works for both *up*dating and in*sert*ing! It is pretty well established in database terminology. I can go with `add` if you're not comfortable with `upsert`. > > +1 for upsert. Its an established term. If `upsert` is part of the tree vocabulary, then I'm OK with that. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18289#discussion_r1611844342 From fyang at openjdk.org Thu May 23 15:07:08 2024 From: fyang at openjdk.org (Fei Yang) Date: Thu, 23 May 2024 15:07:08 GMT Subject: RFR: 8332265: RISC-V: Materialize pointers faster by using a temp register [v6] In-Reply-To: References: Message-ID: On Thu, 23 May 2024 10:55:35 GMT, Robbin Ehn wrote: >> Hi, please consider! >> >> Materializing a 48-bit pointer, using an additional register, we can do with: >> lui + lui + slli + add + addi >> This 15% faster both on VF2 and in CPU models, compared to movptr(). >> >> As we often materialize during calls there is free registers. >> >> I have choose just a few spot to use it, many more can use. >> E.g. la() with tmp register can use li48 instead of movptr. >> >> Running tests now (so far so good), as if I screwed up IC calls it should be seen fast. >> And benchmarks when hardware is free. > > Robbin Ehn has updated the pull request incrementally with two additional commits since the last revision: > > - Fixed more comments > - Fixed comments Updated change looks good. It would be nice to see how much this will benefit performance. ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19246#pullrequestreview-2074260575 From jsjolen at openjdk.org Thu May 23 15:08:15 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Thu, 23 May 2024 15:08:15 GMT Subject: RFR: 8312132: Add tracking of multiple address spaces in NMT [v95] In-Reply-To: References: Message-ID: On Tue, 21 May 2024 06:18:53 GMT, Thomas Stuefe wrote: >> Johan Sj?len has updated the pull request incrementally with four additional commits since the last revision: >> >> - Remove unused include >> - Basic tests for NativeCallStackStorage >> - Allow for passing in nr of buckets >> - Remove friend-ness > > test/hotspot/gtest/nmt/test_nmt_treap.cpp line 81: > >> 79: } >> 80: } >> 81: > > All nodes are same-sized, no? So we don't have to track individual allocations. We can just count them. In the end, counter must be 0. I don't get how the size matters, but I just added a counter. Fixed! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18289#discussion_r1611870501 From jsjolen at openjdk.org Thu May 23 15:08:16 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Thu, 23 May 2024 15:08:16 GMT Subject: RFR: 8312132: Add tracking of multiple address spaces in NMT [v105] In-Reply-To: References: Message-ID: On Thu, 23 May 2024 06:26:51 GMT, Thomas Stuefe wrote: >> Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: >> >> Lower number of pages > > test/hotspot/gtest/nmt/test_vmatree.cpp line 351: > >> 349: struct SimpleVMATracker : public CHeapObj { >> 350: const size_t page_size = 4096; >> 351: enum Tpe { Reserved, Committed, Free }; > > Wow, we save one letter! ;-) Think about it, that's one whole byte per instance in the code! Very useful when pushing changes over a 75 baud connection to Mars :-P! Fixed. > test/hotspot/gtest/nmt/test_vmatree.cpp line 447: > >> 445: const size_t size = num_pages * page_size; >> 446: >> 447: const MEMFLAGS flag = (MEMFLAGS)(os::random() % mt_number_of_types); > > I would maybe scale down, not use mt_number_of_types. > > We want to stress merging, too. So the total number of possible states should not that large. E.g. 8 states, with e.g. 4 flags and 2 stacks. SGTM ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18289#discussion_r1611869340 PR Review Comment: https://git.openjdk.org/jdk/pull/18289#discussion_r1611868042 From stuefe at openjdk.org Thu May 23 16:02:04 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 23 May 2024 16:02:04 GMT Subject: RFR: 8332105: Exploded JDK does not include CDS In-Reply-To: References: Message-ID: On Sat, 11 May 2024 06:13:29 GMT, Thomas Stuefe wrote: > An exploded JDK cannot be used with either -Xshare:on or -Xshare:auto. That causes tests like runtime/CompressedOops/CompressedCPUSpecificClassSpaceReservation.java to fail when running on an exploded JDK. > > Since an exploded JDK cannot use CDS, we should - for tests - treat it as if CDS had not been included. > > > ---- > > Note that I was torn between two ways to fix this: > > - either this fix, which is rather simple and automatically updates the "vm.cds" `@requires` property > - or to expose "exploded-ness" as a boolean property via `WhiteBox` and `VMProps`(`jdk.exploded`). See this draft PR: https://github.com/openjdk/jdk/pull/19178 . > > The latter is cleaner and clearer, conveying the message of exploded-ness without muddling it with the CDS aspect. But OTOH the complexity may not be required. > > I can go either way, though I have a slight preference for this PR, which is why I posted it. Any takers? ------------- PR Comment: https://git.openjdk.org/jdk/pull/19188#issuecomment-2127504392 From shade at openjdk.org Thu May 23 16:07:05 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 23 May 2024 16:07:05 GMT Subject: RFR: 8325821: [REDO] use "dmb.ishst+dmb.ishld" for release barrier [v2] In-Reply-To: References: <9h-ta3XTnzioy3Ghdeulm6FgZYDJb2y5mDdMLGw3oYc=.defe7ef1-15dd-451d-8b79-3688c1e7a1da@github.com> Message-ID: On Thu, 23 May 2024 10:55:41 GMT, Aleksey Shipilev wrote: >> Test is added as merge_dmb_all_kinds > > Right. I was implicitly thinking that we can do this without coding the explicit patterns into the test. As it stands now, it is hard to check that generated patterns are actually correct. Let me see if I can whip up a sample of what I had in mind. I was thinking about this: [improve-tests.patch](https://github.com/openjdk/jdk/files/15419452/improve-tests.patch). Note how it uses the constants for better readability, and also runs the test in both `AlwaysMergeDMB` modes. You might want to adapt other tests to similar pattern. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19278#discussion_r1611958667 From stuefe at openjdk.org Thu May 23 16:14:15 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 23 May 2024 16:14:15 GMT Subject: RFR: 8312132: Add tracking of multiple address spaces in NMT [v95] In-Reply-To: References: Message-ID: On Thu, 23 May 2024 15:05:51 GMT, Johan Sj?len wrote: >> test/hotspot/gtest/nmt/test_nmt_treap.cpp line 81: >> >>> 79: } >>> 80: } >>> 81: >> >> All nodes are same-sized, no? So we don't have to track individual allocations. We can just count them. In the end, counter must be 0. > > I don't get how the size matters, but I just added a counter. Fixed! Ugh, my comment was in the wrong place. It refers to LeakCheckedAllocator. You track every allocation via an own structure `Check`. I don't know why you do this. The only reason I can see is if you were to check that a to-be-free'd pointer exists, but exists only once, in the array. I don't see that. Therefore, if you don't plan to check that (and I don't think that is needed), you can just count: +1 on alloc, -1 on free. That works since all allocations have the same size (which you could check, btw, on allocate). Then, make sure that upon release, count is 0. No need for the Check structure array. 0. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18289#discussion_r1611968887 From jsjolen at openjdk.org Thu May 23 16:25:14 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Thu, 23 May 2024 16:25:14 GMT Subject: RFR: 8312132: Add tracking of multiple address spaces in NMT [v105] In-Reply-To: References: Message-ID: On Thu, 23 May 2024 05:38:44 GMT, Thomas Stuefe wrote: >> Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: >> >> Lower number of pages > > test/hotspot/gtest/nmt/test_vmatree.cpp line 70: > >> 68: NativeCallStackStorage ncs(true); >> 69: NativeCallStackStorage::StackIndex si1 = ncs.push(stack1); >> 70: NativeCallStackStorage::StackIndex si2 = ncs.push(stack2); > > Should be provided by the fixup class, or as global statics. Doing this through the usage of a fixture class. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18289#discussion_r1611989223 From jsjolen at openjdk.org Thu May 23 16:25:15 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Thu, 23 May 2024 16:25:15 GMT Subject: RFR: 8312132: Add tracking of multiple address spaces in NMT [v105] In-Reply-To: References: Message-ID: On Thu, 23 May 2024 04:48:18 GMT, Thomas Stuefe wrote: >> test/hotspot/gtest/nmt/test_vmatree.cpp line 71: >> >>> 69: NativeCallStackStorage::StackIndex si1 = ncs.push(stack1); >>> 70: NativeCallStackStorage::StackIndex si2 = ncs.push(stack2); >>> 71: >> >> Please make these auto functions normal conventional functions. I really dislike that style. >> >> It has real disadvantages. For example, no IDE I know can resolve a call graph across them, or even into them. E.g., in CDT, one of the most capable C++ IDEs, the call graph for VMATreeTest::in_type_of gives me .... nothing. I need to do a dumb full text search to find the call sites. And if the term to search for is very generic, I am out of luck. I know several developers that rely heavily on call graph search in CDT, I certainly do. >> >> New techniques should serve a purpose. Lambdas as replacement for our old Closures makes sense, since they bring benefit (avoiding runtime polymorphy). But here I don't see the point. > > Also, please split all of these scopes up into individual TESTs. A finer granularity allows you to reproduce individual TESTS without having to sit through 20+ irrelevant ones. It also makes for cleaner code. I didn't know about the call graph issue, thanks for explaining. Splitting these out into explicit tests. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18289#discussion_r1611986725 From jsjolen at openjdk.org Thu May 23 16:42:12 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Thu, 23 May 2024 16:42:12 GMT Subject: RFR: 8312132: Add tracking of multiple address spaces in NMT [v105] In-Reply-To: <7mvAVR2Qfa10hYSXXxaL1yXpq6qbvvXFtqu-9-unCCk=.3802b0a1-8bc6-4f89-844a-affa2bf1788b@github.com> References: <7mvAVR2Qfa10hYSXXxaL1yXpq6qbvvXFtqu-9-unCCk=.3802b0a1-8bc6-4f89-844a-affa2bf1788b@github.com> Message-ID: On Thu, 23 May 2024 08:26:13 GMT, Thomas Stuefe wrote: >> test/hotspot/gtest/nmt/test_vmatree.cpp line 156: >> >>> 154: } >>> 155: i++; >>> 156: }); >> >> This doesn't really test the state, nor the stack. It also seems to be a lot of code for a single-use test. Similar in other tests. >> >> --- >> >> Proposal: since you have the recurring pattern of doing something with the tree, then checking its expected state, write a check function in the fixture (e.g. assert_tree_state() ) that does that. >> >> Then use it like this: >> >> // Committing in middle of reservation ends with a sequence of 4 nodes >> TEST_VM_F(VMATreeTest, commit_in_middle_of_reservation) { >> >> ... reserve, commit in middle, then >> >> const AddressState expected_states[] = { >> { 0, .... }, >> { 25, .... }, >> { 50, .... }, >> { 100, .... } }; >> assert_tree_state(expected_states, 4); >> } >> >> >> And to preserve your sanity when constructing AddressState with its many components, you can add a helper that allows that in a human-friendly form. >> >> For example: >> A string that encodes flag, stack, and reserved/Commit state in three letters. >> First letter: A-H, let this be one of 8 selected MEMFLAGS or - for mtNone >> Second letter: a-d, let this be one of 4 selected stacks, or - for no stack/empty >> Third letter: R or C , resreved, committed or none, or - for unreserved >> Season to taste. >> >> Then, e.g. for reserving 0..100, and committing 25..50, you write: >> >> >> const AddressState expected_states[] = { >> makestate( 0, "---", "AaR"), >> makestate(25, "AaR", "AaC"), >> makestate(50, "AaC", "AaR"), >> makestate( 0, "AaR", "---") }; >> assert_tree_state(expected_states, 4); >> >> >> Now you can write a bunch of corner-case tests without being too verbose, intent is immediately clear. As an added bonus, it checks for tree state *exactly*, e.g. which nodes are part of the tree, which are not, in which order, etc. >> >> If you reuse that string format for reserving etc, you could do this: >> >> >> reserve_or_commit(0, 100, "AaR"); >> reserve_or_commit(25, 50, "AaC"); >> >> >> or just >> >> >> do(0, 100, "AaR"); >> do(25, 50, "AaC"); >> >> >> From there one, one could even auto-build the expected state, but if too much automatism goes into the test, this increases the possibility for errors creeping in one never finds because they make the test go green. > > While going for a walk after writing this, I realized an alternative to a string would be just to define those AddressState directly as global constants. > > static const AddressState AaR = .... > > With e.g. 4 flags, 2 stacks and 3 states, this would come to 24 states. With a macro, this could even be easier. > > Up to you. > > If you put those into an array, you can later down in the random-tester-function just chose a state randomly from that array by random index. You don't have to roll the dice three times to select flag, stack and state. I'm going to take some time to digest these ideas. I'm not a fan of the string-based approach, I much prefer longer but more obvious code. The latter approach might work out. Still, a general tool is to me not preferable. The typical case is probably not that you need to read every test case, but a specific one which fails. Having some repetition in other tests shouldn't bother you then, but having to jump into a generalized DSL for state assertions do. >This doesn't really test the state, nor the stack. It also seems to be a lot of code for a single-use test. Similar in other tests. Sorry, I don't understand what you mean by this. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18289#discussion_r1612015180 From sgibbons at openjdk.org Thu May 23 17:04:26 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Thu, 23 May 2024 17:04:26 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v32] In-Reply-To: References: Message-ID: <79fqpujoxeB-9xiWMWM9tTYQRsOqS6vHP4poomY0DSU=.7d52f61f-cafc-4a62-b27e-7ec9e35103ef@github.com> > Re-write the IndexOf code without the use of the pcmpestri instruction, only using AVX2 instructions. This change accelerates String.IndexOf on average 1.3x for AVX2. The benchmark numbers: > > > Benchmark Score Latest > StringIndexOf.advancedWithMediumSub 343.573 317.934 0.925375393x > StringIndexOf.advancedWithShortSub1 1039.081 1053.96 1.014319384x > StringIndexOf.advancedWithShortSub2 55.828 110.541 1.980027943x > StringIndexOf.constantPattern 9.361 11.906 1.271872663x > StringIndexOf.searchCharLongSuccess 4.216 4.218 1.000474383x > StringIndexOf.searchCharMediumSuccess 3.133 3.216 1.02649218x > StringIndexOf.searchCharShortSuccess 3.76 3.761 1.000265957x > StringIndexOf.success 9.186 9.713 1.057369911x > StringIndexOf.successBig 14.341 46.343 3.231504079x > StringIndexOfChar.latin1_AVX2_String 6220.918 12154.52 1.953814533x > StringIndexOfChar.latin1_AVX2_char 5503.556 5540.044 1.006629895x > StringIndexOfChar.latin1_SSE4_String 6978.854 6818.689 0.977049957x > StringIndexOfChar.latin1_SSE4_char 5657.499 5474.624 0.967675646x > StringIndexOfChar.latin1_Short_String 7132.541 6863.359 0.962260014x > StringIndexOfChar.latin1_Short_char 16013.389 16162.437 1.009307711x > StringIndexOfChar.latin1_mixed_String 7386.123 14771.622 1.999915517x > StringIndexOfChar.latin1_mixed_char 9901.671 9782.245 0.987938803 Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: Check macos build ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16753/files - new: https://git.openjdk.org/jdk/pull/16753/files/87b1ebe8..23d2c511 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16753&range=31 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16753&range=30-31 Stats: 109 lines in 1 file changed: 42 ins; 4 del; 63 mod Patch: https://git.openjdk.org/jdk/pull/16753.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16753/head:pull/16753 PR: https://git.openjdk.org/jdk/pull/16753 From kvn at openjdk.org Thu May 23 17:08:02 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 23 May 2024 17:08:02 GMT Subject: RFR: 8332724: x86 MacroAssembler may over-align code [v2] In-Reply-To: References: Message-ID: On Thu, 23 May 2024 09:16:22 GMT, Daniel Jeli?ski wrote: >> The methods align32 and align64 are supposed to align the next instruction to the next 32 or 64 byte boundary using the minimum number of NOP bytes. However, when the target represented as a 32bit signed int is negative, the instructions generate 32 or 64 NOP bytes too many. This was observed in `jbyte_disjoint_arraycopy_avx3` on a Linux machine, where a single align32 invocation generated 63 bytes of NOPs. >> >> This PR addresses the problem by using bit operations to calculate the required number of bytes. >> >> Tier1-3 tests passed. >> >> On a side note, `align64` and `align32` instructions were meant for aligning data for use with zmm / ymm loads, but nowadays they are frequently used in places where `align(CodeEntryAlignment)` or `align(OptoLoopAlignment)` would be more appropriate. I can address that in a separate PR if you think it's worth fixing. > > Daniel Jeli?ski has updated the pull request incrementally with two additional commits since the last revision: > > - Explicit typecasts > - Change to unsigned instead GHA x86-32 build issue: /work/jdk/jdk/src/hotspot/cpu/x86/macroAssembler_x86.cpp: In member function ?void MacroAssembler::align(uint)?: /work/jdk/jdk/src/hotspot/cpu/x86/macroAssembler_x86.cpp:1162:18: error: comparison of integer expressions of different signedness: ?uint? {aka ?unsigned int?} and ?intx? {aka ?int?} [-Werror=sign-compare] 1162 | assert(modulus <= CodeEntryAlignment, "Alignment must be <= CodeEntryAlignment"); | ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~ ------------- PR Comment: https://git.openjdk.org/jdk/pull/19353#issuecomment-2127656170 From stuefe at openjdk.org Thu May 23 17:25:22 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 23 May 2024 17:25:22 GMT Subject: RFR: 8312132: Add tracking of multiple address spaces in NMT [v105] In-Reply-To: References: Message-ID: On Thu, 23 May 2024 16:21:53 GMT, Johan Sj?len wrote: > I didn't know about the call graph issue, thanks for explaining. Splitting these out into explicit tests. You should use CDT, its awesome ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18289#discussion_r1612065462 From sgibbons at openjdk.org Thu May 23 17:25:34 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Thu, 23 May 2024 17:25:34 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v33] In-Reply-To: References: Message-ID: > Re-write the IndexOf code without the use of the pcmpestri instruction, only using AVX2 instructions. This change accelerates String.IndexOf on average 1.3x for AVX2. The benchmark numbers: > > > Benchmark Score Latest > StringIndexOf.advancedWithMediumSub 343.573 317.934 0.925375393x > StringIndexOf.advancedWithShortSub1 1039.081 1053.96 1.014319384x > StringIndexOf.advancedWithShortSub2 55.828 110.541 1.980027943x > StringIndexOf.constantPattern 9.361 11.906 1.271872663x > StringIndexOf.searchCharLongSuccess 4.216 4.218 1.000474383x > StringIndexOf.searchCharMediumSuccess 3.133 3.216 1.02649218x > StringIndexOf.searchCharShortSuccess 3.76 3.761 1.000265957x > StringIndexOf.success 9.186 9.713 1.057369911x > StringIndexOf.successBig 14.341 46.343 3.231504079x > StringIndexOfChar.latin1_AVX2_String 6220.918 12154.52 1.953814533x > StringIndexOfChar.latin1_AVX2_char 5503.556 5540.044 1.006629895x > StringIndexOfChar.latin1_SSE4_String 6978.854 6818.689 0.977049957x > StringIndexOfChar.latin1_SSE4_char 5657.499 5474.624 0.967675646x > StringIndexOfChar.latin1_Short_String 7132.541 6863.359 0.962260014x > StringIndexOfChar.latin1_Short_char 16013.389 16162.437 1.009307711x > StringIndexOfChar.latin1_mixed_String 7386.123 14771.622 1.999915517x > StringIndexOfChar.latin1_mixed_char 9901.671 9782.245 0.987938803 Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: Fix for IndexOf.java on mac ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16753/files - new: https://git.openjdk.org/jdk/pull/16753/files/23d2c511..cba6ffbe Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16753&range=32 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16753&range=31-32 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/16753.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16753/head:pull/16753 PR: https://git.openjdk.org/jdk/pull/16753 From wkemper at openjdk.org Thu May 23 17:47:03 2024 From: wkemper at openjdk.org (William Kemper) Date: Thu, 23 May 2024 17:47:03 GMT Subject: RFR: 8332082: Shenandoah: Use consistent tests to determine when pre-write barrier is active [v3] In-Reply-To: <7YitGep10T35vf9lzitE2Oz3A9XwZywdDpgeiQoMXho=.7bb368d9-ea10-447d-ad29-6429f8ef6631@github.com> References: <7YitGep10T35vf9lzitE2Oz3A9XwZywdDpgeiQoMXho=.7bb368d9-ea10-447d-ad29-6429f8ef6631@github.com> Message-ID: <4G16f8wLsBeOVAGQthPsdtG4UcdwykXyEFYy1w2BiFk=.b5d697f2-0455-4807-af8e-3de60b397f65@github.com> On Mon, 20 May 2024 16:59:25 GMT, William Kemper wrote: >> This is consistent with c1 and other platforms. > > William Kemper has updated the pull request incrementally with one additional commit since the last revision: > > Fix typo Total time here is defined as the sum of 10 measurement runs for each of the 20 working benchmarks in the dacapo suite. The total time was taken for `jdk:master` and this PR with `-XX:TieredStopAtLevel=1` to keep all benchmarks running in C1. jdk:master: Category | Count | Total | GeoMean | Average | Trim 0.1 | StdDev | Minimum | Maximum result | 200 | 2110857.000 | 7656.397 | 10554.285 | 9063.875 | 8894.464 | 1159.000 | 36836.000 This PR: Category | Count | Total | GeoMean | Average | Trim 0.1 | StdDev | Minimum | Maximum result | 200 | 2139852.000 | 7711.781 | 10699.260 | 9129.150 | 9166.330 | 1167.000 | 37336.000 This is a 1.4% increase in total time when running with just C1. The increase at trimmed mean 10% is just 0.7%. I'm okay with this. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19180#issuecomment-2127724163 From coleenp at openjdk.org Thu May 23 17:51:04 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 23 May 2024 17:51:04 GMT Subject: RFR: 8332745: Method::is_vanilla_constructor is never used In-Reply-To: References: Message-ID: On Thu, 23 May 2024 13:00:49 GMT, Dan Heidinga wrote: > Removed dead code related to identifying empty constructors. Missed when [JDK-8057777](https://bugs.openjdk.org/browse/JDK-8057777) cleaned up JVM_AllocateNewObject. > > Passes mach5 tier1. This looks really good. Thank you for finding this unused code. ------------- Marked as reviewed by coleenp (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19367#pullrequestreview-2074659158 From djelinski at openjdk.org Thu May 23 18:15:27 2024 From: djelinski at openjdk.org (Daniel =?UTF-8?B?SmVsacWEc2tp?=) Date: Thu, 23 May 2024 18:15:27 GMT Subject: RFR: 8332724: x86 MacroAssembler may over-align code [v2] In-Reply-To: References: Message-ID: <7WiNc6C0x-C4TM9PKE4FWFuG6bf-qOSz6Z0NrUWf-fw=.37e41761-f8d1-4daa-b9f8-551fa4dc4e6a@github.com> On Thu, 23 May 2024 09:16:22 GMT, Daniel Jeli?ski wrote: >> The methods align32 and align64 are supposed to align the next instruction to the next 32 or 64 byte boundary using the minimum number of NOP bytes. However, when the target represented as a 32bit signed int is negative, the instructions generate 32 or 64 NOP bytes too many. This was observed in `jbyte_disjoint_arraycopy_avx3` on a Linux machine, where a single align32 invocation generated 63 bytes of NOPs. >> >> This PR addresses the problem by using bit operations to calculate the required number of bytes. >> >> Tier1-3 tests passed. >> >> On a side note, `align64` and `align32` instructions were meant for aligning data for use with zmm / ymm loads, but nowadays they are frequently used in places where `align(CodeEntryAlignment)` or `align(OptoLoopAlignment)` would be more appropriate. I can address that in a separate PR if you think it's worth fixing. > > Daniel Jeli?ski has updated the pull request incrementally with two additional commits since the last revision: > > - Explicit typecasts > - Change to unsigned instead Thanks for pointing it out! Should be fixed now, will wait for GHA to confirm... ------------- PR Comment: https://git.openjdk.org/jdk/pull/19353#issuecomment-2127767000 From djelinski at openjdk.org Thu May 23 18:15:27 2024 From: djelinski at openjdk.org (Daniel =?UTF-8?B?SmVsacWEc2tp?=) Date: Thu, 23 May 2024 18:15:27 GMT Subject: RFR: 8332724: x86 MacroAssembler may over-align code [v3] In-Reply-To: References: Message-ID: > The methods align32 and align64 are supposed to align the next instruction to the next 32 or 64 byte boundary using the minimum number of NOP bytes. However, when the target represented as a 32bit signed int is negative, the instructions generate 32 or 64 NOP bytes too many. This was observed in `jbyte_disjoint_arraycopy_avx3` on a Linux machine, where a single align32 invocation generated 63 bytes of NOPs. > > This PR addresses the problem by using bit operations to calculate the required number of bytes. > > Tier1-3 tests passed. > > On a side note, `align64` and `align32` instructions were meant for aligning data for use with zmm / ymm loads, but nowadays they are frequently used in places where `align(CodeEntryAlignment)` or `align(OptoLoopAlignment)` would be more appropriate. I can address that in a separate PR if you think it's worth fixing. Daniel Jeli?ski has updated the pull request incrementally with one additional commit since the last revision: Fix 32-bit compilation ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19353/files - new: https://git.openjdk.org/jdk/pull/19353/files/d0220193..6a7021a3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19353&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19353&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/19353.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19353/head:pull/19353 PR: https://git.openjdk.org/jdk/pull/19353 From shade at openjdk.org Thu May 23 18:18:02 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 23 May 2024 18:18:02 GMT Subject: RFR: 8332082: Shenandoah: Use consistent tests to determine when pre-write barrier is active [v3] In-Reply-To: <7YitGep10T35vf9lzitE2Oz3A9XwZywdDpgeiQoMXho=.7bb368d9-ea10-447d-ad29-6429f8ef6631@github.com> References: <7YitGep10T35vf9lzitE2Oz3A9XwZywdDpgeiQoMXho=.7bb368d9-ea10-447d-ad29-6429f8ef6631@github.com> Message-ID: On Mon, 20 May 2024 16:59:25 GMT, William Kemper wrote: >> This is consistent with c1 and other platforms. > > William Kemper has updated the pull request incrementally with one additional commit since the last revision: > > Fix typo Agreed, this is acceptable. We could resolve it in future by expanding gc-state to several bytes. ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19180#pullrequestreview-2074711115 From djelinski at openjdk.org Thu May 23 19:12:12 2024 From: djelinski at openjdk.org (Daniel =?UTF-8?B?SmVsacWEc2tp?=) Date: Thu, 23 May 2024 19:12:12 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v33] In-Reply-To: References: Message-ID: On Thu, 23 May 2024 17:25:34 GMT, Scott Gibbons wrote: >> Re-write the IndexOf code without the use of the pcmpestri instruction, only using AVX2 instructions. This change accelerates String.IndexOf on average 1.3x for AVX2. The benchmark numbers: >> >> >> Benchmark Score Latest >> StringIndexOf.advancedWithMediumSub 343.573 317.934 0.925375393x >> StringIndexOf.advancedWithShortSub1 1039.081 1053.96 1.014319384x >> StringIndexOf.advancedWithShortSub2 55.828 110.541 1.980027943x >> StringIndexOf.constantPattern 9.361 11.906 1.271872663x >> StringIndexOf.searchCharLongSuccess 4.216 4.218 1.000474383x >> StringIndexOf.searchCharMediumSuccess 3.133 3.216 1.02649218x >> StringIndexOf.searchCharShortSuccess 3.76 3.761 1.000265957x >> StringIndexOf.success 9.186 9.713 1.057369911x >> StringIndexOf.successBig 14.341 46.343 3.231504079x >> StringIndexOfChar.latin1_AVX2_String 6220.918 12154.52 1.953814533x >> StringIndexOfChar.latin1_AVX2_char 5503.556 5540.044 1.006629895x >> StringIndexOfChar.latin1_SSE4_String 6978.854 6818.689 0.977049957x >> StringIndexOfChar.latin1_SSE4_char 5657.499 5474.624 0.967675646x >> StringIndexOfChar.latin1_Short_String 7132.541 6863.359 0.962260014x >> StringIndexOfChar.latin1_Short_char 16013.389 16162.437 1.009307711x >> StringIndexOfChar.latin1_mixed_String 7386.123 14771.622 1.999915517x >> StringIndexOfChar.latin1_mixed_char 9901.671 9782.245 0.987938803 > > Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: > > Fix for IndexOf.java on mac src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 268: > 266: __ cmpq(needle_len_p, 0); > 267: __ jg_b(L_nextCheck); > 268: __ xorq(rax, rax); out of curiosity, is there any advantage to using `xorq` instead of `xorl` here? https://stackoverflow.com/a/33668295/7707617 suggests that `xorl` might be better, but it's a bit dated now. src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 449: > 447: __ cmpq(r13, NUMBER_OF_CASES - 1); > 448: __ ja(L_smallCaseDefault); > 449: __ mov64(r15, (int64_t)small_jump_table); would it make sense to use `lea` here? src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 803: > 801: __ movq(index, needle_len); > 802: __ andq(index, 0xf); // nLen % 16 > 803: __ movq(offset, 0x10); `movl` or `movptr` would produce a shorter encoding src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1544: > 1542: } > 1543: > 1544: __ align(8); why `8` and not `OptoLoopAlignment` ? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1612178285 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1612179069 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1612180163 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1612183311 From iwalulya at openjdk.org Thu May 23 19:15:05 2024 From: iwalulya at openjdk.org (Ivan Walulya) Date: Thu, 23 May 2024 19:15:05 GMT Subject: RFR: 8332139: SymbolTableHash::Node allocations allocates twice the required memory In-Reply-To: <8Q1-f5OGC6_vqM0W-k370VibVVLs7M8Dsyyele4FWT8=.53e09e58-0b6d-437c-85e4-ca89de97c123@github.com> References: <8Q1-f5OGC6_vqM0W-k370VibVVLs7M8Dsyyele4FWT8=.53e09e58-0b6d-437c-85e4-ca89de97c123@github.com> Message-ID: <8Yk0aySE92_5tWrNKL374BOTWOmYby2vvfSOUd6MZM8=.411b0ba8-5385-46ca-b433-1d5aadd1e64f@github.com> On Mon, 13 May 2024 12:30:38 GMT, Axel Boldt-Christmas wrote: > The symbols are inline and allocated together with the ConcurrentHashTable (CHT) Nodes. The calculation used for the required size is `alloc_size = size + value.byte_size() + value.effective_length();` > > Where > * `size == sizeof(SymbolTableHash::Node) == sizeof(void*) + sizeof(Symbol)` > * `value.byte_size() == dynamic_sizeof(Symbol) == sizeof(Symbol) + ` > * `value.effective_length() == dynamic_sizeof(Symbol) - sizeof(Symbol) == ` > > So `alloc_size` ends up being `sizeof(void*) /* node metadata */ + 2 * dynamic_sizeof(Symbol)` > > Because using the CHT with dynamically sized (and inlined) types requires knowing about its implementation details I chose to make the functionality for calculating the the allocation size a property of the CHT. It now queries the CHT for the node allocation size given the dynamic size required for the VALUE. > > The only current (implicit) restriction regarding using dynamically sized (and inlined) types in CHT is that the _value field C++ object ends where the Node object ends, so there is not padding bytes where the dynamic payload is allocated. (effectively `sizeof(VALUE) % alignof(Node) == 0` as long as there are no non-standard alignment fields in the Node metadata). I chose to test this as a runtime assert that the _value ends where the Node object ends, instead of a static assert with the alignment as it seemed to more explicitly show the intent of the check. > > Running testing tier1-7 LGTM! ------------- Marked as reviewed by iwalulya (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19214#pullrequestreview-2074807641 From sgibbons at openjdk.org Thu May 23 19:49:11 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Thu, 23 May 2024 19:49:11 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v33] In-Reply-To: References: Message-ID: On Thu, 23 May 2024 19:02:05 GMT, Daniel Jeli?ski wrote: >> Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix for IndexOf.java on mac > > src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 268: > >> 266: __ cmpq(needle_len_p, 0); >> 267: __ jg_b(L_nextCheck); >> 268: __ xorq(rax, rax); > > out of curiosity, is there any advantage to using `xorq` instead of `xorl` here? > > https://stackoverflow.com/a/33668295/7707617 suggests that `xorl` might be better, but it's a bit dated now. Thanks for finding this. It was ignorance on my part as I thought the xorq would have logic to not emit the REX prefix if not necessary, but it doesn't. Fixed. > src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 449: > >> 447: __ cmpq(r13, NUMBER_OF_CASES - 1); >> 448: __ ja(L_smallCaseDefault); >> 449: __ mov64(r15, (int64_t)small_jump_table); > > would it make sense to use `lea` here? It may, but I believe the movq is shorter (although maybe not to r15). I'll do some experimentation. > src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 803: > >> 801: __ movq(index, needle_len); >> 802: __ andq(index, 0xf); // nLen % 16 >> 803: __ movq(offset, 0x10); > > `movl` or `movptr` would produce a shorter encoding I tried to be consistent with the whole {q,l} syntax throughout when referring to each symbolic register. I feel that changing this would ripple through the code. @sviswa7 what do you think? > src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1544: > >> 1542: } >> 1543: >> 1544: __ align(8); > > why `8` and not `OptoLoopAlignment` ? Short answer - because I didn't know there was such a thing as `OptoLoopAlignment`. I'll change that throughout at the top of my loops. Thanks. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1612201503 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1612207461 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1612216483 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1612218363 From sgibbons at openjdk.org Thu May 23 19:54:39 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Thu, 23 May 2024 19:54:39 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v34] In-Reply-To: References: Message-ID: <13CORNysYmupJ3F2_7ekNqob8pz_xNmTg8gyKIt5vgs=.572e9f52-62ea-44cd-bac4-ab99a09a7510@github.com> > Re-write the IndexOf code without the use of the pcmpestri instruction, only using AVX2 instructions. This change accelerates String.IndexOf on average 1.3x for AVX2. The benchmark numbers: > > > Benchmark Score Latest > StringIndexOf.advancedWithMediumSub 343.573 317.934 0.925375393x > StringIndexOf.advancedWithShortSub1 1039.081 1053.96 1.014319384x > StringIndexOf.advancedWithShortSub2 55.828 110.541 1.980027943x > StringIndexOf.constantPattern 9.361 11.906 1.271872663x > StringIndexOf.searchCharLongSuccess 4.216 4.218 1.000474383x > StringIndexOf.searchCharMediumSuccess 3.133 3.216 1.02649218x > StringIndexOf.searchCharShortSuccess 3.76 3.761 1.000265957x > StringIndexOf.success 9.186 9.713 1.057369911x > StringIndexOf.successBig 14.341 46.343 3.231504079x > StringIndexOfChar.latin1_AVX2_String 6220.918 12154.52 1.953814533x > StringIndexOfChar.latin1_AVX2_char 5503.556 5540.044 1.006629895x > StringIndexOfChar.latin1_SSE4_String 6978.854 6818.689 0.977049957x > StringIndexOfChar.latin1_SSE4_char 5657.499 5474.624 0.967675646x > StringIndexOfChar.latin1_Short_String 7132.541 6863.359 0.962260014x > StringIndexOfChar.latin1_Short_char 16013.389 16162.437 1.009307711x > StringIndexOfChar.latin1_mixed_String 7386.123 14771.622 1.999915517x > StringIndexOfChar.latin1_mixed_char 9901.671 9782.245 0.987938803 Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: Addressing review comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16753/files - new: https://git.openjdk.org/jdk/pull/16753/files/cba6ffbe..2283f2bf Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16753&range=33 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16753&range=32-33 Stats: 7 lines in 2 files changed: 0 ins; 0 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/16753.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16753/head:pull/16753 PR: https://git.openjdk.org/jdk/pull/16753 From kvn at openjdk.org Thu May 23 20:09:01 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 23 May 2024 20:09:01 GMT Subject: RFR: 8332724: x86 MacroAssembler may over-align code [v3] In-Reply-To: References: Message-ID: On Thu, 23 May 2024 18:15:27 GMT, Daniel Jeli?ski wrote: >> The methods align32 and align64 are supposed to align the next instruction to the next 32 or 64 byte boundary using the minimum number of NOP bytes. However, when the target represented as a 32bit signed int is negative, the instructions generate 32 or 64 NOP bytes too many. This was observed in `jbyte_disjoint_arraycopy_avx3` on a Linux machine, where a single align32 invocation generated 63 bytes of NOPs. >> >> This PR addresses the problem by using bit operations to calculate the required number of bytes. >> >> Tier1-3 tests passed. >> >> On a side note, `align64` and `align32` instructions were meant for aligning data for use with zmm / ymm loads, but nowadays they are frequently used in places where `align(CodeEntryAlignment)` or `align(OptoLoopAlignment)` would be more appropriate. I can address that in a separate PR if you think it's worth fixing. > > Daniel Jeli?ski has updated the pull request incrementally with one additional commit since the last revision: > > Fix 32-bit compilation Good. We need to revisit types of flags which hold small values. `intx` and `uintx` are 64-bits values in 64-bits VM. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19353#pullrequestreview-2074899851 From matsaave at openjdk.org Thu May 23 20:53:03 2024 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Thu, 23 May 2024 20:53:03 GMT Subject: RFR: 8293980: Resolve CONSTANT_FieldRef at CDS dump time [v2] In-Reply-To: <7Kk3VF3qMR0IdptWLG1GGiWLbDm1BfCP2zBh7s6n3WE=.f245c5a2-cc27-4331-a401-1eaea41262ed@github.com> References: <6tYcoQdH8mEhbBRUoAcOi3Gue7Lz9qIjJh3GpcLKGGE=.0b286773-cc03-434e-88ca-2c9cc5efce67@github.com> <7Kk3VF3qMR0IdptWLG1GGiWLbDm1BfCP2zBh7s6n3WE=.f245c5a2-cc27-4331-a401-1eaea41262ed@github.com> Message-ID: <6vxtp58v6Nz74xdb5BbmEjDqvk5IDeRlUjJ6sDNFSC0=.2d8868a2-30f6-4e7e-a0cc-8a4b47998508@github.com> On Thu, 23 May 2024 03:35:19 GMT, Ioi Lam wrote: >> ### Overview >> >> This PR archives `CONSTANT_FieldRef` entries in the _resolved_ state when it's safe to do so. >> >> I.e., when a `CONSTANT_FieldRef` constant pool entry in class `A` refers to a *non-static* field `B.F`, >> - `B` is the same class as `A`; or >> - `B` is a supertype of `A`; or >> - `B` is one of the [vmClasses](https://github.com/openjdk/jdk/blob/3d4185a9ce482cc655a4c67f39cb2682b02ae4fe/src/hotspot/share/classfile/vmClasses.hpp), and `A` is loaded by the boot class loader. >> >> Under these conditions, it's guaranteed that whenever `A` tries to use this entry at runtime, `B` is guaranteed to have already been resolved in A's system dictionary, to the same value as resolved during dump time. >> >> Therefore, we can safely archive the `ResolvedFieldEntry` in class `A` that refers to `B.F`. >> >> (Note that we do not archive the `CONSTANT_FieldRef` entries for static fields, as the resolution of such entries can lead to class initialization at runtime. We plan to handle them in a future RFE.) >> >> ### Static CDS Archive >> >> This feature is implemented in three steps for static CDS archive dump: >> >> 1. At the end of the training run, `ClassListWriter` iterates over all loaded classes and writes the indices of their resolved `Class` and `FieldRef` constant pool entries into the classlist file, with the `@cp` prefix. E.g., the following means that the constant pool entries at indices 2, 19 and 106 were resolved during the training run: >> >> @cp java/util/Objects 2 19 106 >> >> 2. When creating the static CDS archive from the classlist file, `ClassListParser` processes the `@cp` entries and resolves all the indicated entries. >> >> 3. Inside the `ArchiveBuilder::make_klasses_shareable()` function, we iterate over all entries in all archived `ConstantPools`. When we see a _resolved_ entry that does not satisfy the safety requirements as stated in _Overview_, we revert it back to the unresolved state. >> >> ### Dynamic CDS Archive >> >> When dumping the dynamic CDS archive, `ClassListWriter` and `ClassListParser` are not used, so steps 1 and 2 are skipped. We only perform step 3 when the archive is being written. >> >> ### Limitations >> >> - For safety, we limit this optimization to only classes loaded by the boot, platform, and app class loaders. This may be relaxed in the future. >> - We archive only the constant pool entries that are actually resolved during the training run. We don't speculatively resolve other entries, as doing so may cause C2 to... > > Ioi Lam has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: > > - Merge branch 'master' into 8293980-resolve-fields-at-dumptime > - 8293980: Resolve CONSTANT_FieldRef at CDS dump time After making a quick first pass over this, I have some comments about the constant pool and cpcache code. src/hotspot/share/oops/constantPool.cpp line 301: > 299: objArrayOop rr = resolved_references(); > 300: if (rr != nullptr) { > 301: ConstantPool* orig_pool = ArchiveBuilder::current()->get_source_addr(this); Are the changes below necessary? I think the original was fine but I may be missing the point of this change. src/hotspot/share/oops/constantPool.cpp line 464: > 462: if (cache() != nullptr) { > 463: // cache() is null if this class is not yet linked. > 464: remove_resolved_field_entries_if_non_deterministic(); These methods look like they can belong to the constant pool cache instead. Can cpCache call the ClassLinker code instead so this can be part of `cache()->remove_unshareable_info()`? src/hotspot/share/oops/constantPool.cpp line 520: > 518: int cp_index = rfi->constant_pool_index(); > 519: bool archived = false; > 520: bool resolved = rfi->is_resolved(Bytecodes::_putfield) || Is one of these meant to be `is_resolved(Bytecodes::get_field)` ? src/hotspot/share/oops/resolvedFieldEntry.hpp line 65: > 63: _tos_state = other._tos_state; > 64: _flags = other._flags; > 65: _get_code = other._get_code; The fields `_get_code` and `_put_code` are normally set atomically, does this need to be the case when copying as well? ------------- PR Review: https://git.openjdk.org/jdk/pull/19355#pullrequestreview-2074929387 PR Review Comment: https://git.openjdk.org/jdk/pull/19355#discussion_r1612265561 PR Review Comment: https://git.openjdk.org/jdk/pull/19355#discussion_r1612288001 PR Review Comment: https://git.openjdk.org/jdk/pull/19355#discussion_r1612277435 PR Review Comment: https://git.openjdk.org/jdk/pull/19355#discussion_r1612261360 From cslucas at openjdk.org Thu May 23 21:55:08 2024 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Thu, 23 May 2024 21:55:08 GMT Subject: RFR: JDK-8325841 - Remove unused references to newInstance0 Message-ID: Can I please get some reviews for this change to remove unused names from `vmSymbols.hpp`? As far as I can tell there is nothing in the code base using these symbols. My search was just a simple grep + some bash script, though. I tested using JTREG on MacOS, Linux Mariner & Alpine from tier1 to 3. ------------- Commit messages: - Remove unused symbol names Changes: https://git.openjdk.org/jdk/pull/19374/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19374&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8325841 Stats: 57 lines in 2 files changed: 0 ins; 56 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/19374.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19374/head:pull/19374 PR: https://git.openjdk.org/jdk/pull/19374 From iklam at openjdk.org Thu May 23 22:11:05 2024 From: iklam at openjdk.org (Ioi Lam) Date: Thu, 23 May 2024 22:11:05 GMT Subject: RFR: 8293980: Resolve CONSTANT_FieldRef at CDS dump time [v2] In-Reply-To: <6vxtp58v6Nz74xdb5BbmEjDqvk5IDeRlUjJ6sDNFSC0=.2d8868a2-30f6-4e7e-a0cc-8a4b47998508@github.com> References: <6tYcoQdH8mEhbBRUoAcOi3Gue7Lz9qIjJh3GpcLKGGE=.0b286773-cc03-434e-88ca-2c9cc5efce67@github.com> <7Kk3VF3qMR0IdptWLG1GGiWLbDm1BfCP2zBh7s6n3WE=.f245c5a2-cc27-4331-a401-1eaea41262ed@github.com> <6vxtp58v6Nz74xdb5BbmEjDqvk5IDeRlUjJ6sDNFSC0=.2d8868a2-30f6-4e7e-a0cc-8a4b47998508@github.com> Message-ID: On Thu, 23 May 2024 20:28:49 GMT, Matias Saavedra Silva wrote: >> Ioi Lam has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: >> >> - Merge branch 'master' into 8293980-resolve-fields-at-dumptime >> - 8293980: Resolve CONSTANT_FieldRef at CDS dump time > > src/hotspot/share/oops/constantPool.cpp line 301: > >> 299: objArrayOop rr = resolved_references(); >> 300: if (rr != nullptr) { >> 301: ConstantPool* orig_pool = ArchiveBuilder::current()->get_source_addr(this); > > Are the changes below necessary? I think the original was fine but I may be missing the point of this change. It's just for consistency. "source" is the terminology used in the comments in archiveBuilder.cpp. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19355#discussion_r1612386810 From kvn at openjdk.org Thu May 23 22:11:12 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 23 May 2024 22:11:12 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v34] In-Reply-To: <13CORNysYmupJ3F2_7ekNqob8pz_xNmTg8gyKIt5vgs=.572e9f52-62ea-44cd-bac4-ab99a09a7510@github.com> References: <13CORNysYmupJ3F2_7ekNqob8pz_xNmTg8gyKIt5vgs=.572e9f52-62ea-44cd-bac4-ab99a09a7510@github.com> Message-ID: On Thu, 23 May 2024 19:54:39 GMT, Scott Gibbons wrote: >> Re-write the IndexOf code without the use of the pcmpestri instruction, only using AVX2 instructions. This change accelerates String.IndexOf on average 1.3x for AVX2. The benchmark numbers: >> >> >> Benchmark Score Latest >> StringIndexOf.advancedWithMediumSub 343.573 317.934 0.925375393x >> StringIndexOf.advancedWithShortSub1 1039.081 1053.96 1.014319384x >> StringIndexOf.advancedWithShortSub2 55.828 110.541 1.980027943x >> StringIndexOf.constantPattern 9.361 11.906 1.271872663x >> StringIndexOf.searchCharLongSuccess 4.216 4.218 1.000474383x >> StringIndexOf.searchCharMediumSuccess 3.133 3.216 1.02649218x >> StringIndexOf.searchCharShortSuccess 3.76 3.761 1.000265957x >> StringIndexOf.success 9.186 9.713 1.057369911x >> StringIndexOf.successBig 14.341 46.343 3.231504079x >> StringIndexOfChar.latin1_AVX2_String 6220.918 12154.52 1.953814533x >> StringIndexOfChar.latin1_AVX2_char 5503.556 5540.044 1.006629895x >> StringIndexOfChar.latin1_SSE4_String 6978.854 6818.689 0.977049957x >> StringIndexOfChar.latin1_SSE4_char 5657.499 5474.624 0.967675646x >> StringIndexOfChar.latin1_Short_String 7132.541 6863.359 0.962260014x >> StringIndexOfChar.latin1_Short_char 16013.389 16162.437 1.009307711x >> StringIndexOfChar.latin1_mixed_String 7386.123 14771.622 1.999915517x >> StringIndexOfChar.latin1_mixed_char 9901.671 9782.245 0.987938803 > > Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: > > Addressing review comments Few suggestions src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 4250: > 4248: generate_chacha_stubs(); > 4249: > 4250: if ((UseAVX == 2) && EnableX86ECoreOpts && VM_Version::supports_avx2()) { `#ifdef COMPILER2` around this code to exclude JVMCI only case. src/hotspot/cpu/x86/stubGenerator_x86_64.hpp line 582: > 580: > 581: #ifdef COMPILER2 > 582: void generate_string_indexof_stubs(address *fnptrs, StrIntrinsicNode::ArgEncoding ae); Is it possible to make `generate_string_indexof_stubs()` as local static method in `stubGenerator_x86_64_string.cpp` and pass `StubGenerator*` as argument? Then you don't to include "opto/intrinsicnode.hpp" here. src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 2: > 1: /* > 2: * Copyright (c) 2023, Intel Corporation. All rights reserved. 2024 year src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 27: > 25: > 26: #include "precompiled.hpp" > 27: #ifdef COMPILER2 You can exclude this file completely from compilation without this `#ifdef` if you prefix the name with `c2_`. There is code in make files to exclude such files: [JvmFeatures.gmk#L38](https://github.com/openjdk/jdk/blob/master/make/hotspot/lib/JvmFeatures.gmk#L38) ------------- PR Review: https://git.openjdk.org/jdk/pull/16753#pullrequestreview-2075150606 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1612352891 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1612383969 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1612365050 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1612375730 From kvn at openjdk.org Thu May 23 22:11:13 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 23 May 2024 22:11:13 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v34] In-Reply-To: References: <13CORNysYmupJ3F2_7ekNqob8pz_xNmTg8gyKIt5vgs=.572e9f52-62ea-44cd-bac4-ab99a09a7510@github.com> Message-ID: On Thu, 23 May 2024 21:50:15 GMT, Vladimir Kozlov wrote: >> Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: >> >> Addressing review comments > > src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 4250: > >> 4248: generate_chacha_stubs(); >> 4249: >> 4250: if ((UseAVX == 2) && EnableX86ECoreOpts && VM_Version::supports_avx2()) { > > `#ifdef COMPILER2` around this code to exclude JVMCI only case. You don't need to check `VM_Version::supports_avx2()` because we reset `UseAVX` if avx2 is not supported. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1612361847 From wkemper at openjdk.org Thu May 23 22:36:07 2024 From: wkemper at openjdk.org (William Kemper) Date: Thu, 23 May 2024 22:36:07 GMT Subject: Integrated: 8332082: Shenandoah: Use consistent tests to determine when pre-write barrier is active In-Reply-To: References: Message-ID: On Fri, 10 May 2024 16:13:51 GMT, William Kemper wrote: > This is consistent with c1 and other platforms. This pull request has now been integrated. Changeset: ddd73b45 Author: William Kemper URL: https://git.openjdk.org/jdk/commit/ddd73b458355bffeaa8e0e5017c27d6c6af2dc94 Stats: 57 lines in 6 files changed: 17 ins; 22 del; 18 mod 8332082: Shenandoah: Use consistent tests to determine when pre-write barrier is active Reviewed-by: kdnilsen, shade ------------- PR: https://git.openjdk.org/jdk/pull/19180 From dholmes at openjdk.org Thu May 23 22:36:11 2024 From: dholmes at openjdk.org (David Holmes) Date: Thu, 23 May 2024 22:36:11 GMT Subject: RFR: 8330198: Add some class loading related perf counters to measure VM startup [v3] In-Reply-To: References: Message-ID: On Mon, 13 May 2024 23:02:27 GMT, Calvin Cheung wrote: >> Adding a few perf counters related to class loading to measure VM startup. The counters are only active if the user specifies `-Xlog:init` in the command line. A diagnostic flag `ProfileClassLinkage` is added to control the new counters. The flag is set to false by default and will be enabled if `-Xlog:init` is specified. >> >> This change is already in the leyden/premain branch. There are more counters in the branch to measure other stuff. For now, just upstreaming class loader related counters. >> >> Refer to the [comment](https://bugs.openjdk.org/browse/JDK-8330198?focusedId=14665311&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14665311) in the bug report for an example output. >> >> Passed tiers 1 - 4 testing. > > Calvin Cheung has updated the pull request incrementally with one additional commit since the last revision: > > comments from Ioi Okay my first reaction here is "I object!". I get that Leyden wants to be able to easily compare startup costs between itself and mainline, but what is this costing mainline? Even if these counters are not active there is an impact on the code execution and I want to know that impact is negligible. The initialization logic seems a little off to me too - see comments below. src/hotspot/share/classfile/classLoader.cpp line 1477: > 1475: > 1476: jlong ClassLoader::class_init_count() { > 1477: return (UsePerfData) ? _perf_classes_inited->get_value() : -1; No need to add brackets here src/hotspot/share/classfile/classLoader.cpp line 1481: > 1479: > 1480: jlong ClassLoader::class_init_time_ms() { > 1481: return (UsePerfData) ? Or here src/hotspot/share/oops/instanceKlass.cpp line 1219: > 1217: } else { > 1218: // The elapsed time is so small it's not worth counting. > 1219: if (UsePerfData || ProfileClassLinkage) { You have to have UsePerfData being true for this work so you don't need the change. src/hotspot/share/runtime/arguments.cpp line 3759: > 3757: if (log_is_enabled(Info, init)) { > 3758: FLAG_SET_ERGO_IF_DEFAULT(ProfileClassLinkage, true); > 3759: } What if ProfileClassLinkage is set true on the command-line without -Xlog:init? That doesn't seem to make sense to me. So I'm not clear why it is a settable diagnostic flag. src/hotspot/share/runtime/perfData.hpp line 834: > 832: public: > 833: inline PerfTraceTime(PerfLongCounter* timerp) : _timerp(timerp) { > 834: if (!UsePerfData || timerp == nullptr) return; Okay so this is needed because the existence of some counters is gated on the ProfileClassLinkage flag. Style nit: use a { } block please. src/hotspot/share/runtime/perfData.hpp line 838: > 836: } > 837: > 838: const char* name() const { return _timerp->name(); } Do you need a null check here? ------------- PR Review: https://git.openjdk.org/jdk/pull/18790#pullrequestreview-2075233160 PR Review Comment: https://git.openjdk.org/jdk/pull/18790#discussion_r1612397674 PR Review Comment: https://git.openjdk.org/jdk/pull/18790#discussion_r1612398041 PR Review Comment: https://git.openjdk.org/jdk/pull/18790#discussion_r1612401224 PR Review Comment: https://git.openjdk.org/jdk/pull/18790#discussion_r1612410143 PR Review Comment: https://git.openjdk.org/jdk/pull/18790#discussion_r1612406745 PR Review Comment: https://git.openjdk.org/jdk/pull/18790#discussion_r1612407157 From dholmes at openjdk.org Thu May 23 22:36:11 2024 From: dholmes at openjdk.org (David Holmes) Date: Thu, 23 May 2024 22:36:11 GMT Subject: RFR: 8330198: Add some class loading related perf counters to measure VM startup [v3] In-Reply-To: References: Message-ID: <2TAhYdUQ5KXWODYMvzb15NqKhkXfFjV7RW9oHeVIg0U=.73940200-8be6-4427-9348-44d50fd22286@github.com> On Thu, 23 May 2024 22:17:43 GMT, David Holmes wrote: >> Calvin Cheung has updated the pull request incrementally with one additional commit since the last revision: >> >> comments from Ioi > > src/hotspot/share/classfile/classLoader.cpp line 1477: > >> 1475: >> 1476: jlong ClassLoader::class_init_count() { >> 1477: return (UsePerfData) ? _perf_classes_inited->get_value() : -1; > > No need to add brackets here Surely this needs to be checking `ProfileClassLinkage`, which in turn should be false if `UsePerfData` is false. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18790#discussion_r1612399712 From kvn at openjdk.org Thu May 23 22:38:01 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 23 May 2024 22:38:01 GMT Subject: RFR: JDK-8325841 - Remove unused references to newInstance0 In-Reply-To: References: Message-ID: On Thu, 23 May 2024 21:51:32 GMT, Cesar Soares Lucas wrote: > Can I please get some reviews for this change to remove unused names from `vmSymbols.hpp`? > > As far as I can tell there is nothing in the code base using these symbols. My search was just a simple grep + some bash script, though. I tested using JTREG on MacOS, Linux Mariner & Alpine from tier1 to 3. Subject and description in JBS have to be updated since you removed not only newInstance0. Why GHA testing is not triggered? ------------- Changes requested by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19374#pullrequestreview-2075248794 PR Comment: https://git.openjdk.org/jdk/pull/19374#issuecomment-2128147194 From sgibbons at openjdk.org Thu May 23 23:00:10 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Thu, 23 May 2024 23:00:10 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v34] In-Reply-To: References: <13CORNysYmupJ3F2_7ekNqob8pz_xNmTg8gyKIt5vgs=.572e9f52-62ea-44cd-bac4-ab99a09a7510@github.com> Message-ID: <5L1PFeLmHP6Lfg1bKx_tRU-ESTFfpqUbP9vHVbiaqPo=.c3fa3b1b-5433-4a68-b639-ef82b4a388d1@github.com> On Thu, 23 May 2024 21:56:39 GMT, Vladimir Kozlov wrote: >> src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 4250: >> >>> 4248: generate_chacha_stubs(); >>> 4249: >>> 4250: if ((UseAVX == 2) && EnableX86ECoreOpts && VM_Version::supports_avx2()) { >> >> `#ifdef COMPILER2` around this code to exclude JVMCI only case. > > You don't need to check `VM_Version::supports_avx2()` because we reset `UseAVX` if avx2 is not supported. Done. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1612396114 From sgibbons at openjdk.org Thu May 23 23:00:12 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Thu, 23 May 2024 23:00:12 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v34] In-Reply-To: References: <13CORNysYmupJ3F2_7ekNqob8pz_xNmTg8gyKIt5vgs=.572e9f52-62ea-44cd-bac4-ab99a09a7510@github.com> Message-ID: On Thu, 23 May 2024 22:06:38 GMT, Vladimir Kozlov wrote: >> Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: >> >> Addressing review comments > > src/hotspot/cpu/x86/stubGenerator_x86_64.hpp line 582: > >> 580: >> 581: #ifdef COMPILER2 >> 582: void generate_string_indexof_stubs(address *fnptrs, StrIntrinsicNode::ArgEncoding ae); > > Is it possible to make `generate_string_indexof_stubs()` as local static method in `stubGenerator_x86_64_string.cpp` and pass `StubGenerator*` as argument? > Then you don't to include "opto/intrinsicnode.hpp" here. Done. > src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 2: > >> 1: /* >> 2: * Copyright (c) 2023, Intel Corporation. All rights reserved. > > 2024 year Fixed. > src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 27: > >> 25: >> 26: #include "precompiled.hpp" >> 27: #ifdef COMPILER2 > > You can exclude this file completely from compilation without this `#ifdef` if you prefix the name with `c2_`. > There is code in make files to exclude such files: [JvmFeatures.gmk#L38](https://github.com/openjdk/jdk/blob/master/make/hotspot/lib/JvmFeatures.gmk#L38) I will change the name and remove the #ifdef. Thanks for this. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1612401461 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1612399243 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1612400071 From cslucas at openjdk.org Thu May 23 23:08:03 2024 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Thu, 23 May 2024 23:08:03 GMT Subject: RFR: JDK-8325841 - Remove unused references to newInstance0 In-Reply-To: References: Message-ID: On Thu, 23 May 2024 22:35:42 GMT, Vladimir Kozlov wrote: > Why GHA testing is not triggered? I don't know. Maybe it has something to do with the Microsoft fork? It's the only thing I did different in this PR. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19374#issuecomment-2128171362 From cslucas at openjdk.org Thu May 23 23:11:07 2024 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Thu, 23 May 2024 23:11:07 GMT Subject: RFR: JDK-8325841 - Remove unused references to vmSymbols.hpp In-Reply-To: References: Message-ID: On Thu, 23 May 2024 22:35:42 GMT, Vladimir Kozlov wrote: >> Can I please get some reviews for this change to remove unused names from `vmSymbols.hpp`? >> >> As far as I can tell there is nothing in the code base using these symbols. My search was just a simple grep + some bash script, though. I tested using JTREG on MacOS, Linux Mariner & Alpine from tier1 to 3. > > Why GHA testing is not triggered? @vnkozlov - I'm investigating this GHA issue on my side. Thank you for pointing that out. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19374#issuecomment-2128174164 From sgibbons at openjdk.org Thu May 23 23:12:42 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Thu, 23 May 2024 23:12:42 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v35] In-Reply-To: References: Message-ID: <-vyOZzeMslZqgJpTsQnnOWi4abWiM8fNeWSVx5LEHm8=.d37011ee-102c-4874-aa26-d113949d25ea@github.com> > Re-write the IndexOf code without the use of the pcmpestri instruction, only using AVX2 instructions. This change accelerates String.IndexOf on average 1.3x for AVX2. The benchmark numbers: > > > Benchmark Score Latest > StringIndexOf.advancedWithMediumSub 343.573 317.934 0.925375393x > StringIndexOf.advancedWithShortSub1 1039.081 1053.96 1.014319384x > StringIndexOf.advancedWithShortSub2 55.828 110.541 1.980027943x > StringIndexOf.constantPattern 9.361 11.906 1.271872663x > StringIndexOf.searchCharLongSuccess 4.216 4.218 1.000474383x > StringIndexOf.searchCharMediumSuccess 3.133 3.216 1.02649218x > StringIndexOf.searchCharShortSuccess 3.76 3.761 1.000265957x > StringIndexOf.success 9.186 9.713 1.057369911x > StringIndexOf.successBig 14.341 46.343 3.231504079x > StringIndexOfChar.latin1_AVX2_String 6220.918 12154.52 1.953814533x > StringIndexOfChar.latin1_AVX2_char 5503.556 5540.044 1.006629895x > StringIndexOfChar.latin1_SSE4_String 6978.854 6818.689 0.977049957x > StringIndexOfChar.latin1_SSE4_char 5657.499 5474.624 0.967675646x > StringIndexOfChar.latin1_Short_String 7132.541 6863.359 0.962260014x > StringIndexOfChar.latin1_Short_char 16013.389 16162.437 1.009307711x > StringIndexOfChar.latin1_mixed_String 7386.123 14771.622 1.999915517x > StringIndexOfChar.latin1_mixed_char 9901.671 9782.245 0.987938803 Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: Review comments - move stubGen*_string.cpp to c2_stubGen*_string.cpp ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16753/files - new: https://git.openjdk.org/jdk/pull/16753/files/2283f2bf..c034d3f9 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16753&range=34 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16753&range=33-34 Stats: 73 lines in 3 files changed: 6 ins; 59 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/16753.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16753/head:pull/16753 PR: https://git.openjdk.org/jdk/pull/16753 From kvn at openjdk.org Thu May 23 23:22:05 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 23 May 2024 23:22:05 GMT Subject: RFR: JDK-8325841 - Remove unused references to vmSymbols.hpp In-Reply-To: References: Message-ID: On Thu, 23 May 2024 21:51:32 GMT, Cesar Soares Lucas wrote: > Can I please get some reviews for this change to remove unused names from `vmSymbols.hpp`? > > As far as I can tell there is nothing in the code base using these symbols. My search was just a simple grep + some bash script, though. I tested using JTREG on MacOS, Linux Mariner & Alpine from tier1 to 3. I submitted our testing. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19374#issuecomment-2128183293 From liach at openjdk.org Thu May 23 23:27:01 2024 From: liach at openjdk.org (Chen Liang) Date: Thu, 23 May 2024 23:27:01 GMT Subject: RFR: 8242888: Convert dynamic proxy to hidden classes In-Reply-To: References: Message-ID: On Thu, 23 May 2024 03:28:30 GMT, Chen Liang wrote: > Please review this change that convert dynamic proxies implementations to hidden classes, intended to target JDK 24. > > Summary: > 1. Adds new implementation while preserving the old implementation behind `-Djdk.reflect.useLegacyProxyImpl=true` in case there are compatibility issues. > 2. ClassLoader.defineClass0 takes a ClassLoader instance but discards it in native code; I updated native code to reuse that ClassLoader for Proxy support. > 3. ProxyGenerator changes mainly involve using Class data to pass Method list (accessed in a single condy) and removal of obsolete setup code generation. > > Testing: tier1 and tier2 have no related failures. > > Comment: Since #8278, Proxy has been converted to ClassFile API, and infrastructure has changed; now, the migration to hidden classes is much cleaner and has less impact, such as preserving ProtectionDomain and dynamic module without "anchor classes", and avoiding java.lang.invoke package. Hmm, actually, looking at the specs of the method again, does it imply that Proxy classes are never unloaded once defined in a ClassLoader, as seen in `Proxy::getProxyClass`: > If a proxy class for the same permutation of interfaces has already been defined by the class loader, then the existing proxy class will be returned If that's the case, Remi's suggestion on passing classdata to a non-hidden class might be better, and it seems to accomplish that in hotspot isn't too hard too. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19356#issuecomment-2128186405 From cslucas at openjdk.org Thu May 23 23:34:00 2024 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Thu, 23 May 2024 23:34:00 GMT Subject: RFR: JDK-8325841 - Remove unused references to vmSymbols.hpp In-Reply-To: References: Message-ID: On Thu, 23 May 2024 21:51:32 GMT, Cesar Soares Lucas wrote: > Can I please get some reviews for this change to remove unused names from `vmSymbols.hpp`? > > As far as I can tell there is nothing in the code base using these symbols. My search was just a simple grep + some bash script, though. I tested using JTREG on MacOS, Linux Mariner & Alpine from tier1 to 3. GHA is fixed now. It was a security/configuration issue in our repo. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19374#issuecomment-2128191463 From kvn at openjdk.org Thu May 23 23:42:09 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 23 May 2024 23:42:09 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v35] In-Reply-To: <-vyOZzeMslZqgJpTsQnnOWi4abWiM8fNeWSVx5LEHm8=.d37011ee-102c-4874-aa26-d113949d25ea@github.com> References: <-vyOZzeMslZqgJpTsQnnOWi4abWiM8fNeWSVx5LEHm8=.d37011ee-102c-4874-aa26-d113949d25ea@github.com> Message-ID: On Thu, 23 May 2024 23:12:42 GMT, Scott Gibbons wrote: >> Re-write the IndexOf code without the use of the pcmpestri instruction, only using AVX2 instructions. This change accelerates String.IndexOf on average 1.3x for AVX2. The benchmark numbers: >> >> >> Benchmark Score Latest >> StringIndexOf.advancedWithMediumSub 343.573 317.934 0.925375393x >> StringIndexOf.advancedWithShortSub1 1039.081 1053.96 1.014319384x >> StringIndexOf.advancedWithShortSub2 55.828 110.541 1.980027943x >> StringIndexOf.constantPattern 9.361 11.906 1.271872663x >> StringIndexOf.searchCharLongSuccess 4.216 4.218 1.000474383x >> StringIndexOf.searchCharMediumSuccess 3.133 3.216 1.02649218x >> StringIndexOf.searchCharShortSuccess 3.76 3.761 1.000265957x >> StringIndexOf.success 9.186 9.713 1.057369911x >> StringIndexOf.successBig 14.341 46.343 3.231504079x >> StringIndexOfChar.latin1_AVX2_String 6220.918 12154.52 1.953814533x >> StringIndexOfChar.latin1_AVX2_char 5503.556 5540.044 1.006629895x >> StringIndexOfChar.latin1_SSE4_String 6978.854 6818.689 0.977049957x >> StringIndexOfChar.latin1_SSE4_char 5657.499 5474.624 0.967675646x >> StringIndexOfChar.latin1_Short_String 7132.541 6863.359 0.962260014x >> StringIndexOfChar.latin1_Short_char 16013.389 16162.437 1.009307711x >> StringIndexOfChar.latin1_mixed_String 7386.123 14771.622 1.999915517x >> StringIndexOfChar.latin1_mixed_char 9901.671 9782.245 0.987938803 > > Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: > > Review comments - move stubGen*_string.cpp to c2_stubGen*_string.cpp I submitted our testing for latest v34 version of changes. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16753#issuecomment-2128207810 From amenkov at openjdk.org Fri May 24 00:35:05 2024 From: amenkov at openjdk.org (Alex Menkov) Date: Fri, 24 May 2024 00:35:05 GMT Subject: Integrated: 8331683: Clean up GetCarrierThread In-Reply-To: References: Message-ID: On Sat, 18 May 2024 00:47:59 GMT, Alex Menkov wrote: > JVMTI GetCarrierThread extension function was introduced by loom for testing. > It's used by several tests in hotspot/jtreg/serviceability. > > Testings: tier1..tier6 This pull request has now been integrated. Changeset: 424eb60d Author: Alex Menkov URL: https://git.openjdk.org/jdk/commit/424eb60dedb332237b8ec97e9da6bd95442c0083 Stats: 37 lines in 3 files changed: 4 ins; 27 del; 6 mod 8331683: Clean up GetCarrierThread Reviewed-by: sspitsyn, cjplummer ------------- PR: https://git.openjdk.org/jdk/pull/19289 From kvn at openjdk.org Fri May 24 00:50:10 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 24 May 2024 00:50:10 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v35] In-Reply-To: <-vyOZzeMslZqgJpTsQnnOWi4abWiM8fNeWSVx5LEHm8=.d37011ee-102c-4874-aa26-d113949d25ea@github.com> References: <-vyOZzeMslZqgJpTsQnnOWi4abWiM8fNeWSVx5LEHm8=.d37011ee-102c-4874-aa26-d113949d25ea@github.com> Message-ID: On Thu, 23 May 2024 23:12:42 GMT, Scott Gibbons wrote: >> Re-write the IndexOf code without the use of the pcmpestri instruction, only using AVX2 instructions. This change accelerates String.IndexOf on average 1.3x for AVX2. The benchmark numbers: >> >> >> Benchmark Score Latest >> StringIndexOf.advancedWithMediumSub 343.573 317.934 0.925375393x >> StringIndexOf.advancedWithShortSub1 1039.081 1053.96 1.014319384x >> StringIndexOf.advancedWithShortSub2 55.828 110.541 1.980027943x >> StringIndexOf.constantPattern 9.361 11.906 1.271872663x >> StringIndexOf.searchCharLongSuccess 4.216 4.218 1.000474383x >> StringIndexOf.searchCharMediumSuccess 3.133 3.216 1.02649218x >> StringIndexOf.searchCharShortSuccess 3.76 3.761 1.000265957x >> StringIndexOf.success 9.186 9.713 1.057369911x >> StringIndexOf.successBig 14.341 46.343 3.231504079x >> StringIndexOfChar.latin1_AVX2_String 6220.918 12154.52 1.953814533x >> StringIndexOfChar.latin1_AVX2_char 5503.556 5540.044 1.006629895x >> StringIndexOfChar.latin1_SSE4_String 6978.854 6818.689 0.977049957x >> StringIndexOfChar.latin1_SSE4_char 5657.499 5474.624 0.967675646x >> StringIndexOfChar.latin1_Short_String 7132.541 6863.359 0.962260014x >> StringIndexOfChar.latin1_Short_char 16013.389 16162.437 1.009307711x >> StringIndexOfChar.latin1_mixed_String 7386.123 14771.622 1.999915517x >> StringIndexOfChar.latin1_mixed_char 9901.671 9782.245 0.987938803 > > Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: > > Review comments - move stubGen*_string.cpp to c2_stubGen*_string.cpp test/jdk/java/lang/StringBuffer/IndexOf.java line 2: > 1: /* > 2: * Copyright (c) 2000, 2024 Oracle and/or its affiliates. All rights reserved. This copyright header validation failure. Missing comma `,` after 2024. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1612519675 From kvn at openjdk.org Fri May 24 01:03:13 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 24 May 2024 01:03:13 GMT Subject: RFR: JDK-8325841 - Remove unused references to vmSymbols.hpp In-Reply-To: References: Message-ID: On Thu, 23 May 2024 21:51:32 GMT, Cesar Soares Lucas wrote: > Can I please get some reviews for this change to remove unused names from `vmSymbols.hpp`? > > As far as I can tell there is nothing in the code base using these symbols. My search was just a simple grep + some bash script, though. I tested using JTREG on MacOS, Linux Mariner & Alpine from tier1 to 3. My testing (tier1 and various build configurations) passed. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19374#pullrequestreview-2075400788 From cslucas at openjdk.org Fri May 24 02:03:00 2024 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Fri, 24 May 2024 02:03:00 GMT Subject: RFR: JDK-8325841 - Remove unused references to vmSymbols.hpp In-Reply-To: References: Message-ID: On Thu, 23 May 2024 23:19:51 GMT, Vladimir Kozlov wrote: >> Can I please get some reviews for this change to remove unused names from `vmSymbols.hpp`? >> >> As far as I can tell there is nothing in the code base using these symbols. My search was just a simple grep + some bash script, though. I tested using JTREG on MacOS, Linux Mariner & Alpine from tier1 to 3. > > I submitted our testing. Thank you @vnkozlov . ------------- PR Comment: https://git.openjdk.org/jdk/pull/19374#issuecomment-2128350224 From cslucas at openjdk.org Fri May 24 02:06:14 2024 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Fri, 24 May 2024 02:06:14 GMT Subject: RFR: JDK-8324341 : Remove redundant preprocessor #if's checks Message-ID: Can I please get some reviews for this change to remove some redundant #if / #ifdefs ? My search was just a simple grep + some bash script, though. I tested using JTREG on MacOS, Linux Mariner & Alpine from tier1 to 3. ------------- Commit messages: - Remove redundant nested #ifs Changes: https://git.openjdk.org/jdk/pull/19378/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19378&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8324341 Stats: 16 lines in 6 files changed: 0 ins; 16 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/19378.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19378/head:pull/19378 PR: https://git.openjdk.org/jdk/pull/19378 From ccheung at openjdk.org Fri May 24 05:24:04 2024 From: ccheung at openjdk.org (Calvin Cheung) Date: Fri, 24 May 2024 05:24:04 GMT Subject: RFR: 8330198: Add some class loading related perf counters to measure VM startup [v3] In-Reply-To: References: Message-ID: <7yfsvM0ff6gBYLefpro2qTcEMBmCOHd3YICcygItlZs=.d900a439-4932-46e6-b287-d1bf2789f195@github.com> On Thu, 23 May 2024 22:33:09 GMT, David Holmes wrote: > Okay my first reaction here is "I object!". I get that Leyden wants to be able to easily compare startup costs between itself and mainline, but what is this costing mainline? Even if these counters are not active there is an impact on the code execution and I want to know that impact is negligible. I added some perf numbers for various startup benchmarks in the bug report [comment](https://bugs.openjdk.org/browse/JDK-8330198?focusedId=14675860&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14675860). > src/hotspot/share/classfile/classLoader.cpp line 1481: > >> 1479: >> 1480: jlong ClassLoader::class_init_time_ms() { >> 1481: return (UsePerfData) ? > > Or here Please refer to my reply above. > src/hotspot/share/oops/instanceKlass.cpp line 1219: > >> 1217: } else { >> 1218: // The elapsed time is so small it's not worth counting. >> 1219: if (UsePerfData || ProfileClassLinkage) { > > You have to have UsePerfData being true for this work so you don't need the change. Right, the `_perf_classes_inited` counter is a pre-existing counter whose creation depends on `UsePerfData`. > src/hotspot/share/runtime/arguments.cpp line 3759: > >> 3757: if (log_is_enabled(Info, init)) { >> 3758: FLAG_SET_ERGO_IF_DEFAULT(ProfileClassLinkage, true); >> 3759: } > > What if ProfileClassLinkage is set true on the command-line without -Xlog:init? That doesn't seem to make sense to me. So I'm not clear why it is a settable diagnostic flag. If only `ProfileClassLinkage` is set to true without `-Xlog:init`, the user will not see any counters output. In `java.cpp`: 160 void log_vm_init_stats() { 161 LogStreamHandle(Info, init) log; 162 if (log.is_enabled()) { 163 ClassLoader::print_counters(); 164 } 165 } In the future, there will be other sets of counters controlled by other diagnostic flags. > src/hotspot/share/runtime/perfData.hpp line 834: > >> 832: public: >> 833: inline PerfTraceTime(PerfLongCounter* timerp) : _timerp(timerp) { >> 834: if (!UsePerfData || timerp == nullptr) return; > > Okay so this is needed because the existence of some counters is gated on the ProfileClassLinkage flag. > > Style nit: use a { } block please. Will fix. > src/hotspot/share/runtime/perfData.hpp line 838: > >> 836: } >> 837: >> 838: const char* name() const { return _timerp->name(); } > > Do you need a null check here? Will add a null check. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18790#issuecomment-2128547516 PR Review Comment: https://git.openjdk.org/jdk/pull/18790#discussion_r1612774777 PR Review Comment: https://git.openjdk.org/jdk/pull/18790#discussion_r1612775255 PR Review Comment: https://git.openjdk.org/jdk/pull/18790#discussion_r1612776411 PR Review Comment: https://git.openjdk.org/jdk/pull/18790#discussion_r1612775580 PR Review Comment: https://git.openjdk.org/jdk/pull/18790#discussion_r1612775821 From ccheung at openjdk.org Fri May 24 05:24:05 2024 From: ccheung at openjdk.org (Calvin Cheung) Date: Fri, 24 May 2024 05:24:05 GMT Subject: RFR: 8330198: Add some class loading related perf counters to measure VM startup [v3] In-Reply-To: <2TAhYdUQ5KXWODYMvzb15NqKhkXfFjV7RW9oHeVIg0U=.73940200-8be6-4427-9348-44d50fd22286@github.com> References: <2TAhYdUQ5KXWODYMvzb15NqKhkXfFjV7RW9oHeVIg0U=.73940200-8be6-4427-9348-44d50fd22286@github.com> Message-ID: On Thu, 23 May 2024 22:19:46 GMT, David Holmes wrote: >> src/hotspot/share/classfile/classLoader.cpp line 1477: >> >>> 1475: >>> 1476: jlong ClassLoader::class_init_count() { >>> 1477: return (UsePerfData) ? _perf_classes_inited->get_value() : -1; >> >> No need to add brackets here > > Surely this needs to be checking `ProfileClassLinkage`, which in turn should be false if `UsePerfData` is false. If `UsePerfData` is set to false, `ProfileClassLinkage` is set to false in arguments.cpp: 3761 if (ProfileClassLinkage && !UsePerfData) { 3762 if (!FLAG_IS_DEFAULT(ProfileClassLinkage)) { 3763 warning("Disabling ProfileClassLinkage since UsePerfData is turned off."); 3764 FLAG_SET_DEFAULT(ProfileClassLinkage, false); 3765 } 3766 } I will remove the extra parentheses. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18790#discussion_r1612774406 From stuefe at openjdk.org Fri May 24 06:16:11 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Fri, 24 May 2024 06:16:11 GMT Subject: RFR: 8312132: Add tracking of multiple address spaces in NMT [v105] In-Reply-To: References: Message-ID: On Thu, 23 May 2024 12:41:46 GMT, Johan Sj?len wrote: > > We claim that: > > > Our Treap-based approach in this patch gives a performance boost such that we see 25x better performance in a benchmark. > > > > > > May I ask how you ran it? I would like to be able to reproduce our claim. > > Sure, it was a while since I ran the benchmark. You're going to have to do a bit of work here, to get it working. > > You take this file: https://github.com/tstuefe/jdk/blob/6be830cd2e90a009effb016fbda2e92e1fca8247/test/hotspot/gtest/nmt/test_nmtvmadict.cpp#L1 > > And you port it to the VMATree instead of VMADict (or whatever it's called). Then you run it and look at output. You could also take one of the stress tests that I made, remove the verification calls, and run the same stress test for VirtualMemoryTracker. The claim makes also sense if you think about it. A binary tree will always grossly outperform a linked list for sorted insert/delete. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18289#issuecomment-2128629567 From stuefe at openjdk.org Fri May 24 06:28:13 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Fri, 24 May 2024 06:28:13 GMT Subject: RFR: 8312132: Add tracking of multiple address spaces in NMT [v105] In-Reply-To: References: <7mvAVR2Qfa10hYSXXxaL1yXpq6qbvvXFtqu-9-unCCk=.3802b0a1-8bc6-4f89-844a-affa2bf1788b@github.com> Message-ID: On Thu, 23 May 2024 16:39:40 GMT, Johan Sj?len wrote: >> This doesn't really test the state, nor the stack. It also seems to be a lot of code for a single-use test. Similar in other tests. > Sorry, I don't understand what you mean by this. You test that after committing, we have four nodes. You don't test that we have reserved-committed-reserved, nor the associated stacks (all equal), nor the flags. The test would be more interesting if you were to verify the region data, too. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18289#discussion_r1612895057 From djelinski at openjdk.org Fri May 24 06:34:10 2024 From: djelinski at openjdk.org (Daniel =?UTF-8?B?SmVsacWEc2tp?=) Date: Fri, 24 May 2024 06:34:10 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v33] In-Reply-To: References: Message-ID: On Thu, 23 May 2024 19:26:10 GMT, Scott Gibbons wrote: >> src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 268: >> >>> 266: __ cmpq(needle_len_p, 0); >>> 267: __ jg_b(L_nextCheck); >>> 268: __ xorq(rax, rax); >> >> out of curiosity, is there any advantage to using `xorq` instead of `xorl` here? >> >> https://stackoverflow.com/a/33668295/7707617 suggests that `xorl` might be better, but it's a bit dated now. > > Thanks for finding this. It was ignorance on my part as I thought the xorq would have logic to not emit the REX prefix if not necessary, but it doesn't. Fixed. Right, it seems to surprise people. There's a lot of preexisting uses of xorq / xorptr to zero a register. I think it would make sense to implement this logic in xorq. I can do this if others agree. >> src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 449: >> >>> 447: __ cmpq(r13, NUMBER_OF_CASES - 1); >>> 448: __ ja(L_smallCaseDefault); >>> 449: __ mov64(r15, (int64_t)small_jump_table); >> >> would it make sense to use `lea` here? > > It may, but I believe the movq is shorter (although maybe not to r15). I'll do some experimentation. the RIP-relative lea should have a shorter encoding. I think something like `lea(r15, ExternalAddress(small_jump_table))` should produce it (untested) >> src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 803: >> >>> 801: __ movq(index, needle_len); >>> 802: __ andq(index, 0xf); // nLen % 16 >>> 803: __ movq(offset, 0x10); >> >> `movl` or `movptr` would produce a shorter encoding > > I tried to be consistent with the whole {q,l} syntax throughout when referring to each symbolic register. I feel that changing this would ripple through the code. @sviswa7 what do you think? Right, that makes sense. I wonder if there's any reason why the logic to select the best mov variant is in movptr, and not in movq. Usually the `ptr` functions just select the `l` or `q` overload depending on the target system, `movptr` is an exception here. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1612907959 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1612908115 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1612908219 From stuefe at openjdk.org Fri May 24 06:35:12 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Fri, 24 May 2024 06:35:12 GMT Subject: RFR: 8312132: Add tracking of multiple address spaces in NMT [v105] In-Reply-To: References: <7mvAVR2Qfa10hYSXXxaL1yXpq6qbvvXFtqu-9-unCCk=.3802b0a1-8bc6-4f89-844a-affa2bf1788b@github.com> Message-ID: On Fri, 24 May 2024 06:25:20 GMT, Thomas Stuefe wrote: >> I'm going to take some time to digest these ideas. I'm not a fan of the string-based approach, I much prefer longer but more obvious code. The latter approach might work out. Still, a general tool is to me not preferable. The typical case is probably not that you need to read every test case, but a specific one which fails. Having some repetition in other tests shouldn't bother you then, but having to jump into a generalized DSL for state assertions do. >> >>>This doesn't really test the state, nor the stack. It also seems to be a lot of code for a single-use test. Similar in other tests. >> >> Sorry, I don't understand what you mean by this. > >>> This doesn't really test the state, nor the stack. It also seems to be a lot of code for a single-use test. Similar in other tests. > >> Sorry, I don't understand what you mean by this. > > > You test that after committing, we have four nodes. You don't test that we have reserved-committed-reserved, nor the associated stacks (all equal), nor the flags. The test would be more interesting if you were to verify the region data, too. > I'm going to take some time to digest these ideas. I'm not a fan of the string-based approach, I much prefer longer but more obvious code. The latter approach might work out. Still, a general tool is to me not preferable. The typical case is probably not that you need to read every test case, but a specific one which fails. Having some repetition in other tests shouldn't bother you then, but having to jump into a generalized DSL for state assertions do. But is this really so far away from what you are doing, just a lot less code? The smaller the code, the smaller the barrier against new tests. And I could think off-head of a lot of tests missing. Thinking about it, 24 different region states is probably too much. If we go for, say, 12 states, that increases the chance of merging in the random tests. So you pre-define, constant, 12 different region data with clear names. Those you use to reserve/commit, and to check the tree state. I would really like this VMA tree be tested well. One reason is that its compelling to reuse it in other scenarios. In fact, I am eagerly awaiting your push so that I can play with it e.g. in Metaspace, or in ZGC. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18289#discussion_r1612904208 From dholmes at openjdk.org Fri May 24 07:02:03 2024 From: dholmes at openjdk.org (David Holmes) Date: Fri, 24 May 2024 07:02:03 GMT Subject: RFR: 8330198: Add some class loading related perf counters to measure VM startup [v3] In-Reply-To: References: <2TAhYdUQ5KXWODYMvzb15NqKhkXfFjV7RW9oHeVIg0U=.73940200-8be6-4427-9348-44d50fd22286@github.com> Message-ID: On Fri, 24 May 2024 05:20:48 GMT, Calvin Cheung wrote: >> Surely this needs to be checking `ProfileClassLinkage`, which in turn should be false if `UsePerfData` is false. > > If `UsePerfData` is set to false, `ProfileClassLinkage` is set to false in arguments.cpp: > > > 3761 if (ProfileClassLinkage && !UsePerfData) { > 3762 if (!FLAG_IS_DEFAULT(ProfileClassLinkage)) { > 3763 warning("Disabling ProfileClassLinkage since UsePerfData is turned off."); > 3764 FLAG_SET_DEFAULT(ProfileClassLinkage, false); > 3765 } > 3766 } > > > I will remove the extra parentheses. Yes but if `UsePerfData` is true it doesn't mean `ProfileClassLinkage` is true. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18790#discussion_r1612951635 From djelinski at openjdk.org Fri May 24 07:18:01 2024 From: djelinski at openjdk.org (Daniel =?UTF-8?B?SmVsacWEc2tp?=) Date: Fri, 24 May 2024 07:18:01 GMT Subject: RFR: 8332724: x86 MacroAssembler may over-align code [v3] In-Reply-To: References: Message-ID: On Thu, 23 May 2024 18:15:27 GMT, Daniel Jeli?ski wrote: >> The methods align32 and align64 are supposed to align the next instruction to the next 32 or 64 byte boundary using the minimum number of NOP bytes. However, when the target represented as a 32bit signed int is negative, the instructions generate 32 or 64 NOP bytes too many. This was observed in `jbyte_disjoint_arraycopy_avx3` on a Linux machine, where a single align32 invocation generated 63 bytes of NOPs. >> >> This PR addresses the problem by using bit operations to calculate the required number of bytes. >> >> Tier1-3 tests passed. >> >> On a side note, `align64` and `align32` instructions were meant for aligning data for use with zmm / ymm loads, but nowadays they are frequently used in places where `align(CodeEntryAlignment)` or `align(OptoLoopAlignment)` would be more appropriate. I can address that in a separate PR if you think it's worth fixing. > > Daniel Jeli?ski has updated the pull request incrementally with one additional commit since the last revision: > > Fix 32-bit compilation Thanks for the reviews. Unless someone objects, I'll integrate this later today. Re revisiting flag types, it seems that the only options currently available are bool, int, intx and uintx, other types might [make jvmci unhappy](https://github.com/openjdk/jdk/blob/af056c1676dab3b0b35666a8259db60f9bbf824e/src/hotspot/share/jvmci/jvmciCompilerToVMInit.cpp#L294). ------------- PR Comment: https://git.openjdk.org/jdk/pull/19353#issuecomment-2128769210 From ayang at openjdk.org Fri May 24 07:35:02 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Fri, 24 May 2024 07:35:02 GMT Subject: RFR: JDK-8324341 : Remove redundant preprocessor #if's checks In-Reply-To: References: Message-ID: <8EQX7Jsg_SGE173q8uesBrV0-DEHZQtzb5aQTQx3A3Q=.cdd410de-4cd4-44d9-a0f1-730b48b522f3@github.com> On Fri, 24 May 2024 02:01:36 GMT, Cesar Soares Lucas wrote: > Can I please get some reviews for this change to remove some redundant #if / #ifdefs ? > > My search was just a simple grep + some bash script, though. I tested using JTREG on MacOS, Linux Mariner & Alpine from tier1 to 3. Can you merge master to re-trigger GHA? ------------- PR Comment: https://git.openjdk.org/jdk/pull/19378#issuecomment-2128797682 From aboldtch at openjdk.org Fri May 24 07:47:04 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Fri, 24 May 2024 07:47:04 GMT Subject: RFR: 8319947: Recursive lightweight locking: s390x implementation [v4] In-Reply-To: References: <_HXRejW4TcLfTYbXlRQUVyejaomiwx6HhObYfG1lX6E=.f89b29a0-4c2d-43e1-83be-d7bafd84816c@github.com> Message-ID: On Thu, 23 May 2024 12:49:16 GMT, Amit Kumar wrote: >> s390x port for recursive locking. >> >> testing: >> - [x] build fastdebug-vm >> - [x] build slowdebug-vm >> - [x] build release-vm >> - [x] build optimized-vm >> - [x] ./test/jdk/java/util/concurrent (fastdebug-vm) >> - [x] with C1 >> - [x] with C2 >> - [x] with interpreter >> - [x] ./test/jdk/java/util/concurrent (release-vm) >> - [x] with C1 >> - [x] with C2 >> - [x] with interpreter >> - [x] ./test/jdk/java/util/concurrent (slowdebug-vm) >> - [x] with C1 >> - [x] with C2 >> - [x] with interpreter >> - [x] tier1 with fastdebug-vm >> - [x] tier1 with slowdebug-vm >> - [x] tier1 with release-vm >> >> *BenchMarks*: >> >> Results from Performance LPARs : >> >> >> Locking Mode = 1 (without Patch) >> >> Benchmark (innerCount) Mode Cnt Score Error Units >> LockUnlock.testContendedLock 100 avgt 12 5.144 ? 0.035 ns/op >> LockUnlock.testRecursiveLockUnlock 100 avgt 12 3824.742 ? 89.475 ns/op >> LockUnlock.testRecursiveSynchronization 100 avgt 12 25.348 ? 0.559 ns/op >> LockUnlock.testSerialLockUnlock 100 avgt 12 466.629 ? 3.036 ns/op >> LockUnlock.testSimpleLockUnlock 100 avgt 12 468.532 ? 1.793 ns/op >> Finished running test 'micro:vm.lang.LockUnlock' >> >> Locking Mode = 1 (with patch) >> >> Benchmark (innerCount) Mode Cnt Score Error Units >> LockUnlock.testContendedLock 100 avgt 12 5.146 ? 0.027 ns/op >> LockUnlock.testRecursiveLockUnlock 100 avgt 12 3833.175 ? 75.863 ns/op >> LockUnlock.testRecursiveSynchronization 100 avgt 12 25.206 ? 0.519 ns/op >> LockUnlock.testSerialLockUnlock 100 avgt 12 473.973 ? 2.103 ns/op >> LockUnlock.testSimpleLockUnlock 100 avgt 12 470.749 ? 2.229 ns/op >> Finished running test 'micro:vm.lang.LockUnlock' >> >> >> >> >> Locking Mode = 2 (without Patch) >> >> Benchmark (innerCount) Mode Cnt Score Error Units >> LockUnlock.testContendedLock 100 avgt 12 4.688 ? 0.051 ns/op >> LockUnlock.testRecursiveLockUnlock 100 avgt 12 12800.544 ? 92.265 ns/op >> LockUnlock.testRecursiveSynchronization 100 avgt 12 26.486 ? 2.229 ns/op >> LockUnlock.testSerialLockUnlock 100 avgt 12 424.499 ? 0.416 ns/op >> LockUnlock.te... > > Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: > > minor code formatting & variable renamings Looks good. Have someone with better s390x knowledge look at this as well. ------------- Marked as reviewed by aboldtch (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18878#pullrequestreview-2076109904 From alanb at openjdk.org Fri May 24 08:05:01 2024 From: alanb at openjdk.org (Alan Bateman) Date: Fri, 24 May 2024 08:05:01 GMT Subject: RFR: 8242888: Convert dynamic proxy to hidden classes In-Reply-To: References: Message-ID: On Thu, 23 May 2024 23:24:16 GMT, Chen Liang wrote: > Hmm, actually, looking at the specs of the method again, does it imply that Proxy classes are never unloaded once defined in a ClassLoader, as seen in `Proxy::getProxyClass`: It's not specified, Proxy pre-dates hidden classes although its Proxy did require some changes to specify that it can't be a proxy to a hidden class. Given the getProxyClass is deprecated then it may be better to have it work the same way as it has always done. If Proxy::newInstanceClass is changed to return an instance of a hidden class then spec changes are needed. Maybe too early to think about that now as there is a lot of analysis work required to do before going near code. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19356#issuecomment-2128853043 From stuefe at openjdk.org Fri May 24 08:06:17 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Fri, 24 May 2024 08:06:17 GMT Subject: RFR: 8312132: Add tracking of multiple address spaces in NMT [v105] In-Reply-To: References: Message-ID: On Wed, 22 May 2024 14:12:44 GMT, Johan Sj?len wrote: >> Hi, >> >> This PR introduces a new abstraction to NMT, named `MemoryFileTracker`. Today, NMT does not track any memory outside of the virtual memory address space. This means that if you allocated memory in something such as a memory-backed file and use `mmap` to map into that memory, then you'll have trouble reporting this to NMT. This is the situation that ZGC is in, and that is what this patch attempts to fix. >> >> ## `MemoryFileTracker` >> >> The `MemoryFileTracker` adds the ability of adding new virtual memory address spaces to NMT and committing memory to these, the basic API is: >> >> ```c++ >> static MemoryFile* make_device(const char* descriptive_name); >> static void free_device(MemoryFile* device); >> >> static void allocate_memory(MemoryFile* device, size_t offset, size_t size, >> MEMFLAGS flag, const NativeCallStack& stack); >> static void free_memory(MemoryFile* device, size_t offset, size_t size); >> >> >> It is easiest to see how this is used by looking at what ZGC's `ZNMT` class does: >> >> ```c++ >> void ZNMT::reserve(zaddress_unsafe start, size_t size) { >> MemTracker::record_virtual_memory_reserve((address)start, size, CALLER_PC, mtJavaHeap); >> } >> void ZNMT::commit(zoffset offset, size_t size) { >> MemTracker::allocate_memory_in(ZNMT::_device, static_cast(offset), size, mtJavaHeap, CALLER_PC); >> } >> void ZNMT::uncommit(zoffset offset, size_t size) { >> MemTracker::free_memory_in(ZNMT::_device, (size_t)offset, size); >> } >> >> void ZNMT::map(zaddress_unsafe addr, size_t size, zoffset offset) { >> // NMT doesn't track mappings at the moment. >> } >> void ZNMT::unmap(zaddress_unsafe addr, size_t size) { >> // NMT doesn't track mappings at the moment. >> } >> >> >> As you can see, any mapping between reserved regions and device-allocated memory is not recorded in NMT. This means that in detailed mode you only get reserved regions printed for the reserved memory, the device-allocated memory is reported separately. When performing summary reporting any memory allocated via these devices is added to the corresponding `MEMFLAGS` as `committed` memory. >> >> This patch is also acting as a base on which we deploy multiple new backend ideas to NMT. These ideas are: >> >> 1. Implement VMA tracking using a balanced binary tree approach. Today's `VirtualMemoryTracker`'s usage of linked lists is slow and brittle, we'd like to move away from it. Our Treap-based approach in this patch gives a performance bo... > > Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: > > Lower number of pages Looked at native callstack storage, memfiletracker, some other stuff. Will continue next week. src/hotspot/share/nmt/memReporter.cpp line 879: > 877: } > 878: > 879: void MemDetailReporter::report_physical_devices() { Can we rename this? This is really not about physical devices. "report_address_spaces" ? Also not better. Sigh. "report_anonymous_kernel_memory_allocated_on_behalf_of_ZGC_with_memfd_that_we_dont_see_in_rss_ever" ? But seriously, maybe "report_memory_file_allocations"? That would work. src/hotspot/share/nmt/memoryFileTracker.cpp line 50: > 48: for (int i = 0; i < mt_number_of_types; i++) { > 49: VirtualMemory* summary = device->_summary.by_type(NMTUtil::index_to_flag(i)); > 50: summary->reserve_memory(diff.flag[i].reserve); Why do we only track reserved memory here? src/hotspot/share/nmt/memoryFileTracker.cpp line 51: > 49: VirtualMemory* summary = device->_summary.by_type(NMTUtil::index_to_flag(i)); > 50: summary->reserve_memory(diff.flag[i].reserve); > 51: } This seems to be a recurring pattern. - Please make VMATree::SummaryDiff a first-class NMT class. Maybe rename it, too, since it does not necessarily have anything to do with Summary reports. Maybe something like like "MemoryDeltas", being an array of "MemoryDelta". Up to you. - Then give us something like "VirtualMemorySnapshot.apply_delta(MemoryDeltas)". That saves code, and when reading its callsite, your intent is much clearer. src/hotspot/share/nmt/memoryFileTracker.cpp line 66: > 64: stream->cr(); > 65: VMATree::TreapNode* prev = nullptr; > 66: device->_tree.visit_in_order([&](VMATree::TreapNode* current) { Does MemoryFileTracker really need to be friend to access the tree? It only needs read-only access to the tree, nothing else. Why not expose a ro access to the tree? I balk at the many friend relationships. To my eye, they undermine the encapsulation. I can see the point for test classes, but here? src/hotspot/share/nmt/memoryFileTracker.cpp line 72: > 70: return; > 71: } > 72: assert(prev->val().out.type() == current->val().in.type(), "must be"); Slight modification, since I expect we will stare at the output of this function to analyse broken trees. Please keep record of "brokenness" and assert at the end only. And print out the current number of the mapping, too. Then, on assert, print out "tree broken first at record XXX". src/hotspot/share/nmt/memoryFileTracker.cpp line 157: > 155: void MemoryFileTracker::summary_snapshot(VirtualMemorySnapshot* snapshot) const { > 156: for (int d = 0; d < _devices.length(); d++) { > 157: auto& device = _devices.at(d); Lets make this a type, and const: `const MemoryFile*` src/hotspot/share/nmt/nmtNativeCallStackStorage.hpp line 61: > 59: : _stack_index(-1) { > 60: } > 61: }; If you want to retain StackIndex as class (there is no pressing need, could be just as well a simple typedef to a 32bit int), let's make it work for its money. E.g., in StackIndex(i), assert i >= 0. Maybe give it an is_invalid() method, and replace manual comparisons with -1 with index.is_invalid(). My obsessive compulsive mind nags at wasting half of the value range for "invalid". Granted, that only matters should we want to shrink the index to 16bit. Still, you could make the index uint32_t and declare UINT_MAX to be the invalid value. I know part of the reason is that GrowableArray is hardcoded to signed int as index, but no need for that to define the index type here. src/hotspot/share/nmt/nmtNativeCallStackStorage.hpp line 80: > 78: } > 79: link = link->next; > 80: } Good. We do an youngest-first search if I am seeing right. Was that deliberate? The chance of the most recent callstacks reoccurring is a lot higher than seeing older stacks. src/hotspot/share/nmt/nmtNativeCallStackStorage.hpp line 88: > 86: > 87: // For storage of the Links > 88: Arena _arena; I like that we store the links in arena, it saves space and mallocs. Only thing to remember is that now we will see "Arena" memory in the NMT category when printing the report. src/hotspot/share/nmt/nmtNativeCallStackStorage.hpp line 92: > 90: // 4099 gives a 50% probability of collisions at 76 stacks (as per birthday problem). > 91: static const constexpr int default_nr_buckets = 4099; > 92: int _nr_buckets; isn't this normally called table_size or somesuch? _nr_buckets sounds like number of items, which this is not. src/hotspot/share/nmt/nmtNativeCallStackStorage.hpp line 95: > 93: Link** _buckets; > 94: GrowableArrayCHeap _stacks; > 95: bool _is_detailed_mode; _is_detailed: I somehow don't think this rather low level class should care about and copy the MemTracker state. I like "one truth only", which is MemTracker::enabled. I'd rather see this handled at the call site. If we only need it to prevent allocation of the bucket table at construction time, I'd allocate that one with malloc. src/hotspot/share/nmt/nmtNativeCallStackStorage.hpp line 119: > 117: } > 118: } > 119: }; Possibly for follow up RFE: I would like to see number of stacks in the NMT statistic (there is this statistic subcommand to the jcmd VM.native_memory). I also would like to see those statistics in the hs-err file. For example, if we ever decide to track larger stacks (which would make a lot of sense, 4 frames is really not much), we will see a logarithmic (?) increase in number of stacks. I would like to know those numbers. Note that I sometimes do that during investigations, and I have a RFE open somewhere to make the number of frames in stacks tunable with a VM options. src/hotspot/share/nmt/vmatree.hpp line 146: > 144: struct SingleDiff { > 145: int64_t reserve; > 146: int64_t commit; The typical type would be `ssize_t`, not int64. Apart from clarity, I am not sure how int64 would work on 32-bit. test/hotspot/gtest/nmt/test_nmt_memoryfiletracker.cpp line 53: > 51: TEST_VM_F(MemoryFileTrackerTest, Basics) { > 52: this->basics(); > 53: } Curious, just a question. You like using fixture classes even if not necessary. Why not write the test directly into a TEST_VM ? ------------- PR Review: https://git.openjdk.org/jdk/pull/18289#pullrequestreview-2075935829 PR Review Comment: https://git.openjdk.org/jdk/pull/18289#discussion_r1612991632 PR Review Comment: https://git.openjdk.org/jdk/pull/18289#discussion_r1613035196 PR Review Comment: https://git.openjdk.org/jdk/pull/18289#discussion_r1613015946 PR Review Comment: https://git.openjdk.org/jdk/pull/18289#discussion_r1613023874 PR Review Comment: https://git.openjdk.org/jdk/pull/18289#discussion_r1613027118 PR Review Comment: https://git.openjdk.org/jdk/pull/18289#discussion_r1613029549 PR Review Comment: https://git.openjdk.org/jdk/pull/18289#discussion_r1612953300 PR Review Comment: https://git.openjdk.org/jdk/pull/18289#discussion_r1612946445 PR Review Comment: https://git.openjdk.org/jdk/pull/18289#discussion_r1612964659 PR Review Comment: https://git.openjdk.org/jdk/pull/18289#discussion_r1612971831 PR Review Comment: https://git.openjdk.org/jdk/pull/18289#discussion_r1612960587 PR Review Comment: https://git.openjdk.org/jdk/pull/18289#discussion_r1612986848 PR Review Comment: https://git.openjdk.org/jdk/pull/18289#discussion_r1613018290 PR Review Comment: https://git.openjdk.org/jdk/pull/18289#discussion_r1612912160 From stuefe at openjdk.org Fri May 24 08:06:17 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Fri, 24 May 2024 08:06:17 GMT Subject: RFR: 8312132: Add tracking of multiple address spaces in NMT [v105] In-Reply-To: References: Message-ID: On Fri, 24 May 2024 07:25:01 GMT, Thomas Stuefe wrote: >> Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: >> >> Lower number of pages > > src/hotspot/share/nmt/nmtNativeCallStackStorage.hpp line 119: > >> 117: } >> 118: } >> 119: }; > > Possibly for follow up RFE: I would like to see number of stacks in the NMT statistic (there is this statistic subcommand to the jcmd VM.native_memory). I also would like to see those statistics in the hs-err file. > > For example, if we ever decide to track larger stacks (which would make a lot of sense, 4 frames is really not much), we will see a logarithmic (?) increase in number of stacks. I would like to know those numbers. Note that I sometimes do that during investigations, and I have a RFE open somewhere to make the number of frames in stacks tunable with a VM options. Note: mid-term we should place *all* stacks in here, not just those for tracking ZGC. And replace all physical copies of stacks with StackIndex. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18289#discussion_r1613003650 From stuefe at openjdk.org Fri May 24 08:15:15 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Fri, 24 May 2024 08:15:15 GMT Subject: RFR: 8312132: Add tracking of multiple address spaces in NMT [v105] In-Reply-To: References: Message-ID: On Fri, 24 May 2024 07:46:33 GMT, Thomas Stuefe wrote: >> Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: >> >> Lower number of pages > > src/hotspot/share/nmt/memoryFileTracker.cpp line 51: > >> 49: VirtualMemory* summary = device->_summary.by_type(NMTUtil::index_to_flag(i)); >> 50: summary->reserve_memory(diff.flag[i].reserve); >> 51: } > > This seems to be a recurring pattern. > > - Please make VMATree::SummaryDiff a first-class NMT class. Maybe rename it, too, since it does not necessarily have anything to do with Summary reports. Maybe something like like "MemoryDeltas", being an array of "MemoryDelta". Up to you. > - Then give us something like "VirtualMemorySnapshot.apply_delta(MemoryDeltas)". > > That saves code, and when reading its callsite, your intent is much clearer. Wait, I made a small thinking error here. If you do this, MemoryDeltas should carry delta for both reserved and committed counters, and therefore is tied to the VirtualMemory case, and should probably be named VirtualMemoryDelta (or Diff or whatever) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18289#discussion_r1613056414 From stuefe at openjdk.org Fri May 24 08:23:17 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Fri, 24 May 2024 08:23:17 GMT Subject: RFR: 8312132: Add tracking of multiple address spaces in NMT [v105] In-Reply-To: References: Message-ID: On Fri, 24 May 2024 08:12:41 GMT, Thomas Stuefe wrote: >> src/hotspot/share/nmt/memoryFileTracker.cpp line 51: >> >>> 49: VirtualMemory* summary = device->_summary.by_type(NMTUtil::index_to_flag(i)); >>> 50: summary->reserve_memory(diff.flag[i].reserve); >>> 51: } >> >> This seems to be a recurring pattern. >> >> - Please make VMATree::SummaryDiff a first-class NMT class. Maybe rename it, too, since it does not necessarily have anything to do with Summary reports. Maybe something like like "MemoryDeltas", being an array of "MemoryDelta". Up to you. >> - Then give us something like "VirtualMemorySnapshot.apply_delta(MemoryDeltas)". >> >> That saves code, and when reading its callsite, your intent is much clearer. > > Wait, I made a small thinking error here. If you do this, MemoryDeltas should carry delta for both reserved and committed counters, and therefore is tied to the VirtualMemory case, and should probably be named VirtualMemoryDelta (or Diff or whatever) You know what, I leave this up to you. We can streamline this in later RFEs. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18289#discussion_r1613071212 From azafari at openjdk.org Fri May 24 08:54:08 2024 From: azafari at openjdk.org (Afshin Zafari) Date: Fri, 24 May 2024 08:54:08 GMT Subject: RFR: 8331539: [REDO] NMT: add/make a mandatory MEMFLAGS argument to family of os::reserve/commit/uncommit memory API [v2] In-Reply-To: <9XzKmn3xJvlbw4gz2vK_NZ6yOwfKB9VzHE6CBSz-73E=.dfaa5291-95b6-403d-b363-42131ebf4c4c@github.com> References: <1i0PKv9mCusM6BZqXG8ULe0lRA2Nz2ix4aZHz9otNMM=.b9d2d151-883e-4cb6-be48-4ba45b49ed43@github.com> <_M5SvhyN_E_8HUeamhiLJMp37flhjgTVE_X7t8jmPVc=.f86cbb23-9461-4013-83bf-d6b154b96cfd@github.com> <9XzKmn3xJvlbw4gz2vK_NZ6yOwfKB9VzHE6CBSz-73E=.dfaa5291-95b6-403d-b363-42131ebf4c4c@github.com> Message-ID: On Thu, 23 May 2024 12:38:40 GMT, Stefan Karlsson wrote: >> Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision: >> >> fixed the missing parts of shenandoahHeap.cpp > > src/hotspot/os/posix/os_posix.cpp line 386: > >> 384: if (begin_offset > 0) { >> 385: if (os::release_memory(extra_base, begin_offset)) >> 386: { > > The `{` should be moved to the line above. Fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19343#discussion_r1613115512 From azafari at openjdk.org Fri May 24 09:01:02 2024 From: azafari at openjdk.org (Afshin Zafari) Date: Fri, 24 May 2024 09:01:02 GMT Subject: RFR: 8331539: [REDO] NMT: add/make a mandatory MEMFLAGS argument to family of os::reserve/commit/uncommit memory API [v2] In-Reply-To: <9XzKmn3xJvlbw4gz2vK_NZ6yOwfKB9VzHE6CBSz-73E=.dfaa5291-95b6-403d-b363-42131ebf4c4c@github.com> References: <1i0PKv9mCusM6BZqXG8ULe0lRA2Nz2ix4aZHz9otNMM=.b9d2d151-883e-4cb6-be48-4ba45b49ed43@github.com> <_M5SvhyN_E_8HUeamhiLJMp37flhjgTVE_X7t8jmPVc=.f86cbb23-9461-4013-83bf-d6b154b96cfd@github.com> <9XzKmn3xJvlbw4gz2vK_NZ6yOwfKB9VzHE6CBSz-73E=.dfaa5291-95b6-403d-b363-42131ebf4c4c@github.com> Message-ID: On Thu, 23 May 2024 12:41:25 GMT, Stefan Karlsson wrote: >> Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision: >> >> fixed the missing parts of shenandoahHeap.cpp > > src/hotspot/os/posix/os_posix.cpp line 387: > >> 385: if (os::release_memory(extra_base, begin_offset)) >> 386: { >> 387: ThreadCritical tc; > > In many of the functions we put the `ThreadCritical` inside the `MemTracker` after the `enabled()` check, but we don't do it here. Why is that? Shouldn't the `ThreadCritical` usage be hidden inside `MemTracker`? I have already tried to move `ThreadCritical` into the `MemTracker` (in another PR), but it failed. AFAIR, the unmapping/releasing the memory should be in critical section too. The current implementation follows this order: 1) create critical section 2) unmap/release 3) if successful, call MemTracker. The step 2) should be in critical section. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19343#discussion_r1613124698 From jsjolen at openjdk.org Fri May 24 09:02:39 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Fri, 24 May 2024 09:02:39 GMT Subject: RFR: 8312132: Add tracking of multiple address spaces in NMT [v106] In-Reply-To: References: Message-ID: > Hi, > > This PR introduces a new abstraction to NMT, named `MemoryFileTracker`. Today, NMT does not track any memory outside of the virtual memory address space. This means that if you allocated memory in something such as a memory-backed file and use `mmap` to map into that memory, then you'll have trouble reporting this to NMT. This is the situation that ZGC is in, and that is what this patch attempts to fix. > > ## `MemoryFileTracker` > > The `MemoryFileTracker` adds the ability of adding new virtual memory address spaces to NMT and committing memory to these, the basic API is: > > ```c++ > static MemoryFile* make_device(const char* descriptive_name); > static void free_device(MemoryFile* device); > > static void allocate_memory(MemoryFile* device, size_t offset, size_t size, > MEMFLAGS flag, const NativeCallStack& stack); > static void free_memory(MemoryFile* device, size_t offset, size_t size); > > > It is easiest to see how this is used by looking at what ZGC's `ZNMT` class does: > > ```c++ > void ZNMT::reserve(zaddress_unsafe start, size_t size) { > MemTracker::record_virtual_memory_reserve((address)start, size, CALLER_PC, mtJavaHeap); > } > void ZNMT::commit(zoffset offset, size_t size) { > MemTracker::allocate_memory_in(ZNMT::_device, static_cast(offset), size, mtJavaHeap, CALLER_PC); > } > void ZNMT::uncommit(zoffset offset, size_t size) { > MemTracker::free_memory_in(ZNMT::_device, (size_t)offset, size); > } > > void ZNMT::map(zaddress_unsafe addr, size_t size, zoffset offset) { > // NMT doesn't track mappings at the moment. > } > void ZNMT::unmap(zaddress_unsafe addr, size_t size) { > // NMT doesn't track mappings at the moment. > } > > > As you can see, any mapping between reserved regions and device-allocated memory is not recorded in NMT. This means that in detailed mode you only get reserved regions printed for the reserved memory, the device-allocated memory is reported separately. When performing summary reporting any memory allocated via these devices is added to the corresponding `MEMFLAGS` as `committed` memory. > > This patch is also acting as a base on which we deploy multiple new backend ideas to NMT. These ideas are: > > 1. Implement VMA tracking using a balanced binary tree approach. Today's `VirtualMemoryTracker`'s usage of linked lists is slow and brittle, we'd like to move away from it. Our Treap-based approach in this patch gives a performance boost such that we see 25x better performance in a benchmark. The idea and draft of this... Johan Sj?len has updated the pull request incrementally with 12 additional commits since the last revision: - Rename all uses of device in MemoryFileTracker to file - Make type explicit - Use expect_node_count and other utilities to cut down on repetitive code - Refactor NCS and SI into the test fixture class - Move out auto lambdas to functions in the fixture class - Move using to global scope - Use two stacks, four memflags - Use is_aligned - Only four candidate flags - Rename Tpe to Type - ... and 2 more: https://git.openjdk.org/jdk/compare/80605766...f99190f0 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18289/files - new: https://git.openjdk.org/jdk/pull/18289/files/80605766..f99190f0 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18289&range=105 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18289&range=104-105 Stats: 270 lines in 5 files changed: 74 ins; 78 del; 118 mod Patch: https://git.openjdk.org/jdk/pull/18289.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18289/head:pull/18289 PR: https://git.openjdk.org/jdk/pull/18289 From jsjolen at openjdk.org Fri May 24 09:02:39 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Fri, 24 May 2024 09:02:39 GMT Subject: RFR: 8312132: Add tracking of multiple address spaces in NMT [v105] In-Reply-To: References: Message-ID: On Fri, 24 May 2024 07:48:29 GMT, Thomas Stuefe wrote: >> Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: >> >> Lower number of pages > > src/hotspot/share/nmt/vmatree.hpp line 146: > >> 144: struct SingleDiff { >> 145: int64_t reserve; >> 146: int64_t commit; > > The typical type would be `ssize_t`, not int64. > > Apart from clarity, I am not sure how int64 would work on 32-bit. That doesn't seem right to me. `ssize_t` has a guaranteed range of `[-1, INT_MAX)`, the -1 being there for errors. We need as full of a range of negative numbers as possible. Good question regarding 32-bit, will have to think about that one. Btw: Yes, I know, we can underflow or overflow the diff, but in practice no one will allocate `2**64` bytes, I am willing to take that risk. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18289#discussion_r1613124297 From jsjolen at openjdk.org Fri May 24 09:06:04 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Fri, 24 May 2024 09:06:04 GMT Subject: RFR: 8331539: [REDO] NMT: add/make a mandatory MEMFLAGS argument to family of os::reserve/commit/uncommit memory API [v2] In-Reply-To: <_M5SvhyN_E_8HUeamhiLJMp37flhjgTVE_X7t8jmPVc=.f86cbb23-9461-4013-83bf-d6b154b96cfd@github.com> References: <1i0PKv9mCusM6BZqXG8ULe0lRA2Nz2ix4aZHz9otNMM=.b9d2d151-883e-4cb6-be48-4ba45b49ed43@github.com> <_M5SvhyN_E_8HUeamhiLJMp37flhjgTVE_X7t8jmPVc=.f86cbb23-9461-4013-83bf-d6b154b96cfd@github.com> Message-ID: On Wed, 22 May 2024 12:09:14 GMT, Afshin Zafari wrote: >> This PR fixes the problems existed in the original PR (https://github.com/openjdk/jdk/pull/18745). There are two main fixes here: >> 1- `ReservedSpace` class is changed so that the `_flag` member never changes after it is set in ctor. Since reserving memory regions may go thru a try and fail sequence of reserve-release pairs, changing the `_flag` member at failed releases would lead to incorrect flags in subsequent reserves. >> Also, some assertion are added to the getters of a `ReservedSpace` to check if the region is successfully reserved. >> >> 2- In order to have adjacent regions with different flags, CDS reserves a (large) region `R` and then splits it into sub regions `R1` and `R2` (`R == <---R1---><--R2-->`). At release time, NMT tracks only `R` and ignores releasing `R1` and `R2`. This ignoring is problematic when a requested region `R` is size-aligned to `R1---R---R2` first and then the `R1` and `R2` are released (`chop_extra_memory` function is called for this). In this case, NMT ignores tracking `R1` and `R2` with false assumption that a containing `R` will be released. Therefore, `R1` and `R2` remain in the NMT reserved-regions-list and when a new reserve happens at that regions, NMT complains by raising an exception. >> >> Tests: >> mach5 tiers 1-5, {linux-x64, macosx-aarch64, windows-x64, linux-aarch64 } x {debug, non-debug} > > Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision: > > fixed the missing parts of shenandoahHeap.cpp >2- In order to have adjacent regions with different flags, CDS reserves a (large) region R and then splits it into sub regions R1 and R2 (R == <---R1---><--R2-->). At release time, NMT tracks only R and ignores releasing R1 and R2. This ignoring is problematic when a requested region R is size-aligned to R1---R---R2 first and then the R1 and R2 are released (chop_extra_memory function is called for this). In this case, NMT ignores tracking R1 and R2 with false assumption that a containing R will be released. Therefore, R1 and R2 remain in the NMT reserved-regions-list and when a new reserve happens at that regions, NMT complains by raising an exception. Thank you for the in-depth explanation, I think I understand it. What was the fix for this issue? ------------- PR Comment: https://git.openjdk.org/jdk/pull/19343#issuecomment-2129009945 From kbarrett at openjdk.org Fri May 24 09:15:11 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Fri, 24 May 2024 09:15:11 GMT Subject: RFR: 8332720: ubsan: instanceKlass.cpp:3550:76: runtime error: member call on null pointer of type 'struct Array' [v2] In-Reply-To: References: <-visGzw1GeoT6b35zj5l6Ii-m1BpS_slOuVOlVgWmqs=.679e3dd7-f22d-44eb-9cd3-24352ef82f92@github.com> Message-ID: <4EJgeUbmOlGVPiSxp2QHxTSwQAhxjleoV0w0XSF9kFw=.25cd64e8-bf46-4871-95e7-d14ca74a1c9e@github.com> On Thu, 23 May 2024 07:48:42 GMT, Matthias Baesken wrote: > > Aside, I thought there was supposed to be a blank in between concatenated strings because some compiler complained. > > It is the same at a lot of places in the file so I did not change it here . A drive-by followup: `"..."identifier` is syntactically a user-defined literal, so we need a space to dodge that syntax. For simplicity, the "rule" that has been adopted is to always separate string literals from adjacent code. But apparently not here... ------------- PR Comment: https://git.openjdk.org/jdk/pull/19349#issuecomment-2129031431 From azafari at openjdk.org Fri May 24 09:17:01 2024 From: azafari at openjdk.org (Afshin Zafari) Date: Fri, 24 May 2024 09:17:01 GMT Subject: RFR: 8331539: [REDO] NMT: add/make a mandatory MEMFLAGS argument to family of os::reserve/commit/uncommit memory API [v2] In-Reply-To: <9XzKmn3xJvlbw4gz2vK_NZ6yOwfKB9VzHE6CBSz-73E=.dfaa5291-95b6-403d-b363-42131ebf4c4c@github.com> References: <1i0PKv9mCusM6BZqXG8ULe0lRA2Nz2ix4aZHz9otNMM=.b9d2d151-883e-4cb6-be48-4ba45b49ed43@github.com> <_M5SvhyN_E_8HUeamhiLJMp37flhjgTVE_X7t8jmPVc=.f86cbb23-9461-4013-83bf-d6b154b96cfd@github.com> <9XzKmn3xJvlbw4gz2vK_NZ6yOwfKB9VzHE6CBSz-73E=.dfaa5291-95b6-403d-b363-42131ebf4c4c@github.com> Message-ID: On Thu, 23 May 2024 11:57:17 GMT, Stefan Karlsson wrote: >> Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision: >> >> fixed the missing parts of shenandoahHeap.cpp > > src/hotspot/share/cds/metaspaceShared.cpp line 1088: > >> 1086: #endif // ASSERT >> 1087: >> 1088: if (archive_space_rs.is_reserved()) { > > We've already asserted that this should be true, so this if should not be needed. I had to add these, since the `log_info(cds)` calls caused the assertions in the `ReservedSpace` getters raised (IIRC, even during jdk-build). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19343#discussion_r1613148798 From ayang at openjdk.org Fri May 24 09:18:03 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Fri, 24 May 2024 09:18:03 GMT Subject: RFR: 8332745: Method::is_vanilla_constructor is never used In-Reply-To: References: Message-ID: <1jM7Dh4PY8lgOo9FD9JDubsRcnQ-OVT1FSaOQCF9ShA=.caf1a16e-50f4-4cd6-aa1a-56d05bbecff1@github.com> On Thu, 23 May 2024 13:00:49 GMT, Dan Heidinga wrote: > Removed dead code related to identifying empty constructors. Missed when [JDK-8057777](https://bugs.openjdk.org/browse/JDK-8057777) cleaned up JVM_AllocateNewObject. > > Passes mach5 tier1. Marked as reviewed by ayang (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/19367#pullrequestreview-2076340437 From azafari at openjdk.org Fri May 24 09:23:01 2024 From: azafari at openjdk.org (Afshin Zafari) Date: Fri, 24 May 2024 09:23:01 GMT Subject: RFR: 8331539: [REDO] NMT: add/make a mandatory MEMFLAGS argument to family of os::reserve/commit/uncommit memory API [v2] In-Reply-To: <9XzKmn3xJvlbw4gz2vK_NZ6yOwfKB9VzHE6CBSz-73E=.dfaa5291-95b6-403d-b363-42131ebf4c4c@github.com> References: <1i0PKv9mCusM6BZqXG8ULe0lRA2Nz2ix4aZHz9otNMM=.b9d2d151-883e-4cb6-be48-4ba45b49ed43@github.com> <_M5SvhyN_E_8HUeamhiLJMp37flhjgTVE_X7t8jmPVc=.f86cbb23-9461-4013-83bf-d6b154b96cfd@github.com> <9XzKmn3xJvlbw4gz2vK_NZ6yOwfKB9VzHE6CBSz-73E=.dfaa5291-95b6-403d-b363-42131ebf4c4c@github.com> Message-ID: On Thu, 23 May 2024 12:10:17 GMT, Stefan Karlsson wrote: >> Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision: >> >> fixed the missing parts of shenandoahHeap.cpp > > src/hotspot/share/cds/metaspaceShared.cpp line 1341: > >> 1339: } else { >> 1340: if (use_archive_base_addr && base_address != nullptr) { >> 1341: total_space_rs = ReservedSpace(total_range_size, archive_space_alignment, > > Can you explain why you changed this? > > It's also interesting that after this change we only use `base_address_alignment` in asserts. I think this indicates that something should be cleaned up / fixed here. That comes after merge with mainline. I trusted the tests in tiers 1-5 for the correctness of this change. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19343#discussion_r1613163280 From azafari at openjdk.org Fri May 24 09:29:02 2024 From: azafari at openjdk.org (Afshin Zafari) Date: Fri, 24 May 2024 09:29:02 GMT Subject: RFR: 8331539: [REDO] NMT: add/make a mandatory MEMFLAGS argument to family of os::reserve/commit/uncommit memory API [v2] In-Reply-To: <9XzKmn3xJvlbw4gz2vK_NZ6yOwfKB9VzHE6CBSz-73E=.dfaa5291-95b6-403d-b363-42131ebf4c4c@github.com> References: <1i0PKv9mCusM6BZqXG8ULe0lRA2Nz2ix4aZHz9otNMM=.b9d2d151-883e-4cb6-be48-4ba45b49ed43@github.com> <_M5SvhyN_E_8HUeamhiLJMp37flhjgTVE_X7t8jmPVc=.f86cbb23-9461-4013-83bf-d6b154b96cfd@github.com> <9XzKmn3xJvlbw4gz2vK_NZ6yOwfKB9VzHE6CBSz-73E=.dfaa5291-95b6-403d-b363-42131ebf4c4c@github.com> Message-ID: On Thu, 23 May 2024 12:30:41 GMT, Stefan Karlsson wrote: >> Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision: >> >> fixed the missing parts of shenandoahHeap.cpp > > src/hotspot/share/cds/metaspaceShared.cpp line 1370: > >> 1368: ccs_begin_offset, mtClassShared, mtClass); >> 1369: } >> 1370: assert(archive_space_rs.is_reserved(), "Archive space is not reserved."); > > Something is dubious about the code above: > > archive_space_rs = total_space_rs.first_part(ccs_begin_offset, > (size_t)archive_space_alignment); > class_space_rs = total_space_rs.last_part(ccs_begin_offset); > MemTracker::record_virtual_memory_split_reserved(total_space_rs.base(), total_space_rs.size(), > ccs_begin_offset, mtClassShared, mtClass); > > > In one path `total_space_rs` gets initialized with `mtClass` and in another path it gets initialized with `mtClassShared`. This means that we always get the wrong flag in one of `archive_space_rs` and `class_space_rs`. The logic is that, a large region `total_space_rs` is reserved and then is split into two sub regions. It doesn't matter what is the flag for `total_space_rs`. At split time the flags are set correctly for sub regions. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19343#discussion_r1613171792 From azafari at openjdk.org Fri May 24 09:32:07 2024 From: azafari at openjdk.org (Afshin Zafari) Date: Fri, 24 May 2024 09:32:07 GMT Subject: RFR: 8331539: [REDO] NMT: add/make a mandatory MEMFLAGS argument to family of os::reserve/commit/uncommit memory API [v2] In-Reply-To: <9XzKmn3xJvlbw4gz2vK_NZ6yOwfKB9VzHE6CBSz-73E=.dfaa5291-95b6-403d-b363-42131ebf4c4c@github.com> References: <1i0PKv9mCusM6BZqXG8ULe0lRA2Nz2ix4aZHz9otNMM=.b9d2d151-883e-4cb6-be48-4ba45b49ed43@github.com> <_M5SvhyN_E_8HUeamhiLJMp37flhjgTVE_X7t8jmPVc=.f86cbb23-9461-4013-83bf-d6b154b96cfd@github.com> <9XzKmn3xJvlbw4gz2vK_NZ6yOwfKB9VzHE6CBSz-73E=.dfaa5291-95b6-403d-b363-42131ebf4c4c@github.com> Message-ID: On Thu, 23 May 2024 12:44:42 GMT, Stefan Karlsson wrote: >> Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision: >> >> fixed the missing parts of shenandoahHeap.cpp > > src/hotspot/share/memory/virtualspace.hpp line 63: > >> 61: // it should not change after. >> 62: // * _alignment - Not to be changed after initialization >> 63: // * _executable - Not to be changed after initialization > > I think this would be a good change to do in the future, but currently this isn't true. `clear_members` do clear these fields, so I think you should remove these two lines. Done. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19343#discussion_r1613175723 From mdoerr at openjdk.org Fri May 24 09:34:02 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Fri, 24 May 2024 09:34:02 GMT Subject: RFR: 8331117: [PPC64] secondary_super_cache does not scale well In-Reply-To: References: Message-ID: On Thu, 23 May 2024 14:11:36 GMT, Martin Doerr wrote: > PPC64 implementation of [JDK-8180450](https://bugs.openjdk.org/browse/JDK-8180450). Please review! > I noticed that `r_array_length` is sometimes 0 and I don't see code for that on x86. Any idea? > How can we verify it? By comparing the performance using the micro benchmarks? @theRealAph: It would be great if you could take a look and see if you can spot any bug. Especially, I wonder why `r_array_length` happens to be 0 in some cases, but x86 doesn't check. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19368#issuecomment-2129071450 From azafari at openjdk.org Fri May 24 09:39:02 2024 From: azafari at openjdk.org (Afshin Zafari) Date: Fri, 24 May 2024 09:39:02 GMT Subject: RFR: 8331539: [REDO] NMT: add/make a mandatory MEMFLAGS argument to family of os::reserve/commit/uncommit memory API [v2] In-Reply-To: <9XzKmn3xJvlbw4gz2vK_NZ6yOwfKB9VzHE6CBSz-73E=.dfaa5291-95b6-403d-b363-42131ebf4c4c@github.com> References: <1i0PKv9mCusM6BZqXG8ULe0lRA2Nz2ix4aZHz9otNMM=.b9d2d151-883e-4cb6-be48-4ba45b49ed43@github.com> <_M5SvhyN_E_8HUeamhiLJMp37flhjgTVE_X7t8jmPVc=.f86cbb23-9461-4013-83bf-d6b154b96cfd@github.com> <9XzKmn3xJvlbw4gz2vK_NZ6yOwfKB9VzHE6CBSz-73E=.dfaa5291-95b6-403d-b363-42131ebf4c4c@github.com> Message-ID: On Thu, 23 May 2024 12:45:37 GMT, Stefan Karlsson wrote: >> Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision: >> >> fixed the missing parts of shenandoahHeap.cpp > > src/hotspot/share/memory/virtualspace.hpp line 76: > >> 74: >> 75: MEMFLAGS nmt_flag() const { assert(is_reserved(), "Memory region is not reserved."); assert(_flag != mtNone, "Memory flag is not set."); return _flag; } >> 76: > > Looking at this again, and realize that this function should probably be moved to the other accessors below. Done. > src/hotspot/share/memory/virtualspace.hpp line 98: > >> 96: bool special() const { assert(is_reserved(), "Memory region is not reserved."); return _special; } >> 97: bool executable() const { assert(is_reserved(), "Memory region is not reserved."); return _executable; } >> 98: size_t noaccess_prefix() const { assert(is_reserved(), "Memory region is not reserved."); return _noaccess_prefix; } > > FWIW, this change comes from one of my debugging sessions. I think it is good to have these asserts, I just wish they could says something like `assert(is_initialized(), ...)` to more clearly convey why we are doing this check. > > We are considering if there are ways to split ReservedSpace into two classes, one that handles reserving of memory and one that is a plain view of already reserved memory. If/when we do such a change we could consider updating these asserts to be more legible. > > In the meantime, it would be nice to change the string to "Fields not initialized" (and get rid of the `.`). Done. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19343#discussion_r1613185089 PR Review Comment: https://git.openjdk.org/jdk/pull/19343#discussion_r1613185323 From azafari at openjdk.org Fri May 24 09:54:03 2024 From: azafari at openjdk.org (Afshin Zafari) Date: Fri, 24 May 2024 09:54:03 GMT Subject: RFR: 8331539: [REDO] NMT: add/make a mandatory MEMFLAGS argument to family of os::reserve/commit/uncommit memory API [v2] In-Reply-To: <9XzKmn3xJvlbw4gz2vK_NZ6yOwfKB9VzHE6CBSz-73E=.dfaa5291-95b6-403d-b363-42131ebf4c4c@github.com> References: <1i0PKv9mCusM6BZqXG8ULe0lRA2Nz2ix4aZHz9otNMM=.b9d2d151-883e-4cb6-be48-4ba45b49ed43@github.com> <_M5SvhyN_E_8HUeamhiLJMp37flhjgTVE_X7t8jmPVc=.f86cbb23-9461-4013-83bf-d6b154b96cfd@github.com> <9XzKmn3xJvlbw4gz2vK_NZ6yOwfKB9VzHE6CBSz-73E=.dfaa5291-95b6-403d-b363-42131ebf4c4c@github.com> Message-ID: On Thu, 23 May 2024 13:04:24 GMT, Stefan Karlsson wrote: >> Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision: >> >> fixed the missing parts of shenandoahHeap.cpp > > src/hotspot/share/nmt/virtualMemoryTracker.cpp line 506: > >> 504: return true; >> 505: assert(reserved_rgn->end() == rgn.end() || reserved_rgn->base() == rgn.base(), "extra memory should be at either end of the region."); >> 506: } > > This seems like an extreme hack. I understand that this just follows the tradition of the rest of the hacks in this file, but can't this be better handled in the CDS layer above? TBH, I don't like it too. Unfortunately, chopping extra memory is done at `os:xxx` layer and reporting the case back to CDS would need to pass all the chopping info up to CDS. In addition, it is valid in CDS that a region is partitioned into sub regions and releasing sub regions can be silently and correctly ignored. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19343#discussion_r1613202175 From azafari at openjdk.org Fri May 24 09:57:01 2024 From: azafari at openjdk.org (Afshin Zafari) Date: Fri, 24 May 2024 09:57:01 GMT Subject: RFR: 8331539: [REDO] NMT: add/make a mandatory MEMFLAGS argument to family of os::reserve/commit/uncommit memory API [v2] In-Reply-To: <9XzKmn3xJvlbw4gz2vK_NZ6yOwfKB9VzHE6CBSz-73E=.dfaa5291-95b6-403d-b363-42131ebf4c4c@github.com> References: <1i0PKv9mCusM6BZqXG8ULe0lRA2Nz2ix4aZHz9otNMM=.b9d2d151-883e-4cb6-be48-4ba45b49ed43@github.com> <_M5SvhyN_E_8HUeamhiLJMp37flhjgTVE_X7t8jmPVc=.f86cbb23-9461-4013-83bf-d6b154b96cfd@github.com> <9XzKmn3xJvlbw4gz2vK_NZ6yOwfKB9VzHE6CBSz-73E=.dfaa5291-95b6-403d-b363-42131ebf4c4c@github.com> Message-ID: On Thu, 23 May 2024 11:58:15 GMT, Stefan Karlsson wrote: >> Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision: >> >> fixed the missing parts of shenandoahHeap.cpp > > src/hotspot/share/cds/metaspaceShared.cpp line 1092: > >> 1090: p2i(archive_space_rs.base()), p2i(archive_space_rs.end()), archive_space_rs.size()); >> 1091: } >> 1092: if (class_space_rs.is_reserved()) { > > `class_space_rs.is_reserved()` is asserted if `if (Metaspace::using_class_space())` is taken. I think this could be changed to: > Suggestion: > > if (Metaspace::using_class_space()) { Done. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19343#discussion_r1613208902 From azafari at openjdk.org Fri May 24 10:03:27 2024 From: azafari at openjdk.org (Afshin Zafari) Date: Fri, 24 May 2024 10:03:27 GMT Subject: RFR: 8331539: [REDO] NMT: add/make a mandatory MEMFLAGS argument to family of os::reserve/commit/uncommit memory API [v3] In-Reply-To: <1i0PKv9mCusM6BZqXG8ULe0lRA2Nz2ix4aZHz9otNMM=.b9d2d151-883e-4cb6-be48-4ba45b49ed43@github.com> References: <1i0PKv9mCusM6BZqXG8ULe0lRA2Nz2ix4aZHz9otNMM=.b9d2d151-883e-4cb6-be48-4ba45b49ed43@github.com> Message-ID: > This PR fixes the problems existed in the original PR (https://github.com/openjdk/jdk/pull/18745). There are two main fixes here: > 1- `ReservedSpace` class is changed so that the `_flag` member never changes after it is set in ctor. Since reserving memory regions may go thru a try and fail sequence of reserve-release pairs, changing the `_flag` member at failed releases would lead to incorrect flags in subsequent reserves. > Also, some assertion are added to the getters of a `ReservedSpace` to check if the region is successfully reserved. > > 2- In order to have adjacent regions with different flags, CDS reserves a (large) region `R` and then splits it into sub regions `R1` and `R2` (`R == <---R1---><--R2-->`). At release time, NMT tracks only `R` and ignores releasing `R1` and `R2`. This ignoring is problematic when a requested region `R` is size-aligned to `R1---R---R2` first and then the `R1` and `R2` are released (`chop_extra_memory` function is called for this). In this case, NMT ignores tracking `R1` and `R2` with false assumption that a containing `R` will be released. Therefore, `R1` and `R2` remain in the NMT reserved-regions-list and when a new reserve happens at that regions, NMT complains by raising an exception. > > Tests: > mach5 tiers 1-5, {linux-x64, macosx-aarch64, windows-x64, linux-aarch64 } x {debug, non-debug} Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision: applied review comments. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19343/files - new: https://git.openjdk.org/jdk/pull/19343/files/86ae1e37..c7ff3867 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19343&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19343&range=01-02 Stats: 16 lines in 3 files changed: 2 ins; 5 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/19343.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19343/head:pull/19343 PR: https://git.openjdk.org/jdk/pull/19343 From azafari at openjdk.org Fri May 24 10:03:27 2024 From: azafari at openjdk.org (Afshin Zafari) Date: Fri, 24 May 2024 10:03:27 GMT Subject: RFR: 8331539: [REDO] NMT: add/make a mandatory MEMFLAGS argument to family of os::reserve/commit/uncommit memory API [v2] In-Reply-To: <_M5SvhyN_E_8HUeamhiLJMp37flhjgTVE_X7t8jmPVc=.f86cbb23-9461-4013-83bf-d6b154b96cfd@github.com> References: <1i0PKv9mCusM6BZqXG8ULe0lRA2Nz2ix4aZHz9otNMM=.b9d2d151-883e-4cb6-be48-4ba45b49ed43@github.com> <_M5SvhyN_E_8HUeamhiLJMp37flhjgTVE_X7t8jmPVc=.f86cbb23-9461-4013-83bf-d6b154b96cfd@github.com> Message-ID: On Wed, 22 May 2024 12:09:14 GMT, Afshin Zafari wrote: >> This PR fixes the problems existed in the original PR (https://github.com/openjdk/jdk/pull/18745). There are two main fixes here: >> 1- `ReservedSpace` class is changed so that the `_flag` member never changes after it is set in ctor. Since reserving memory regions may go thru a try and fail sequence of reserve-release pairs, changing the `_flag` member at failed releases would lead to incorrect flags in subsequent reserves. >> Also, some assertion are added to the getters of a `ReservedSpace` to check if the region is successfully reserved. >> >> 2- In order to have adjacent regions with different flags, CDS reserves a (large) region `R` and then splits it into sub regions `R1` and `R2` (`R == <---R1---><--R2-->`). At release time, NMT tracks only `R` and ignores releasing `R1` and `R2`. This ignoring is problematic when a requested region `R` is size-aligned to `R1---R---R2` first and then the `R1` and `R2` are released (`chop_extra_memory` function is called for this). In this case, NMT ignores tracking `R1` and `R2` with false assumption that a containing `R` will be released. Therefore, `R1` and `R2` remain in the NMT reserved-regions-list and when a new reserve happens at that regions, NMT complains by raising an exception. >> >> Tests: >> mach5 tiers 1-5, {linux-x64, macosx-aarch64, windows-x64, linux-aarch64 } x {debug, non-debug} > > Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision: > > fixed the missing parts of shenandoahHeap.cpp Thank @stefank for your comments. Many of them applied, a few remained open for discussion. ------------- PR Review: https://git.openjdk.org/jdk/pull/19343#pullrequestreview-2076445838 From duke at openjdk.org Fri May 24 10:06:17 2024 From: duke at openjdk.org (kuaiwei) Date: Fri, 24 May 2024 10:06:17 GMT Subject: RFR: 8325821: [REDO] use "dmb.ishst+dmb.ishld" for release barrier [v4] In-Reply-To: References: Message-ID: > he origin patch for https://bugs.openjdk.org/browse/JDK-8324186 has 2 issues: > 1 It show regression in some platform, like Apple silicon in mac os > 2 Can not handle instruction sequence like "dmb.ishld; dmb.ishst; dmb.ishld; dmb.ishld" > > It can be fixed by: > 1 Enable AlwaysMergeDMB by default, only disable it in architecture we can see performance improvement (N1 or N2) > 2 Check the special pattern and merge the subsequent dmb. > > It also fix a bug when code buffer is expanding, st/ld/dmb can not be merged. I added unit tests for these. > > This patch still has a unhandled case. Insts like "dmb.ishld; dmb.ishst; dmb.ish", it will merge the last 2 instructions and can not merge all three. Because when emitting dmb.ish, if merge all previous dmbs, the code buffer will shrink the size. I think it may break some resumption and think it's not a common pattern. > > In previous PR https://github.com/openjdk/jdk/pull/18467 , I tried an implementation to use state machine for merging. But it looks risky to pending instruction during emitting. kuaiwei has updated the pull request incrementally with one additional commit since the last revision: Refine merge dmb test cases ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19278/files - new: https://git.openjdk.org/jdk/pull/19278/files/6214b435..00262c4c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19278&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19278&range=02-03 Stats: 7806 lines in 1 file changed: 149 ins; 7641 del; 16 mod Patch: https://git.openjdk.org/jdk/pull/19278.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19278/head:pull/19278 PR: https://git.openjdk.org/jdk/pull/19278 From duke at openjdk.org Fri May 24 10:06:17 2024 From: duke at openjdk.org (kuaiwei) Date: Fri, 24 May 2024 10:06:17 GMT Subject: RFR: 8325821: [REDO] use "dmb.ishst+dmb.ishld" for release barrier [v2] In-Reply-To: References: <9h-ta3XTnzioy3Ghdeulm6FgZYDJb2y5mDdMLGw3oYc=.defe7ef1-15dd-451d-8b79-3688c1e7a1da@github.com> Message-ID: <7wmLib5O_jaY1BISU-z4HrjGsHYF0DA-tAHpj5UzQQo=.3d770e9a-3759-42b2-aa4a-8cfb47715974@github.com> On Thu, 23 May 2024 16:02:21 GMT, Aleksey Shipilev wrote: >> Right. I was implicitly thinking that we can do this without coding the explicit patterns into the test. As it stands now, it is hard to check that generated patterns are actually correct. Let me see if I can whip up a sample of what I had in mind. > > I was thinking about this: [improve-tests.patch](https://github.com/openjdk/jdk/files/15419452/improve-tests.patch). Note how it uses the constants for better readability, and also runs the test in both `AlwaysMergeDMB` modes. You might want to adapt other tests to similar pattern. Thanks for your patch. I patched and add few sanity check. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19278#discussion_r1613219759 From stuefe at openjdk.org Fri May 24 10:17:02 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Fri, 24 May 2024 10:17:02 GMT Subject: RFR: 8331539: [REDO] NMT: add/make a mandatory MEMFLAGS argument to family of os::reserve/commit/uncommit memory API [v2] In-Reply-To: <_M5SvhyN_E_8HUeamhiLJMp37flhjgTVE_X7t8jmPVc=.f86cbb23-9461-4013-83bf-d6b154b96cfd@github.com> References: <1i0PKv9mCusM6BZqXG8ULe0lRA2Nz2ix4aZHz9otNMM=.b9d2d151-883e-4cb6-be48-4ba45b49ed43@github.com> <_M5SvhyN_E_8HUeamhiLJMp37flhjgTVE_X7t8jmPVc=.f86cbb23-9461-4013-83bf-d6b154b96cfd@github.com> Message-ID: On Wed, 22 May 2024 12:09:14 GMT, Afshin Zafari wrote: >> This PR fixes the problems existed in the original PR (https://github.com/openjdk/jdk/pull/18745). There are two main fixes here: >> 1- `ReservedSpace` class is changed so that the `_flag` member never changes after it is set in ctor. Since reserving memory regions may go thru a try and fail sequence of reserve-release pairs, changing the `_flag` member at failed releases would lead to incorrect flags in subsequent reserves. >> Also, some assertion are added to the getters of a `ReservedSpace` to check if the region is successfully reserved. >> >> 2- In order to have adjacent regions with different flags, CDS reserves a (large) region `R` and then splits it into sub regions `R1` and `R2` (`R == <---R1---><--R2-->`). At release time, NMT tracks only `R` and ignores releasing `R1` and `R2`. This ignoring is problematic when a requested region `R` is size-aligned to `R1---R---R2` first and then the `R1` and `R2` are released (`chop_extra_memory` function is called for this). In this case, NMT ignores tracking `R1` and `R2` with false assumption that a containing `R` will be released. Therefore, `R1` and `R2` remain in the NMT reserved-regions-list and when a new reserve happens at that regions, NMT complains by raising an exception. >> >> Tests: >> mach5 tiers 1-5, {linux-x64, macosx-aarch64, windows-x64, linux-aarch64 } x {debug, non-debug} > > Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision: > > fixed the missing parts of shenandoahHeap.cpp Good analysis, @afshin-zafari. However, I keep thinking we are stop-gap fixing holes in a badly designed system. The problem is two-fold: 1) NMT assumes reserves and commits to be different layers and, e.g., for committed regions to be fully contained in a reserved region. This is wrong and does not reflect the realities of mmap. We can overlay and overlap any reservation/committing/uncommitting/releasing in any way we want. The right way to track virtual memory regions is what we do now in https://github.com/openjdk/jdk/pull/18289 with src/hotspot/share/nmt/vmatree.hpp. See [1] for the (simple) theory behind it. Not only would this be a lot faster and simpler, but it would also be less error-prone since it does not assume any kind of layering between reservations and committing memory. With the VMATree, releasing a whole region would remove all containing regions automatically. With Johan's PR we now will do this for ZGC memory file allocations. However, we should also use the same technique to track VirtualMemory in NMT. Then, errors like this will disappear. 2) Another problem is that ReservedSpace assumes ownership of the underlying memory. On Windows, we cannot split regions allocated with VirtualAlloc. So ReservedRegions are assigned a MEMFLAG at construction, and we can never split up the region because Windows. Therefore, to have a contiguous region with different regions and different flags in NMT, NMT forces us to allocate them in two steps, side by side, with all that can go wrong. This is suboptimal. NMT is a simple tracker; it should not dictate how we allocate memory but be able to accommodate any way we want. I don't have a good solution in my head for (2). It is also the less urgent problem, I think. [1] https://gist.github.com/tstuefe/d9682b7f11b3375da27faa100f45e621 src/hotspot/share/cds/metaspaceShared.cpp line 1169: > 1167: // Set up compressed Klass pointer encoding: the encoding range must > 1168: // cover both archive and class space. > 1169: assert(class_space_rs.is_reserved(), "Memory region should be reserved."); Not necessary. Checked in reserve_address_space_for_archives, and in Metaspace::initialize_class_space ------------- PR Review: https://git.openjdk.org/jdk/pull/19343#pullrequestreview-2076383403 PR Review Comment: https://git.openjdk.org/jdk/pull/19343#discussion_r1613180561 From azafari at openjdk.org Fri May 24 10:17:03 2024 From: azafari at openjdk.org (Afshin Zafari) Date: Fri, 24 May 2024 10:17:03 GMT Subject: RFR: 8331539: [REDO] NMT: add/make a mandatory MEMFLAGS argument to family of os::reserve/commit/uncommit memory API [v2] In-Reply-To: References: <1i0PKv9mCusM6BZqXG8ULe0lRA2Nz2ix4aZHz9otNMM=.b9d2d151-883e-4cb6-be48-4ba45b49ed43@github.com> <_M5SvhyN_E_8HUeamhiLJMp37flhjgTVE_X7t8jmPVc=.f86cbb23-9461-4013-83bf-d6b154b96cfd@github.com> Message-ID: On Fri, 24 May 2024 09:03:07 GMT, Johan Sj?len wrote: > Thank you for the in-depth explanation, I think I understand it. What was the fix for this issue? As can be seen in `os_posix.cpp::chop_extra_memory`, the exceptional case (of releasing sub regions due to extra memory) is notified to `MemTracker` for handling it. The optional `bool extra_memory` arg of `MemTracker::record_virtual_memory_release()` is used for addressing the case. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19343#issuecomment-2129155345 From stefank at openjdk.org Fri May 24 10:17:04 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Fri, 24 May 2024 10:17:04 GMT Subject: RFR: 8331539: [REDO] NMT: add/make a mandatory MEMFLAGS argument to family of os::reserve/commit/uncommit memory API [v2] In-Reply-To: References: <1i0PKv9mCusM6BZqXG8ULe0lRA2Nz2ix4aZHz9otNMM=.b9d2d151-883e-4cb6-be48-4ba45b49ed43@github.com> <_M5SvhyN_E_8HUeamhiLJMp37flhjgTVE_X7t8jmPVc=.f86cbb23-9461-4013-83bf-d6b154b96cfd@github.com> <9XzKmn3xJvlbw4gz2vK_NZ6yOwfKB9VzHE6CBSz-73E=.dfaa5291-95b6-403d-b363-42131ebf4c4c@github.com> Message-ID: On Fri, 24 May 2024 09:12:16 GMT, Afshin Zafari wrote: >> src/hotspot/share/cds/metaspaceShared.cpp line 1088: >> >>> 1086: #endif // ASSERT >>> 1087: >>> 1088: if (archive_space_rs.is_reserved()) { >> >> We've already asserted that this should be true, so this if should not be needed. > > I had to add these, since the `log_info(cds)` calls caused the assertions in the `ReservedSpace` getters raised (IIRC, even during jdk-build). Can you show the error message? >> src/hotspot/share/cds/metaspaceShared.cpp line 1370: >> >>> 1368: ccs_begin_offset, mtClassShared, mtClass); >>> 1369: } >>> 1370: assert(archive_space_rs.is_reserved(), "Archive space is not reserved."); >> >> Something is dubious about the code above: >> >> archive_space_rs = total_space_rs.first_part(ccs_begin_offset, >> (size_t)archive_space_alignment); >> class_space_rs = total_space_rs.last_part(ccs_begin_offset); >> MemTracker::record_virtual_memory_split_reserved(total_space_rs.base(), total_space_rs.size(), >> ccs_begin_offset, mtClassShared, mtClass); >> >> >> In one path `total_space_rs` gets initialized with `mtClass` and in another path it gets initialized with `mtClassShared`. This means that we always get the wrong flag in one of `archive_space_rs` and `class_space_rs`. > > The logic is that, a large region `total_space_rs` is reserved and then is split into two sub regions. It doesn't matter what is the flag for `total_space_rs`. At split time the flags are set correctly for sub regions. The flags sent to the NMT subsystem is correct, but the flags recorded in the ReservedSpaces will be wrong, AFAIKT. You can probably verify that by adding asserts. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19343#discussion_r1613235966 PR Review Comment: https://git.openjdk.org/jdk/pull/19343#discussion_r1613237903 From stuefe at openjdk.org Fri May 24 10:17:05 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Fri, 24 May 2024 10:17:05 GMT Subject: RFR: 8331539: [REDO] NMT: add/make a mandatory MEMFLAGS argument to family of os::reserve/commit/uncommit memory API [v2] In-Reply-To: References: <1i0PKv9mCusM6BZqXG8ULe0lRA2Nz2ix4aZHz9otNMM=.b9d2d151-883e-4cb6-be48-4ba45b49ed43@github.com> <_M5SvhyN_E_8HUeamhiLJMp37flhjgTVE_X7t8jmPVc=.f86cbb23-9461-4013-83bf-d6b154b96cfd@github.com> <9XzKmn3xJvlbw4gz2vK_NZ6yOwfKB9VzHE6CBSz-73E=.dfaa5291-95b6-403d-b363-42131ebf4c4c@github.com> Message-ID: On Fri, 24 May 2024 09:20:13 GMT, Afshin Zafari wrote: >> src/hotspot/share/cds/metaspaceShared.cpp line 1341: >> >>> 1339: } else { >>> 1340: if (use_archive_base_addr && base_address != nullptr) { >>> 1341: total_space_rs = ReservedSpace(total_range_size, archive_space_alignment, >> >> Can you explain why you changed this? >> >> It's also interesting that after this change we only use `base_address_alignment` in asserts. I think this indicates that something should be cleaned up / fixed here. > > That comes after merge with mainline. > I trusted the tests in tiers 1-5 for the correctness of this change. No, I think this is wrong. I changed it with https://github.com/openjdk/jdk/pull/19152. Please be careful, this part is rather tricky, and a lot of thought went into this. And I am pretty sure we don't cover all possible code paths in tests. Please also note that I am working on adding no-access zones for the Klass Encoding range (see https://github.com/openjdk/jdk/pull/19290), which may impact these regions too. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19343#discussion_r1613191595 From stefank at openjdk.org Fri May 24 10:38:02 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Fri, 24 May 2024 10:38:02 GMT Subject: RFR: 8331539: [REDO] NMT: add/make a mandatory MEMFLAGS argument to family of os::reserve/commit/uncommit memory API [v3] In-Reply-To: References: <1i0PKv9mCusM6BZqXG8ULe0lRA2Nz2ix4aZHz9otNMM=.b9d2d151-883e-4cb6-be48-4ba45b49ed43@github.com> Message-ID: On Fri, 24 May 2024 10:03:27 GMT, Afshin Zafari wrote: >> This PR fixes the problems existed in the original PR (https://github.com/openjdk/jdk/pull/18745). There are two main fixes here: >> 1- `ReservedSpace` class is changed so that the `_flag` member never changes after it is set in ctor. Since reserving memory regions may go thru a try and fail sequence of reserve-release pairs, changing the `_flag` member at failed releases would lead to incorrect flags in subsequent reserves. >> Also, some assertion are added to the getters of a `ReservedSpace` to check if the region is successfully reserved. >> >> 2- In order to have adjacent regions with different flags, CDS reserves a (large) region `R` and then splits it into sub regions `R1` and `R2` (`R == <---R1---><--R2-->`). At release time, NMT tracks only `R` and ignores releasing `R1` and `R2`. This ignoring is problematic when a requested region `R` is size-aligned to `R1---R---R2` first and then the `R1` and `R2` are released (`chop_extra_memory` function is called for this). In this case, NMT ignores tracking `R1` and `R2` with false assumption that a containing `R` will be released. Therefore, `R1` and `R2` remain in the NMT reserved-regions-list and when a new reserve happens at that regions, NMT complains by raising an exception. >> >> Tests: >> mach5 tiers 1-5, {linux-x64, macosx-aarch64, windows-x64, linux-aarch64 } x {debug, non-debug} > > Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision: > > applied review comments. Changes requested by stefank (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/19343#pullrequestreview-2076525088 From stefank at openjdk.org Fri May 24 10:38:03 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Fri, 24 May 2024 10:38:03 GMT Subject: RFR: 8331539: [REDO] NMT: add/make a mandatory MEMFLAGS argument to family of os::reserve/commit/uncommit memory API [v2] In-Reply-To: References: <1i0PKv9mCusM6BZqXG8ULe0lRA2Nz2ix4aZHz9otNMM=.b9d2d151-883e-4cb6-be48-4ba45b49ed43@github.com> <_M5SvhyN_E_8HUeamhiLJMp37flhjgTVE_X7t8jmPVc=.f86cbb23-9461-4013-83bf-d6b154b96cfd@github.com> <9XzKmn3xJvlbw4gz2vK_NZ6yOwfKB9VzHE6CBSz-73E=.dfaa5291-95b6-403d-b363-42131ebf4c4c@github.com> Message-ID: On Fri, 24 May 2024 08:57:57 GMT, Afshin Zafari wrote: >> src/hotspot/os/posix/os_posix.cpp line 387: >> >>> 385: if (os::release_memory(extra_base, begin_offset)) >>> 386: { >>> 387: ThreadCritical tc; >> >> In many of the functions we put the `ThreadCritical` inside the `MemTracker` after the `enabled()` check, but we don't do it here. Why is that? Shouldn't the `ThreadCritical` usage be hidden inside `MemTracker`? > > I have already tried to move `ThreadCritical` into the `MemTracker` (in another PR), but it failed. AFAIR, the unmapping/releasing the memory should be in critical section too. The current implementation follows this order: 1) create critical section 2) unmap/release 3) if successful, call MemTracker. The step 2) should be in critical section. Hmm. os::release_memory also calls `record_virtual_memory_release`, and then this code calls it again with a second ThreadCritical, but then it is called again with `extra_memory`. I still find this addition of `extra_memory` highly dubious. >> src/hotspot/share/nmt/virtualMemoryTracker.cpp line 506: >> >>> 504: return true; >>> 505: assert(reserved_rgn->end() == rgn.end() || reserved_rgn->base() == rgn.base(), "extra memory should be at either end of the region."); >>> 506: } >> >> This seems like an extreme hack. I understand that this just follows the tradition of the rest of the hacks in this file, but can't this be better handled in the CDS layer above? > > TBH, I don't like it too. Unfortunately, chopping extra memory is done at `os:xxx` layer and reporting the case back to CDS would need to pass all the chopping info up to CDS. In addition, it is valid in CDS that a region is partitioned into sub regions and releasing sub regions can be silently and correctly ignored. I'd like to take an extra look at that before this PR gets integrated. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19343#discussion_r1613259695 PR Review Comment: https://git.openjdk.org/jdk/pull/19343#discussion_r1613262335 From mli at openjdk.org Fri May 24 10:52:07 2024 From: mli at openjdk.org (Hamlin Li) Date: Fri, 24 May 2024 10:52:07 GMT Subject: RFR: 8332265: RISC-V: Materialize pointers faster by using a temp register [v6] In-Reply-To: References: Message-ID: On Thu, 23 May 2024 10:55:35 GMT, Robbin Ehn wrote: >> Hi, please consider! >> >> Materializing a 48-bit pointer, using an additional register, we can do with: >> lui + lui + slli + add + addi >> This 15% faster both on VF2 and in CPU models, compared to movptr(). >> >> As we often materialize during calls there is free registers. >> >> I have choose just a few spot to use it, many more can use. >> E.g. la() with tmp register can use li48 instead of movptr. >> >> Running tests now (so far so good), as if I screwed up IC calls it should be seen fast. >> And benchmarks when hardware is free. > > Robbin Ehn has updated the pull request incrementally with two additional commits since the last revision: > > - Fixed more comments > - Fixed comments src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 1681: > 1679: } > 1680: > 1681: void MacroAssembler::movptr1(Register Rd, uint64_t imm64, int32_t &offset) { Original code of MacroAssembler:: movptr(...) is bit tricky at `upper -= lower;` to understand for me, and I think new MacroAssembler:: movptr2(...) uses the similar way at `lower30 - low12`. I can add some comment later to help future understanding. src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 1703: > 1701: assert_different_registers(Rd, tmp, noreg); > 1702: > 1703: uint32_t upper18 = (addr >> 30ull); literal suffix `ull` could be removed? And `uint32_t upper18 = (addr >> 30ull);` + `lui(tmp, upper18 << 12);` could be replaced with `lui(tmp, addr >> 18);`? src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 1706: > 1704: int32_t lower30 = (addr & 0x3fffffffu); > 1705: int32_t low12 = (lower30 << 20) >> 20; > 1706: int32_t mid18 = ((lower30 - low12) >> 12); Similar here. `mid18 = ((lower30 - low12) >> 12);` and `lui(Rd, mid18 << 12);` could be replaced with `lui(Rd, lower30 - low12);`? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19246#discussion_r1613279908 PR Review Comment: https://git.openjdk.org/jdk/pull/19246#discussion_r1613247627 PR Review Comment: https://git.openjdk.org/jdk/pull/19246#discussion_r1613257147 From mli at openjdk.org Fri May 24 10:58:04 2024 From: mli at openjdk.org (Hamlin Li) Date: Fri, 24 May 2024 10:58:04 GMT Subject: RFR: 8332265: RISC-V: Materialize pointers faster by using a temp register [v6] In-Reply-To: References: Message-ID: On Thu, 23 May 2024 10:55:35 GMT, Robbin Ehn wrote: >> Hi, please consider! >> >> Materializing a 48-bit pointer, using an additional register, we can do with: >> lui + lui + slli + add + addi >> This 15% faster both on VF2 and in CPU models, compared to movptr(). >> >> As we often materialize during calls there is free registers. >> >> I have choose just a few spot to use it, many more can use. >> E.g. la() with tmp register can use li48 instead of movptr. >> >> Running tests now (so far so good), as if I screwed up IC calls it should be seen fast. >> And benchmarks when hardware is free. > > Robbin Ehn has updated the pull request incrementally with two additional commits since the last revision: > > - Fixed more comments > - Fixed comments And a general question about `nativeInst_riscv.cpp` and `macroAssembler_riscv.cpp`. I saw the functions in these 2 files call each other, that make the code a bit mess to me. It's not an issue introduced in this pr. I wonder if this could be refactored? If so, I can work on it. But just in case you have easy answer already, so I don't have to do further investigation. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19246#issuecomment-2129240699 From rehn at openjdk.org Fri May 24 11:30:06 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Fri, 24 May 2024 11:30:06 GMT Subject: RFR: 8332265: RISC-V: Materialize pointers faster by using a temp register [v6] In-Reply-To: References: Message-ID: On Thu, 23 May 2024 15:04:22 GMT, Fei Yang wrote: > Updated change looks good. It would be nice to see how much this will benefit performance. I tried todo some benchmarks but it seems like the error of them are larger than the benefit. I'll try todo some longer runs, and minimize the error. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19246#issuecomment-2129297347 From coleenp at openjdk.org Fri May 24 11:50:05 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Fri, 24 May 2024 11:50:05 GMT Subject: RFR: JDK-8325841 - Remove unused references to vmSymbols.hpp In-Reply-To: References: Message-ID: On Thu, 23 May 2024 21:51:32 GMT, Cesar Soares Lucas wrote: > Can I please get some reviews for this change to remove unused names from `vmSymbols.hpp`? > > As far as I can tell there is nothing in the code base using these symbols. My search was just a simple grep + some bash script, though. I tested using JTREG on MacOS, Linux Mariner & Alpine from tier1 to 3. This looks good. Hotspot rules are that you need two reviewers before integrating. ------------- Marked as reviewed by coleenp (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19374#pullrequestreview-2076689162 From azafari at openjdk.org Fri May 24 11:51:02 2024 From: azafari at openjdk.org (Afshin Zafari) Date: Fri, 24 May 2024 11:51:02 GMT Subject: RFR: 8331539: [REDO] NMT: add/make a mandatory MEMFLAGS argument to family of os::reserve/commit/uncommit memory API [v2] In-Reply-To: References: <1i0PKv9mCusM6BZqXG8ULe0lRA2Nz2ix4aZHz9otNMM=.b9d2d151-883e-4cb6-be48-4ba45b49ed43@github.com> <_M5SvhyN_E_8HUeamhiLJMp37flhjgTVE_X7t8jmPVc=.f86cbb23-9461-4013-83bf-d6b154b96cfd@github.com> <9XzKmn3xJvlbw4gz2vK_NZ6yOwfKB9VzHE6CBSz-73E=.dfaa5291-95b6-403d-b363-42131ebf4c4c@github.com> Message-ID: On Fri, 24 May 2024 10:32:17 GMT, Stefan Karlsson wrote: >> I have already tried to move `ThreadCritical` into the `MemTracker` (in another PR), but it failed. AFAIR, the unmapping/releasing the memory should be in critical section too. The current implementation follows this order: 1) create critical section 2) unmap/release 3) if successful, call MemTracker. The step 2) should be in critical section. > > Hmm. os::release_memory also calls `record_virtual_memory_release`, and then this code calls it again with a second ThreadCritical, but then it is called again with `extra_memory`. I still find this addition of `extra_memory` highly dubious. Some facts: - `MemTracker::record_virtual_memory_release()` has no `ThreadCritical` internally and therefore should be called inside a critical section. - When `os::release_memory()` returns, the `ThreadCritical` that is created there is destroyed and a new one should be created again here. - Releasing a sub-region that flagged for CDS and is contained in a larger CDS region is ignored at `MemTracker::record_virtual_memory_release()`. It is a valid case due to the way that CDS reserves and/or releases regions. - This exceptional case is notified to `MemTracker` by passing `true` as `extra_memory`. - Inside `MemTracker`, the `extra_memory == true` is used in the places where the exceptional case should/would be addressed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19343#discussion_r1613352945 From cslucas at openjdk.org Fri May 24 11:58:05 2024 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Fri, 24 May 2024 11:58:05 GMT Subject: Integrated: JDK-8325841 - Remove unused references to vmSymbols.hpp In-Reply-To: References: Message-ID: On Thu, 23 May 2024 21:51:32 GMT, Cesar Soares Lucas wrote: > Can I please get some reviews for this change to remove unused names from `vmSymbols.hpp`? > > As far as I can tell there is nothing in the code base using these symbols. My search was just a simple grep + some bash script, though. I tested using JTREG on MacOS, Linux Mariner & Alpine from tier1 to 3. This pull request has now been integrated. Changeset: 5a2ba952 Author: Cesar Soares Lucas Committer: Coleen Phillimore URL: https://git.openjdk.org/jdk/commit/5a2ba952b120394d7cc0d0890619780c1c27a078 Stats: 57 lines in 2 files changed: 0 ins; 56 del; 1 mod 8325841: Remove unused references to vmSymbols.hpp Reviewed-by: kvn, coleenp ------------- PR: https://git.openjdk.org/jdk/pull/19374 From azafari at openjdk.org Fri May 24 12:01:02 2024 From: azafari at openjdk.org (Afshin Zafari) Date: Fri, 24 May 2024 12:01:02 GMT Subject: RFR: 8331539: [REDO] NMT: add/make a mandatory MEMFLAGS argument to family of os::reserve/commit/uncommit memory API [v2] In-Reply-To: References: <1i0PKv9mCusM6BZqXG8ULe0lRA2Nz2ix4aZHz9otNMM=.b9d2d151-883e-4cb6-be48-4ba45b49ed43@github.com> <_M5SvhyN_E_8HUeamhiLJMp37flhjgTVE_X7t8jmPVc=.f86cbb23-9461-4013-83bf-d6b154b96cfd@github.com> <9XzKmn3xJvlbw4gz2vK_NZ6yOwfKB9VzHE6CBSz-73E=.dfaa5291-95b6-403d-b363-42131ebf4c4c@github.com> Message-ID: On Fri, 24 May 2024 10:34:56 GMT, Stefan Karlsson wrote: >> TBH, I don't like it too. Unfortunately, chopping extra memory is done at `os:xxx` layer and reporting the case back to CDS would need to pass all the chopping info up to CDS. In addition, it is valid in CDS that a region is partitioned into sub regions and releasing sub regions can be silently and correctly ignored. > > I'd like to take an extra look at that before this PR gets integrated. I agree and will work on it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19343#discussion_r1613367242 From rehn at openjdk.org Fri May 24 12:20:06 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Fri, 24 May 2024 12:20:06 GMT Subject: RFR: 8332265: RISC-V: Materialize pointers faster by using a temp register [v6] In-Reply-To: References: Message-ID: On Fri, 24 May 2024 10:23:49 GMT, Hamlin Li wrote: >> Robbin Ehn has updated the pull request incrementally with two additional commits since the last revision: >> >> - Fixed more comments >> - Fixed comments > > src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 1703: > >> 1701: assert_different_registers(Rd, tmp, noreg); >> 1702: >> 1703: uint32_t upper18 = (addr >> 30ull); > > literal suffix `ull` could be removed? > And `uint32_t upper18 = (addr >> 30ull);` + `lui(tmp, upper18 << 12);` could be replaced with `lui(tmp, addr >> 18);`? **ull** is there to say should be a 64-bit logical shift, appereant not doing it's jobs. The reason it's done in two steps is first we get the values, then we adjust them for our implementation of the instructions. Our lui() shifts down the immediate value so we must shift it up before. While when we patch lui we use the value without shifting: unsigned int upper18 = (addr >> 30ull); Assembler::patch(instruction_address + (NativeInstruction::instruction_size * 0), 31, 12, (upper18 & 0xfffff)); So it can be replaced but then we are missing the calculation why it's correct, > src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 1706: > >> 1704: int32_t lower30 = (addr & 0x3fffffffu); >> 1705: int32_t low12 = (lower30 << 20) >> 20; >> 1706: int32_t mid18 = ((lower30 - low12) >> 12); > > Similar here. `mid18 = ((lower30 - low12) >> 12);` and `lui(Rd, mid18 << 12);` could be replaced with `lui(Rd, lower30 - low12);`? Same reason as above. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19246#discussion_r1613388456 PR Review Comment: https://git.openjdk.org/jdk/pull/19246#discussion_r1613389486 From jsjolen at openjdk.org Fri May 24 12:26:16 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Fri, 24 May 2024 12:26:16 GMT Subject: RFR: 8312132: Add tracking of multiple address spaces in NMT [v105] In-Reply-To: References: Message-ID: On Fri, 24 May 2024 08:02:03 GMT, Thomas Stuefe wrote: >> Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: >> >> Lower number of pages > > src/hotspot/share/nmt/memoryFileTracker.cpp line 50: > >> 48: for (int i = 0; i < mt_number_of_types; i++) { >> 49: VirtualMemory* summary = device->_summary.by_type(NMTUtil::index_to_flag(i)); >> 50: summary->reserve_memory(diff.flag[i].reserve); > > Why do we only track reserved memory here? This is clearly a point of confusion, as Gerard also asked about this. The answer is that the MFT only cares about memory in a file, which is always considered committed... So we consider reserved memory to be committed. Yeah, let's just change it so that MFT always commits memory instead. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18289#discussion_r1613395782 From jsjolen at openjdk.org Fri May 24 12:31:28 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Fri, 24 May 2024 12:31:28 GMT Subject: RFR: 8312132: Add tracking of multiple address spaces in NMT [v107] In-Reply-To: References: Message-ID: <3u9JYI8Khnzq5UibXCSI1YXMI7Gi5RzAsAYiICJnxcQ=.79b6c499-6b5a-4d32-9038-dac21691f073@github.com> > Hi, > > This PR introduces a new abstraction to NMT, named `MemoryFileTracker`. Today, NMT does not track any memory outside of the virtual memory address space. This means that if you allocated memory in something such as a memory-backed file and use `mmap` to map into that memory, then you'll have trouble reporting this to NMT. This is the situation that ZGC is in, and that is what this patch attempts to fix. > > ## `MemoryFileTracker` > > The `MemoryFileTracker` adds the ability of adding new virtual memory address spaces to NMT and committing memory to these, the basic API is: > > ```c++ > static MemoryFile* make_device(const char* descriptive_name); > static void free_device(MemoryFile* device); > > static void allocate_memory(MemoryFile* device, size_t offset, size_t size, > MEMFLAGS flag, const NativeCallStack& stack); > static void free_memory(MemoryFile* device, size_t offset, size_t size); > > > It is easiest to see how this is used by looking at what ZGC's `ZNMT` class does: > > ```c++ > void ZNMT::reserve(zaddress_unsafe start, size_t size) { > MemTracker::record_virtual_memory_reserve((address)start, size, CALLER_PC, mtJavaHeap); > } > void ZNMT::commit(zoffset offset, size_t size) { > MemTracker::allocate_memory_in(ZNMT::_device, static_cast(offset), size, mtJavaHeap, CALLER_PC); > } > void ZNMT::uncommit(zoffset offset, size_t size) { > MemTracker::free_memory_in(ZNMT::_device, (size_t)offset, size); > } > > void ZNMT::map(zaddress_unsafe addr, size_t size, zoffset offset) { > // NMT doesn't track mappings at the moment. > } > void ZNMT::unmap(zaddress_unsafe addr, size_t size) { > // NMT doesn't track mappings at the moment. > } > > > As you can see, any mapping between reserved regions and device-allocated memory is not recorded in NMT. This means that in detailed mode you only get reserved regions printed for the reserved memory, the device-allocated memory is reported separately. When performing summary reporting any memory allocated via these devices is added to the corresponding `MEMFLAGS` as `committed` memory. > > This patch is also acting as a base on which we deploy multiple new backend ideas to NMT. These ideas are: > > 1. Implement VMA tracking using a balanced binary tree approach. Today's `VirtualMemoryTracker`'s usage of linked lists is slow and brittle, we'd like to move away from it. Our Treap-based approach in this patch gives a performance boost such that we see 25x better performance in a benchmark. The idea and draft of this... Johan Sj?len has updated the pull request incrementally with three additional commits since the last revision: - Make position and PositionComparator public - Just use commit directly and not reserved in MFT - Fix test ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18289/files - new: https://git.openjdk.org/jdk/pull/18289/files/f99190f0..4aaa0927 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18289&range=106 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18289&range=105-106 Stats: 28 lines in 4 files changed: 1 ins; 2 del; 25 mod Patch: https://git.openjdk.org/jdk/pull/18289.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18289/head:pull/18289 PR: https://git.openjdk.org/jdk/pull/18289 From jsjolen at openjdk.org Fri May 24 12:37:21 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Fri, 24 May 2024 12:37:21 GMT Subject: RFR: 8312132: Add tracking of multiple address spaces in NMT [v105] In-Reply-To: References: Message-ID: On Fri, 24 May 2024 12:23:23 GMT, Johan Sj?len wrote: >> src/hotspot/share/nmt/memoryFileTracker.cpp line 50: >> >>> 48: for (int i = 0; i < mt_number_of_types; i++) { >>> 49: VirtualMemory* summary = device->_summary.by_type(NMTUtil::index_to_flag(i)); >>> 50: summary->reserve_memory(diff.flag[i].reserve); >> >> Why do we only track reserved memory here? > > This is clearly a point of confusion, as Gerard also asked about this. The answer is that the MFT only cares about memory in a file, which is always considered committed... So we consider reserved memory to be committed. Yeah, let's just change it so that MFT always commits memory instead. Switched it around a bit. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18289#discussion_r1613409163 From jsjolen at openjdk.org Fri May 24 12:37:21 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Fri, 24 May 2024 12:37:21 GMT Subject: RFR: 8312132: Add tracking of multiple address spaces in NMT [v105] In-Reply-To: References: Message-ID: On Fri, 24 May 2024 08:20:20 GMT, Thomas Stuefe wrote: >> Wait, I made a small thinking error here. If you do this, MemoryDeltas should carry delta for both reserved and committed counters, and therefore is tied to the VirtualMemory case, and should probably be named VirtualMemoryDelta (or Diff or whatever) > > You know what, I leave this up to you. We can streamline this in later RFEs. Yeah, this does sound like a refactoring in a future RFE. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18289#discussion_r1613407935 From jsjolen at openjdk.org Fri May 24 12:37:22 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Fri, 24 May 2024 12:37:22 GMT Subject: RFR: 8312132: Add tracking of multiple address spaces in NMT [v105] In-Reply-To: References: Message-ID: <0w6COcy8MtVQVVXynGMg9FfNu-Wtwou99Y0UhD1nwTk=.5839be0b-cd9b-4964-80c8-1a58de36dc37@github.com> On Fri, 24 May 2024 07:53:09 GMT, Thomas Stuefe wrote: >> Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: >> >> Lower number of pages > > src/hotspot/share/nmt/memoryFileTracker.cpp line 66: > >> 64: stream->cr(); >> 65: VMATree::TreapNode* prev = nullptr; >> 66: device->_tree.visit_in_order([&](VMATree::TreapNode* current) { > > Does MemoryFileTracker really need to be friend to access the tree? It only needs read-only access to the tree, nothing else. Why not expose a ro access to the tree? > > I balk at the many friend relationships. To my eye, they undermine the encapsulation. I can see the point for test classes, but here? It's not a friend, only the test is. But I do agree, the general API w.r.t. encapsulation is a bit poorly defined right now. It's a bit difficult to nail the public vs private API whilst the code changes so frequently. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18289#discussion_r1613406858 From rehn at openjdk.org Fri May 24 12:39:05 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Fri, 24 May 2024 12:39:05 GMT Subject: RFR: 8332265: RISC-V: Materialize pointers faster by using a temp register [v6] In-Reply-To: References: Message-ID: On Fri, 24 May 2024 10:48:36 GMT, Hamlin Li wrote: >> Robbin Ehn has updated the pull request incrementally with two additional commits since the last revision: >> >> - Fixed more comments >> - Fixed comments > > src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 1681: > >> 1679: } >> 1680: >> 1681: void MacroAssembler::movptr1(Register Rd, uint64_t imm64, int32_t &offset) { > > Original code of MacroAssembler:: movptr(...) is bit tricky at `upper -= lower;` to understand for me, and I think new MacroAssembler:: movptr2(...) uses the similar way at `lower30 - low12`. > I can add some comment later to help future understanding later. Yes, it's very confusing with 12 bit *signed*, imm 11 bits + 1 signed bit. Therefore we need to use arithmetic shift to create "12 bit sign value". (hence why they do not use 12ull, to signal this is an arithmetic shift) So if we need to set bit 12 this value it will be negative plus the bit pattern we want in the other 11 bits. To compensate for that we need to lui() a larger value, and that value would be low30 - low12. If low12 is postive it's just removing those bits, otherwise we also need to add 4096 (bit 12). Hence low30 - -low12 would instead add to low30, the mid18 can thus be extended to 19 bits. The 20 bit is sign bit so we must stay away from it otherwise lui() will sign extend the 19+1 imm value. I don't think that helped, but I tried at least :) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19246#discussion_r1613411682 From rehn at openjdk.org Fri May 24 12:45:04 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Fri, 24 May 2024 12:45:04 GMT Subject: RFR: 8332265: RISC-V: Materialize pointers faster by using a temp register [v6] In-Reply-To: References: Message-ID: On Fri, 24 May 2024 10:55:27 GMT, Hamlin Li wrote: > And a general question about `nativeInst_riscv.cpp` and `macroAssembler_riscv.cpp`. I saw the functions in these 2 files call each other, that make the code a bit mess to me. It's not an issue introduced in this pr. I wonder if this could be refactored? If so, I can work on it. But just in case you have easy answer already, so I don't have to do further investigation. I agree, I would prefer having classes for the instruction where all the instruction functionality would be. As it's now the opcodes are reapted everywhere, instead it should just be in in-place, this class. And then have classes for instruction sequence where we keep all functionality gathered. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19246#issuecomment-2129438142 From duke at openjdk.org Fri May 24 12:56:10 2024 From: duke at openjdk.org (Lei Zaakjyu) Date: Fri, 24 May 2024 12:56:10 GMT Subject: RFR: 8330694: Rename 'HeapRegion' to 'G1HeapRegion' [v12] In-Reply-To: <27qGzorWxtdq6HLmIMPLHZ6_qRbOZo2DvA7pewZfNKA=.3f11daeb-1645-466e-b4bb-56aab62021b2@github.com> References: <3IdWn9VGEERd8v9RcH2E_LzjVo0L8nMfi5jGWmhgVuM=.6b5b3be4-bfbd-4376-9580-48d78d75665c@github.com> <27qGzorWxtdq6HLmIMPLHZ6_qRbOZo2DvA7pewZfNKA=.3f11daeb-1645-466e-b4bb-56aab62021b2@github.com> Message-ID: On Tue, 21 May 2024 06:08:54 GMT, Chris Plummer wrote: >> Lei Zaakjyu has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: >> >> restore > > test/hotspot/jtreg/runtime/cds/appcds/sharedStrings/SharedStringsHumongous.java line 90: > >> 88: // before dumping the string table. That means the heap should contain no >> 89: // humongous regions. >> 90: dumpOutput.shouldNotMatch("gc,region,cds. G1HeapRegion 0x[0-9a-f]* HUM"); > > Just a minor nit. I noticed a pre-existing typo on line 87 above. It says "kelp" instead of "kept". Can you fix it? ok ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18871#discussion_r1613437928 From tschatzl at openjdk.org Fri May 24 12:56:20 2024 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Fri, 24 May 2024 12:56:20 GMT Subject: RFR: 8330577: G1 sometimes sends jdk.G1HeapRegionTypeChange for non-changes Message-ID: <4B5e_9phnbHwNVMi-muq8EweQTt58Wm3KTL_Psjbt9w=.bc00798c-9947-4823-9ff4-d95da2e88e40@github.com> Hi all, please review this change that avoids posting Free->Free and Old->Old region transitions in JFR. The reason for these could have been: * Free->Free: heap shrinking and full gc * Old->Old: heap shrinking, full gc or evacuation failure in an old region Parts of this change has been contributed by @ansteiner , crediting him for this (the first commit). Testing: tier1-3, tier5, all "detailed" JFR test cases Thanks, Thomas ------------- Commit messages: - Fix errorneous Old->Old transitions which actually were Free->Old. - JDK-8330577 Changes: https://git.openjdk.org/jdk/pull/19389/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19389&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8330577 Stats: 112 lines in 2 files changed: 110 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/19389.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19389/head:pull/19389 PR: https://git.openjdk.org/jdk/pull/19389 From duke at openjdk.org Fri May 24 13:04:14 2024 From: duke at openjdk.org (Lei Zaakjyu) Date: Fri, 24 May 2024 13:04:14 GMT Subject: RFR: 8330694: Rename 'HeapRegion' to 'G1HeapRegion' [v13] In-Reply-To: <3IdWn9VGEERd8v9RcH2E_LzjVo0L8nMfi5jGWmhgVuM=.6b5b3be4-bfbd-4376-9580-48d78d75665c@github.com> References: <3IdWn9VGEERd8v9RcH2E_LzjVo0L8nMfi5jGWmhgVuM=.6b5b3be4-bfbd-4376-9580-48d78d75665c@github.com> Message-ID: > follow up 8267941 Lei Zaakjyu has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 10 commits: - review - Merge branch 'master' of https://git.openjdk.org/jdk into JDK-8330694 - restore - Merge branch 'master' of https://git.openjdk.org/jdk into JDK-8330694 - review - Merge branch 'master' into JDK-8330694 - fix indentation - also tidy up - tidy up - rename ------------- Changes: https://git.openjdk.org/jdk/pull/18871/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=18871&range=12 Stats: 1003 lines in 123 files changed: 1 ins; 4 del; 998 mod Patch: https://git.openjdk.org/jdk/pull/18871.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18871/head:pull/18871 PR: https://git.openjdk.org/jdk/pull/18871 From azafari at openjdk.org Fri May 24 13:13:04 2024 From: azafari at openjdk.org (Afshin Zafari) Date: Fri, 24 May 2024 13:13:04 GMT Subject: RFR: 8331539: [REDO] NMT: add/make a mandatory MEMFLAGS argument to family of os::reserve/commit/uncommit memory API [v2] In-Reply-To: References: <1i0PKv9mCusM6BZqXG8ULe0lRA2Nz2ix4aZHz9otNMM=.b9d2d151-883e-4cb6-be48-4ba45b49ed43@github.com> <_M5SvhyN_E_8HUeamhiLJMp37flhjgTVE_X7t8jmPVc=.f86cbb23-9461-4013-83bf-d6b154b96cfd@github.com> <9XzKmn3xJvlbw4gz2vK_NZ6yOwfKB9VzHE6CBSz-73E=.dfaa5291-95b6-403d-b363-42131ebf4c4c@github.com> Message-ID: On Fri, 24 May 2024 10:14:41 GMT, Stefan Karlsson wrote: >> The logic is that, a large region `total_space_rs` is reserved and then is split into two sub regions. It doesn't matter what is the flag for `total_space_rs`. At split time the flags are set correctly for sub regions. > > The flags sent to the NMT subsystem is correct, but the flags recorded in the ReservedSpaces will be wrong, AFAIKT. You can probably verify that by adding asserts. If your comment refers only to these lines of code, they are already verified. Since, inside the split function, the sub-regions get the new flags and all the reserved and committed amounts are moved from the large region to the new ones. So, the accounting of memory is correct. FWIW, if we trace down the call at line 1346 of `total_space_rs = Metaspace::reserve_address_space_for_compressed_classes(total_range_size, false /* optimize_for_zero_base */);` the region may get different flags of `mtClass` or `mtMetaspace` based on the checked criteria down there. If you comment on all such cases, then I will double check for them and add assertion for. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19343#discussion_r1613462000 From tschatzl at openjdk.org Fri May 24 13:15:04 2024 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Fri, 24 May 2024 13:15:04 GMT Subject: RFR: 8330694: Rename 'HeapRegion' to 'G1HeapRegion' [v13] In-Reply-To: References: <3IdWn9VGEERd8v9RcH2E_LzjVo0L8nMfi5jGWmhgVuM=.6b5b3be4-bfbd-4376-9580-48d78d75665c@github.com> Message-ID: On Fri, 24 May 2024 13:04:14 GMT, Lei Zaakjyu wrote: >> follow up 8267941 > > Lei Zaakjyu has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 10 commits: > > - review > - Merge branch 'master' of https://git.openjdk.org/jdk into JDK-8330694 > - restore > - Merge branch 'master' of https://git.openjdk.org/jdk into JDK-8330694 > - review > - Merge branch 'master' into JDK-8330694 > - fix indentation > - also tidy up > - tidy up > - rename Still good imo ------------- Marked as reviewed by tschatzl (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18871#pullrequestreview-2076897185 From mbaesken at openjdk.org Fri May 24 13:34:24 2024 From: mbaesken at openjdk.org (Matthias Baesken) Date: Fri, 24 May 2024 13:34:24 GMT Subject: RFR: 8332894: ubsan: vmError.cpp:2090:26: runtime error: division by zero Message-ID: <5Dnql_PtTbZUQxDqrnZBxmkE0ztmxwtom04vQWG--Z0=.fbf93839-c55f-4f69-80d0-9b4bc6a44a12@github.com> When running with ubsan enabled on Linux x86_64, I get in the HS :tier1 tests this error : runtime/ErrorHandling/TestDwarf_dontCheckDecoder.jtr /jdk/src/hotspot/share/utilities/vmError.cpp:2090:26: runtime error: division by zero #0 0x7f16bc531f32 in crash_with_sigfpe /jdk/src/hotspot/share/utilities/vmError.cpp:2090 #1 0x7f16bc531f32 in VMError::controlled_crash(int) /jdk/src/hotspot/share/utilities/vmError.cpp:2137 #2 0x7f16bea2d8fd in JNI_CreateJavaVM_inner /jdk/src/hotspot/share/prims/jni.cpp:3621 #3 0x7f16bea2d8fd in JNI_CreateJavaVM /jdk/src/hotspot/share/prims/jni.cpp:3672 #4 0x7f16c5dbd0e5 in InitializeJVM /jdk/src/java.base/share/native/libjli/java.c:1550 #5 0x7f16c5dbd0e5 in JavaMain /jdk/src/java.base/share/native/libjli/java.c:491 #6 0x7f16c5dc6748 in ThreadJavaMain /jdk/src/java.base/unix/native/libjli/java_md.c:642 #7 0x7f16c5d756e9 in start_thread (/lib64/libpthread.so.0+0xa6e9) (BuildId: 2f8d3c2d0f4d7888c2598d2ff6356537f5708a73) #8 0x7f16c531550e in clone (/lib64/libc.so.6+0x11850e) (BuildId: f732026552f6adff988b338e92d466bc81a01c37) Reason is that we do a float division by zero to get a signal . This is desired by us so not really an error but ubsan cannot know this. So add an attribute to this function that it has undefined behavior. See https://clang.llvm.org/docs/UndefinedBehaviorSanitizer.html (division by zero) . "Floating point division by zero. This is undefined per the C and C++ standards" ------------- Commit messages: - JDK-8332894 Changes: https://git.openjdk.org/jdk/pull/19394/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19394&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8332894 Stats: 4 lines in 1 file changed: 4 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/19394.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19394/head:pull/19394 PR: https://git.openjdk.org/jdk/pull/19394 From mli at openjdk.org Fri May 24 13:41:12 2024 From: mli at openjdk.org (Hamlin Li) Date: Fri, 24 May 2024 13:41:12 GMT Subject: RFR: 8332265: RISC-V: Materialize pointers faster by using a temp register [v6] In-Reply-To: References: Message-ID: On Fri, 24 May 2024 12:39:32 GMT, Robbin Ehn wrote: > > And a general question about `nativeInst_riscv.cpp` and `macroAssembler_riscv.cpp`. I saw the functions in these 2 files call each other, that make the code a bit mess to me. It's not an issue introduced in this pr. I wonder if this could be refactored? If so, I can work on it. But just in case you have easy answer already, so I don't have to do further investigation. > > I agree, I would prefer having classes for the instruction where all the instruction functionality would be. As it's now the opcodes are reapted everywhere, instead it should just be in in-place, this class. And then have classes for instruction sequence where we keep all functionality gathered. OK, let me do some further investigation to see if we can make it more readable and maintainable. >> src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 1681: >> >>> 1679: } >>> 1680: >>> 1681: void MacroAssembler::movptr1(Register Rd, uint64_t imm64, int32_t &offset) { >> >> Original code of MacroAssembler:: movptr(...) is bit tricky at `upper -= lower;` to understand for me, and I think new MacroAssembler:: movptr2(...) uses the similar way at `lower30 - low12`. >> I can add some comment later to help future understanding later. > > Yes, it's very confusing with 12 bit *signed*, imm 11 bits + 1 signed bit. > Therefore we need to use arithmetic shift to create "12 bit sign value". > (hence why they do not use 12ull, to signal this is an arithmetic shift) > > So if we need to set bit 12 this value it will be negative plus the bit pattern we want in the other 11 bits. > To compensate for that we need to lui() a larger value, and that value would be low30 - low12. > If low12 is postive it's just removing those bits, otherwise we also need to add 4096 (bit 12). > Hence low30 - -low12 would instead add to low30, the mid18 can thus be extended to 19 bits. > > The 20 bit is sign bit so we must stay away from it otherwise lui() will sign extend the 19+1 imm value. > > I don't think that helped, but I tried at least :) Thanks for explanation. It also took me a while to figure it out. Then I will try to add some comment for it later. Maybe also try to refactor movptr2() if I can find out a way to make it more clear. >> src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 1703: >> >>> 1701: assert_different_registers(Rd, tmp, noreg); >>> 1702: >>> 1703: uint32_t upper18 = (addr >> 30ull); >> >> literal suffix `ull` could be removed? >> And `uint32_t upper18 = (addr >> 30ull);` + `lui(tmp, upper18 << 12);` could be replaced with `lui(tmp, addr >> 18);`? > > **ull** is there to say should be a 64-bit logical shift, appereant not doing it's jobs. > The reason it's done in two steps is first we get the values, then we adjust them for our implementation of the instructions. > > Our lui() shifts down the immediate value so we must shift it up before. > While when we patch lui we use the value without shifting: > > unsigned int upper18 = (addr >> 30ull); > Assembler::patch(instruction_address + (NativeInstruction::instruction_size * 0), 31, 12, (upper18 & 0xfffff)); > > > So it can be replaced but then we are missing the calculation why it's correct, Thanks for explanation. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19246#issuecomment-2129563395 PR Review Comment: https://git.openjdk.org/jdk/pull/19246#discussion_r1613498640 PR Review Comment: https://git.openjdk.org/jdk/pull/19246#discussion_r1613498744 From sgibbons at openjdk.org Fri May 24 13:44:28 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Fri, 24 May 2024 13:44:28 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v36] In-Reply-To: References: Message-ID: > Re-write the IndexOf code without the use of the pcmpestri instruction, only using AVX2 instructions. This change accelerates String.IndexOf on average 1.3x for AVX2. The benchmark numbers: > > > Benchmark Score Latest > StringIndexOf.advancedWithMediumSub 343.573 317.934 0.925375393x > StringIndexOf.advancedWithShortSub1 1039.081 1053.96 1.014319384x > StringIndexOf.advancedWithShortSub2 55.828 110.541 1.980027943x > StringIndexOf.constantPattern 9.361 11.906 1.271872663x > StringIndexOf.searchCharLongSuccess 4.216 4.218 1.000474383x > StringIndexOf.searchCharMediumSuccess 3.133 3.216 1.02649218x > StringIndexOf.searchCharShortSuccess 3.76 3.761 1.000265957x > StringIndexOf.success 9.186 9.713 1.057369911x > StringIndexOf.successBig 14.341 46.343 3.231504079x > StringIndexOfChar.latin1_AVX2_String 6220.918 12154.52 1.953814533x > StringIndexOfChar.latin1_AVX2_char 5503.556 5540.044 1.006629895x > StringIndexOfChar.latin1_SSE4_String 6978.854 6818.689 0.977049957x > StringIndexOfChar.latin1_SSE4_char 5657.499 5474.624 0.967675646x > StringIndexOfChar.latin1_Short_String 7132.541 6863.359 0.962260014x > StringIndexOfChar.latin1_Short_char 16013.389 16162.437 1.009307711x > StringIndexOfChar.latin1_mixed_String 7386.123 14771.622 1.999915517x > StringIndexOfChar.latin1_mixed_char 9901.671 9782.245 0.987938803 Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: Missing comma ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16753/files - new: https://git.openjdk.org/jdk/pull/16753/files/c034d3f9..1a71eb10 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16753&range=35 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16753&range=34-35 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/16753.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16753/head:pull/16753 PR: https://git.openjdk.org/jdk/pull/16753 From sgibbons at openjdk.org Fri May 24 13:44:28 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Fri, 24 May 2024 13:44:28 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v33] In-Reply-To: References: Message-ID: On Fri, 24 May 2024 06:31:36 GMT, Daniel Jeli?ski wrote: >> Thanks for finding this. It was ignorance on my part as I thought the xorq would have logic to not emit the REX prefix if not necessary, but it doesn't. Fixed. > > Right, it seems to surprise people. There's a lot of preexisting uses of xorq / xorptr to zero a register. I think it would make sense to implement this logic in xorq. I can do this if others agree. Good idea. I vote yes. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1613506958 From sgibbons at openjdk.org Fri May 24 13:44:29 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Fri, 24 May 2024 13:44:29 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v35] In-Reply-To: References: <-vyOZzeMslZqgJpTsQnnOWi4abWiM8fNeWSVx5LEHm8=.d37011ee-102c-4874-aa26-d113949d25ea@github.com> Message-ID: On Fri, 24 May 2024 00:47:04 GMT, Vladimir Kozlov wrote: >> Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: >> >> Review comments - move stubGen*_string.cpp to c2_stubGen*_string.cpp > > test/jdk/java/lang/StringBuffer/IndexOf.java line 2: > >> 1: /* >> 2: * Copyright (c) 2000, 2024 Oracle and/or its affiliates. All rights reserved. > > This copyright header validation failure. Missing comma `,` after 2024. Fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1613504949 From azafari at openjdk.org Fri May 24 13:46:15 2024 From: azafari at openjdk.org (Afshin Zafari) Date: Fri, 24 May 2024 13:46:15 GMT Subject: RFR: 8331539: [REDO] NMT: add/make a mandatory MEMFLAGS argument to family of os::reserve/commit/uncommit memory API [v4] In-Reply-To: <1i0PKv9mCusM6BZqXG8ULe0lRA2Nz2ix4aZHz9otNMM=.b9d2d151-883e-4cb6-be48-4ba45b49ed43@github.com> References: <1i0PKv9mCusM6BZqXG8ULe0lRA2Nz2ix4aZHz9otNMM=.b9d2d151-883e-4cb6-be48-4ba45b49ed43@github.com> Message-ID: > This PR fixes the problems existed in the original PR (https://github.com/openjdk/jdk/pull/18745). There are two main fixes here: > 1- `ReservedSpace` class is changed so that the `_flag` member never changes after it is set in ctor. Since reserving memory regions may go thru a try and fail sequence of reserve-release pairs, changing the `_flag` member at failed releases would lead to incorrect flags in subsequent reserves. > Also, some assertion are added to the getters of a `ReservedSpace` to check if the region is successfully reserved. > > 2- In order to have adjacent regions with different flags, CDS reserves a (large) region `R` and then splits it into sub regions `R1` and `R2` (`R == <---R1---><--R2-->`). At release time, NMT tracks only `R` and ignores releasing `R1` and `R2`. This ignoring is problematic when a requested region `R` is size-aligned to `R1---R---R2` first and then the `R1` and `R2` are released (`chop_extra_memory` function is called for this). In this case, NMT ignores tracking `R1` and `R2` with false assumption that a containing `R` will be released. Therefore, `R1` and `R2` remain in the NMT reserved-regions-list and when a new reserve happens at that regions, NMT complains by raising an exception. > > Tests: > mach5 tiers 1-5, {linux-x64, macosx-aarch64, windows-x64, linux-aarch64 } x {debug, non-debug} Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision: more fixes. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19343/files - new: https://git.openjdk.org/jdk/pull/19343/files/c7ff3867..302e35ea Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19343&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19343&range=02-03 Stats: 6 lines in 1 file changed: 0 ins; 3 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/19343.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19343/head:pull/19343 PR: https://git.openjdk.org/jdk/pull/19343 From azafari at openjdk.org Fri May 24 13:46:15 2024 From: azafari at openjdk.org (Afshin Zafari) Date: Fri, 24 May 2024 13:46:15 GMT Subject: RFR: 8331539: [REDO] NMT: add/make a mandatory MEMFLAGS argument to family of os::reserve/commit/uncommit memory API [v2] In-Reply-To: References: <1i0PKv9mCusM6BZqXG8ULe0lRA2Nz2ix4aZHz9otNMM=.b9d2d151-883e-4cb6-be48-4ba45b49ed43@github.com> <_M5SvhyN_E_8HUeamhiLJMp37flhjgTVE_X7t8jmPVc=.f86cbb23-9461-4013-83bf-d6b154b96cfd@github.com> <9XzKmn3xJvlbw4gz2vK_NZ6yOwfKB9VzHE6CBSz-73E=.dfaa5291-95b6-403d-b363-42131ebf4c4c@github.com> Message-ID: On Fri, 24 May 2024 10:12:53 GMT, Stefan Karlsson wrote: >> I had to add these, since the `log_info(cds)` calls caused the assertions in the `ReservedSpace` getters raised (IIRC, even during jdk-build). > > Can you show the error message? I could not reproduce the error. Removed the check. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19343#discussion_r1613507173 From azafari at openjdk.org Fri May 24 13:46:15 2024 From: azafari at openjdk.org (Afshin Zafari) Date: Fri, 24 May 2024 13:46:15 GMT Subject: RFR: 8331539: [REDO] NMT: add/make a mandatory MEMFLAGS argument to family of os::reserve/commit/uncommit memory API [v2] In-Reply-To: References: <1i0PKv9mCusM6BZqXG8ULe0lRA2Nz2ix4aZHz9otNMM=.b9d2d151-883e-4cb6-be48-4ba45b49ed43@github.com> <_M5SvhyN_E_8HUeamhiLJMp37flhjgTVE_X7t8jmPVc=.f86cbb23-9461-4013-83bf-d6b154b96cfd@github.com> Message-ID: On Fri, 24 May 2024 09:33:06 GMT, Thomas Stuefe wrote: >> Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision: >> >> fixed the missing parts of shenandoahHeap.cpp > > src/hotspot/share/cds/metaspaceShared.cpp line 1169: > >> 1167: // Set up compressed Klass pointer encoding: the encoding range must >> 1168: // cover both archive and class space. >> 1169: assert(class_space_rs.is_reserved(), "Memory region should be reserved."); > > Not necessary. Checked in reserve_address_space_for_archives, and in Metaspace::initialize_class_space Removed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19343#discussion_r1613503505 From azafari at openjdk.org Fri May 24 13:46:15 2024 From: azafari at openjdk.org (Afshin Zafari) Date: Fri, 24 May 2024 13:46:15 GMT Subject: RFR: 8331539: [REDO] NMT: add/make a mandatory MEMFLAGS argument to family of os::reserve/commit/uncommit memory API [v2] In-Reply-To: References: <1i0PKv9mCusM6BZqXG8ULe0lRA2Nz2ix4aZHz9otNMM=.b9d2d151-883e-4cb6-be48-4ba45b49ed43@github.com> <_M5SvhyN_E_8HUeamhiLJMp37flhjgTVE_X7t8jmPVc=.f86cbb23-9461-4013-83bf-d6b154b96cfd@github.com> <9XzKmn3xJvlbw4gz2vK_NZ6yOwfKB9VzHE6CBSz-73E=.dfaa5291-95b6-403d-b363-42131ebf4c4c@github.com> Message-ID: On Fri, 24 May 2024 09:41:25 GMT, Thomas Stuefe wrote: >> That comes after merge with mainline. >> I trusted the tests in tiers 1-5 for the correctness of this change. > > No, I think this is wrong. I changed it with https://github.com/openjdk/jdk/pull/19152. > > Please be careful, this part is rather tricky, and a lot of thought went into this. And I am pretty sure we don't cover all possible code paths in tests. > > Please also note that I am working on adding no-access zones for the Klass Encoding range (see https://github.com/openjdk/jdk/pull/19290), which may impact these regions too. Overlooked in merge. Fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19343#discussion_r1613504094 From rehn at openjdk.org Fri May 24 13:50:05 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Fri, 24 May 2024 13:50:05 GMT Subject: RFR: 8332265: RISC-V: Materialize pointers faster by using a temp register [v6] In-Reply-To: References: Message-ID: On Fri, 24 May 2024 13:47:00 GMT, Hamlin Li wrote: > OK, let's move forward. :) I create some bugs to track the further work. https://bugs.openjdk.org/browse/JDK-8332899 https://bugs.openjdk.org/browse/JDK-8332900 > > Feel free to take them if you're also interested in them. Thank you! ------------- PR Comment: https://git.openjdk.org/jdk/pull/19246#issuecomment-2129582674 From mli at openjdk.org Fri May 24 13:50:04 2024 From: mli at openjdk.org (Hamlin Li) Date: Fri, 24 May 2024 13:50:04 GMT Subject: RFR: 8332265: RISC-V: Materialize pointers faster by using a temp register [v6] In-Reply-To: References: Message-ID: On Thu, 23 May 2024 10:55:35 GMT, Robbin Ehn wrote: >> Hi, please consider! >> >> Materializing a 48-bit pointer, using an additional register, we can do with: >> lui + lui + slli + add + addi >> This 15% faster both on VF2 and in CPU models, compared to movptr(). >> >> As we often materialize during calls there is free registers. >> >> I have choose just a few spot to use it, many more can use. >> E.g. la() with tmp register can use li48 instead of movptr. >> >> Running tests now (so far so good), as if I screwed up IC calls it should be seen fast. >> And benchmarks when hardware is free. > > Robbin Ehn has updated the pull request incrementally with two additional commits since the last revision: > > - Fixed more comments > - Fixed comments OK, let's move forward. :) I create some bugs to track the further work. https://bugs.openjdk.org/browse/JDK-8332899 https://bugs.openjdk.org/browse/JDK-8332900 Feel free to take them if you're also interested in them. ------------- Marked as reviewed by mli (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19246#pullrequestreview-2076984914 From azafari at openjdk.org Fri May 24 13:59:03 2024 From: azafari at openjdk.org (Afshin Zafari) Date: Fri, 24 May 2024 13:59:03 GMT Subject: RFR: 8331539: [REDO] NMT: add/make a mandatory MEMFLAGS argument to family of os::reserve/commit/uncommit memory API [v2] In-Reply-To: References: <1i0PKv9mCusM6BZqXG8ULe0lRA2Nz2ix4aZHz9otNMM=.b9d2d151-883e-4cb6-be48-4ba45b49ed43@github.com> <_M5SvhyN_E_8HUeamhiLJMp37flhjgTVE_X7t8jmPVc=.f86cbb23-9461-4013-83bf-d6b154b96cfd@github.com> Message-ID: On Fri, 24 May 2024 10:10:08 GMT, Thomas Stuefe wrote: > 1. NMT assumes reserves and commits to be different layers and, e.g., for committed regions to be fully contained in a reserved region. This is wrong and does not reflect the realities of mmap. We can overlay and overlap any reservation/committing/uncommitting/releasing in any way we want. On Windows, a commit without reserve is not allowed. ([reference](https://learn.microsoft.com/en-us/windows/win32/api/memoryapi/nf-memoryapi-virtualalloc#:~:text=MEM_COMMIT%20%7C%20MEM_RESERVE.-,Attempting%20to%20commit%20a%20specific%20address%20range%20by%20specifying%20MEM_COMMIT%20without%20MEM_RESERVE%20and%20a%20non%2DNULL%20lpAddress%20fails%20unless%20the%20entire%20range%20has%20already%20been%20reserved.%20The%20resulting%20error%20code%20is%20ERROR_INVALID_ADDRESS.,-An%20attempt%20to)) ------------- PR Comment: https://git.openjdk.org/jdk/pull/19343#issuecomment-2129603425 From sgibbons at openjdk.org Fri May 24 14:22:11 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Fri, 24 May 2024 14:22:11 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v33] In-Reply-To: References: Message-ID: On Fri, 24 May 2024 06:31:40 GMT, Daniel Jeli?ski wrote: >> It may, but I believe the movq is shorter (although maybe not to r15). I'll do some experimentation. > > the RIP-relative lea should have a shorter encoding. I think something like `lea(r15, ExternalAddress(small_jump_table))` should produce it (untested) Just did the experiment and it turns out that `mov64(r15, (int64_t)small_jump_table)` and `lea(r15, ExternalAddress(small_jump_table))` produce exactly the same code: `0x00007fffe463d68b: 49 bf a0 d5 63 e4 ff 7f 00 00 movabs r15,0x7fffe463d5a0` The code in `MacroAssembler` for `lea` calls `mov_literal64` with no check for whether it can be ip-relative. I tried doing it myself via `leaq(r15, Address(rip, (int64_t)small_jump_table - (int64_t)(__ pc())))` but there is no definition in `register_x86.hpp` for register `rip`. So I'm not sure exactly how to produce RIP-relative addressing. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1613560044 From ayang at openjdk.org Fri May 24 14:35:01 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Fri, 24 May 2024 14:35:01 GMT Subject: RFR: 8330577: G1 sometimes sends jdk.G1HeapRegionTypeChange for non-changes In-Reply-To: <4B5e_9phnbHwNVMi-muq8EweQTt58Wm3KTL_Psjbt9w=.bc00798c-9947-4823-9ff4-d95da2e88e40@github.com> References: <4B5e_9phnbHwNVMi-muq8EweQTt58Wm3KTL_Psjbt9w=.bc00798c-9947-4823-9ff4-d95da2e88e40@github.com> Message-ID: On Fri, 24 May 2024 11:26:57 GMT, Thomas Schatzl wrote: > Hi all, > > please review this change that avoids posting Free->Free and Old->Old region transitions in JFR. > > The reason for these could have been: > * Free->Free: heap shrinking and full gc > * Old->Old: heap shrinking, full gc or evacuation failure in an old region > > Parts of this change has been contributed by @ansteiner , crediting him for this (the first commit). > > Testing: tier1-3, tier5, all "detailed" JFR test cases > > Thanks, > Thomas Marked as reviewed by ayang (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/19389#pullrequestreview-2077095668 From asteiner at openjdk.org Fri May 24 14:50:05 2024 From: asteiner at openjdk.org (Andreas Steiner) Date: Fri, 24 May 2024 14:50:05 GMT Subject: RFR: 8330577: G1 sometimes sends jdk.G1HeapRegionTypeChange for non-changes In-Reply-To: <4B5e_9phnbHwNVMi-muq8EweQTt58Wm3KTL_Psjbt9w=.bc00798c-9947-4823-9ff4-d95da2e88e40@github.com> References: <4B5e_9phnbHwNVMi-muq8EweQTt58Wm3KTL_Psjbt9w=.bc00798c-9947-4823-9ff4-d95da2e88e40@github.com> Message-ID: On Fri, 24 May 2024 11:26:57 GMT, Thomas Schatzl wrote: > Hi all, > > please review this change that avoids posting Free->Free and Old->Old region transitions in JFR. > > The reason for these could have been: > * Free->Free: heap shrinking and full gc > * Old->Old: heap shrinking, full gc or evacuation failure in an old region > > Parts of this change has been contributed by @ansteiner , crediting him for this (the first commit). > > Testing: tier1-3, tier5, all "detailed" JFR test cases > > Thanks, > Thomas LGTM ------------- Marked as reviewed by asteiner (Author). PR Review: https://git.openjdk.org/jdk/pull/19389#pullrequestreview-2077139516 From djelinski at openjdk.org Fri May 24 14:52:12 2024 From: djelinski at openjdk.org (Daniel =?UTF-8?B?SmVsacWEc2tp?=) Date: Fri, 24 May 2024 14:52:12 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v33] In-Reply-To: References: Message-ID: On Fri, 24 May 2024 14:19:13 GMT, Scott Gibbons wrote: >> the RIP-relative lea should have a shorter encoding. I think something like `lea(r15, ExternalAddress(small_jump_table))` should produce it (untested) > > Just did the experiment and it turns out that `mov64(r15, (int64_t)small_jump_table)` and `lea(r15, ExternalAddress(small_jump_table))` produce exactly the same code: > > `0x00007fffe463d68b: 49 bf a0 d5 63 e4 ff 7f 00 00 movabs r15,0x7fffe463d5a0` > > The code in `MacroAssembler` for `lea` calls `mov_literal64` with no check for whether it can be ip-relative. > > I tried doing it myself via `leaq(r15, Address(rip, (int64_t)small_jump_table - (int64_t)(__ pc())))` but there is no definition in `register_x86.hpp` for register `rip`. So I'm not sure exactly how to produce RIP-relative addressing. Thanks for checking. Well I know that the `MacroAssembler::movdqu(XMMRegister dst, AddressLiteral src, Register rscratch)` method actually generates rip-relative addresses. Maybe we could copy some of that code. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1613603833 From kvn at openjdk.org Fri May 24 15:25:11 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 24 May 2024 15:25:11 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v33] In-Reply-To: References: Message-ID: On Fri, 24 May 2024 14:49:05 GMT, Daniel Jeli?ski wrote: >> Just did the experiment and it turns out that `mov64(r15, (int64_t)small_jump_table)` and `lea(r15, ExternalAddress(small_jump_table))` produce exactly the same code: >> >> `0x00007fffe463d68b: 49 bf a0 d5 63 e4 ff 7f 00 00 movabs r15,0x7fffe463d5a0` >> >> The code in `MacroAssembler` for `lea` calls `mov_literal64` with no check for whether it can be ip-relative. >> >> I tried doing it myself via `leaq(r15, Address(rip, (int64_t)small_jump_table - (int64_t)(__ pc())))` but there is no definition in `register_x86.hpp` for register `rip`. So I'm not sure exactly how to produce RIP-relative addressing. > > Thanks for checking. Well I know that the `MacroAssembler::movdqu(XMMRegister dst, AddressLiteral src, Register rscratch)` method actually generates rip-relative addresses. Maybe we could copy some of that code. Use `lea` and `InternalAddress()` for referencing jump tables since the addresses are in the same code section. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1613648648 From sgibbons at openjdk.org Fri May 24 15:32:26 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Fri, 24 May 2024 15:32:26 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v37] In-Reply-To: References: Message-ID: <4xYUBsOJ_eDSuj6w9AjUo_6gFN_9piWR-ChLrHQoXl4=.88756684-8e9c-48e3-8b59-f5f684b81cde@github.com> > Re-write the IndexOf code without the use of the pcmpestri instruction, only using AVX2 instructions. This change accelerates String.IndexOf on average 1.3x for AVX2. The benchmark numbers: > > > Benchmark Score Latest > StringIndexOf.advancedWithMediumSub 343.573 317.934 0.925375393x > StringIndexOf.advancedWithShortSub1 1039.081 1053.96 1.014319384x > StringIndexOf.advancedWithShortSub2 55.828 110.541 1.980027943x > StringIndexOf.constantPattern 9.361 11.906 1.271872663x > StringIndexOf.searchCharLongSuccess 4.216 4.218 1.000474383x > StringIndexOf.searchCharMediumSuccess 3.133 3.216 1.02649218x > StringIndexOf.searchCharShortSuccess 3.76 3.761 1.000265957x > StringIndexOf.success 9.186 9.713 1.057369911x > StringIndexOf.successBig 14.341 46.343 3.231504079x > StringIndexOfChar.latin1_AVX2_String 6220.918 12154.52 1.953814533x > StringIndexOfChar.latin1_AVX2_char 5503.556 5540.044 1.006629895x > StringIndexOfChar.latin1_SSE4_String 6978.854 6818.689 0.977049957x > StringIndexOfChar.latin1_SSE4_char 5657.499 5474.624 0.967675646x > StringIndexOfChar.latin1_Short_String 7132.541 6863.359 0.962260014x > StringIndexOfChar.latin1_Short_char 16013.389 16162.437 1.009307711x > StringIndexOfChar.latin1_mixed_String 7386.123 14771.622 1.999915517x > StringIndexOfChar.latin1_mixed_char 9901.671 9782.245 0.987938803 Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: mov64 => lea(InternalAddress) ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16753/files - new: https://git.openjdk.org/jdk/pull/16753/files/1a71eb10..5d10a20b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16753&range=36 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16753&range=35-36 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/16753.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16753/head:pull/16753 PR: https://git.openjdk.org/jdk/pull/16753 From cjplummer at openjdk.org Fri May 24 15:33:07 2024 From: cjplummer at openjdk.org (Chris Plummer) Date: Fri, 24 May 2024 15:33:07 GMT Subject: RFR: 8330694: Rename 'HeapRegion' to 'G1HeapRegion' [v13] In-Reply-To: References: <3IdWn9VGEERd8v9RcH2E_LzjVo0L8nMfi5jGWmhgVuM=.6b5b3be4-bfbd-4376-9580-48d78d75665c@github.com> Message-ID: On Fri, 24 May 2024 13:04:14 GMT, Lei Zaakjyu wrote: >> follow up 8267941 > > Lei Zaakjyu has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 10 commits: > > - review > - Merge branch 'master' of https://git.openjdk.org/jdk into JDK-8330694 > - restore > - Merge branch 'master' of https://git.openjdk.org/jdk into JDK-8330694 > - review > - Merge branch 'master' into JDK-8330694 > - fix indentation > - also tidy up > - tidy up > - rename Marked as reviewed by cjplummer (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/18871#pullrequestreview-2077242456 From sgibbons at openjdk.org Fri May 24 15:36:12 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Fri, 24 May 2024 15:36:12 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v33] In-Reply-To: References: Message-ID: On Fri, 24 May 2024 14:49:05 GMT, Daniel Jeli?ski wrote: >> Just did the experiment and it turns out that `mov64(r15, (int64_t)small_jump_table)` and `lea(r15, ExternalAddress(small_jump_table))` produce exactly the same code: >> >> `0x00007fffe463d68b: 49 bf a0 d5 63 e4 ff 7f 00 00 movabs r15,0x7fffe463d5a0` >> >> The code in `MacroAssembler` for `lea` calls `mov_literal64` with no check for whether it can be ip-relative. >> >> I tried doing it myself via `leaq(r15, Address(rip, (int64_t)small_jump_table - (int64_t)(__ pc())))` but there is no definition in `register_x86.hpp` for register `rip`. So I'm not sure exactly how to produce RIP-relative addressing. > > Thanks for checking. Well I know that the `MacroAssembler::movdqu(XMMRegister dst, AddressLiteral src, Register rscratch)` method actually generates rip-relative addresses. Maybe we could copy some of that code. Changed to `lea` with `InternalAddress()`. Generates the exact same code, but makes more sense. I looked at `movdqu` and see no code that generates RIP-relative loads. It merely checks `reachable()` and adds an intermediate `lea` if not reachable. @djelinski can you clarify please? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1613665756 From duke at openjdk.org Fri May 24 15:55:07 2024 From: duke at openjdk.org (Lei Zaakjyu) Date: Fri, 24 May 2024 15:55:07 GMT Subject: RFR: 8330694: Rename 'HeapRegion' to 'G1HeapRegion' [v13] In-Reply-To: References: <3IdWn9VGEERd8v9RcH2E_LzjVo0L8nMfi5jGWmhgVuM=.6b5b3be4-bfbd-4376-9580-48d78d75665c@github.com> Message-ID: On Fri, 24 May 2024 13:04:14 GMT, Lei Zaakjyu wrote: >> follow up 8267941 > > Lei Zaakjyu has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 10 commits: > > - review > - Merge branch 'master' of https://git.openjdk.org/jdk into JDK-8330694 > - restore > - Merge branch 'master' of https://git.openjdk.org/jdk into JDK-8330694 > - review > - Merge branch 'master' into JDK-8330694 > - fix indentation > - also tidy up > - tidy up > - rename thanks for the review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/18871#issuecomment-2129876471 From scott.gibbons at intel.com Fri May 24 16:01:13 2024 From: scott.gibbons at intel.com (Gibbons, Scott) Date: Fri, 24 May 2024 16:01:13 +0000 Subject: Help with intrinsic testing for String.indexOf() Message-ID: Hi. I wrote a stub for implementing the indexOf method and am looking for a way to thoroughly test it. I have good tests for both positive and negative functionality that I'm pretty confident in. What I'm looking for is a good way to write a testcase to validate that I am not accessing memory outside the range of the strings passed to the stub. What I'd like? to do is to allocate an isolated page of memory such that accesses outside the page would cause a SIGSEGV. I would like to allocate the string within the page such that the last character of the string is at the end of the page, and also allocate a string at the beginning of the page. That way I could get clear indications of reading either past the end of the string or before the beginning (I know there's a header, so the header would be allocated at the beginning). Is there any method at all I can use within a Java testcase to effect such behavior? I've carefully performed code inspection and am relatively confident that I'm staying within the bounds of the strings but would like HW verification that I haven't missed anything. Ideas? Thanks, --Scott Gibbons Software Development Engineer, Runtime Engineering [cid:916a9f87-078f-42b1-ba53-c90320614209] DEVELOPER SOFTWARE ENGINEERING Ph: 1-503-456-7756 Cell: 1-469-450-8390 Intel JF1, 2111 NE 25th Ave Hillsboro, OR 97124 Intel Corporation | www.intel.com -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Outlook-rx5s21kd Type: image/jpg Size: 1250 bytes Desc: Outlook-rx5s21kd URL: From heidinga at openjdk.org Fri May 24 16:06:05 2024 From: heidinga at openjdk.org (Dan Heidinga) Date: Fri, 24 May 2024 16:06:05 GMT Subject: Integrated: 8332745: Method::is_vanilla_constructor is never used In-Reply-To: References: Message-ID: On Thu, 23 May 2024 13:00:49 GMT, Dan Heidinga wrote: > Removed dead code related to identifying empty constructors. Missed when [JDK-8057777](https://bugs.openjdk.org/browse/JDK-8057777) cleaned up JVM_AllocateNewObject. > > Passes mach5 tier1. This pull request has now been integrated. Changeset: 6d2aeb82 Author: Dan Heidinga Committer: Coleen Phillimore URL: https://git.openjdk.org/jdk/commit/6d2aeb82bc6f8b6894bf3777162be0efb2826397 Stats: 78 lines in 5 files changed: 0 ins; 76 del; 2 mod 8332745: Method::is_vanilla_constructor is never used Reviewed-by: coleenp, ayang ------------- PR: https://git.openjdk.org/jdk/pull/19367 From cslucas at openjdk.org Fri May 24 17:03:03 2024 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Fri, 24 May 2024 17:03:03 GMT Subject: RFR: 8332894: ubsan: vmError.cpp:2090:26: runtime error: division by zero In-Reply-To: <5Dnql_PtTbZUQxDqrnZBxmkE0ztmxwtom04vQWG--Z0=.fbf93839-c55f-4f69-80d0-9b4bc6a44a12@github.com> References: <5Dnql_PtTbZUQxDqrnZBxmkE0ztmxwtom04vQWG--Z0=.fbf93839-c55f-4f69-80d0-9b4bc6a44a12@github.com> Message-ID: On Fri, 24 May 2024 13:30:41 GMT, Matthias Baesken wrote: > When running with ubsan enabled on Linux x86_64, I get in the HS :tier1 tests this error : > > runtime/ErrorHandling/TestDwarf_dontCheckDecoder.jtr > > /jdk/src/hotspot/share/utilities/vmError.cpp:2090:26: runtime error: division by zero > #0 0x7f16bc531f32 in crash_with_sigfpe /jdk/src/hotspot/share/utilities/vmError.cpp:2090 > #1 0x7f16bc531f32 in VMError::controlled_crash(int) /jdk/src/hotspot/share/utilities/vmError.cpp:2137 > #2 0x7f16bea2d8fd in JNI_CreateJavaVM_inner /jdk/src/hotspot/share/prims/jni.cpp:3621 > #3 0x7f16bea2d8fd in JNI_CreateJavaVM /jdk/src/hotspot/share/prims/jni.cpp:3672 > #4 0x7f16c5dbd0e5 in InitializeJVM /jdk/src/java.base/share/native/libjli/java.c:1550 > #5 0x7f16c5dbd0e5 in JavaMain /jdk/src/java.base/share/native/libjli/java.c:491 > #6 0x7f16c5dc6748 in ThreadJavaMain /jdk/src/java.base/unix/native/libjli/java_md.c:642 > #7 0x7f16c5d756e9 in start_thread (/lib64/libpthread.so.0+0xa6e9) (BuildId: 2f8d3c2d0f4d7888c2598d2ff6356537f5708a73) > #8 0x7f16c531550e in clone (/lib64/libc.so.6+0x11850e) (BuildId: f732026552f6adff988b338e92d466bc81a01c37) > > Reason is that we do a float division by zero to get a signal . This is desired by us so not really an error but ubsan cannot know this. > So add an attribute to this function that it has undefined behavior. > See https://clang.llvm.org/docs/UndefinedBehaviorSanitizer.html (division by zero) . "Floating point division by zero. This is undefined per the C and C++ standards" I'm wondering if we shouldn't follow the pattern of creating another header file in `src/hotspot/share/sanitizers/` but otherwise LGTM. I'm not a reviewer, though. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19394#issuecomment-2130000962 From kvn at openjdk.org Fri May 24 18:02:14 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 24 May 2024 18:02:14 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v37] In-Reply-To: <4xYUBsOJ_eDSuj6w9AjUo_6gFN_9piWR-ChLrHQoXl4=.88756684-8e9c-48e3-8b59-f5f684b81cde@github.com> References: <4xYUBsOJ_eDSuj6w9AjUo_6gFN_9piWR-ChLrHQoXl4=.88756684-8e9c-48e3-8b59-f5f684b81cde@github.com> Message-ID: On Fri, 24 May 2024 15:32:26 GMT, Scott Gibbons wrote: >> Re-write the IndexOf code without the use of the pcmpestri instruction, only using AVX2 instructions. This change accelerates String.IndexOf on average 1.3x for AVX2. The benchmark numbers: >> >> >> Benchmark Score Latest >> StringIndexOf.advancedWithMediumSub 343.573 317.934 0.925375393x >> StringIndexOf.advancedWithShortSub1 1039.081 1053.96 1.014319384x >> StringIndexOf.advancedWithShortSub2 55.828 110.541 1.980027943x >> StringIndexOf.constantPattern 9.361 11.906 1.271872663x >> StringIndexOf.searchCharLongSuccess 4.216 4.218 1.000474383x >> StringIndexOf.searchCharMediumSuccess 3.133 3.216 1.02649218x >> StringIndexOf.searchCharShortSuccess 3.76 3.761 1.000265957x >> StringIndexOf.success 9.186 9.713 1.057369911x >> StringIndexOf.successBig 14.341 46.343 3.231504079x >> StringIndexOfChar.latin1_AVX2_String 6220.918 12154.52 1.953814533x >> StringIndexOfChar.latin1_AVX2_char 5503.556 5540.044 1.006629895x >> StringIndexOfChar.latin1_SSE4_String 6978.854 6818.689 0.977049957x >> StringIndexOfChar.latin1_SSE4_char 5657.499 5474.624 0.967675646x >> StringIndexOfChar.latin1_Short_String 7132.541 6863.359 0.962260014x >> StringIndexOfChar.latin1_Short_char 16013.389 16162.437 1.009307711x >> StringIndexOfChar.latin1_mixed_String 7386.123 14771.622 1.999915517x >> StringIndexOfChar.latin1_mixed_char 9901.671 9782.245 0.987938803 > > Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: > > mov64 => lea(InternalAddress) My testing for v34 passed without new failures. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16753#issuecomment-2130096346 From kvn at openjdk.org Fri May 24 18:16:13 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 24 May 2024 18:16:13 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v37] In-Reply-To: <4xYUBsOJ_eDSuj6w9AjUo_6gFN_9piWR-ChLrHQoXl4=.88756684-8e9c-48e3-8b59-f5f684b81cde@github.com> References: <4xYUBsOJ_eDSuj6w9AjUo_6gFN_9piWR-ChLrHQoXl4=.88756684-8e9c-48e3-8b59-f5f684b81cde@github.com> Message-ID: On Fri, 24 May 2024 15:32:26 GMT, Scott Gibbons wrote: >> Re-write the IndexOf code without the use of the pcmpestri instruction, only using AVX2 instructions. This change accelerates String.IndexOf on average 1.3x for AVX2. The benchmark numbers: >> >> >> Benchmark Score Latest >> StringIndexOf.advancedWithMediumSub 343.573 317.934 0.925375393x >> StringIndexOf.advancedWithShortSub1 1039.081 1053.96 1.014319384x >> StringIndexOf.advancedWithShortSub2 55.828 110.541 1.980027943x >> StringIndexOf.constantPattern 9.361 11.906 1.271872663x >> StringIndexOf.searchCharLongSuccess 4.216 4.218 1.000474383x >> StringIndexOf.searchCharMediumSuccess 3.133 3.216 1.02649218x >> StringIndexOf.searchCharShortSuccess 3.76 3.761 1.000265957x >> StringIndexOf.success 9.186 9.713 1.057369911x >> StringIndexOf.successBig 14.341 46.343 3.231504079x >> StringIndexOfChar.latin1_AVX2_String 6220.918 12154.52 1.953814533x >> StringIndexOfChar.latin1_AVX2_char 5503.556 5540.044 1.006629895x >> StringIndexOfChar.latin1_SSE4_String 6978.854 6818.689 0.977049957x >> StringIndexOfChar.latin1_SSE4_char 5657.499 5474.624 0.967675646x >> StringIndexOfChar.latin1_Short_String 7132.541 6863.359 0.962260014x >> StringIndexOfChar.latin1_Short_char 16013.389 16162.437 1.009307711x >> StringIndexOfChar.latin1_mixed_String 7386.123 14771.622 1.999915517x >> StringIndexOfChar.latin1_mixed_char 9901.671 9782.245 0.987938803 > > Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: > > mov64 => lea(InternalAddress) I am fine with current version. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16753#pullrequestreview-2077568604 From sgibbons at openjdk.org Fri May 24 18:16:14 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Fri, 24 May 2024 18:16:14 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v37] In-Reply-To: References: <4xYUBsOJ_eDSuj6w9AjUo_6gFN_9piWR-ChLrHQoXl4=.88756684-8e9c-48e3-8b59-f5f684b81cde@github.com> Message-ID: On Fri, 24 May 2024 17:59:49 GMT, Vladimir Kozlov wrote: >> Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: >> >> mov64 => lea(InternalAddress) > > My testing for v34 passed without new failures. Thank you @vnkozlov . Waiting for review from @sviswa7 and @jatin-bhateja, then I'll integrate. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16753#issuecomment-2130114623 From duke at openjdk.org Fri May 24 18:30:14 2024 From: duke at openjdk.org (Volodymyr Paprotski) Date: Fri, 24 May 2024 18:30:14 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v19] In-Reply-To: References: Message-ID: On Wed, 22 May 2024 14:41:36 GMT, Scott Gibbons wrote: >> test/micro/org/openjdk/bench/java/lang/StringIndexOfHuge.java line 132: >> >>> 130: @Benchmark >>> 131: public int searchHugeLargeSubstring() { >>> 132: return dataStringHuge.indexOf("B".repeat(30) + "X" + "A".repeat(30), 74); >> >> .repeat() call and string concatenation shouldn't be part of the benchmark (here and several other @Benchmark functions in this file) since it will detract from the measurement. >> >> (String concatenation gets converted (by javac) into StringBuilder().append().append()....append().toString()) > > Since we're only concerned with the delta of performance, does this really matter? Can you suggest an alternative? The needle really should be like the all the other strings, e.g. `dataStringHuge` itself, generated by the setup. As to weather it really matters; the answer is Amdahl's law. You can indeed measure the delta, but you can't measure the speedup of just the indexOf; not with repeat and concatenation obscuring the numbers. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1613864094 From duke at openjdk.org Fri May 24 18:35:11 2024 From: duke at openjdk.org (Volodymyr Paprotski) Date: Fri, 24 May 2024 18:35:11 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v19] In-Reply-To: References: Message-ID: On Fri, 17 May 2024 23:59:05 GMT, Scott Gibbons wrote: >> test/jdk/java/lang/StringBuffer/IndexOf.java line 40: >> >>> 38: private static boolean failure = false; >>> 39: public static void main(String[] args) throws Exception { >>> 40: String testName = "IndexOf"; >> >> intentation > > Fixed (missed a `git add`? don't see the updates for this file) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1613870558 From kvn at openjdk.org Fri May 24 18:40:12 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 24 May 2024 18:40:12 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v33] In-Reply-To: References: Message-ID: On Fri, 24 May 2024 15:33:46 GMT, Scott Gibbons wrote: >> Thanks for checking. Well I know that the `MacroAssembler::movdqu(XMMRegister dst, AddressLiteral src, Register rscratch)` method actually generates rip-relative addresses. Maybe we could copy some of that code. > > Changed to `lea` with `InternalAddress()`. Generates the exact same code, but makes more sense. I looked at `movdqu` and see no code that generates RIP-relative loads. It merely checks `reachable()` and adds an intermediate `lea` if not reachable. @djelinski can you clarify please? I think HotSpot prefer to have full addresses in `lea` for possible patching. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1613874603 From cjplummer at openjdk.org Fri May 24 19:52:11 2024 From: cjplummer at openjdk.org (Chris Plummer) Date: Fri, 24 May 2024 19:52:11 GMT Subject: RFR: 8332917: failure_handler should execute gdb "info threads" command on linux Message-ID: On linux, failure_handler dumps stack traces for all threads, but this dump does not include the name of each thread. The gdb "info threads" command will give a summary of all threads, and if debugging process, the summary will include each thread's name. If debugging a core file, for some reason the thread name is not included, but the summary is still useful. Tested by running some tests that fail with a timeout, and looking at the failure_handler gdb output for both the process and the core file. ------------- Commit messages: - Use 'info threads' instead of 'info thread', although both work equally - Execute gdb 'info threads' command on linux Changes: https://git.openjdk.org/jdk/pull/19401/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19401&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8332917 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/19401.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19401/head:pull/19401 PR: https://git.openjdk.org/jdk/pull/19401 From cjplummer at openjdk.org Fri May 24 19:52:13 2024 From: cjplummer at openjdk.org (Chris Plummer) Date: Fri, 24 May 2024 19:52:13 GMT Subject: RFR: 8332917: failure_handler should execute gdb "info threads" command on linux In-Reply-To: References: Message-ID: On Fri, 24 May 2024 19:45:21 GMT, Chris Plummer wrote: > On linux, failure_handler dumps stack traces for all threads, but this dump does not include the name of each thread. The gdb "info threads" command will give a summary of all threads, and if debugging process, the summary will include each thread's name. If debugging a core file, for some reason the thread name is not included, but the summary is still useful. > > Tested by running some tests that fail with a timeout, and looking at the failure_handler gdb output for both the process and the core file. Here's some output for each: Process: Id Target Id Frame * 1 Thread 0xffff7fcf2a50 (LWP 2749191) "java" 0x0000ffff7fc22ba8 in __pthread_timedjoin_ex () from /lib64/libpthread.so.0 2 Thread 0xffff7d9f31d0 (LWP 2749192) "old-m-a-i-n" 0x0000ffff7fc27d70 in pthread_cond_wait@@GLIBC_2.17 () from /lib64/libpthread.so.0 3 Thread 0xffff7c51e1d0 (LWP 2749193) "GC Thread#0" 0x0000ffff7fc2a820 in do_futex_wait.constprop () from /lib64/libpthread.so.0 4 Thread 0xffff7c31f1d0 (LWP 2749194) "G1 Main Marker" 0x0000ffff7fc27d70 in pthread_cond_wait@@GLIBC_2.17 () from /lib64/libpthread.so.0 5 Thread 0xffff5ea991d0 (LWP 2749195) "G1 Conc#0" 0x0000ffff7fc2a820 in do_futex_wait.constprop () from /lib64/libpthread.so.0 ... Core File: Id Target Id Frame * 1 Thread 0xffff7fcf2a50 (LWP 2749191) 0x0000ffff7fc22ba8 in __pthread_timedjoin_ex () from /lib64/libpthread.so.0 2 Thread 0xffff7d9f31d0 (LWP 2749192) 0x0000ffff7fc27d70 in pthread_cond_wait@@GLIBC_2.17 () from /lib64/libpthread.so.0 3 Thread 0xffff4ffff1d0 (LWP 2749196) 0x0000ffff7fc27d70 in pthread_cond_wait@@GLIBC_2.17 () from /lib64/libpthread.so.0 4 Thread 0xffff7c51e1d0 (LWP 2749193) 0x0000ffff7fc2a820 in do_futex_wait.constprop () from /lib64/libpthread.so.0 5 Thread 0xffff5ea991d0 (LWP 2749195) 0x0000ffff7fc2a820 in do_futex_wait.constprop () from /lib64/libpthread.so.0 ... ------------- PR Comment: https://git.openjdk.org/jdk/pull/19401#issuecomment-2130253422 From sgibbons at openjdk.org Fri May 24 19:55:40 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Fri, 24 May 2024 19:55:40 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v19] In-Reply-To: References: Message-ID: On Fri, 24 May 2024 18:32:53 GMT, Volodymyr Paprotski wrote: >> Fixed > > (missed a `git add`? don't see the updates for this file) Hmmm... Not sure what happened. >> Since we're only concerned with the delta of performance, does this really matter? Can you suggest an alternative? > > The needle really should be like the all the other strings, e.g. `dataStringHuge` itself, generated by the setup. > > As to weather it really matters; the answer is Amdahl's law. You can indeed measure the delta, but you can't measure the speedup of just the indexOf; not with repeat and concatenation obscuring the numbers. I have to believe that any relatively smart compiler would recognize that as a compile-time constant and make the change irrelevant. I've yielded to your desire and changed the code. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1613956309 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1613955264 From sgibbons at openjdk.org Fri May 24 19:55:43 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Fri, 24 May 2024 19:55:43 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v18] In-Reply-To: References: Message-ID: On Wed, 15 May 2024 19:41:58 GMT, Volodymyr Paprotski wrote: >> Scott Gibbons has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 50 commits: >> >> - Merge remote-tracking branch 'origin/master' into indexof >> - Move arrays_equals back to c2_MacroAssembler >> - Merge branch 'openjdk:master' into indexof >> - Remove infinite loop (used for debugging) >> - Merge branch 'openjdk:master' into indexof >> - Cleaned up, ready for review >> - Pre-cleanup code >> - Add JMH. Add 16-byte compares to arrays_equals >> - Better method for mask creation >> - Merge branch 'openjdk:master' into indexof >> - ... and 40 more: https://git.openjdk.org/jdk/compare/b20fa7b4...f52d281d > > test/jdk/java/lang/StringBuffer/IndexOf.java line 81: > >> 79: String shs = (new String((hs_charset == StandardCharsets.UTF_16) ? haystack_16 : haystack)).substring(0, haystackSize); >> 80: >> 81: shs = "$&),,18+-!'8)+"; > > Should really keep the original test unmodified and add new tests as needed The test functionality was not changed. I just added printing of information when a failure occurs. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1613914184 From sgibbons at openjdk.org Fri May 24 19:55:40 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Fri, 24 May 2024 19:55:40 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v38] In-Reply-To: References: Message-ID: > Re-write the IndexOf code without the use of the pcmpestri instruction, only using AVX2 instructions. This change accelerates String.IndexOf on average 1.3x for AVX2. The benchmark numbers: > > > Benchmark Score Latest > StringIndexOf.advancedWithMediumSub 343.573 317.934 0.925375393x > StringIndexOf.advancedWithShortSub1 1039.081 1053.96 1.014319384x > StringIndexOf.advancedWithShortSub2 55.828 110.541 1.980027943x > StringIndexOf.constantPattern 9.361 11.906 1.271872663x > StringIndexOf.searchCharLongSuccess 4.216 4.218 1.000474383x > StringIndexOf.searchCharMediumSuccess 3.133 3.216 1.02649218x > StringIndexOf.searchCharShortSuccess 3.76 3.761 1.000265957x > StringIndexOf.success 9.186 9.713 1.057369911x > StringIndexOf.successBig 14.341 46.343 3.231504079x > StringIndexOfChar.latin1_AVX2_String 6220.918 12154.52 1.953814533x > StringIndexOfChar.latin1_AVX2_char 5503.556 5540.044 1.006629895x > StringIndexOfChar.latin1_SSE4_String 6978.854 6818.689 0.977049957x > StringIndexOfChar.latin1_SSE4_char 5657.499 5474.624 0.967675646x > StringIndexOfChar.latin1_Short_String 7132.541 6863.359 0.962260014x > StringIndexOfChar.latin1_Short_char 16013.389 16162.437 1.009307711x > StringIndexOfChar.latin1_mixed_String 7386.123 14771.622 1.999915517x > StringIndexOfChar.latin1_mixed_char 9901.671 9782.245 0.987938803 Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: Test clarifications ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16753/files - new: https://git.openjdk.org/jdk/pull/16753/files/5d10a20b..485d02fd Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16753&range=37 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16753&range=36-37 Stats: 69 lines in 2 files changed: 16 ins; 10 del; 43 mod Patch: https://git.openjdk.org/jdk/pull/16753.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16753/head:pull/16753 PR: https://git.openjdk.org/jdk/pull/16753 From sgibbons at openjdk.org Fri May 24 19:55:43 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Fri, 24 May 2024 19:55:43 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v19] In-Reply-To: References: Message-ID: On Wed, 15 May 2024 19:34:40 GMT, Volodymyr Paprotski wrote: >> Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: >> >> Rearrange; add lambdas for clarity > > test/jdk/java/lang/StringBuffer/IndexOf.java line 90: > >> 88: >> 89: // printStringBytes(shs.getBytes(hs_charset)); >> 90: for (int i = 0; i < 200000; i++) { > > This wont be a deterministic way to reach the intrinsic. I would suggest copying the idea from test/jdk/com/sun/crypto/provider/Cipher/ChaCha20/unittest/Poly1305UnitTestDriver.java > > i.e. Have two `@run main` invocations at the top of this file, one with default parameters, one with `-Xcomp -XX:-TieredCompilation`. You dont need a 'driver' program, that was to handle something else. > > > /* > * @test > * @modules java.base/com.sun.crypto.provider > * @run main java.base/com.sun.crypto.provider.Poly1305KAT > * @summary Unit test for com.sun.crypto.provider.Poly1305. > */ > > /* > * @test > * @modules java.base/com.sun.crypto.provider > * @summary Unit test for IntrinsicCandidate in com.sun.crypto.provider.Poly1305. > * @run main/othervm -Xcomp -XX:-TieredCompilation -XX:+UnlockDiagnosticVMOptions -XX:+ForceUnreachable java.base/com.sun.crypto.provider.Poly1305KAT > */ Done. > test/jdk/java/lang/StringBuffer/IndexOf.java line 126: > >> 124: int aNewLength = getRandomIndex(min, max); >> 125: for (int y = 0; y < aNewLength; y++) { >> 126: int achar = generator.nextInt(30) + 30; > > This will only ever generate LL cases, i.e. chars from [30,60]. Could be parametrized to also produce utf16 if instead of 30, offset was in the unicode range Original code. > test/jdk/java/lang/StringBuffer/IndexOf.java line 199: > >> 197: System.out.println("Source="+sourceString.substring(hsBegin, hsBegin + haystackLen)); >> 198: System.out.println("Target="+targetString.substring(nBegin, nBegin + needleLen)); >> 199: System.out.println("haystackLen="+haystackLen+" neeldeLen="+needleLen+" hsBegin="+hsBegin+" nBegin="+nBegin+ > > This looks like 'development scaffolding' (i.e. printf debugging) that was meant to be removed This is additional information printed upon failure instead of just saying "failed" > test/jdk/java/lang/StringBuffer/IndexOf.java line 295: > >> 293: sourceString = generateTestString(99, 100); >> 294: sourceBuffer = new StringBuffer(sourceString); >> 295: targetString = generateTestString(10, 11); > > Generate a random int [0,1,2] for LL, UU, UL, pass that as parameter to generateTestString() to test the other paths. Same for other tests in this file using this pattern. > > This test is specific to haystacklen=100, needlelen=10.. what about other haystack/needle sizes to exercise all the paths in the intrinsic assembler (i.e. haystack >=, <=32, needlelen ={1,2,3,4,5..32..}). Elsewhere already? Original code. > test/jdk/java/lang/StringBuffer/IndexOf.java line 360: > >> 358: System.err.println(" sAnswer = " + sAnswer + ", sbAnswer = " + sbAnswer); >> 359: System.err.println(" testString = '" + testString + "'"); >> 360: System.err.println(" testBuffer = '" + testBuffer + "'"); > > tracing left here and further down Adding more information on failure. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1613915508 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1613919180 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1613920449 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1613922554 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1613923075 From kvn at openjdk.org Fri May 24 20:15:13 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 24 May 2024 20:15:13 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v38] In-Reply-To: References: Message-ID: On Fri, 24 May 2024 19:55:40 GMT, Scott Gibbons wrote: >> Re-write the IndexOf code without the use of the pcmpestri instruction, only using AVX2 instructions. This change accelerates String.IndexOf on average 1.3x for AVX2. The benchmark numbers: >> >> >> Benchmark Score Latest >> StringIndexOf.advancedWithMediumSub 343.573 317.934 0.925375393x >> StringIndexOf.advancedWithShortSub1 1039.081 1053.96 1.014319384x >> StringIndexOf.advancedWithShortSub2 55.828 110.541 1.980027943x >> StringIndexOf.constantPattern 9.361 11.906 1.271872663x >> StringIndexOf.searchCharLongSuccess 4.216 4.218 1.000474383x >> StringIndexOf.searchCharMediumSuccess 3.133 3.216 1.02649218x >> StringIndexOf.searchCharShortSuccess 3.76 3.761 1.000265957x >> StringIndexOf.success 9.186 9.713 1.057369911x >> StringIndexOf.successBig 14.341 46.343 3.231504079x >> StringIndexOfChar.latin1_AVX2_String 6220.918 12154.52 1.953814533x >> StringIndexOfChar.latin1_AVX2_char 5503.556 5540.044 1.006629895x >> StringIndexOfChar.latin1_SSE4_String 6978.854 6818.689 0.977049957x >> StringIndexOfChar.latin1_SSE4_char 5657.499 5474.624 0.967675646x >> StringIndexOfChar.latin1_Short_String 7132.541 6863.359 0.962260014x >> StringIndexOfChar.latin1_Short_char 16013.389 16162.437 1.009307711x >> StringIndexOfChar.latin1_mixed_String 7386.123 14771.622 1.999915517x >> StringIndexOfChar.latin1_mixed_char 9901.671 9782.245 0.987938803 > > Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: > > Test clarifications test/jdk/java/lang/StringBuffer/IndexOf.java line 28: > 26: * @summary Test indexOf and lastIndexOf > 27: * @run main/othervm IndexOf > 28: * @run main/othervm -XX:+IgnoreUnrecognizedVMOptions -Xcomp -XX:-TieredCompilation -XX:UseAVX=2 -XX:+UnlockDiagnosticVMOptions -XX:+EnableX86ECoreOpts IndexOf I suggest to split it into 2 subtest jobs and use `@requires vm.cpu.features ~= ".*avx2.*"` for second which specified `-XX:UseAVX=2`. See `compiler/loopopts/superword/TestDependencyOffsets.java` for example. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1613972734 From sgibbons at openjdk.org Fri May 24 20:26:40 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Fri, 24 May 2024 20:26:40 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v39] In-Reply-To: References: Message-ID: > Re-write the IndexOf code without the use of the pcmpestri instruction, only using AVX2 instructions. This change accelerates String.IndexOf on average 1.3x for AVX2. The benchmark numbers: > > > Benchmark Score Latest > StringIndexOf.advancedWithMediumSub 343.573 317.934 0.925375393x > StringIndexOf.advancedWithShortSub1 1039.081 1053.96 1.014319384x > StringIndexOf.advancedWithShortSub2 55.828 110.541 1.980027943x > StringIndexOf.constantPattern 9.361 11.906 1.271872663x > StringIndexOf.searchCharLongSuccess 4.216 4.218 1.000474383x > StringIndexOf.searchCharMediumSuccess 3.133 3.216 1.02649218x > StringIndexOf.searchCharShortSuccess 3.76 3.761 1.000265957x > StringIndexOf.success 9.186 9.713 1.057369911x > StringIndexOf.successBig 14.341 46.343 3.231504079x > StringIndexOfChar.latin1_AVX2_String 6220.918 12154.52 1.953814533x > StringIndexOfChar.latin1_AVX2_char 5503.556 5540.044 1.006629895x > StringIndexOfChar.latin1_SSE4_String 6978.854 6818.689 0.977049957x > StringIndexOfChar.latin1_SSE4_char 5657.499 5474.624 0.967675646x > StringIndexOfChar.latin1_Short_String 7132.541 6863.359 0.962260014x > StringIndexOfChar.latin1_Short_char 16013.389 16162.437 1.009307711x > StringIndexOfChar.latin1_mixed_String 7386.123 14771.622 1.999915517x > StringIndexOfChar.latin1_mixed_char 9901.671 9782.245 0.987938803 Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: Split into two subtest jobs ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16753/files - new: https://git.openjdk.org/jdk/pull/16753/files/485d02fd..69ca8d13 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16753&range=38 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16753&range=37-38 Stats: 8 lines in 1 file changed: 8 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/16753.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16753/head:pull/16753 PR: https://git.openjdk.org/jdk/pull/16753 From duke at openjdk.org Fri May 24 20:26:40 2024 From: duke at openjdk.org (Volodymyr Paprotski) Date: Fri, 24 May 2024 20:26:40 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v37] In-Reply-To: <4xYUBsOJ_eDSuj6w9AjUo_6gFN_9piWR-ChLrHQoXl4=.88756684-8e9c-48e3-8b59-f5f684b81cde@github.com> References: <4xYUBsOJ_eDSuj6w9AjUo_6gFN_9piWR-ChLrHQoXl4=.88756684-8e9c-48e3-8b59-f5f684b81cde@github.com> Message-ID: On Fri, 24 May 2024 15:32:26 GMT, Scott Gibbons wrote: >> Re-write the IndexOf code without the use of the pcmpestri instruction, only using AVX2 instructions. This change accelerates String.IndexOf on average 1.3x for AVX2. The benchmark numbers: >> >> >> Benchmark Score Latest >> StringIndexOf.advancedWithMediumSub 343.573 317.934 0.925375393x >> StringIndexOf.advancedWithShortSub1 1039.081 1053.96 1.014319384x >> StringIndexOf.advancedWithShortSub2 55.828 110.541 1.980027943x >> StringIndexOf.constantPattern 9.361 11.906 1.271872663x >> StringIndexOf.searchCharLongSuccess 4.216 4.218 1.000474383x >> StringIndexOf.searchCharMediumSuccess 3.133 3.216 1.02649218x >> StringIndexOf.searchCharShortSuccess 3.76 3.761 1.000265957x >> StringIndexOf.success 9.186 9.713 1.057369911x >> StringIndexOf.successBig 14.341 46.343 3.231504079x >> StringIndexOfChar.latin1_AVX2_String 6220.918 12154.52 1.953814533x >> StringIndexOfChar.latin1_AVX2_char 5503.556 5540.044 1.006629895x >> StringIndexOfChar.latin1_SSE4_String 6978.854 6818.689 0.977049957x >> StringIndexOfChar.latin1_SSE4_char 5657.499 5474.624 0.967675646x >> StringIndexOfChar.latin1_Short_String 7132.541 6863.359 0.962260014x >> StringIndexOfChar.latin1_Short_char 16013.389 16162.437 1.009307711x >> StringIndexOfChar.latin1_mixed_String 7386.123 14771.622 1.999915517x >> StringIndexOfChar.latin1_mixed_char 9901.671 9782.245 0.987938803 > > Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: > > mov64 => lea(InternalAddress) src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 4633: > 4631: andl(result, 0x0000000f); // tail count (in bytes) > 4632: andl(limit, 0xfffffff0); // vector count (in bytes) > 4633: jcc(Assembler::zero, COMPARE_TAIL); In the `expand_ary2` case, this is the same andl/compare as line 4549; i.e. I think you can just put `jcc(Assembler::zero, COMPARE_TAIL);` on line 4549, inside the if (and move the other jcc into the else branch)? src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 4639: > 4637: negptr(limit); > 4638: > 4639: bind(COMPARE_WIDE_VECTORS_16); Understanding-check.. this loop will execute at most 2 times, right? i.e. process as many 32-byte chunks as possible, then 1-or-2 16-byte chunks then byte-by-byte? (Still a good optimization, just trying to understand the scope) src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 4718: > 4716: jmp(TRUE_LABEL); > 4717: } else { > 4718: movl(chr, Address(ary1, limit, scaleFactor)); scaleFactor is always Address::times_1 here (expand_ary2==false), might be clearer to change it back test/jdk/java/lang/StringBuffer/ECoreIndexOf.java line 57: > 55: > 56: generator = new Random(); > 57: long seed = generator.nextLong();//-5291521104060046276L; dead code test/jdk/java/lang/StringBuffer/ECoreIndexOf.java line 63: > 61: /////////////////////////// WARM-UP ////////////////////////// > 62: > 63: for (int i = 0; i < 20000; i++) { -Xcomp should be more deterministic (and quicker) way to reach the intrinsic (i.e. like the other tests) On other hand, perhaps this doesn't matter? @vnkozlov Understanding-check please.. these tests will run as part of every build from this point-till-infinity; Therefore, long test will affect every openjdk developer. But if this test is not run on every build, then the build-time does not matter, then this test can run for as long as it 'wants'. test/jdk/java/lang/StringBuffer/ECoreIndexOf.java line 160: > 158: } > 159: > 160: private static String generateTestString(int min, int max) { I see you have various `Charset[] charSets` above, but this function still only generates LL. Are those separate tests? Or am I missing some concatenation somewhere that will convert the generated string string to the correct encoding? You could had implemented my suggestion from IndexOf.generateTestString here instead, so that the tests that do call this function endup with multiple encodings; i.e. similar to what you already do in the next function. I suppose, with addition of String/IndexOf.java that is a moot point. test/jdk/java/lang/StringBuffer/ECoreIndexOf.java line 185: > 183: } > 184: > 185: private static int indexOfKernel(String haystack, String needle) { Is the intention of kernels not to be inlined so that it would be part of separate compilation? If so, you probably want to annotate it with `@CompilerControl(CompilerControl.Mode.DONT_INLINE)` i.e. https://github.com/openjdk/jmh/blob/master/jmh-samples/src/main/java/org/openjdk/jmh/samples/JMHSample_16_CompilerControl.java test/jdk/java/lang/StringBuffer/ECoreIndexOf.java line 539: > 537: failCount = indexOfKernel("", ""); > 538: > 539: for (int x = 0; x < 1000000; x++) { Should we be concerned about the increased run-time? Or does this execute 'quickly enough' ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1613940896 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1613943518 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1613946470 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1613955620 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1613955354 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1613970971 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1613967681 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1613983597 From sgibbons at openjdk.org Fri May 24 20:26:41 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Fri, 24 May 2024 20:26:41 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v38] In-Reply-To: References: Message-ID: On Fri, 24 May 2024 20:12:07 GMT, Vladimir Kozlov wrote: >> Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: >> >> Test clarifications > > test/jdk/java/lang/StringBuffer/IndexOf.java line 28: > >> 26: * @summary Test indexOf and lastIndexOf >> 27: * @run main/othervm IndexOf >> 28: * @run main/othervm -XX:+IgnoreUnrecognizedVMOptions -Xcomp -XX:-TieredCompilation -XX:UseAVX=2 -XX:+UnlockDiagnosticVMOptions -XX:+EnableX86ECoreOpts IndexOf > > I suggest to split it into 2 subtest jobs and use `@requires vm.cpu.features ~= ".*avx2.*"` for second which specified `-XX:UseAVX=2`. > See `compiler/loopopts/superword/TestDependencyOffsets.java` for example. Right. Done. Also added `@requires vm.compiler2.enabled` since my stub is only valid with C2. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1613985672 From duke at openjdk.org Fri May 24 20:26:41 2024 From: duke at openjdk.org (Volodymyr Paprotski) Date: Fri, 24 May 2024 20:26:41 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v19] In-Reply-To: References: Message-ID: On Wed, 22 May 2024 14:50:40 GMT, Scott Gibbons wrote: >> test/jdk/java/lang/StringBuffer/IndexOf.java line 284: >> >>> 282: >>> 283: // Note: it is possible although highly improbable that failCount will >>> 284: // be > 0 even if everthing is working ok >> >> This sounds like either a bug or a testcase bug? Same as line 301, `extremely remote possibility of > 1 match`? > > This was there from the original author. I think they were trying to infer that a match could occur in the rare case that the same random string was produced. They're random after all, and there's no reason the same sequence could be generated. Makes sense ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1613872215 From jsjolen at openjdk.org Fri May 24 20:37:41 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Fri, 24 May 2024 20:37:41 GMT Subject: RFR: 8312132: Add tracking of multiple address spaces in NMT [v108] In-Reply-To: References: Message-ID: > Hi, > > This PR introduces a new abstraction to NMT, named `MemoryFileTracker`. Today, NMT does not track any memory outside of the virtual memory address space. This means that if you allocated memory in something such as a memory-backed file and use `mmap` to map into that memory, then you'll have trouble reporting this to NMT. This is the situation that ZGC is in, and that is what this patch attempts to fix. > > ## `MemoryFileTracker` > > The `MemoryFileTracker` adds the ability of adding new virtual memory address spaces to NMT and committing memory to these, the basic API is: > > ```c++ > static MemoryFile* make_device(const char* descriptive_name); > static void free_device(MemoryFile* device); > > static void allocate_memory(MemoryFile* device, size_t offset, size_t size, > MEMFLAGS flag, const NativeCallStack& stack); > static void free_memory(MemoryFile* device, size_t offset, size_t size); > > > It is easiest to see how this is used by looking at what ZGC's `ZNMT` class does: > > ```c++ > void ZNMT::reserve(zaddress_unsafe start, size_t size) { > MemTracker::record_virtual_memory_reserve((address)start, size, CALLER_PC, mtJavaHeap); > } > void ZNMT::commit(zoffset offset, size_t size) { > MemTracker::allocate_memory_in(ZNMT::_device, static_cast(offset), size, mtJavaHeap, CALLER_PC); > } > void ZNMT::uncommit(zoffset offset, size_t size) { > MemTracker::free_memory_in(ZNMT::_device, (size_t)offset, size); > } > > void ZNMT::map(zaddress_unsafe addr, size_t size, zoffset offset) { > // NMT doesn't track mappings at the moment. > } > void ZNMT::unmap(zaddress_unsafe addr, size_t size) { > // NMT doesn't track mappings at the moment. > } > > > As you can see, any mapping between reserved regions and device-allocated memory is not recorded in NMT. This means that in detailed mode you only get reserved regions printed for the reserved memory, the device-allocated memory is reported separately. When performing summary reporting any memory allocated via these devices is added to the corresponding `MEMFLAGS` as `committed` memory. > > This patch is also acting as a base on which we deploy multiple new backend ideas to NMT. These ideas are: > > 1. Implement VMA tracking using a balanced binary tree approach. Today's `VirtualMemoryTracker`'s usage of linked lists is slow and brittle, we'd like to move away from it. Our Treap-based approach in this patch gives a performance boost such that we see 25x better performance in a benchmark. The idea and draft of this... Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: Switch to the static naming of num_pages ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18289/files - new: https://git.openjdk.org/jdk/pull/18289/files/4aaa0927..c1938adf Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18289&range=107 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18289&range=106-107 Stats: 8 lines in 1 file changed: 0 ins; 0 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/18289.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18289/head:pull/18289 PR: https://git.openjdk.org/jdk/pull/18289 From sgibbons at openjdk.org Fri May 24 20:47:23 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Fri, 24 May 2024 20:47:23 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v40] In-Reply-To: References: Message-ID: > Re-write the IndexOf code without the use of the pcmpestri instruction, only using AVX2 instructions. This change accelerates String.IndexOf on average 1.3x for AVX2. The benchmark numbers: > > > Benchmark Score Latest > StringIndexOf.advancedWithMediumSub 343.573 317.934 0.925375393x > StringIndexOf.advancedWithShortSub1 1039.081 1053.96 1.014319384x > StringIndexOf.advancedWithShortSub2 55.828 110.541 1.980027943x > StringIndexOf.constantPattern 9.361 11.906 1.271872663x > StringIndexOf.searchCharLongSuccess 4.216 4.218 1.000474383x > StringIndexOf.searchCharMediumSuccess 3.133 3.216 1.02649218x > StringIndexOf.searchCharShortSuccess 3.76 3.761 1.000265957x > StringIndexOf.success 9.186 9.713 1.057369911x > StringIndexOf.successBig 14.341 46.343 3.231504079x > StringIndexOfChar.latin1_AVX2_String 6220.918 12154.52 1.953814533x > StringIndexOfChar.latin1_AVX2_char 5503.556 5540.044 1.006629895x > StringIndexOfChar.latin1_SSE4_String 6978.854 6818.689 0.977049957x > StringIndexOfChar.latin1_SSE4_char 5657.499 5474.624 0.967675646x > StringIndexOfChar.latin1_Short_String 7132.541 6863.359 0.962260014x > StringIndexOfChar.latin1_Short_char 16013.389 16162.437 1.009307711x > StringIndexOfChar.latin1_mixed_String 7386.123 14771.622 1.999915517x > StringIndexOfChar.latin1_mixed_char 9901.671 9782.245 0.987938803 Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: Review comments. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16753/files - new: https://git.openjdk.org/jdk/pull/16753/files/69ca8d13..be001e2c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16753&range=39 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16753&range=38-39 Stats: 13 lines in 2 files changed: 10 ins; 1 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/16753.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16753/head:pull/16753 PR: https://git.openjdk.org/jdk/pull/16753 From sgibbons at openjdk.org Fri May 24 20:47:24 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Fri, 24 May 2024 20:47:24 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v37] In-Reply-To: References: <4xYUBsOJ_eDSuj6w9AjUo_6gFN_9piWR-ChLrHQoXl4=.88756684-8e9c-48e3-8b59-f5f684b81cde@github.com> Message-ID: On Fri, 24 May 2024 19:30:54 GMT, Volodymyr Paprotski wrote: >> Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: >> >> mov64 => lea(InternalAddress) > > src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 4633: > >> 4631: andl(result, 0x0000000f); // tail count (in bytes) >> 4632: andl(limit, 0xfffffff0); // vector count (in bytes) >> 4633: jcc(Assembler::zero, COMPARE_TAIL); > > In the `expand_ary2` case, this is the same andl/compare as line 4549; i.e. I think you can just put `jcc(Assembler::zero, COMPARE_TAIL);` on line 4549, inside the if (and move the other jcc into the else branch)? OK. Shortens pathlength by 4 instructions. > src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 4639: > >> 4637: negptr(limit); >> 4638: >> 4639: bind(COMPARE_WIDE_VECTORS_16); > > Understanding-check.. this loop will execute at most 2 times, right? > > i.e. process as many 32-byte chunks as possible, then 1-or-2 16-byte chunks then byte-by-byte? > > (Still a good optimization, just trying to understand the scope) Yes. > src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 4718: > >> 4716: jmp(TRUE_LABEL); >> 4717: } else { >> 4718: movl(chr, Address(ary1, limit, scaleFactor)); > > scaleFactor is always Address::times_1 here (expand_ary2==false), might be clearer to change it back *Sigh*. Changing it back. > test/jdk/java/lang/StringBuffer/ECoreIndexOf.java line 57: > >> 55: >> 56: generator = new Random(); >> 57: long seed = generator.nextLong();//-5291521104060046276L; > > dead code Fixed > test/jdk/java/lang/StringBuffer/ECoreIndexOf.java line 63: > >> 61: /////////////////////////// WARM-UP ////////////////////////// >> 62: >> 63: for (int i = 0; i < 20000; i++) { > > -Xcomp should be more deterministic (and quicker) way to reach the intrinsic (i.e. like the other tests) > > On other hand, perhaps this doesn't matter? @vnkozlov Understanding-check please.. these tests will run as part of every build from this point-till-infinity; Therefore, long test will affect every openjdk developer. But if this test is not run on every build, then the build-time does not matter, then this test can run for as long as it 'wants'. This test runs in well under 2 minutes. I'm not sure what is trying to be accomplished? > test/jdk/java/lang/StringBuffer/ECoreIndexOf.java line 160: > >> 158: } >> 159: >> 160: private static String generateTestString(int min, int max) { > > I see you have various `Charset[] charSets` above, but this function still only generates LL. Are those separate tests? Or am I missing some concatenation somewhere that will convert the generated string string to the correct encoding? > > You could had implemented my suggestion from IndexOf.generateTestString here instead, so that the tests that do call this function endup with multiple encodings; i.e. similar to what you already do in the next function. > > I suppose, with addition of String/IndexOf.java that is a moot point. Yes, I think it's a moot point. Thanks. > test/jdk/java/lang/StringBuffer/ECoreIndexOf.java line 185: > >> 183: } >> 184: >> 185: private static int indexOfKernel(String haystack, String needle) { > > Is the intention of kernels not to be inlined so that it would be part of separate compilation? > > If so, you probably want to annotate it with `@CompilerControl(CompilerControl.Mode.DONT_INLINE)` > > i.e. https://github.com/openjdk/jmh/blob/master/jmh-samples/src/main/java/org/openjdk/jmh/samples/JMHSample_16_CompilerControl.java Fixed. > test/jdk/java/lang/StringBuffer/ECoreIndexOf.java line 539: > >> 537: failCount = indexOfKernel("", ""); >> 538: >> 539: for (int x = 0; x < 1000000; x++) { > > Should we be concerned about the increased run-time? Or does this execute 'quickly enough' Runs in well under 2 minutes. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1613997645 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1613993657 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1613998432 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1614000081 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1614000885 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1614001480 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1614002801 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1614003072 From jsjolen at openjdk.org Fri May 24 21:26:27 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Fri, 24 May 2024 21:26:27 GMT Subject: RFR: 8312132: Add tracking of multiple address spaces in NMT [v109] In-Reply-To: References: Message-ID: > Hi, > > This PR introduces a new abstraction to NMT, named `MemoryFileTracker`. Today, NMT does not track any memory outside of the virtual memory address space. This means that if you allocated memory in something such as a memory-backed file and use `mmap` to map into that memory, then you'll have trouble reporting this to NMT. This is the situation that ZGC is in, and that is what this patch attempts to fix. > > ## `MemoryFileTracker` > > The `MemoryFileTracker` adds the ability of adding new virtual memory address spaces to NMT and committing memory to these, the basic API is: > > ```c++ > static MemoryFile* make_device(const char* descriptive_name); > static void free_device(MemoryFile* device); > > static void allocate_memory(MemoryFile* device, size_t offset, size_t size, > MEMFLAGS flag, const NativeCallStack& stack); > static void free_memory(MemoryFile* device, size_t offset, size_t size); > > > It is easiest to see how this is used by looking at what ZGC's `ZNMT` class does: > > ```c++ > void ZNMT::reserve(zaddress_unsafe start, size_t size) { > MemTracker::record_virtual_memory_reserve((address)start, size, CALLER_PC, mtJavaHeap); > } > void ZNMT::commit(zoffset offset, size_t size) { > MemTracker::allocate_memory_in(ZNMT::_device, static_cast(offset), size, mtJavaHeap, CALLER_PC); > } > void ZNMT::uncommit(zoffset offset, size_t size) { > MemTracker::free_memory_in(ZNMT::_device, (size_t)offset, size); > } > > void ZNMT::map(zaddress_unsafe addr, size_t size, zoffset offset) { > // NMT doesn't track mappings at the moment. > } > void ZNMT::unmap(zaddress_unsafe addr, size_t size) { > // NMT doesn't track mappings at the moment. > } > > > As you can see, any mapping between reserved regions and device-allocated memory is not recorded in NMT. This means that in detailed mode you only get reserved regions printed for the reserved memory, the device-allocated memory is reported separately. When performing summary reporting any memory allocated via these devices is added to the corresponding `MEMFLAGS` as `committed` memory. > > This patch is also acting as a base on which we deploy multiple new backend ideas to NMT. These ideas are: > > 1. Implement VMA tracking using a balanced binary tree approach. Today's `VirtualMemoryTracker`'s usage of linked lists is slow and brittle, we'd like to move away from it. Our Treap-based approach in this patch gives a performance boost such that we see 25x better performance in a benchmark. The idea and draft of this... Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: Linkage issue ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18289/files - new: https://git.openjdk.org/jdk/pull/18289/files/c1938adf..164996b9 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18289&range=108 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18289&range=107-108 Stats: 3 lines in 1 file changed: 2 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/18289.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18289/head:pull/18289 PR: https://git.openjdk.org/jdk/pull/18289 From jsjolen at openjdk.org Fri May 24 21:35:02 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Fri, 24 May 2024 21:35:02 GMT Subject: RFR: 8312132: Add tracking of multiple address spaces in NMT [v110] In-Reply-To: References: Message-ID: > Hi, > > This PR introduces a new abstraction to NMT, named `MemoryFileTracker`. Today, NMT does not track any memory outside of the virtual memory address space. This means that if you allocated memory in something such as a memory-backed file and use `mmap` to map into that memory, then you'll have trouble reporting this to NMT. This is the situation that ZGC is in, and that is what this patch attempts to fix. > > ## `MemoryFileTracker` > > The `MemoryFileTracker` adds the ability of adding new virtual memory address spaces to NMT and committing memory to these, the basic API is: > > ```c++ > static MemoryFile* make_device(const char* descriptive_name); > static void free_device(MemoryFile* device); > > static void allocate_memory(MemoryFile* device, size_t offset, size_t size, > MEMFLAGS flag, const NativeCallStack& stack); > static void free_memory(MemoryFile* device, size_t offset, size_t size); > > > It is easiest to see how this is used by looking at what ZGC's `ZNMT` class does: > > ```c++ > void ZNMT::reserve(zaddress_unsafe start, size_t size) { > MemTracker::record_virtual_memory_reserve((address)start, size, CALLER_PC, mtJavaHeap); > } > void ZNMT::commit(zoffset offset, size_t size) { > MemTracker::allocate_memory_in(ZNMT::_device, static_cast(offset), size, mtJavaHeap, CALLER_PC); > } > void ZNMT::uncommit(zoffset offset, size_t size) { > MemTracker::free_memory_in(ZNMT::_device, (size_t)offset, size); > } > > void ZNMT::map(zaddress_unsafe addr, size_t size, zoffset offset) { > // NMT doesn't track mappings at the moment. > } > void ZNMT::unmap(zaddress_unsafe addr, size_t size) { > // NMT doesn't track mappings at the moment. > } > > > As you can see, any mapping between reserved regions and device-allocated memory is not recorded in NMT. This means that in detailed mode you only get reserved regions printed for the reserved memory, the device-allocated memory is reported separately. When performing summary reporting any memory allocated via these devices is added to the corresponding `MEMFLAGS` as `committed` memory. > > This patch is also acting as a base on which we deploy multiple new backend ideas to NMT. These ideas are: > > 1. Implement VMA tracking using a balanced binary tree approach. Today's `VirtualMemoryTracker`'s usage of linked lists is slow and brittle, we'd like to move away from it. Our Treap-based approach in this patch gives a performance boost such that we see 25x better performance in a benchmark. The idea and draft of this... Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: Return fake stack if not in detailed mode ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18289/files - new: https://git.openjdk.org/jdk/pull/18289/files/164996b9..67626aca Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18289&range=109 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18289&range=108-109 Stats: 7 lines in 1 file changed: 5 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/18289.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18289/head:pull/18289 PR: https://git.openjdk.org/jdk/pull/18289 From sviswanathan at openjdk.org Fri May 24 22:33:16 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Fri, 24 May 2024 22:33:16 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v35] In-Reply-To: <-vyOZzeMslZqgJpTsQnnOWi4abWiM8fNeWSVx5LEHm8=.d37011ee-102c-4874-aa26-d113949d25ea@github.com> References: <-vyOZzeMslZqgJpTsQnnOWi4abWiM8fNeWSVx5LEHm8=.d37011ee-102c-4874-aa26-d113949d25ea@github.com> Message-ID: On Thu, 23 May 2024 23:12:42 GMT, Scott Gibbons wrote: >> Re-write the IndexOf code without the use of the pcmpestri instruction, only using AVX2 instructions. This change accelerates String.IndexOf on average 1.3x for AVX2. The benchmark numbers: >> >> >> Benchmark Score Latest >> StringIndexOf.advancedWithMediumSub 343.573 317.934 0.925375393x >> StringIndexOf.advancedWithShortSub1 1039.081 1053.96 1.014319384x >> StringIndexOf.advancedWithShortSub2 55.828 110.541 1.980027943x >> StringIndexOf.constantPattern 9.361 11.906 1.271872663x >> StringIndexOf.searchCharLongSuccess 4.216 4.218 1.000474383x >> StringIndexOf.searchCharMediumSuccess 3.133 3.216 1.02649218x >> StringIndexOf.searchCharShortSuccess 3.76 3.761 1.000265957x >> StringIndexOf.success 9.186 9.713 1.057369911x >> StringIndexOf.successBig 14.341 46.343 3.231504079x >> StringIndexOfChar.latin1_AVX2_String 6220.918 12154.52 1.953814533x >> StringIndexOfChar.latin1_AVX2_char 5503.556 5540.044 1.006629895x >> StringIndexOfChar.latin1_SSE4_String 6978.854 6818.689 0.977049957x >> StringIndexOfChar.latin1_SSE4_char 5657.499 5474.624 0.967675646x >> StringIndexOfChar.latin1_Short_String 7132.541 6863.359 0.962260014x >> StringIndexOfChar.latin1_Short_char 16013.389 16162.437 1.009307711x >> StringIndexOfChar.latin1_mixed_String 7386.123 14771.622 1.999915517x >> StringIndexOfChar.latin1_mixed_char 9901.671 9782.245 0.987938803 > > Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: > > Review comments - move stubGen*_string.cpp to c2_stubGen*_string.cpp src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 1754: > 1752: continue; > 1753: } else { > 1754: Label L_loopTop; L_loopTop label not used in the else block. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1612495013 From sviswanathan at openjdk.org Fri May 24 22:33:15 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Fri, 24 May 2024 22:33:15 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v40] In-Reply-To: References: Message-ID: <6r30gPhGsZAoAOSYsP39qr2czQ8Wj7YMOxlP2VZZpAI=.61ee3985-d3a5-40b2-9bce-453253185600@github.com> On Fri, 24 May 2024 20:47:23 GMT, Scott Gibbons wrote: >> Re-write the IndexOf code without the use of the pcmpestri instruction, only using AVX2 instructions. This change accelerates String.IndexOf on average 1.3x for AVX2. The benchmark numbers: >> >> >> Benchmark Score Latest >> StringIndexOf.advancedWithMediumSub 343.573 317.934 0.925375393x >> StringIndexOf.advancedWithShortSub1 1039.081 1053.96 1.014319384x >> StringIndexOf.advancedWithShortSub2 55.828 110.541 1.980027943x >> StringIndexOf.constantPattern 9.361 11.906 1.271872663x >> StringIndexOf.searchCharLongSuccess 4.216 4.218 1.000474383x >> StringIndexOf.searchCharMediumSuccess 3.133 3.216 1.02649218x >> StringIndexOf.searchCharShortSuccess 3.76 3.761 1.000265957x >> StringIndexOf.success 9.186 9.713 1.057369911x >> StringIndexOf.successBig 14.341 46.343 3.231504079x >> StringIndexOfChar.latin1_AVX2_String 6220.918 12154.52 1.953814533x >> StringIndexOfChar.latin1_AVX2_char 5503.556 5540.044 1.006629895x >> StringIndexOfChar.latin1_SSE4_String 6978.854 6818.689 0.977049957x >> StringIndexOfChar.latin1_SSE4_char 5657.499 5474.624 0.967675646x >> StringIndexOfChar.latin1_Short_String 7132.541 6863.359 0.962260014x >> StringIndexOfChar.latin1_Short_char 16013.389 16162.437 1.009307711x >> StringIndexOfChar.latin1_mixed_String 7386.123 14771.622 1.999915517x >> StringIndexOfChar.latin1_mixed_char 9901.671 9782.245 0.987938803 > > Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: > > Review comments. src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 1122: > 1120: // eq_mask - The bit mask returned that holds the result of the comparison > 1121: // rTmp - a temporary register > 1122: // rTmp2 - a temporary register There is no rtmp, rtmp2 here. src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 1129: > 1127: // _masm - Current MacroAssembler instance pointer > 1128: // > 1129: // If (n - k) < 32, need to handle reading past end of haystack Don't see (n-k) < 32 being handled in this function. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1614091336 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1614092828 From sviswanathan at openjdk.org Fri May 24 22:33:13 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Fri, 24 May 2024 22:33:13 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v20] In-Reply-To: References: Message-ID: On Fri, 17 May 2024 23:47:45 GMT, Scott Gibbons wrote: >> Re-write the IndexOf code without the use of the pcmpestri instruction, only using AVX2 instructions. This change accelerates String.IndexOf on average 1.3x for AVX2. The benchmark numbers: >> >> >> Benchmark Score Latest >> StringIndexOf.advancedWithMediumSub 343.573 317.934 0.925375393x >> StringIndexOf.advancedWithShortSub1 1039.081 1053.96 1.014319384x >> StringIndexOf.advancedWithShortSub2 55.828 110.541 1.980027943x >> StringIndexOf.constantPattern 9.361 11.906 1.271872663x >> StringIndexOf.searchCharLongSuccess 4.216 4.218 1.000474383x >> StringIndexOf.searchCharMediumSuccess 3.133 3.216 1.02649218x >> StringIndexOf.searchCharShortSuccess 3.76 3.761 1.000265957x >> StringIndexOf.success 9.186 9.713 1.057369911x >> StringIndexOf.successBig 14.341 46.343 3.231504079x >> StringIndexOfChar.latin1_AVX2_String 6220.918 12154.52 1.953814533x >> StringIndexOfChar.latin1_AVX2_char 5503.556 5540.044 1.006629895x >> StringIndexOfChar.latin1_SSE4_String 6978.854 6818.689 0.977049957x >> StringIndexOfChar.latin1_SSE4_char 5657.499 5474.624 0.967675646x >> StringIndexOfChar.latin1_Short_String 7132.541 6863.359 0.962260014x >> StringIndexOfChar.latin1_Short_char 16013.389 16162.437 1.009307711x >> StringIndexOfChar.latin1_mixed_String 7386.123 14771.622 1.999915517x >> StringIndexOfChar.latin1_mixed_char 9901.671 9782.245 0.987938803 > > Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: > > Addressing lots of comments. Interim commit. src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 4737: > 4735: bind(COMPARE_BYTE); > 4736: } else { > 4737: lea(ary1, Address(ary1, expand_ary2 ? 4 : 2)); This change is not required. expand_ary2 code doesn't come here. src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1233: > 1231: __ andq(eq_mask, rTmp); > 1232: > 1233: __ testl(eq_mask, eq_mask); Mismatch of operation size q vs l: andq and testl. src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1623: > 1621: //////////////////////////////////////////////////////////////////////////////////////// > 1622: // > 1623: // Small haystack (<32 bytes) switch This should be <= 32 bytes. src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1709: > 1707: // XMM_BYTE_K - last element of needle, broadcast > 1708: // > 1709: // The haystack is >= 32 bytes Should this be > 32 bytes? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1609023624 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1609043720 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1609160143 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1609163535 From sviswanathan at openjdk.org Fri May 24 22:33:17 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Fri, 24 May 2024 22:33:17 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v25] In-Reply-To: References: Message-ID: On Wed, 22 May 2024 17:40:24 GMT, Scott Gibbons wrote: >> Re-write the IndexOf code without the use of the pcmpestri instruction, only using AVX2 instructions. This change accelerates String.IndexOf on average 1.3x for AVX2. The benchmark numbers: >> >> >> Benchmark Score Latest >> StringIndexOf.advancedWithMediumSub 343.573 317.934 0.925375393x >> StringIndexOf.advancedWithShortSub1 1039.081 1053.96 1.014319384x >> StringIndexOf.advancedWithShortSub2 55.828 110.541 1.980027943x >> StringIndexOf.constantPattern 9.361 11.906 1.271872663x >> StringIndexOf.searchCharLongSuccess 4.216 4.218 1.000474383x >> StringIndexOf.searchCharMediumSuccess 3.133 3.216 1.02649218x >> StringIndexOf.searchCharShortSuccess 3.76 3.761 1.000265957x >> StringIndexOf.success 9.186 9.713 1.057369911x >> StringIndexOf.successBig 14.341 46.343 3.231504079x >> StringIndexOfChar.latin1_AVX2_String 6220.918 12154.52 1.953814533x >> StringIndexOfChar.latin1_AVX2_char 5503.556 5540.044 1.006629895x >> StringIndexOfChar.latin1_SSE4_String 6978.854 6818.689 0.977049957x >> StringIndexOfChar.latin1_SSE4_char 5657.499 5474.624 0.967675646x >> StringIndexOfChar.latin1_Short_String 7132.541 6863.359 0.962260014x >> StringIndexOfChar.latin1_Short_char 16013.389 16162.437 1.009307711x >> StringIndexOfChar.latin1_mixed_String 7386.123 14771.622 1.999915517x >> StringIndexOfChar.latin1_mixed_char 9901.671 9782.245 0.987938803 > > Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: > > un-helper-ize preload_needle_helper; try fix for macos build src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 101: > 99: if (isU) { \ > 100: __ blsrl(tmp, mask); \ > 101: __ blsrl(mask, tmp); \ We could do this as __ blsrl(mask, mask); \ __ blsrl(mask, mask); \ Thereby removing the need for tmp. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1610453473 From sviswanathan at openjdk.org Fri May 24 22:33:19 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Fri, 24 May 2024 22:33:19 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v27] In-Reply-To: References: Message-ID: On Wed, 22 May 2024 18:52:27 GMT, Scott Gibbons wrote: >> Re-write the IndexOf code without the use of the pcmpestri instruction, only using AVX2 instructions. This change accelerates String.IndexOf on average 1.3x for AVX2. The benchmark numbers: >> >> >> Benchmark Score Latest >> StringIndexOf.advancedWithMediumSub 343.573 317.934 0.925375393x >> StringIndexOf.advancedWithShortSub1 1039.081 1053.96 1.014319384x >> StringIndexOf.advancedWithShortSub2 55.828 110.541 1.980027943x >> StringIndexOf.constantPattern 9.361 11.906 1.271872663x >> StringIndexOf.searchCharLongSuccess 4.216 4.218 1.000474383x >> StringIndexOf.searchCharMediumSuccess 3.133 3.216 1.02649218x >> StringIndexOf.searchCharShortSuccess 3.76 3.761 1.000265957x >> StringIndexOf.success 9.186 9.713 1.057369911x >> StringIndexOf.successBig 14.341 46.343 3.231504079x >> StringIndexOfChar.latin1_AVX2_String 6220.918 12154.52 1.953814533x >> StringIndexOfChar.latin1_AVX2_char 5503.556 5540.044 1.006629895x >> StringIndexOfChar.latin1_SSE4_String 6978.854 6818.689 0.977049957x >> StringIndexOfChar.latin1_SSE4_char 5657.499 5474.624 0.967675646x >> StringIndexOfChar.latin1_Short_String 7132.541 6863.359 0.962260014x >> StringIndexOfChar.latin1_Short_char 16013.389 16162.437 1.009307711x >> StringIndexOfChar.latin1_mixed_String 7386.123 14771.622 1.999915517x >> StringIndexOfChar.latin1_mixed_char 9901.671 9782.245 0.987938803 > > Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: > > Revert last change to IndexOf.java src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1218: > 1216: // isU - true if argument encoding is either UU or UL > 1217: // eq_mask - The bit mask returned that holds the result of the comparison > 1218: // needleLen - a temporary register. Only used if isUL true needleLen is not a temporary register. needleLen is used to read the kThByte from haystack below when !sizeKnown so must hold valid info. src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1439: > 1437: // back to last valid read position > 1438: __ cmpq(hsPtrRet, last); > 1439: __ jb_b(L_midLoop); could be jbe_b? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1610617943 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1610740998 From sgibbons at openjdk.org Fri May 24 23:11:13 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Fri, 24 May 2024 23:11:13 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v20] In-Reply-To: References: Message-ID: <3A1V-APGmN8EO49abMKEzdGA-VLYsIiKtTrJCPtuYUc=.a6c13f99-c314-4872-a347-02e6c8a6b8aa@github.com> On Tue, 21 May 2024 22:39:42 GMT, Sandhya Viswanathan wrote: >> Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: >> >> Addressing lots of comments. Interim commit. > > src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 4737: > >> 4735: bind(COMPARE_BYTE); >> 4736: } else { >> 4737: lea(ary1, Address(ary1, expand_ary2 ? 4 : 2)); > > This change is not required. expand_ary2 code doesn't come here. Right. Fixed. > src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1233: > >> 1231: __ andq(eq_mask, rTmp); >> 1232: >> 1233: __ testl(eq_mask, eq_mask); > > Mismatch of operation size q vs l: andq and testl. Fixed. > src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1623: > >> 1621: //////////////////////////////////////////////////////////////////////////////////////// >> 1622: // >> 1623: // Small haystack (<32 bytes) switch > > This should be <= 32 bytes. Fixed. > src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1709: > >> 1707: // XMM_BYTE_K - last element of needle, broadcast >> 1708: // >> 1709: // The haystack is >= 32 bytes > > Should this be > 32 bytes? Fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1614114763 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1614127986 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1614127889 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1614127781 From sgibbons at openjdk.org Fri May 24 23:11:15 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Fri, 24 May 2024 23:11:15 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v40] In-Reply-To: <6r30gPhGsZAoAOSYsP39qr2czQ8Wj7YMOxlP2VZZpAI=.61ee3985-d3a5-40b2-9bce-453253185600@github.com> References: <6r30gPhGsZAoAOSYsP39qr2czQ8Wj7YMOxlP2VZZpAI=.61ee3985-d3a5-40b2-9bce-453253185600@github.com> Message-ID: <38c22L3m_I_joyXB6ZAzaAaec3-Gj4spqor35Pv1h6c=.31c1e620-dd74-49d9-8c8b-a4864167a6cc@github.com> On Fri, 24 May 2024 22:26:56 GMT, Sandhya Viswanathan wrote: >> Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: >> >> Review comments. > > src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 1122: > >> 1120: // eq_mask - The bit mask returned that holds the result of the comparison >> 1121: // rTmp - a temporary register >> 1122: // rTmp2 - a temporary register > > There is no rtmp, rtmp2 here. Fixed. > src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 1129: > >> 1127: // _masm - Current MacroAssembler instance pointer >> 1128: // >> 1129: // If (n - k) < 32, need to handle reading past end of haystack > > Don't see (n-k) < 32 being handled in this function. Fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1614116033 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1614116814 From sgibbons at openjdk.org Fri May 24 23:11:16 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Fri, 24 May 2024 23:11:16 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v35] In-Reply-To: References: <-vyOZzeMslZqgJpTsQnnOWi4abWiM8fNeWSVx5LEHm8=.d37011ee-102c-4874-aa26-d113949d25ea@github.com> Message-ID: On Fri, 24 May 2024 00:09:38 GMT, Sandhya Viswanathan wrote: >> Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: >> >> Review comments - move stubGen*_string.cpp to c2_stubGen*_string.cpp > > src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 1754: > >> 1752: continue; >> 1753: } else { >> 1754: Label L_loopTop; > > L_loopTop label not used in the else block. Removed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1614117294 From sgibbons at openjdk.org Fri May 24 23:11:17 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Fri, 24 May 2024 23:11:17 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v25] In-Reply-To: References: Message-ID: On Wed, 22 May 2024 18:22:24 GMT, Sandhya Viswanathan wrote: >> Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: >> >> un-helper-ize preload_needle_helper; try fix for macos build > > src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 101: > >> 99: if (isU) { \ >> 100: __ blsrl(tmp, mask); \ >> 101: __ blsrl(mask, tmp); \ > > We could do this as > __ blsrl(mask, mask); \ > __ blsrl(mask, mask); \ > Thereby removing the need for tmp. Fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1614127638 From sgibbons at openjdk.org Fri May 24 23:11:19 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Fri, 24 May 2024 23:11:19 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v27] In-Reply-To: References: Message-ID: On Wed, 22 May 2024 20:36:25 GMT, Sandhya Viswanathan wrote: >> Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: >> >> Revert last change to IndexOf.java > > src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1218: > >> 1216: // isU - true if argument encoding is either UU or UL >> 1217: // eq_mask - The bit mask returned that holds the result of the comparison >> 1218: // needleLen - a temporary register. Only used if isUL true > > needleLen is not a temporary register. needleLen is used to read the kThByte from haystack below when !sizeKnown so must hold valid info. Fixed. > src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1439: > >> 1437: // back to last valid read position >> 1438: __ cmpq(hsPtrRet, last); >> 1439: __ jb_b(L_midLoop); > > could be jbe_b? Fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1614127526 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1614127356 From sgibbons at openjdk.org Fri May 24 23:15:26 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Fri, 24 May 2024 23:15:26 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v41] In-Reply-To: References: Message-ID: > Re-write the IndexOf code without the use of the pcmpestri instruction, only using AVX2 instructions. This change accelerates String.IndexOf on average 1.3x for AVX2. The benchmark numbers: > > > Benchmark Score Latest > StringIndexOf.advancedWithMediumSub 343.573 317.934 0.925375393x > StringIndexOf.advancedWithShortSub1 1039.081 1053.96 1.014319384x > StringIndexOf.advancedWithShortSub2 55.828 110.541 1.980027943x > StringIndexOf.constantPattern 9.361 11.906 1.271872663x > StringIndexOf.searchCharLongSuccess 4.216 4.218 1.000474383x > StringIndexOf.searchCharMediumSuccess 3.133 3.216 1.02649218x > StringIndexOf.searchCharShortSuccess 3.76 3.761 1.000265957x > StringIndexOf.success 9.186 9.713 1.057369911x > StringIndexOf.successBig 14.341 46.343 3.231504079x > StringIndexOfChar.latin1_AVX2_String 6220.918 12154.52 1.953814533x > StringIndexOfChar.latin1_AVX2_char 5503.556 5540.044 1.006629895x > StringIndexOfChar.latin1_SSE4_String 6978.854 6818.689 0.977049957x > StringIndexOfChar.latin1_SSE4_char 5657.499 5474.624 0.967675646x > StringIndexOfChar.latin1_Short_String 7132.541 6863.359 0.962260014x > StringIndexOfChar.latin1_Short_char 16013.389 16162.437 1.009307711x > StringIndexOfChar.latin1_mixed_String 7386.123 14771.622 1.999915517x > StringIndexOfChar.latin1_mixed_char 9901.671 9782.245 0.987938803 Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: Fix test; review comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16753/files - new: https://git.openjdk.org/jdk/pull/16753/files/be001e2c..b154faee Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16753&range=40 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16753&range=39-40 Stats: 31 lines in 3 files changed: 4 ins; 13 del; 14 mod Patch: https://git.openjdk.org/jdk/pull/16753.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16753/head:pull/16753 PR: https://git.openjdk.org/jdk/pull/16753 From ccheung at openjdk.org Fri May 24 23:25:15 2024 From: ccheung at openjdk.org (Calvin Cheung) Date: Fri, 24 May 2024 23:25:15 GMT Subject: RFR: 8330198: Add some class loading related perf counters to measure VM startup [v4] In-Reply-To: References: Message-ID: <7AWghiG_TSVMjkfVfA_krBMWZNMRVlakI7kny1tuJ9s=.d4ca3b29-923a-48e6-80d7-97c72ea6e308@github.com> > Adding a few perf counters related to class loading to measure VM startup. The counters are only active if the user specifies `-Xlog:init` in the command line. A diagnostic flag `ProfileClassLinkage` is added to control the new counters. The flag is set to false by default and will be enabled if `-Xlog:init` is specified. > > This change is already in the leyden/premain branch. There are more counters in the branch to measure other stuff. For now, just upstreaming class loader related counters. > > Refer to the [comment](https://bugs.openjdk.org/browse/JDK-8330198?focusedId=14665311&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14665311) in the bug report for an example output. > > Passed tiers 1 - 4 testing. Calvin Cheung has updated the pull request incrementally with one additional commit since the last revision: @dholmes-ora comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18790/files - new: https://git.openjdk.org/jdk/pull/18790/files/51b86d42..209c4662 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18790&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18790&range=02-03 Stats: 7 lines in 3 files changed: 0 ins; 0 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/18790.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18790/head:pull/18790 PR: https://git.openjdk.org/jdk/pull/18790 From ccheung at openjdk.org Fri May 24 23:30:01 2024 From: ccheung at openjdk.org (Calvin Cheung) Date: Fri, 24 May 2024 23:30:01 GMT Subject: RFR: 8330198: Add some class loading related perf counters to measure VM startup [v3] In-Reply-To: References: <2TAhYdUQ5KXWODYMvzb15NqKhkXfFjV7RW9oHeVIg0U=.73940200-8be6-4427-9348-44d50fd22286@github.com> Message-ID: <_xcd8u4sLU0-WIqOo7gjpotaKrQlQpqsj6IaWnoyKgw=.e246af1e-7226-4881-93f3-ad3b159edf96@github.com> On Fri, 24 May 2024 06:59:32 GMT, David Holmes wrote: >> If `UsePerfData` is set to false, `ProfileClassLinkage` is set to false in arguments.cpp: >> >> >> 3761 if (ProfileClassLinkage && !UsePerfData) { >> 3762 if (!FLAG_IS_DEFAULT(ProfileClassLinkage)) { >> 3763 warning("Disabling ProfileClassLinkage since UsePerfData is turned off."); >> 3764 FLAG_SET_DEFAULT(ProfileClassLinkage, false); >> 3765 } >> 3766 } >> >> >> I will remove the extra parentheses. > > Yes but if `UsePerfData` is true it doesn't mean `ProfileClassLinkage` is true. The `_perf_classes_inited` counter is a pre-existing counter only depends on `UsePerfData`. The `ProfileClassLinkage` should not have any effect on the counter. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18790#discussion_r1614140958 From sviswanathan at openjdk.org Sat May 25 00:46:19 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Sat, 25 May 2024 00:46:19 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v41] In-Reply-To: References: Message-ID: On Fri, 24 May 2024 23:15:26 GMT, Scott Gibbons wrote: >> Re-write the IndexOf code without the use of the pcmpestri instruction, only using AVX2 instructions. This change accelerates String.IndexOf on average 1.3x for AVX2. The benchmark numbers: >> >> >> Benchmark Score Latest >> StringIndexOf.advancedWithMediumSub 343.573 317.934 0.925375393x >> StringIndexOf.advancedWithShortSub1 1039.081 1053.96 1.014319384x >> StringIndexOf.advancedWithShortSub2 55.828 110.541 1.980027943x >> StringIndexOf.constantPattern 9.361 11.906 1.271872663x >> StringIndexOf.searchCharLongSuccess 4.216 4.218 1.000474383x >> StringIndexOf.searchCharMediumSuccess 3.133 3.216 1.02649218x >> StringIndexOf.searchCharShortSuccess 3.76 3.761 1.000265957x >> StringIndexOf.success 9.186 9.713 1.057369911x >> StringIndexOf.successBig 14.341 46.343 3.231504079x >> StringIndexOfChar.latin1_AVX2_String 6220.918 12154.52 1.953814533x >> StringIndexOfChar.latin1_AVX2_char 5503.556 5540.044 1.006629895x >> StringIndexOfChar.latin1_SSE4_String 6978.854 6818.689 0.977049957x >> StringIndexOfChar.latin1_SSE4_char 5657.499 5474.624 0.967675646x >> StringIndexOfChar.latin1_Short_String 7132.541 6863.359 0.962260014x >> StringIndexOfChar.latin1_Short_char 16013.389 16162.437 1.009307711x >> StringIndexOfChar.latin1_mixed_String 7386.123 14771.622 1.999915517x >> StringIndexOfChar.latin1_mixed_char 9901.671 9782.245 0.987938803 > > Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: > > Fix test; review comments src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 316: > 314: // Set up jump tables. Used when needle size <= NUMBER_OF_CASES > 315: setup_jump_tables(ae, L_returnRBP, L_checkRangeAndReturn, L_bigCaseFixupAndReturn, > 316: &big_jump_table, &small_jump_table, _masm); We could directly use L_returnError here instead of L_returnRBP. src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 476: > 474: // Used to check and return value in rbp - usually error > 475: __ bind(L_returnRBP); > 476: __ movq(rax, rbp); This seems spurious as rax is being overwritten at line 489. Did you intend to return -1? src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 1816: > 1814: byte_compare_helper(i + 1, L_loopTop, L_fixup, needle, needle_val, hs_ptr, eq_mask, set_bit, > 1815: rTmp4, ae, _masm); > 1816: } L_checkRange on NoMatch could be L_error instead. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1614172379 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1614172021 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1614175081 From sviswanathan at openjdk.org Sat May 25 00:46:19 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Sat, 25 May 2024 00:46:19 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v35] In-Reply-To: <-vyOZzeMslZqgJpTsQnnOWi4abWiM8fNeWSVx5LEHm8=.d37011ee-102c-4874-aa26-d113949d25ea@github.com> References: <-vyOZzeMslZqgJpTsQnnOWi4abWiM8fNeWSVx5LEHm8=.d37011ee-102c-4874-aa26-d113949d25ea@github.com> Message-ID: On Thu, 23 May 2024 23:12:42 GMT, Scott Gibbons wrote: >> Re-write the IndexOf code without the use of the pcmpestri instruction, only using AVX2 instructions. This change accelerates String.IndexOf on average 1.3x for AVX2. The benchmark numbers: >> >> >> Benchmark Score Latest >> StringIndexOf.advancedWithMediumSub 343.573 317.934 0.925375393x >> StringIndexOf.advancedWithShortSub1 1039.081 1053.96 1.014319384x >> StringIndexOf.advancedWithShortSub2 55.828 110.541 1.980027943x >> StringIndexOf.constantPattern 9.361 11.906 1.271872663x >> StringIndexOf.searchCharLongSuccess 4.216 4.218 1.000474383x >> StringIndexOf.searchCharMediumSuccess 3.133 3.216 1.02649218x >> StringIndexOf.searchCharShortSuccess 3.76 3.761 1.000265957x >> StringIndexOf.success 9.186 9.713 1.057369911x >> StringIndexOf.successBig 14.341 46.343 3.231504079x >> StringIndexOfChar.latin1_AVX2_String 6220.918 12154.52 1.953814533x >> StringIndexOfChar.latin1_AVX2_char 5503.556 5540.044 1.006629895x >> StringIndexOfChar.latin1_SSE4_String 6978.854 6818.689 0.977049957x >> StringIndexOfChar.latin1_SSE4_char 5657.499 5474.624 0.967675646x >> StringIndexOfChar.latin1_Short_String 7132.541 6863.359 0.962260014x >> StringIndexOfChar.latin1_Short_char 16013.389 16162.437 1.009307711x >> StringIndexOfChar.latin1_mixed_String 7386.123 14771.622 1.999915517x >> StringIndexOfChar.latin1_mixed_char 9901.671 9782.245 0.987938803 > > Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: > > Review comments - move stubGen*_string.cpp to c2_stubGen*_string.cpp src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 1740: > 1738: // > 1739: // If a match is found, jump to L_checkRangeAndReturn, which ensures the > 1740: // matched needle is not past the end of the haystack. These labels are not in this function. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1614125339 From kbarrett at openjdk.org Sat May 25 02:13:18 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Sat, 25 May 2024 02:13:18 GMT Subject: RFR: 8330694: Rename 'HeapRegion' to 'G1HeapRegion' [v13] In-Reply-To: References: <3IdWn9VGEERd8v9RcH2E_LzjVo0L8nMfi5jGWmhgVuM=.6b5b3be4-bfbd-4376-9580-48d78d75665c@github.com> Message-ID: On Fri, 24 May 2024 13:04:14 GMT, Lei Zaakjyu wrote: >> follow up 8267941 > > Lei Zaakjyu has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 10 commits: > > - review > - Merge branch 'master' of https://git.openjdk.org/jdk into JDK-8330694 > - restore > - Merge branch 'master' of https://git.openjdk.org/jdk into JDK-8330694 > - review > - Merge branch 'master' into JDK-8330694 > - fix indentation > - also tidy up > - tidy up > - rename Still looks good. ------------- Marked as reviewed by kbarrett (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18871#pullrequestreview-2078316714 From duke at openjdk.org Sat May 25 02:13:19 2024 From: duke at openjdk.org (Lei Zaakjyu) Date: Sat, 25 May 2024 02:13:19 GMT Subject: Integrated: 8330694: Rename 'HeapRegion' to 'G1HeapRegion' In-Reply-To: <3IdWn9VGEERd8v9RcH2E_LzjVo0L8nMfi5jGWmhgVuM=.6b5b3be4-bfbd-4376-9580-48d78d75665c@github.com> References: <3IdWn9VGEERd8v9RcH2E_LzjVo0L8nMfi5jGWmhgVuM=.6b5b3be4-bfbd-4376-9580-48d78d75665c@github.com> Message-ID: On Sat, 20 Apr 2024 02:04:20 GMT, Lei Zaakjyu wrote: > follow up 8267941 This pull request has now been integrated. Changeset: 985b9ce7 Author: Lei Zaakjyu Committer: Kim Barrett URL: https://git.openjdk.org/jdk/commit/985b9ce79a2d620a8b8675d1ae6c9730d72a757f Stats: 1003 lines in 123 files changed: 1 ins; 4 del; 998 mod 8330694: Rename 'HeapRegion' to 'G1HeapRegion' Reviewed-by: cjplummer, kbarrett, tschatzl ------------- PR: https://git.openjdk.org/jdk/pull/18871 From stuefe at openjdk.org Sat May 25 06:05:16 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Sat, 25 May 2024 06:05:16 GMT Subject: RFR: 8312132: Add tracking of multiple address spaces in NMT [v105] In-Reply-To: References: Message-ID: On Fri, 24 May 2024 08:57:38 GMT, Johan Sj?len wrote: >> src/hotspot/share/nmt/vmatree.hpp line 146: >> >>> 144: struct SingleDiff { >>> 145: int64_t reserve; >>> 146: int64_t commit; >> >> The typical type would be `ssize_t`, not int64. >> >> Apart from clarity, I am not sure how int64 would work on 32-bit. > > That doesn't seem right to me. `ssize_t` has a guaranteed range of `[-1, INT_MAX)`, the -1 being there for errors. We need as full of a range of negative numbers as possible. > > Good question regarding 32-bit, will have to think about that one. > > Btw: Yes, I know, we can underflow or overflow the diff, but in practice no one will allocate `2**64` bytes, I am willing to take that risk. Hm. We use ssize_t in many places for working with memory deltas, It works on all our platforms. And I don't see a good alternative here. int64 is not a replacement. To me, int64 feels like using void* for pointers, obfuscating intent. Its a memory size, I'd like therefore to see that in code. Apart from the obvious 64/32 bit problem. One could probably use ptrdiff_t, but no pointers are involved, it seems awkward. One could use size_t+boolean tupels, and that may be the hypercorrect way. It would solve the problem of not being > Btw: Yes, I know, we can underflow or overflow the diff, but in practice no one will allocate 2**64 bytes, I am willing to take that risk. An overflow would occur at 2**63. On 32-bit, on 2**31. So, on 32-bit you cannot express data sizes >= 2GB correctly. Which seems to me like a real limit. Maybe a size_t+boolean tupel is the right way to go. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18289#discussion_r1614377754 From stuefe at openjdk.org Sat May 25 06:05:14 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Sat, 25 May 2024 06:05:14 GMT Subject: RFR: 8312132: Add tracking of multiple address spaces in NMT [v105] In-Reply-To: References: Message-ID: On Fri, 24 May 2024 12:34:12 GMT, Johan Sj?len wrote: >> This is clearly a point of confusion, as Gerard also asked about this. The answer is that the MFT only cares about memory in a file, which is always considered committed... So we consider reserved memory to be committed. Yeah, let's just change it so that MFT always commits memory instead. > > Switched it around a bit. For the purpose of consistency with the rest of NMT, and to be able to use normal VirtualSummary etc, maybe MFT should mark the whole range as reserved upfront? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18289#discussion_r1614378052 From jsjolen at openjdk.org Sat May 25 06:05:15 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Sat, 25 May 2024 06:05:15 GMT Subject: RFR: 8312132: Add tracking of multiple address spaces in NMT [v105] In-Reply-To: References: Message-ID: On Fri, 24 May 2024 07:06:54 GMT, Thomas Stuefe wrote: >> Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: >> >> Lower number of pages > > src/hotspot/share/nmt/nmtNativeCallStackStorage.hpp line 95: > >> 93: Link** _buckets; >> 94: GrowableArrayCHeap _stacks; >> 95: bool _is_detailed_mode; > > _is_detailed: I somehow don't think this rather low level class should care about and copy the MemTracker state. I like "one truth only", which is MemTracker::enabled. I'd rather see this handled at the call site. > > If we only need it to prevent allocation of the bucket table at construction time, I'd allocate that one with malloc. It's not my favourite part either, but it should receive the truth from the `MemTracker::enabled`. >If we only need it to prevent allocation of the bucket table at construction time, I'd allocate that one with malloc. It's also used for returning fake `StackIndex`:es for `VMATree` if we're in summary mode. I'll look more at this next week. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18289#discussion_r1614044361 From stuefe at openjdk.org Sat May 25 06:18:01 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Sat, 25 May 2024 06:18:01 GMT Subject: RFR: 8331539: [REDO] NMT: add/make a mandatory MEMFLAGS argument to family of os::reserve/commit/uncommit memory API [v2] In-Reply-To: References: <1i0PKv9mCusM6BZqXG8ULe0lRA2Nz2ix4aZHz9otNMM=.b9d2d151-883e-4cb6-be48-4ba45b49ed43@github.com> <_M5SvhyN_E_8HUeamhiLJMp37flhjgTVE_X7t8jmPVc=.f86cbb23-9461-4013-83bf-d6b154b96cfd@github.com> Message-ID: On Fri, 24 May 2024 13:55:58 GMT, Afshin Zafari wrote: > > 1. NMT assumes reserves and commits to be different layers and, e.g., for committed regions to be fully contained in a reserved region. This is wrong and does not reflect the realities of mmap. We can overlay and overlap any reservation/committing/uncommitting/releasing in any way we want. > > On Windows, a commit without reserve is not allowed. ([reference](https://learn.microsoft.com/en-us/windows/win32/api/memoryapi/nf-memoryapi-virtualalloc#:~:text=MEM_COMMIT%20%7C%20MEM_RESERVE.-,Attempting%20to%20commit%20a%20specific%20address%20range%20by%20specifying%20MEM_COMMIT%20without%20MEM_RESERVE%20and%20a%20non%2DNULL%20lpAddress%20fails%20unless%20the%20entire%20range%20has%20already%20been%20reserved.%20The%20resulting%20error%20code%20is%20ERROR_INVALID_ADDRESS.,-An%20attempt%20to)) Sorry, but why does NMT have to enforce that? NMT should allow what the most flexible of OSes allows. On mmap on Posix is a lot more flexible than VirtualAlloc on Windows. We want to be able to use the APIs that the OS gives us to the fullest. If the OS allows us to reserve a memory range with a single system call, then to treat that range as disjunct entities that I can commit/release independently from another, the JVM should be able to do just that without having to worry about NMT. NMT should cope with that. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19343#issuecomment-2130872202 From alanb at openjdk.org Sat May 25 06:36:12 2024 From: alanb at openjdk.org (Alan Bateman) Date: Sat, 25 May 2024 06:36:12 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v41] In-Reply-To: References: Message-ID: On Fri, 24 May 2024 23:15:26 GMT, Scott Gibbons wrote: >> Re-write the IndexOf code without the use of the pcmpestri instruction, only using AVX2 instructions. This change accelerates String.IndexOf on average 1.3x for AVX2. The benchmark numbers: >> >> >> Benchmark Score Latest >> StringIndexOf.advancedWithMediumSub 343.573 317.934 0.925375393x >> StringIndexOf.advancedWithShortSub1 1039.081 1053.96 1.014319384x >> StringIndexOf.advancedWithShortSub2 55.828 110.541 1.980027943x >> StringIndexOf.constantPattern 9.361 11.906 1.271872663x >> StringIndexOf.searchCharLongSuccess 4.216 4.218 1.000474383x >> StringIndexOf.searchCharMediumSuccess 3.133 3.216 1.02649218x >> StringIndexOf.searchCharShortSuccess 3.76 3.761 1.000265957x >> StringIndexOf.success 9.186 9.713 1.057369911x >> StringIndexOf.successBig 14.341 46.343 3.231504079x >> StringIndexOfChar.latin1_AVX2_String 6220.918 12154.52 1.953814533x >> StringIndexOfChar.latin1_AVX2_char 5503.556 5540.044 1.006629895x >> StringIndexOfChar.latin1_SSE4_String 6978.854 6818.689 0.977049957x >> StringIndexOfChar.latin1_SSE4_char 5657.499 5474.624 0.967675646x >> StringIndexOfChar.latin1_Short_String 7132.541 6863.359 0.962260014x >> StringIndexOfChar.latin1_Short_char 16013.389 16162.437 1.009307711x >> StringIndexOfChar.latin1_mixed_String 7386.123 14771.622 1.999915517x >> StringIndexOfChar.latin1_mixed_char 9901.671 9782.245 0.987938803 > > Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: > > Fix test; review comments test/jdk/java/lang/StringBuffer/IndexOf.java line 47: > 45: public class IndexOf { > 46: > 47: static Random generator = new Random(); @RogerRiggs Would you have cycles to look at Scott's changes to this test? I suspect it will need to be re-structured, re-formatted, and commented to get into maintainable shape. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1614383260 From iklam at openjdk.org Sat May 25 06:44:02 2024 From: iklam at openjdk.org (Ioi Lam) Date: Sat, 25 May 2024 06:44:02 GMT Subject: RFR: 8330198: Add some class loading related perf counters to measure VM startup [v3] In-Reply-To: <7yfsvM0ff6gBYLefpro2qTcEMBmCOHd3YICcygItlZs=.d900a439-4932-46e6-b287-d1bf2789f195@github.com> References: <7yfsvM0ff6gBYLefpro2qTcEMBmCOHd3YICcygItlZs=.d900a439-4932-46e6-b287-d1bf2789f195@github.com> Message-ID: On Fri, 24 May 2024 05:20:15 GMT, Calvin Cheung wrote: > Okay my first reaction here is "I object!". I get that Leyden wants to be able to easily compare startup costs between itself and mainline, but what is this costing mainline? Even if these counters are not active there is an impact on the code execution and I want to know that impact is negligible. These counters are useful in the mainline as well. We want to be able to use `java -Xlog:init` to diagnose start-up time performance for the mainline. The main cost of the performance counters is reading of the clock. All the new counters added in the PR are guarded by a global flag, so the cost is negligible when the logging is not enabled. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18790#issuecomment-2130928570 From iklam at openjdk.org Sat May 25 06:48:26 2024 From: iklam at openjdk.org (Ioi Lam) Date: Sat, 25 May 2024 06:48:26 GMT Subject: RFR: 8293980: Resolve CONSTANT_FieldRef at CDS dump time [v3] In-Reply-To: <6tYcoQdH8mEhbBRUoAcOi3Gue7Lz9qIjJh3GpcLKGGE=.0b286773-cc03-434e-88ca-2c9cc5efce67@github.com> References: <6tYcoQdH8mEhbBRUoAcOi3Gue7Lz9qIjJh3GpcLKGGE=.0b286773-cc03-434e-88ca-2c9cc5efce67@github.com> Message-ID: > ### Overview > > This PR archives `CONSTANT_FieldRef` entries in the _resolved_ state when it's safe to do so. > > I.e., when a `CONSTANT_FieldRef` constant pool entry in class `A` refers to a *non-static* field `B.F`, > - `B` is the same class as `A`; or > - `B` is a supertype of `A`; or > - `B` is one of the [vmClasses](https://github.com/openjdk/jdk/blob/3d4185a9ce482cc655a4c67f39cb2682b02ae4fe/src/hotspot/share/classfile/vmClasses.hpp), and `A` is loaded by the boot class loader. > > Under these conditions, it's guaranteed that whenever `A` tries to use this entry at runtime, `B` is guaranteed to have already been resolved in A's system dictionary, to the same value as resolved during dump time. > > Therefore, we can safely archive the `ResolvedFieldEntry` in class `A` that refers to `B.F`. > > (Note that we do not archive the `CONSTANT_FieldRef` entries for static fields, as the resolution of such entries can lead to class initialization at runtime. We plan to handle them in a future RFE.) > > ### Static CDS Archive > > This feature is implemented in three steps for static CDS archive dump: > > 1. At the end of the training run, `ClassListWriter` iterates over all loaded classes and writes the indices of their resolved `Class` and `FieldRef` constant pool entries into the classlist file, with the `@cp` prefix. E.g., the following means that the constant pool entries at indices 2, 19 and 106 were resolved during the training run: > > @cp java/util/Objects 2 19 106 > > 2. When creating the static CDS archive from the classlist file, `ClassListParser` processes the `@cp` entries and resolves all the indicated entries. > > 3. Inside the `ArchiveBuilder::make_klasses_shareable()` function, we iterate over all entries in all archived `ConstantPools`. When we see a _resolved_ entry that does not satisfy the safety requirements as stated in _Overview_, we revert it back to the unresolved state. > > ### Dynamic CDS Archive > > When dumping the dynamic CDS archive, `ClassListWriter` and `ClassListParser` are not used, so steps 1 and 2 are skipped. We only perform step 3 when the archive is being written. > > ### Limitations > > - For safety, we limit this optimization to only classes loaded by the boot, platform, and app class loaders. This may be relaxed in the future. > - We archive only the constant pool entries that are actually resolved during the training run. We don't speculatively resolve other entries, as doing so may cause C2 to unnecessarily generate code for paths that are never taken by the app... Ioi Lam has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: - Fixed typo in previous commit - Merge branch 'master' into 8293980-resolve-fields-at-dumptime - @matias9927 comments - moved remove_resolved_field_entries_if_non_deterministic() to cpCache - Merge branch 'master' into 8293980-resolve-fields-at-dumptime - 8293980: Resolve CONSTANT_FieldRef at CDS dump time ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19355/files - new: https://git.openjdk.org/jdk/pull/19355/files/3900c568..89184c33 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19355&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19355&range=01-02 Stats: 13691 lines in 428 files changed: 7998 ins; 3129 del; 2564 mod Patch: https://git.openjdk.org/jdk/pull/19355.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19355/head:pull/19355 PR: https://git.openjdk.org/jdk/pull/19355 From iklam at openjdk.org Sat May 25 06:48:27 2024 From: iklam at openjdk.org (Ioi Lam) Date: Sat, 25 May 2024 06:48:27 GMT Subject: RFR: 8293980: Resolve CONSTANT_FieldRef at CDS dump time [v2] In-Reply-To: <6vxtp58v6Nz74xdb5BbmEjDqvk5IDeRlUjJ6sDNFSC0=.2d8868a2-30f6-4e7e-a0cc-8a4b47998508@github.com> References: <6tYcoQdH8mEhbBRUoAcOi3Gue7Lz9qIjJh3GpcLKGGE=.0b286773-cc03-434e-88ca-2c9cc5efce67@github.com> <7Kk3VF3qMR0IdptWLG1GGiWLbDm1BfCP2zBh7s6n3WE=.f245c5a2-cc27-4331-a401-1eaea41262ed@github.com> <6vxtp58v6Nz74xdb5BbmEjDqvk5IDeRlUjJ6sDNFSC0=.2d8868a2-30f6-4e7e-a0cc-8a4b47998508@github.com> Message-ID: On Thu, 23 May 2024 20:50:47 GMT, Matias Saavedra Silva wrote: >> Ioi Lam has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: >> >> - Merge branch 'master' into 8293980-resolve-fields-at-dumptime >> - 8293980: Resolve CONSTANT_FieldRef at CDS dump time > > src/hotspot/share/oops/constantPool.cpp line 464: > >> 462: if (cache() != nullptr) { >> 463: // cache() is null if this class is not yet linked. >> 464: remove_resolved_field_entries_if_non_deterministic(); > > These methods look like they can belong to the constant pool cache instead. Can cpCache call the ClassLinker code instead so this can be part of `cache()->remove_unshareable_info()`? I moved remove_resolved_field_entries_if_non_deterministic() to cpCache as you suggested. I removed the functions for indy and method, as dumptime resolution for those types of entries is not yet implemented. > src/hotspot/share/oops/constantPool.cpp line 520: > >> 518: int cp_index = rfi->constant_pool_index(); >> 519: bool archived = false; >> 520: bool resolved = rfi->is_resolved(Bytecodes::_putfield) || > > Is one of these meant to be `is_resolved(Bytecodes::get_field)` ? Fixed. > src/hotspot/share/oops/resolvedFieldEntry.hpp line 65: > >> 63: _tos_state = other._tos_state; >> 64: _flags = other._flags; >> 65: _get_code = other._get_code; > > The fields `_get_code` and `_put_code` are normally set atomically, does this need to be the case when copying as well? This is done inside while the ResolvedFieldEntries are being prepared during class rewriting. All access is single threaded so there's no need for atomic operations. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19355#discussion_r1614389410 PR Review Comment: https://git.openjdk.org/jdk/pull/19355#discussion_r1614389465 PR Review Comment: https://git.openjdk.org/jdk/pull/19355#discussion_r1614390838 From jsjolen at openjdk.org Sat May 25 08:40:15 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Sat, 25 May 2024 08:40:15 GMT Subject: RFR: 8312132: Add tracking of multiple address spaces in NMT [v105] In-Reply-To: References: Message-ID: On Sat, 25 May 2024 06:01:58 GMT, Thomas Stuefe wrote: >> Switched it around a bit. > > For the purpose of consistency with the rest of NMT, and to be able to use normal VirtualSummary etc, maybe MFT should mark the whole range as reserved upfront? I don't think that's the right choice here as the typical use-case is probably that you: 1. Reserve memory in the virtual address space 2. Make a memory file 3. Map in memory from that file to the memory you previously reserved So if the memory file is also reserved, then that would cause "double accounting" of sorts. I'm also not sure if you have equivalent semantics of reserved memory for temporary files (doesn't actually take up space, is non-writeable). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18289#discussion_r1614467229 From amitkumar at openjdk.org Sat May 25 08:52:08 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Sat, 25 May 2024 08:52:08 GMT Subject: RFR: 8319947: Recursive lightweight locking: s390x implementation [v4] In-Reply-To: References: <_HXRejW4TcLfTYbXlRQUVyejaomiwx6HhObYfG1lX6E=.f89b29a0-4c2d-43e1-83be-d7bafd84816c@github.com> Message-ID: On Thu, 23 May 2024 12:49:16 GMT, Amit Kumar wrote: >> s390x port for recursive locking. >> >> testing: >> - [x] build fastdebug-vm >> - [x] build slowdebug-vm >> - [x] build release-vm >> - [x] build optimized-vm >> - [x] ./test/jdk/java/util/concurrent (fastdebug-vm) >> - [x] with C1 >> - [x] with C2 >> - [x] with interpreter >> - [x] ./test/jdk/java/util/concurrent (release-vm) >> - [x] with C1 >> - [x] with C2 >> - [x] with interpreter >> - [x] ./test/jdk/java/util/concurrent (slowdebug-vm) >> - [x] with C1 >> - [x] with C2 >> - [x] with interpreter >> - [x] tier1 with fastdebug-vm >> - [x] tier1 with slowdebug-vm >> - [x] tier1 with release-vm >> >> *BenchMarks*: >> >> Results from Performance LPARs : >> >> >> Locking Mode = 1 (without Patch) >> >> Benchmark (innerCount) Mode Cnt Score Error Units >> LockUnlock.testContendedLock 100 avgt 12 5.144 ? 0.035 ns/op >> LockUnlock.testRecursiveLockUnlock 100 avgt 12 3824.742 ? 89.475 ns/op >> LockUnlock.testRecursiveSynchronization 100 avgt 12 25.348 ? 0.559 ns/op >> LockUnlock.testSerialLockUnlock 100 avgt 12 466.629 ? 3.036 ns/op >> LockUnlock.testSimpleLockUnlock 100 avgt 12 468.532 ? 1.793 ns/op >> Finished running test 'micro:vm.lang.LockUnlock' >> >> Locking Mode = 1 (with patch) >> >> Benchmark (innerCount) Mode Cnt Score Error Units >> LockUnlock.testContendedLock 100 avgt 12 5.146 ? 0.027 ns/op >> LockUnlock.testRecursiveLockUnlock 100 avgt 12 3833.175 ? 75.863 ns/op >> LockUnlock.testRecursiveSynchronization 100 avgt 12 25.206 ? 0.519 ns/op >> LockUnlock.testSerialLockUnlock 100 avgt 12 473.973 ? 2.103 ns/op >> LockUnlock.testSimpleLockUnlock 100 avgt 12 470.749 ? 2.229 ns/op >> Finished running test 'micro:vm.lang.LockUnlock' >> >> >> >> >> Locking Mode = 2 (without Patch) >> >> Benchmark (innerCount) Mode Cnt Score Error Units >> LockUnlock.testContendedLock 100 avgt 12 4.688 ? 0.051 ns/op >> LockUnlock.testRecursiveLockUnlock 100 avgt 12 12800.544 ? 92.265 ns/op >> LockUnlock.testRecursiveSynchronization 100 avgt 12 26.486 ? 2.229 ns/op >> LockUnlock.testSerialLockUnlock 100 avgt 12 424.499 ? 0.416 ns/op >> LockUnlock.te... > > Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: > > minor code formatting & variable renamings Thanks Axel for review. let's wait for @TheRealMDoerr & @RealLucy comments ;-) ------------- PR Comment: https://git.openjdk.org/jdk/pull/18878#issuecomment-2131136503 From jsjolen at openjdk.org Sat May 25 09:03:14 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Sat, 25 May 2024 09:03:14 GMT Subject: RFR: 8312132: Add tracking of multiple address spaces in NMT [v105] In-Reply-To: References: Message-ID: On Fri, 24 May 2024 06:54:24 GMT, Thomas Stuefe wrote: >> Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: >> >> Lower number of pages > > src/hotspot/share/nmt/nmtNativeCallStackStorage.hpp line 80: > >> 78: } >> 79: link = link->next; >> 80: } > > Good. We do an youngest-first search if I am seeing right. Was that deliberate? The chance of the most recent callstacks reoccurring is a lot higher than seeing older stacks. It wasn't deliberate at all, I have no idea of temporal locality of specific stack traces. I guess that makes sense, Metaspace tends to do a lot of allocations clustered closely in time for example. Anyway, you are right, that is what we do. > src/hotspot/share/nmt/nmtNativeCallStackStorage.hpp line 92: > >> 90: // 4099 gives a 50% probability of collisions at 76 stacks (as per birthday problem). >> 91: static const constexpr int default_nr_buckets = 4099; >> 92: int _nr_buckets; > > isn't this normally called table_size or somesuch? _nr_buckets sounds like number of items, which this is not. I think table_size is more established in Hotspot, I'll switch. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18289#discussion_r1614483890 PR Review Comment: https://git.openjdk.org/jdk/pull/18289#discussion_r1614486106 From kbarrett at openjdk.org Sat May 25 15:15:02 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Sat, 25 May 2024 15:15:02 GMT Subject: RFR: 8332894: ubsan: vmError.cpp:2090:26: runtime error: division by zero In-Reply-To: <5Dnql_PtTbZUQxDqrnZBxmkE0ztmxwtom04vQWG--Z0=.fbf93839-c55f-4f69-80d0-9b4bc6a44a12@github.com> References: <5Dnql_PtTbZUQxDqrnZBxmkE0ztmxwtom04vQWG--Z0=.fbf93839-c55f-4f69-80d0-9b4bc6a44a12@github.com> Message-ID: On Fri, 24 May 2024 13:30:41 GMT, Matthias Baesken wrote: > When running with ubsan enabled on Linux x86_64, I get in the HS :tier1 tests this error : > > runtime/ErrorHandling/TestDwarf_dontCheckDecoder.jtr > > /jdk/src/hotspot/share/utilities/vmError.cpp:2090:26: runtime error: division by zero > #0 0x7f16bc531f32 in crash_with_sigfpe /jdk/src/hotspot/share/utilities/vmError.cpp:2090 > #1 0x7f16bc531f32 in VMError::controlled_crash(int) /jdk/src/hotspot/share/utilities/vmError.cpp:2137 > #2 0x7f16bea2d8fd in JNI_CreateJavaVM_inner /jdk/src/hotspot/share/prims/jni.cpp:3621 > #3 0x7f16bea2d8fd in JNI_CreateJavaVM /jdk/src/hotspot/share/prims/jni.cpp:3672 > #4 0x7f16c5dbd0e5 in InitializeJVM /jdk/src/java.base/share/native/libjli/java.c:1550 > #5 0x7f16c5dbd0e5 in JavaMain /jdk/src/java.base/share/native/libjli/java.c:491 > #6 0x7f16c5dc6748 in ThreadJavaMain /jdk/src/java.base/unix/native/libjli/java_md.c:642 > #7 0x7f16c5d756e9 in start_thread (/lib64/libpthread.so.0+0xa6e9) (BuildId: 2f8d3c2d0f4d7888c2598d2ff6356537f5708a73) > #8 0x7f16c531550e in clone (/lib64/libc.so.6+0x11850e) (BuildId: f732026552f6adff988b338e92d466bc81a01c37) > > Reason is that we do a float division by zero to get a signal . This is desired by us so not really an error but ubsan cannot know this. > So add an attribute to this function that it has undefined behavior. > See https://clang.llvm.org/docs/UndefinedBehaviorSanitizer.html (division by zero) . "Floating point division by zero. This is undefined per the C and C++ standards" src/hotspot/share/utilities/vmError.cpp line 2093: > 2091: static void ALWAYSINLINE crash_with_sigfpe() { > 2092: > 2093: // generate a native synchronous SIGFPE where possible; Maybe simpler would be to change the definition to only use the divide-by-zero approach for _WIN32 and always use the currently conditional fallback to pthread_kill on non-_WIN32. Especially in light of the fact that the divide-by-zero approach doesn't work on some platforms. I also wonder if the comment about OSX incorrectly implementing raise is correct? Maybe that's been fixed? Or maybe it's not a bug, but a BSD-ism? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19394#discussion_r1614725794 From sgibbons at openjdk.org Sat May 25 21:57:13 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Sat, 25 May 2024 21:57:13 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v41] In-Reply-To: References: Message-ID: On Sat, 25 May 2024 00:15:03 GMT, Sandhya Viswanathan wrote: >> Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix test; review comments > > src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 316: > >> 314: // Set up jump tables. Used when needle size <= NUMBER_OF_CASES >> 315: setup_jump_tables(ae, L_returnRBP, L_checkRangeAndReturn, L_bigCaseFixupAndReturn, >> 316: &big_jump_table, &small_jump_table, _masm); > > We could directly use L_returnError here instead of L_returnRBP. OK > src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 476: > >> 474: // Used to check and return value in rbp - usually error >> 475: __ bind(L_returnRBP); >> 476: __ movq(rax, rbp); > > This seems spurious as rax is being overwritten at line 489. Did you intend to return -1? Removed all references to L_returnRBP. Replaced with L_returnError. > src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 1816: > >> 1814: byte_compare_helper(i + 1, L_loopTop, L_fixup, needle, needle_val, hs_ptr, eq_mask, set_bit, >> 1815: rTmp4, ae, _masm); >> 1816: } > > L_checkRange on NoMatch could be L_error instead. Fixed ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1614900796 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1614903860 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1614901577 From sgibbons at openjdk.org Sat May 25 21:57:14 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Sat, 25 May 2024 21:57:14 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v35] In-Reply-To: References: <-vyOZzeMslZqgJpTsQnnOWi4abWiM8fNeWSVx5LEHm8=.d37011ee-102c-4874-aa26-d113949d25ea@github.com> Message-ID: On Fri, 24 May 2024 23:04:55 GMT, Sandhya Viswanathan wrote: >> Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: >> >> Review comments - move stubGen*_string.cpp to c2_stubGen*_string.cpp > > src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 1740: > >> 1738: // >> 1739: // If a match is found, jump to L_checkRangeAndReturn, which ensures the >> 1740: // matched needle is not past the end of the haystack. > > These labels are not in this function. Fixed ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1614901350 From sgibbons at openjdk.org Sat May 25 22:16:41 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Sat, 25 May 2024 22:16:41 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v42] In-Reply-To: References: Message-ID: > Re-write the IndexOf code without the use of the pcmpestri instruction, only using AVX2 instructions. This change accelerates String.IndexOf on average 1.3x for AVX2. The benchmark numbers: > > > Benchmark Score Latest > StringIndexOf.advancedWithMediumSub 343.573 317.934 0.925375393x > StringIndexOf.advancedWithShortSub1 1039.081 1053.96 1.014319384x > StringIndexOf.advancedWithShortSub2 55.828 110.541 1.980027943x > StringIndexOf.constantPattern 9.361 11.906 1.271872663x > StringIndexOf.searchCharLongSuccess 4.216 4.218 1.000474383x > StringIndexOf.searchCharMediumSuccess 3.133 3.216 1.02649218x > StringIndexOf.searchCharShortSuccess 3.76 3.761 1.000265957x > StringIndexOf.success 9.186 9.713 1.057369911x > StringIndexOf.successBig 14.341 46.343 3.231504079x > StringIndexOfChar.latin1_AVX2_String 6220.918 12154.52 1.953814533x > StringIndexOfChar.latin1_AVX2_char 5503.556 5540.044 1.006629895x > StringIndexOfChar.latin1_SSE4_String 6978.854 6818.689 0.977049957x > StringIndexOfChar.latin1_SSE4_char 5657.499 5474.624 0.967675646x > StringIndexOfChar.latin1_Short_String 7132.541 6863.359 0.962260014x > StringIndexOfChar.latin1_Short_char 16013.389 16162.437 1.009307711x > StringIndexOfChar.latin1_mixed_String 7386.123 14771.622 1.999915517x > StringIndexOfChar.latin1_mixed_char 9901.671 9782.245 0.987938803 Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: Review comments; fix reading past end of haystack when (n-k) < 32 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16753/files - new: https://git.openjdk.org/jdk/pull/16753/files/b154faee..e13c7ea4 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16753&range=41 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16753&range=40-41 Stats: 78 lines in 1 file changed: 29 ins; 9 del; 40 mod Patch: https://git.openjdk.org/jdk/pull/16753.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16753/head:pull/16753 PR: https://git.openjdk.org/jdk/pull/16753 From sgibbons at openjdk.org Sat May 25 22:19:41 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Sat, 25 May 2024 22:19:41 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v43] In-Reply-To: References: Message-ID: > Re-write the IndexOf code without the use of the pcmpestri instruction, only using AVX2 instructions. This change accelerates String.IndexOf on average 1.3x for AVX2. The benchmark numbers: > > > Benchmark Score Latest > StringIndexOf.advancedWithMediumSub 343.573 317.934 0.925375393x > StringIndexOf.advancedWithShortSub1 1039.081 1053.96 1.014319384x > StringIndexOf.advancedWithShortSub2 55.828 110.541 1.980027943x > StringIndexOf.constantPattern 9.361 11.906 1.271872663x > StringIndexOf.searchCharLongSuccess 4.216 4.218 1.000474383x > StringIndexOf.searchCharMediumSuccess 3.133 3.216 1.02649218x > StringIndexOf.searchCharShortSuccess 3.76 3.761 1.000265957x > StringIndexOf.success 9.186 9.713 1.057369911x > StringIndexOf.successBig 14.341 46.343 3.231504079x > StringIndexOfChar.latin1_AVX2_String 6220.918 12154.52 1.953814533x > StringIndexOfChar.latin1_AVX2_char 5503.556 5540.044 1.006629895x > StringIndexOfChar.latin1_SSE4_String 6978.854 6818.689 0.977049957x > StringIndexOfChar.latin1_SSE4_char 5657.499 5474.624 0.967675646x > StringIndexOfChar.latin1_Short_String 7132.541 6863.359 0.962260014x > StringIndexOfChar.latin1_Short_char 16013.389 16162.437 1.009307711x > StringIndexOfChar.latin1_mixed_String 7386.123 14771.622 1.999915517x > StringIndexOfChar.latin1_mixed_char 9901.671 9782.245 0.987938803 Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: Fix tests ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16753/files - new: https://git.openjdk.org/jdk/pull/16753/files/e13c7ea4..15994a39 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16753&range=42 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16753&range=41-42 Stats: 2 lines in 2 files changed: 0 ins; 2 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/16753.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16753/head:pull/16753 PR: https://git.openjdk.org/jdk/pull/16753 From jwaters at openjdk.org Sun May 26 05:12:08 2024 From: jwaters at openjdk.org (Julian Waters) Date: Sun, 26 May 2024 05:12:08 GMT Subject: RFR: 8332894: ubsan: vmError.cpp:2090:26: runtime error: division by zero In-Reply-To: <5Dnql_PtTbZUQxDqrnZBxmkE0ztmxwtom04vQWG--Z0=.fbf93839-c55f-4f69-80d0-9b4bc6a44a12@github.com> References: <5Dnql_PtTbZUQxDqrnZBxmkE0ztmxwtom04vQWG--Z0=.fbf93839-c55f-4f69-80d0-9b4bc6a44a12@github.com> Message-ID: On Fri, 24 May 2024 13:30:41 GMT, Matthias Baesken wrote: > When running with ubsan enabled on Linux x86_64, I get in the HS :tier1 tests this error : > > runtime/ErrorHandling/TestDwarf_dontCheckDecoder.jtr > > /jdk/src/hotspot/share/utilities/vmError.cpp:2090:26: runtime error: division by zero > #0 0x7f16bc531f32 in crash_with_sigfpe /jdk/src/hotspot/share/utilities/vmError.cpp:2090 > #1 0x7f16bc531f32 in VMError::controlled_crash(int) /jdk/src/hotspot/share/utilities/vmError.cpp:2137 > #2 0x7f16bea2d8fd in JNI_CreateJavaVM_inner /jdk/src/hotspot/share/prims/jni.cpp:3621 > #3 0x7f16bea2d8fd in JNI_CreateJavaVM /jdk/src/hotspot/share/prims/jni.cpp:3672 > #4 0x7f16c5dbd0e5 in InitializeJVM /jdk/src/java.base/share/native/libjli/java.c:1550 > #5 0x7f16c5dbd0e5 in JavaMain /jdk/src/java.base/share/native/libjli/java.c:491 > #6 0x7f16c5dc6748 in ThreadJavaMain /jdk/src/java.base/unix/native/libjli/java_md.c:642 > #7 0x7f16c5d756e9 in start_thread (/lib64/libpthread.so.0+0xa6e9) (BuildId: 2f8d3c2d0f4d7888c2598d2ff6356537f5708a73) > #8 0x7f16c531550e in clone (/lib64/libc.so.6+0x11850e) (BuildId: f732026552f6adff988b338e92d466bc81a01c37) > > Reason is that we do a float division by zero to get a signal . This is desired by us so not really an error but ubsan cannot know this. > So add an attribute to this function that it has undefined behavior. > See https://clang.llvm.org/docs/UndefinedBehaviorSanitizer.html (division by zero) . "Floating point division by zero. This is undefined per the C and C++ standards" Would've used the C++14 attribute syntax for this, but oh well ------------- PR Review: https://git.openjdk.org/jdk/pull/19394#pullrequestreview-2079405652 From stuefe at openjdk.org Sun May 26 06:11:06 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Sun, 26 May 2024 06:11:06 GMT Subject: RFR: 8332894: ubsan: vmError.cpp:2090:26: runtime error: division by zero In-Reply-To: <5Dnql_PtTbZUQxDqrnZBxmkE0ztmxwtom04vQWG--Z0=.fbf93839-c55f-4f69-80d0-9b4bc6a44a12@github.com> References: <5Dnql_PtTbZUQxDqrnZBxmkE0ztmxwtom04vQWG--Z0=.fbf93839-c55f-4f69-80d0-9b4bc6a44a12@github.com> Message-ID: <78byblRrnZErliour4J6QZABWFIwihnpK6PImRmsVJI=.ded439ab-6548-40e6-a1b1-6b9162c71543@github.com> On Fri, 24 May 2024 13:30:41 GMT, Matthias Baesken wrote: > When running with ubsan enabled on Linux x86_64, I get in the HS :tier1 tests this error : > > runtime/ErrorHandling/TestDwarf_dontCheckDecoder.jtr > > /jdk/src/hotspot/share/utilities/vmError.cpp:2090:26: runtime error: division by zero > #0 0x7f16bc531f32 in crash_with_sigfpe /jdk/src/hotspot/share/utilities/vmError.cpp:2090 > #1 0x7f16bc531f32 in VMError::controlled_crash(int) /jdk/src/hotspot/share/utilities/vmError.cpp:2137 > #2 0x7f16bea2d8fd in JNI_CreateJavaVM_inner /jdk/src/hotspot/share/prims/jni.cpp:3621 > #3 0x7f16bea2d8fd in JNI_CreateJavaVM /jdk/src/hotspot/share/prims/jni.cpp:3672 > #4 0x7f16c5dbd0e5 in InitializeJVM /jdk/src/java.base/share/native/libjli/java.c:1550 > #5 0x7f16c5dbd0e5 in JavaMain /jdk/src/java.base/share/native/libjli/java.c:491 > #6 0x7f16c5dc6748 in ThreadJavaMain /jdk/src/java.base/unix/native/libjli/java_md.c:642 > #7 0x7f16c5d756e9 in start_thread (/lib64/libpthread.so.0+0xa6e9) (BuildId: 2f8d3c2d0f4d7888c2598d2ff6356537f5708a73) > #8 0x7f16c531550e in clone (/lib64/libc.so.6+0x11850e) (BuildId: f732026552f6adff988b338e92d466bc81a01c37) > > Reason is that we do a float division by zero to get a signal . This is desired by us so not really an error but ubsan cannot know this. > So add an attribute to this function that it has undefined behavior. > See https://clang.llvm.org/docs/UndefinedBehaviorSanitizer.html (division by zero) . "Floating point division by zero. This is undefined per the C and C++ standards" @kimbarrett @MBaesken When this was written, the point was to raise a "real" SIGFPE. That matters because the behavior is subtly different from a real signal compared to one faked with raise (asynchronous vs synchronous). Among other things, this SIGFPE is used for regression testing https://bugs.openjdk.org/browse/JDK-8065895. JDK-8065895 described a situation where we accidentally blocked all but the currently processed signal in the signal handler. That meant if we process a synchronous signal (e.g. SIGSEGV) and another, different, synchronous signal happens (e.g. SIGILL), the VM won't handle it in the secondary handler. Instead, depending on the OS, the process either dies immediately without core or it hangs in the kernel. To regression-test the fix, we need to be able to trigger two different synchronous signals. I believe I used SIGILL and SIGSEGV in my original patch in the closed-source SAP JVM. Both are easy to trigger. But then I got resistance against triggering SIGILL, though, and therefore OpenJDK triggers SIGFPE instead of SIGILL. With the unfortunate effect that the test won't work as expected on all platforms. Apart from JDK-8065895, it was also used to check hs-err printing in general. But I guess for that we could use a raised signal. Replacing the triggering with raise will make the regression test for JDK-8065895 toothless. We may just as well remove it then. I remember it being a pain to investigate (no core, no hs-err file), so we should come up with a replacement. We could replace it with an explicit check that tests that the signal handler masks inside error reporting are set up correctly. That is not the same as the real thing, but I guess it would be the next best thing. If we keep it, we need a comment in controlled_crash, because this discussion re-occurs at regular intervals. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19394#issuecomment-2132079633 From djelinski at openjdk.org Sun May 26 06:21:15 2024 From: djelinski at openjdk.org (Daniel =?UTF-8?B?SmVsacWEc2tp?=) Date: Sun, 26 May 2024 06:21:15 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v33] In-Reply-To: References: Message-ID: On Fri, 24 May 2024 18:37:13 GMT, Vladimir Kozlov wrote: >> Changed to `lea` with `InternalAddress()`. Generates the exact same code, but makes more sense. I looked at `movdqu` and see no code that generates RIP-relative loads. It merely checks `reachable()` and adds an intermediate `lea` if not reachable. @djelinski can you clarify please? > > I think HotSpot prefer to have full addresses in `lea` for possible patching. Right. Our assembler implements rip-relative addressing for some instructions, but apparently lea isn't one of them. I'll experiment with it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1615033502 From duke at openjdk.org Mon May 27 00:12:04 2024 From: duke at openjdk.org (ExE Boss) Date: Mon, 27 May 2024 00:12:04 GMT Subject: RFR: 8242888: Convert dynamic proxy to hidden classes In-Reply-To: References: Message-ID: On Thu, 23 May 2024 03:28:30 GMT, Chen Liang wrote: > Please review this change that convert dynamic proxies implementations to hidden classes, intended to target JDK 24. > > Summary: > 1. Adds new implementation while preserving the old implementation behind `-Djdk.reflect.useLegacyProxyImpl=true` in case there are compatibility issues. > 2. ClassLoader.defineClass0 takes a ClassLoader instance but discards it in native code; I updated native code to reuse that ClassLoader for Proxy support. > 3. ProxyGenerator changes mainly involve using Class data to pass Method list (accessed in a single condy) and removal of obsolete setup code generation. > > Testing: tier1 and tier2 have no related failures. > > Comment: Since #8278, Proxy has been converted to ClassFile API, and infrastructure has changed; now, the migration to hidden classes is much cleaner and has less impact, such as preserving ProtectionDomain and dynamic module without "anchor classes", and avoiding java.lang.invoke package. `useLegacyProxyImpl &&?!useOldSerializableConstructor` would?always be?`false` when `useOldSerializableConstructor` is?`true`, which?is?the?opposite of?what?s?described in?the?CSR. src/java.base/share/classes/jdk/internal/reflect/ReflectionFactory.java line 557: > 555: public static boolean useLegacyProxyImpl() { > 556: var config = config(); > 557: return config.useLegacyProxyImpl && !config.useOldSerializableConstructor; Suggestion: return config.useLegacyProxyImpl || config.useOldSerializableConstructor; src/java.base/share/classes/jdk/internal/reflect/ReflectionFactory.java line 624: > 622: "true".equals(props.getProperty("jdk.disableSerialConstructorChecks")); > 623: > 624: useLegacyProxyImpl &= !useOldSerializableConstructor; Suggestion: useLegacyProxyImpl |= useOldSerializableConstructor; ------------- PR Review: https://git.openjdk.org/jdk/pull/19356#pullrequestreview-2079825251 PR Review Comment: https://git.openjdk.org/jdk/pull/19356#discussion_r1615362157 PR Review Comment: https://git.openjdk.org/jdk/pull/19356#discussion_r1615362271 From lmesnik at openjdk.org Mon May 27 00:50:02 2024 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Mon, 27 May 2024 00:50:02 GMT Subject: RFR: 8332917: failure_handler should execute gdb "info threads" command on linux In-Reply-To: References: Message-ID: <3aStAfJCdcDXUKvKJHj8Wzd99yK6QVeM-1rCnyftFZo=.e7230731-7dbc-43e0-9c20-463871b8bdce@github.com> On Fri, 24 May 2024 19:45:21 GMT, Chris Plummer wrote: > On linux, failure_handler dumps stack traces for all threads, but this dump does not include the name of each thread. The gdb "info threads" command will give a summary of all threads, and if debugging process, the summary will include each thread's name. If debugging a core file, for some reason the thread name is not included, but the summary is still useful. > > Tested by running some tests that fail with a timeout, and looking at the failure_handler gdb output for both the process and the core file. Marked as reviewed by lmesnik (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/19401#pullrequestreview-2079839432 From liach at openjdk.org Mon May 27 01:25:11 2024 From: liach at openjdk.org (Chen Liang) Date: Mon, 27 May 2024 01:25:11 GMT Subject: RFR: 8242888: Convert dynamic proxy to hidden classes In-Reply-To: References: Message-ID: <7Kf2Il9AOTNK5iJrHGm0ta37FjRE-1MpVHgv14NA8x0=.0440b81c-607b-4df6-a725-6de09b31b30a@github.com> On Mon, 27 May 2024 00:03:41 GMT, ExE Boss wrote: >> Please review this change that convert dynamic proxies implementations to hidden classes, intended to target JDK 24. >> >> Summary: >> 1. Adds new implementation while preserving the old implementation behind `-Djdk.reflect.useLegacyProxyImpl=true` in case there are compatibility issues. >> 2. ClassLoader.defineClass0 takes a ClassLoader instance but discards it in native code; I updated native code to reuse that ClassLoader for Proxy support. >> 3. ProxyGenerator changes mainly involve using Class data to pass Method list (accessed in a single condy) and removal of obsolete setup code generation. >> >> Testing: tier1 and tier2 have no related failures. >> >> Comment: Since #8278, Proxy has been converted to ClassFile API, and infrastructure has changed; now, the migration to hidden classes is much cleaner and has less impact, such as preserving ProtectionDomain and dynamic module without "anchor classes", and avoiding java.lang.invoke package. > > src/java.base/share/classes/jdk/internal/reflect/ReflectionFactory.java line 557: > >> 555: public static boolean useLegacyProxyImpl() { >> 556: var config = config(); >> 557: return config.useLegacyProxyImpl && !config.useOldSerializableConstructor; > > Suggestion: > > return config.useLegacyProxyImpl || config.useOldSerializableConstructor; This site can actually simply be `config.useLegacyProxyImpl` as it's initialized in `loadConfig`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19356#discussion_r1615382817 From ddong at openjdk.org Mon May 27 01:47:16 2024 From: ddong at openjdk.org (Denghui Dong) Date: Mon, 27 May 2024 01:47:16 GMT Subject: Withdrawn: 8326012: JFR: Event for time to safepoint In-Reply-To: <68hS0kQgtDIk4ioAJj_r0_GLT6h0lcif6Daj6WRwxlI=.40c2a6e7-70a8-4954-bcde-9318ee311028@github.com> References: <68hS0kQgtDIk4ioAJj_r0_GLT6h0lcif6Daj6WRwxlI=.40c2a6e7-70a8-4954-bcde-9318ee311028@github.com> Message-ID: On Fri, 16 Feb 2024 03:59:36 GMT, Denghui Dong wrote: > There are now some JFR events related to safepoint. When time-to-safepoint (aka ttsp) is too long, these events could not be very helpful since based on them we cannot know which threads cause it and what those threads are doing. > > Users can use `-XX:+SafepointTimeout -XX:SafepointTimeoutDelay=100` to see the threads that don't reach safepoint in time but without stack traces. Using `-XX:+ AbortVMOnSafepointTimeout` can capture the stack traces but it crashes the process, hence it's not sensible to enable the flag in production. > > ~~This patch adds a new JFR event `EventSafepointTimeout` to record the threads that cause ttsp too long.~~ > > ~~This event includes two fields:~~ > > ~~- safepointId: the relevant safepoint id~~ > ~~- timeExceeded: the amount of time exceeding `SafepointTimeoutDelay` used by the thread to reach safepoint~~ > > ~~In the current version, this event records the stack of those problematic threads when they finally reach safepoint. Hence, there is a bias, but it's still helpful to deduce the root place.~~ > > A better implementation is to record a more accurate stack, but this will increase complexity. At the same time, the native stack may also be important for this problem, but it is not currently supported by JFR. > > Any input would be greatly appreciated. > > Testing: jdk/jdk/jfr This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/17888 From dholmes at openjdk.org Mon May 27 02:18:22 2024 From: dholmes at openjdk.org (David Holmes) Date: Mon, 27 May 2024 02:18:22 GMT Subject: RFR: 8329958: jdk22 win x86 make fails: downcallLinker.cpp(36) redefinition Message-ID: Trivial fix to add JNICALL to the function declaration. This will be backported to JDK 22. Testing: - tier1 sanity builds Thanks ------------- Commit messages: - 8329958: jdk22 win x86 make fails: downcallLinker.cpp(36) redefinition Changes: https://git.openjdk.org/jdk/pull/19406/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19406&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8329958 Stats: 3 lines in 1 file changed: 1 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/19406.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19406/head:pull/19406 PR: https://git.openjdk.org/jdk/pull/19406 From duke at openjdk.org Mon May 27 03:14:24 2024 From: duke at openjdk.org (kuaiwei) Date: Mon, 27 May 2024 03:14:24 GMT Subject: RFR: 8325821: [REDO] use "dmb.ishst+dmb.ishld" for release barrier [v5] In-Reply-To: References: Message-ID: > he origin patch for https://bugs.openjdk.org/browse/JDK-8324186 has 2 issues: > 1 It show regression in some platform, like Apple silicon in mac os > 2 Can not handle instruction sequence like "dmb.ishld; dmb.ishst; dmb.ishld; dmb.ishld" > > It can be fixed by: > 1 Enable AlwaysMergeDMB by default, only disable it in architecture we can see performance improvement (N1 or N2) > 2 Check the special pattern and merge the subsequent dmb. > > It also fix a bug when code buffer is expanding, st/ld/dmb can not be merged. I added unit tests for these. > > This patch still has a unhandled case. Insts like "dmb.ishld; dmb.ishst; dmb.ish", it will merge the last 2 instructions and can not merge all three. Because when emitting dmb.ish, if merge all previous dmbs, the code buffer will shrink the size. I think it may break some resumption and think it's not a common pattern. > > In previous PR https://github.com/openjdk/jdk/pull/18467 , I tried an implementation to use state machine for merging. But it looks risky to pending instruction during emitting. kuaiwei has updated the pull request incrementally with one additional commit since the last revision: Remove tailing white space ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19278/files - new: https://git.openjdk.org/jdk/pull/19278/files/00262c4c..8ef3e037 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19278&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19278&range=03-04 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/19278.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19278/head:pull/19278 PR: https://git.openjdk.org/jdk/pull/19278 From djelinski at openjdk.org Mon May 27 05:26:06 2024 From: djelinski at openjdk.org (Daniel =?UTF-8?B?SmVsacWEc2tp?=) Date: Mon, 27 May 2024 05:26:06 GMT Subject: Integrated: 8332724: x86 MacroAssembler may over-align code In-Reply-To: References: Message-ID: <7EuivUMlD7SgjPuysNFh2DYSnuKLmRlDEPwMhlfXR30=.fd7f9aca-eb6d-47c8-9124-e4a6d21c6545@github.com> On Wed, 22 May 2024 19:04:27 GMT, Daniel Jeli?ski wrote: > The methods align32 and align64 are supposed to align the next instruction to the next 32 or 64 byte boundary using the minimum number of NOP bytes. However, when the target represented as a 32bit signed int is negative, the instructions generate 32 or 64 NOP bytes too many. This was observed in `jbyte_disjoint_arraycopy_avx3` on a Linux machine, where a single align32 invocation generated 63 bytes of NOPs. > > This PR addresses the problem by using bit operations to calculate the required number of bytes. > > Tier1-3 tests passed. > > On a side note, `align64` and `align32` instructions were meant for aligning data for use with zmm / ymm loads, but nowadays they are frequently used in places where `align(CodeEntryAlignment)` or `align(OptoLoopAlignment)` would be more appropriate. I can address that in a separate PR if you think it's worth fixing. This pull request has now been integrated. Changeset: 08d51003 Author: Daniel Jeli?ski URL: https://git.openjdk.org/jdk/commit/08d51003d142e89b9d2f66187a4ea50e12b94fbb Stats: 9 lines in 4 files changed: 0 ins; 0 del; 9 mod 8332724: x86 MacroAssembler may over-align code Reviewed-by: dlong, kvn ------------- PR: https://git.openjdk.org/jdk/pull/19353 From dholmes at openjdk.org Mon May 27 05:29:11 2024 From: dholmes at openjdk.org (David Holmes) Date: Mon, 27 May 2024 05:29:11 GMT Subject: RFR: 8330198: Add some class loading related perf counters to measure VM startup [v4] In-Reply-To: <7AWghiG_TSVMjkfVfA_krBMWZNMRVlakI7kny1tuJ9s=.d4ca3b29-923a-48e6-80d7-97c72ea6e308@github.com> References: <7AWghiG_TSVMjkfVfA_krBMWZNMRVlakI7kny1tuJ9s=.d4ca3b29-923a-48e6-80d7-97c72ea6e308@github.com> Message-ID: On Fri, 24 May 2024 23:25:15 GMT, Calvin Cheung wrote: >> Adding a few perf counters related to class loading to measure VM startup. The counters are only active if the user specifies `-Xlog:init` in the command line. A diagnostic flag `ProfileClassLinkage` is added to control the new counters. The flag is set to false by default and will be enabled if `-Xlog:init` is specified. >> >> This change is already in the leyden/premain branch. There are more counters in the branch to measure other stuff. For now, just upstreaming class loader related counters. >> >> Refer to the [comment](https://bugs.openjdk.org/browse/JDK-8330198?focusedId=14665311&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14665311) in the bug report for an example output. >> >> Passed tiers 1 - 4 testing. > > Calvin Cheung has updated the pull request incrementally with one additional commit since the last revision: > > @dholmes-ora comments I have a few concerns about the way this is being put together. The coupling between the use of the perf counters and the unified logging seems awkward to me. src/hotspot/share/cds/dynamicArchive.cpp line 123: > 121: > 122: log_info(cds,dynamic)("CDS dynamic dump: clinit = " INT64_FORMAT "ms)", > 123: (int64_t)ClassLoader::class_init_time_ms()); Nit: just use JLONG_FORMAT and avoid the cast src/hotspot/share/classfile/classLoader.cpp line 144: > 142: log.print_cr("ClassLoader:"); > 143: log.print_cr(" clinit: " INT64_FORMAT "ms / " INT64_FORMAT " events", (int64_t)ClassLoader::class_init_time_ms(), (int64_t)ClassLoader::class_init_count()); > 144: log.print_cr(" link methods: " INT64_FORMAT "ms / " INT64_FORMAT " events", (int64_t)Management::ticks_to_ms(_perf_ik_link_methods_time->get_value()) , (int64_t)_perf_ik_link_methods_count->get_value()); Why are you casting all the jlong values to int64_t instead of just using JLONG_FORMAT? src/hotspot/share/runtime/java.cpp line 165: > 163: ClassLoader::print_counters(); > 164: } > 165: } This method seems unnecessary. Inside `print_counters` it checks if the log is enabled and whether `ProfileClassLinkage` is set, so no need to check the log is enabled here. Wherever this is called you should just call `ClassLoader::print_counters` directly. (Further the "init" part of the name is only meaningful for the call site at the end of VM initialization.) src/hotspot/share/runtime/java.cpp line 367: > 365: ThreadsSMRSupport::log_statistics(); > 366: > 367: log_vm_init_stats(); Do we really want to call `ClassLoader::print_counters` here? IIUC most everything else here is printing to tty, but `ClassLoader::print_counters` will "print" to whereever the logging has been configured. (` ThreadsSMRSupport::log_statistics` seems similarly misplaced as it too uses logging). src/hotspot/share/runtime/perfData.hpp line 838: > 836: } > 837: > 838: const char* name() const { return (_timerp != nullptr) ? _timerp->name() : nullptr; } So now all the callers of this need a null check too. I wonder if this should just be an assertion check, as we should only ever call this when we have encountered a valid/live counter. src/hotspot/share/runtime/threads.cpp line 832: > 830: > 831: if (ProfileClassLinkage) { > 832: log_info(init)("Before main:"); "Before main: " ?? That seems very launcher specific. How about "At VM initialization completion"? ------------- Changes requested by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18790#pullrequestreview-2079957601 PR Review Comment: https://git.openjdk.org/jdk/pull/18790#discussion_r1615453134 PR Review Comment: https://git.openjdk.org/jdk/pull/18790#discussion_r1615453989 PR Review Comment: https://git.openjdk.org/jdk/pull/18790#discussion_r1615467369 PR Review Comment: https://git.openjdk.org/jdk/pull/18790#discussion_r1615480495 PR Review Comment: https://git.openjdk.org/jdk/pull/18790#discussion_r1615458516 PR Review Comment: https://git.openjdk.org/jdk/pull/18790#discussion_r1615460402 From dholmes at openjdk.org Mon May 27 05:29:12 2024 From: dholmes at openjdk.org (David Holmes) Date: Mon, 27 May 2024 05:29:12 GMT Subject: RFR: 8330198: Add some class loading related perf counters to measure VM startup [v3] In-Reply-To: <7yfsvM0ff6gBYLefpro2qTcEMBmCOHd3YICcygItlZs=.d900a439-4932-46e6-b287-d1bf2789f195@github.com> References: <7yfsvM0ff6gBYLefpro2qTcEMBmCOHd3YICcygItlZs=.d900a439-4932-46e6-b287-d1bf2789f195@github.com> Message-ID: On Fri, 24 May 2024 05:21:36 GMT, Calvin Cheung wrote: >> src/hotspot/share/runtime/arguments.cpp line 3759: >> >>> 3757: if (log_is_enabled(Info, init)) { >>> 3758: FLAG_SET_ERGO_IF_DEFAULT(ProfileClassLinkage, true); >>> 3759: } >> >> What if ProfileClassLinkage is set true on the command-line without -Xlog:init? That doesn't seem to make sense to me. So I'm not clear why it is a settable diagnostic flag. > > If only `ProfileClassLinkage` is set to true without `-Xlog:init`, the user will not see any counters output. > In `java.cpp`: > > 160 void log_vm_init_stats() { > 161 LogStreamHandle(Info, init) log; > 162 if (log.is_enabled()) { > 163 ClassLoader::print_counters(); > 164 } > 165 } > > > In the future, there will be other sets of counters controlled by other diagnostic flags. Yeah I'm not really getting the control aspects here. If I turn on logging I should not get these new counters unless I explicitly ask for them - simply turning on the logging should not set `ProfileClassLinkage` IMO. But enabling `ProfileClassLinkage` should turn on `init` logging, else it serves no purpose. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18790#discussion_r1615456407 From dholmes at openjdk.org Mon May 27 06:14:02 2024 From: dholmes at openjdk.org (David Holmes) Date: Mon, 27 May 2024 06:14:02 GMT Subject: RFR: 8332105: Exploded JDK does not include CDS In-Reply-To: References: Message-ID: On Sat, 11 May 2024 06:13:29 GMT, Thomas Stuefe wrote: > An exploded JDK cannot be used with either -Xshare:on or -Xshare:auto. That causes tests like runtime/CompressedOops/CompressedCPUSpecificClassSpaceReservation.java to fail when running on an exploded JDK. > > Since an exploded JDK cannot use CDS, we should - for tests - treat it as if CDS had not been included. > > > ---- > > Note that I was torn between two ways to fix this: > > - either this fix, which is rather simple and automatically updates the "vm.cds" `@requires` property > - or to expose "exploded-ness" as a boolean property via `WhiteBox` and `VMProps`(`jdk.exploded`). See this draft PR: https://github.com/openjdk/jdk/pull/19178 . > > The latter is cleaner and clearer, conveying the message of exploded-ness without muddling it with the CDS aspect. But OTOH the complexity may not be required. > > I can go either way, though I have a slight preference for this PR, which is why I posted it. Seems okay. This test should have had `requires vm.cds` anyway. Just out of curiosity why is CDS not compatible with an exploded build? ------------- Marked as reviewed by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19188#pullrequestreview-2080060349 From dholmes at openjdk.org Mon May 27 06:26:00 2024 From: dholmes at openjdk.org (David Holmes) Date: Mon, 27 May 2024 06:26:00 GMT Subject: RFR: 8332894: ubsan: vmError.cpp:2090:26: runtime error: division by zero In-Reply-To: <5Dnql_PtTbZUQxDqrnZBxmkE0ztmxwtom04vQWG--Z0=.fbf93839-c55f-4f69-80d0-9b4bc6a44a12@github.com> References: <5Dnql_PtTbZUQxDqrnZBxmkE0ztmxwtom04vQWG--Z0=.fbf93839-c55f-4f69-80d0-9b4bc6a44a12@github.com> Message-ID: On Fri, 24 May 2024 13:30:41 GMT, Matthias Baesken wrote: > When running with ubsan enabled on Linux x86_64, I get in the HS :tier1 tests this error : > > runtime/ErrorHandling/TestDwarf_dontCheckDecoder.jtr > > /jdk/src/hotspot/share/utilities/vmError.cpp:2090:26: runtime error: division by zero > #0 0x7f16bc531f32 in crash_with_sigfpe /jdk/src/hotspot/share/utilities/vmError.cpp:2090 > #1 0x7f16bc531f32 in VMError::controlled_crash(int) /jdk/src/hotspot/share/utilities/vmError.cpp:2137 > #2 0x7f16bea2d8fd in JNI_CreateJavaVM_inner /jdk/src/hotspot/share/prims/jni.cpp:3621 > #3 0x7f16bea2d8fd in JNI_CreateJavaVM /jdk/src/hotspot/share/prims/jni.cpp:3672 > #4 0x7f16c5dbd0e5 in InitializeJVM /jdk/src/java.base/share/native/libjli/java.c:1550 > #5 0x7f16c5dbd0e5 in JavaMain /jdk/src/java.base/share/native/libjli/java.c:491 > #6 0x7f16c5dc6748 in ThreadJavaMain /jdk/src/java.base/unix/native/libjli/java_md.c:642 > #7 0x7f16c5d756e9 in start_thread (/lib64/libpthread.so.0+0xa6e9) (BuildId: 2f8d3c2d0f4d7888c2598d2ff6356537f5708a73) > #8 0x7f16c531550e in clone (/lib64/libc.so.6+0x11850e) (BuildId: f732026552f6adff988b338e92d466bc81a01c37) > > Reason is that we do a float division by zero to get a signal . This is desired by us so not really an error but ubsan cannot know this. > So add an attribute to this function that it has undefined behavior. > See https://clang.llvm.org/docs/UndefinedBehaviorSanitizer.html (division by zero) . "Floating point division by zero. This is undefined per the C and C++ standards" As Thomas notes we intentionally want to test a synchronous signal is possible, so doing the minimum we can to "fix" ubsan is fine by me. Thanks. ------------- Marked as reviewed by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19394#pullrequestreview-2080075594 From mbaesken at openjdk.org Mon May 27 06:45:04 2024 From: mbaesken at openjdk.org (Matthias Baesken) Date: Mon, 27 May 2024 06:45:04 GMT Subject: RFR: 8332894: ubsan: vmError.cpp:2090:26: runtime error: division by zero In-Reply-To: References: <5Dnql_PtTbZUQxDqrnZBxmkE0ztmxwtom04vQWG--Z0=.fbf93839-c55f-4f69-80d0-9b4bc6a44a12@github.com> Message-ID: On Sat, 25 May 2024 15:12:10 GMT, Kim Barrett wrote: >> When running with ubsan enabled on Linux x86_64, I get in the HS :tier1 tests this error : >> >> runtime/ErrorHandling/TestDwarf_dontCheckDecoder.jtr >> >> /jdk/src/hotspot/share/utilities/vmError.cpp:2090:26: runtime error: division by zero >> #0 0x7f16bc531f32 in crash_with_sigfpe /jdk/src/hotspot/share/utilities/vmError.cpp:2090 >> #1 0x7f16bc531f32 in VMError::controlled_crash(int) /jdk/src/hotspot/share/utilities/vmError.cpp:2137 >> #2 0x7f16bea2d8fd in JNI_CreateJavaVM_inner /jdk/src/hotspot/share/prims/jni.cpp:3621 >> #3 0x7f16bea2d8fd in JNI_CreateJavaVM /jdk/src/hotspot/share/prims/jni.cpp:3672 >> #4 0x7f16c5dbd0e5 in InitializeJVM /jdk/src/java.base/share/native/libjli/java.c:1550 >> #5 0x7f16c5dbd0e5 in JavaMain /jdk/src/java.base/share/native/libjli/java.c:491 >> #6 0x7f16c5dc6748 in ThreadJavaMain /jdk/src/java.base/unix/native/libjli/java_md.c:642 >> #7 0x7f16c5d756e9 in start_thread (/lib64/libpthread.so.0+0xa6e9) (BuildId: 2f8d3c2d0f4d7888c2598d2ff6356537f5708a73) >> #8 0x7f16c531550e in clone (/lib64/libc.so.6+0x11850e) (BuildId: f732026552f6adff988b338e92d466bc81a01c37) >> >> Reason is that we do a float division by zero to get a signal . This is desired by us so not really an error but ubsan cannot know this. >> So add an attribute to this function that it has undefined behavior. >> See https://clang.llvm.org/docs/UndefinedBehaviorSanitizer.html (division by zero) . "Floating point division by zero. This is undefined per the C and C++ standards" > > src/hotspot/share/utilities/vmError.cpp line 2093: > >> 2091: static void ALWAYSINLINE crash_with_sigfpe() { >> 2092: >> 2093: // generate a native synchronous SIGFPE where possible; > > Maybe simpler would be to change the definition to only use the divide-by-zero > approach for _WIN32 and always use the currently conditional fallback to > pthread_kill on non-_WIN32. Especially in light of the fact that the > divide-by-zero approach doesn't work on some platforms. > > I also wonder if the comment about OSX incorrectly implementing raise is > correct? Maybe that's been fixed? Or maybe it's not a bug, but a BSD-ism? I do not know about the OSX specific issues, maybe someone else can comment? Regarding the handling on UNIX, Thomas commented and I think the coding should better stay. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19394#discussion_r1615546824 From rehn at openjdk.org Mon May 27 06:45:30 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Mon, 27 May 2024 06:45:30 GMT Subject: RFR: 8332265: RISC-V: Materialize pointers faster by using a temp register [v7] In-Reply-To: References: Message-ID: > Hi, please consider! > > Materializing a 48-bit pointer, using an additional register, we can do with: > lui + lui + slli + add + addi > This 15% faster both on VF2 and in CPU models, compared to movptr(). > > As we often materialize during calls there is free registers. > > I have choose just a few spot to use it, many more can use. > E.g. la() with tmp register can use li48 instead of movptr. > > Running tests now (so far so good), as if I screwed up IC calls it should be seen fast. > And benchmarks when hardware is free. Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 12 commits: - Merge branch 'master' into 8332265 - Fixed more comments - Fixed comments - Merge branch 'master' into 8332265 - More review comments - Review changes - Merge branch 'master' into 8332265 - Merge branch 'master' into 8332265 - Small review update - li48 -> movptr - ... and 2 more: https://git.openjdk.org/jdk/compare/16dba04e...8dff3f12 ------------- Changes: https://git.openjdk.org/jdk/pull/19246/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19246&range=06 Stats: 212 lines in 8 files changed: 123 ins; 13 del; 76 mod Patch: https://git.openjdk.org/jdk/pull/19246.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19246/head:pull/19246 PR: https://git.openjdk.org/jdk/pull/19246 From mbaesken at openjdk.org Mon May 27 06:52:00 2024 From: mbaesken at openjdk.org (Matthias Baesken) Date: Mon, 27 May 2024 06:52:00 GMT Subject: RFR: 8332894: ubsan: vmError.cpp:2090:26: runtime error: division by zero In-Reply-To: References: <5Dnql_PtTbZUQxDqrnZBxmkE0ztmxwtom04vQWG--Z0=.fbf93839-c55f-4f69-80d0-9b4bc6a44a12@github.com> Message-ID: <8S84rKDoyx-JF6oYqUMhmEJBnz5in_PuTUf4bAZ1PNs=.d58d27fd-7599-4195-8bca-b96f525e83a9@github.com> On Sun, 26 May 2024 05:09:37 GMT, Julian Waters wrote: > Would've used the C++14 attribute syntax for this, but oh well You can put it on the list, why C++ 14 is desired (at some point in future sooner or later we will go anyway to some more current version of the standard). ------------- PR Comment: https://git.openjdk.org/jdk/pull/19394#issuecomment-2132761559 From mbaesken at openjdk.org Mon May 27 07:09:26 2024 From: mbaesken at openjdk.org (Matthias Baesken) Date: Mon, 27 May 2024 07:09:26 GMT Subject: RFR: 8332894: ubsan: vmError.cpp:2090:26: runtime error: division by zero [v2] In-Reply-To: <5Dnql_PtTbZUQxDqrnZBxmkE0ztmxwtom04vQWG--Z0=.fbf93839-c55f-4f69-80d0-9b4bc6a44a12@github.com> References: <5Dnql_PtTbZUQxDqrnZBxmkE0ztmxwtom04vQWG--Z0=.fbf93839-c55f-4f69-80d0-9b4bc6a44a12@github.com> Message-ID: <1SICcvVvxShSN5au4kjtezCp5qINLWtIbLt3_7e0QdQ=.23665d97-6320-46d9-bafc-19716b631b21@github.com> > When running with ubsan enabled on Linux x86_64, I get in the HS :tier1 tests this error : > > runtime/ErrorHandling/TestDwarf_dontCheckDecoder.jtr > > /jdk/src/hotspot/share/utilities/vmError.cpp:2090:26: runtime error: division by zero > #0 0x7f16bc531f32 in crash_with_sigfpe /jdk/src/hotspot/share/utilities/vmError.cpp:2090 > #1 0x7f16bc531f32 in VMError::controlled_crash(int) /jdk/src/hotspot/share/utilities/vmError.cpp:2137 > #2 0x7f16bea2d8fd in JNI_CreateJavaVM_inner /jdk/src/hotspot/share/prims/jni.cpp:3621 > #3 0x7f16bea2d8fd in JNI_CreateJavaVM /jdk/src/hotspot/share/prims/jni.cpp:3672 > #4 0x7f16c5dbd0e5 in InitializeJVM /jdk/src/java.base/share/native/libjli/java.c:1550 > #5 0x7f16c5dbd0e5 in JavaMain /jdk/src/java.base/share/native/libjli/java.c:491 > #6 0x7f16c5dc6748 in ThreadJavaMain /jdk/src/java.base/unix/native/libjli/java_md.c:642 > #7 0x7f16c5d756e9 in start_thread (/lib64/libpthread.so.0+0xa6e9) (BuildId: 2f8d3c2d0f4d7888c2598d2ff6356537f5708a73) > #8 0x7f16c531550e in clone (/lib64/libc.so.6+0x11850e) (BuildId: f732026552f6adff988b338e92d466bc81a01c37) > > Reason is that we do a float division by zero to get a signal . This is desired by us so not really an error but ubsan cannot know this. > So add an attribute to this function that it has undefined behavior. > See https://clang.llvm.org/docs/UndefinedBehaviorSanitizer.html (division by zero) . "Floating point division by zero. This is undefined per the C and C++ standards" Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: adjust comment ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19394/files - new: https://git.openjdk.org/jdk/pull/19394/files/c747e0c3..b297f194 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19394&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19394&range=00-01 Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/19394.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19394/head:pull/19394 PR: https://git.openjdk.org/jdk/pull/19394 From mbaesken at openjdk.org Mon May 27 07:09:26 2024 From: mbaesken at openjdk.org (Matthias Baesken) Date: Mon, 27 May 2024 07:09:26 GMT Subject: RFR: 8332894: ubsan: vmError.cpp:2090:26: runtime error: division by zero In-Reply-To: <5Dnql_PtTbZUQxDqrnZBxmkE0ztmxwtom04vQWG--Z0=.fbf93839-c55f-4f69-80d0-9b4bc6a44a12@github.com> References: <5Dnql_PtTbZUQxDqrnZBxmkE0ztmxwtom04vQWG--Z0=.fbf93839-c55f-4f69-80d0-9b4bc6a44a12@github.com> Message-ID: <_nKQJMR9MM4cJ5GzG0ksIgSYVFafqBkSyDJOdUPZ5YM=.0bf72dbc-8c03-4154-a685-6edf59db8b1c@github.com> On Fri, 24 May 2024 13:30:41 GMT, Matthias Baesken wrote: > When running with ubsan enabled on Linux x86_64, I get in the HS :tier1 tests this error : > > runtime/ErrorHandling/TestDwarf_dontCheckDecoder.jtr > > /jdk/src/hotspot/share/utilities/vmError.cpp:2090:26: runtime error: division by zero > #0 0x7f16bc531f32 in crash_with_sigfpe /jdk/src/hotspot/share/utilities/vmError.cpp:2090 > #1 0x7f16bc531f32 in VMError::controlled_crash(int) /jdk/src/hotspot/share/utilities/vmError.cpp:2137 > #2 0x7f16bea2d8fd in JNI_CreateJavaVM_inner /jdk/src/hotspot/share/prims/jni.cpp:3621 > #3 0x7f16bea2d8fd in JNI_CreateJavaVM /jdk/src/hotspot/share/prims/jni.cpp:3672 > #4 0x7f16c5dbd0e5 in InitializeJVM /jdk/src/java.base/share/native/libjli/java.c:1550 > #5 0x7f16c5dbd0e5 in JavaMain /jdk/src/java.base/share/native/libjli/java.c:491 > #6 0x7f16c5dc6748 in ThreadJavaMain /jdk/src/java.base/unix/native/libjli/java_md.c:642 > #7 0x7f16c5d756e9 in start_thread (/lib64/libpthread.so.0+0xa6e9) (BuildId: 2f8d3c2d0f4d7888c2598d2ff6356537f5708a73) > #8 0x7f16c531550e in clone (/lib64/libc.so.6+0x11850e) (BuildId: f732026552f6adff988b338e92d466bc81a01c37) > > Reason is that we do a float division by zero to get a signal . This is desired by us so not really an error but ubsan cannot know this. > So add an attribute to this function that it has undefined behavior. > See https://clang.llvm.org/docs/UndefinedBehaviorSanitizer.html (division by zero) . "Floating point division by zero. This is undefined per the C and C++ standards" I adjusted the comment a bit. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19394#issuecomment-2132786570 From tschatzl at openjdk.org Mon May 27 07:18:31 2024 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Mon, 27 May 2024 07:18:31 GMT Subject: RFR: 8330577: G1 sometimes sends jdk.G1HeapRegionTypeChange for non-changes [v2] In-Reply-To: <4B5e_9phnbHwNVMi-muq8EweQTt58Wm3KTL_Psjbt9w=.bc00798c-9947-4823-9ff4-d95da2e88e40@github.com> References: <4B5e_9phnbHwNVMi-muq8EweQTt58Wm3KTL_Psjbt9w=.bc00798c-9947-4823-9ff4-d95da2e88e40@github.com> Message-ID: <3qM2hA4GZo8qDs3r65uy12U3KM1bmusrmUvcG667cdw=.4ebeb222-e255-4e90-a1eb-0b875551ae06@github.com> > Hi all, > > please review this change that avoids posting Free->Free and Old->Old region transitions in JFR. > > The reason for these could have been: > * Free->Free: heap shrinking and full gc > * Old->Old: heap shrinking, full gc or evacuation failure in an old region > > Parts of this change has been contributed by @ansteiner , crediting him for this (the first commit). > > Testing: tier1-3, tier5, all "detailed" JFR test cases > > Thanks, > Thomas Thomas Schatzl has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains three commits: - Merge branch 'master' into submit/80330577-jfr-heap-region-type-transitions - Fix errorneous Old->Old transitions which actually were Free->Old. Add test - JDK-8330577 ------------- Changes: https://git.openjdk.org/jdk/pull/19389/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19389&range=01 Stats: 113 lines in 2 files changed: 110 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/19389.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19389/head:pull/19389 PR: https://git.openjdk.org/jdk/pull/19389 From dholmes at openjdk.org Mon May 27 07:41:06 2024 From: dholmes at openjdk.org (David Holmes) Date: Mon, 27 May 2024 07:41:06 GMT Subject: RFR: 8332894: ubsan: vmError.cpp:2090:26: runtime error: division by zero [v2] In-Reply-To: References: <5Dnql_PtTbZUQxDqrnZBxmkE0ztmxwtom04vQWG--Z0=.fbf93839-c55f-4f69-80d0-9b4bc6a44a12@github.com> Message-ID: <2l2cPdYfocVcIFfNOjfkyKOF6W0_3JGDzZlUoRyWo38=.c6cb7e9c-e5b6-467c-a51a-312f2f5e6444@github.com> On Mon, 27 May 2024 06:42:42 GMT, Matthias Baesken wrote: >> src/hotspot/share/utilities/vmError.cpp line 2093: >> >>> 2091: static void ALWAYSINLINE crash_with_sigfpe() { >>> 2092: >>> 2093: // generate a native synchronous SIGFPE where possible; >> >> Maybe simpler would be to change the definition to only use the divide-by-zero >> approach for _WIN32 and always use the currently conditional fallback to >> pthread_kill on non-_WIN32. Especially in light of the fact that the >> divide-by-zero approach doesn't work on some platforms. >> >> I also wonder if the comment about OSX incorrectly implementing raise is >> correct? Maybe that's been fixed? Or maybe it's not a bug, but a BSD-ism? > > I do not know about the OSX specific issues, maybe someone else can comment? > Regarding the handling on UNIX, Thomas commented and I think the coding should better stay. macOS `raise` raises the signal to the process not the thread (per Posix requirements). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19394#discussion_r1615608795 From tschatzl at openjdk.org Mon May 27 07:49:31 2024 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Mon, 27 May 2024 07:49:31 GMT Subject: RFR: 8330577: G1 sometimes sends jdk.G1HeapRegionTypeChange for non-changes [v3] In-Reply-To: <4B5e_9phnbHwNVMi-muq8EweQTt58Wm3KTL_Psjbt9w=.bc00798c-9947-4823-9ff4-d95da2e88e40@github.com> References: <4B5e_9phnbHwNVMi-muq8EweQTt58Wm3KTL_Psjbt9w=.bc00798c-9947-4823-9ff4-d95da2e88e40@github.com> Message-ID: > Hi all, > > please review this change that avoids posting Free->Free and Old->Old region transitions in JFR. > > The reason for these could have been: > * Free->Free: heap shrinking and full gc > * Old->Old: heap shrinking, full gc or evacuation failure in an old region > > Parts of this change has been contributed by @ansteiner , crediting him for this (the first commit). > > Testing: tier1-3, tier5, all "detailed" JFR test cases > > Thanks, > Thomas Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: Fix compilation after merge ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19389/files - new: https://git.openjdk.org/jdk/pull/19389/files/be8585b3..75c8027c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19389&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19389&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/19389.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19389/head:pull/19389 PR: https://git.openjdk.org/jdk/pull/19389 From mbaesken at openjdk.org Mon May 27 07:52:01 2024 From: mbaesken at openjdk.org (Matthias Baesken) Date: Mon, 27 May 2024 07:52:01 GMT Subject: RFR: 8332894: ubsan: vmError.cpp:2090:26: runtime error: division by zero [v2] In-Reply-To: <2l2cPdYfocVcIFfNOjfkyKOF6W0_3JGDzZlUoRyWo38=.c6cb7e9c-e5b6-467c-a51a-312f2f5e6444@github.com> References: <5Dnql_PtTbZUQxDqrnZBxmkE0ztmxwtom04vQWG--Z0=.fbf93839-c55f-4f69-80d0-9b4bc6a44a12@github.com> <2l2cPdYfocVcIFfNOjfkyKOF6W0_3JGDzZlUoRyWo38=.c6cb7e9c-e5b6-467c-a51a-312f2f5e6444@github.com> Message-ID: On Mon, 27 May 2024 07:38:52 GMT, David Holmes wrote: >> I do not know about the OSX specific issues, maybe someone else can comment? >> Regarding the handling on UNIX, Thomas commented and I think the coding should better stay. > > macOS `raise` raises the signal to the process not the thread (per Posix requirements). Hi David, so the comment // OSX implements raise(sig) incorrectly so we need to // explicitly target the current thread seems to be not correct, should we change it e.g. to your comment // macOS raise raises the signal to the process not the thread (per Posix requirements) // so we need to explicitly target the current thread ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19394#discussion_r1615622848 From aph at openjdk.org Mon May 27 08:29:01 2024 From: aph at openjdk.org (Andrew Haley) Date: Mon, 27 May 2024 08:29:01 GMT Subject: RFR: 8332105: Exploded JDK does not include CDS In-Reply-To: References: Message-ID: On Mon, 27 May 2024 06:11:20 GMT, David Holmes wrote: > Seems okay. This test should have had `requires vm.cds` anyway. > > Just out of curiosity why is CDS not compatible with an exploded build? Isn't the exploded build supposed to be as fast as possible? I think that's why people use it, and it'd be a shame to allow anything, such as building a CDS arcive, to slow that process down. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19188#issuecomment-2132932378 From stuefe at openjdk.org Mon May 27 09:16:01 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Mon, 27 May 2024 09:16:01 GMT Subject: RFR: 8332105: Exploded JDK does not include CDS In-Reply-To: References: Message-ID: On Mon, 27 May 2024 08:26:32 GMT, Andrew Haley wrote: > > Seems okay. This test should have had `requires vm.cds` anyway. > > Just out of curiosity why is CDS not compatible with an exploded build? @dholmes-ora Thanks for the review. Honestly, I don't know. Maybe @iklam knows. > > Isn't the exploded build supposed to be as fast as possible? I think that's why people use it, and it'd be a shame to allow anything, such as building a CDS arcive, to slow that process down. Sure, but not generating a CDS archive at build time and being unable to dump or use an archive at all are two different things. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19188#issuecomment-2133031419 From azafari at openjdk.org Mon May 27 09:28:04 2024 From: azafari at openjdk.org (Afshin Zafari) Date: Mon, 27 May 2024 09:28:04 GMT Subject: RFR: 8331539: [REDO] NMT: add/make a mandatory MEMFLAGS argument to family of os::reserve/commit/uncommit memory API [v4] In-Reply-To: References: <1i0PKv9mCusM6BZqXG8ULe0lRA2Nz2ix4aZHz9otNMM=.b9d2d151-883e-4cb6-be48-4ba45b49ed43@github.com> Message-ID: <_ca1EJiY646IzwhzyaYcFalLhCVmQuvICmvHnOL2cUk=.884fcf4b-7cf6-4d24-9cf8-0b423add835c@github.com> On Fri, 24 May 2024 13:46:15 GMT, Afshin Zafari wrote: >> This PR fixes the problems existed in the original PR (https://github.com/openjdk/jdk/pull/18745). There are two main fixes here: >> 1- `ReservedSpace` class is changed so that the `_flag` member never changes after it is set in ctor. Since reserving memory regions may go thru a try and fail sequence of reserve-release pairs, changing the `_flag` member at failed releases would lead to incorrect flags in subsequent reserves. >> Also, some assertion are added to the getters of a `ReservedSpace` to check if the region is successfully reserved. >> >> 2- In order to have adjacent regions with different flags, CDS reserves a (large) region `R` and then splits it into sub regions `R1` and `R2` (`R == <---R1---><--R2-->`). At release time, NMT tracks only `R` and ignores releasing `R1` and `R2`. This ignoring is problematic when a requested region `R` is size-aligned to `R1---R---R2` first and then the `R1` and `R2` are released (`chop_extra_memory` function is called for this). In this case, NMT ignores tracking `R1` and `R2` with false assumption that a containing `R` will be released. Therefore, `R1` and `R2` remain in the NMT reserved-regions-list and when a new reserve happens at that regions, NMT complains by raising an exception. >> >> Tests: >> mach5 tiers 1-5, {linux-x64, macosx-aarch64, windows-x64, linux-aarch64 } x {debug, non-debug} > > Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision: > > more fixes. fixes and further discussions are added. ------------- PR Review: https://git.openjdk.org/jdk/pull/19343#pullrequestreview-2080464778 From tschatzl at openjdk.org Mon May 27 09:52:09 2024 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Mon, 27 May 2024 09:52:09 GMT Subject: RFR: 8330577: G1 sometimes sends jdk.G1HeapRegionTypeChange for non-changes [v3] In-Reply-To: References: <4B5e_9phnbHwNVMi-muq8EweQTt58Wm3KTL_Psjbt9w=.bc00798c-9947-4823-9ff4-d95da2e88e40@github.com> Message-ID: On Fri, 24 May 2024 14:47:33 GMT, Andreas Steiner wrote: >> Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix compilation after merge > > LGTM Thanks @ansteiner @albertnetymk for your reviews ------------- PR Comment: https://git.openjdk.org/jdk/pull/19389#issuecomment-2133097573 From tschatzl at openjdk.org Mon May 27 09:52:10 2024 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Mon, 27 May 2024 09:52:10 GMT Subject: Integrated: 8330577: G1 sometimes sends jdk.G1HeapRegionTypeChange for non-changes In-Reply-To: <4B5e_9phnbHwNVMi-muq8EweQTt58Wm3KTL_Psjbt9w=.bc00798c-9947-4823-9ff4-d95da2e88e40@github.com> References: <4B5e_9phnbHwNVMi-muq8EweQTt58Wm3KTL_Psjbt9w=.bc00798c-9947-4823-9ff4-d95da2e88e40@github.com> Message-ID: On Fri, 24 May 2024 11:26:57 GMT, Thomas Schatzl wrote: > Hi all, > > please review this change that avoids posting Free->Free and Old->Old region transitions in JFR. > > The reason for these could have been: > * Free->Free: heap shrinking and full gc > * Old->Old: heap shrinking, full gc or evacuation failure in an old region > > Parts of this change has been contributed by @ansteiner , crediting him for this (the first commit). > > Testing: tier1-3, tier5, all "detailed" JFR test cases > > Thanks, > Thomas This pull request has now been integrated. Changeset: 72fbfe18 Author: Thomas Schatzl URL: https://git.openjdk.org/jdk/commit/72fbfe18cb20274bab2057f3d67920e0c86c5793 Stats: 112 lines in 2 files changed: 110 ins; 0 del; 2 mod 8330577: G1 sometimes sends jdk.G1HeapRegionTypeChange for non-changes Co-authored-by: Andreas Steiner Reviewed-by: ayang, asteiner ------------- PR: https://git.openjdk.org/jdk/pull/19389 From stuefe at openjdk.org Mon May 27 09:57:01 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Mon, 27 May 2024 09:57:01 GMT Subject: RFR: 8332894: ubsan: vmError.cpp:2090:26: runtime error: division by zero [v2] In-Reply-To: <1SICcvVvxShSN5au4kjtezCp5qINLWtIbLt3_7e0QdQ=.23665d97-6320-46d9-bafc-19716b631b21@github.com> References: <5Dnql_PtTbZUQxDqrnZBxmkE0ztmxwtom04vQWG--Z0=.fbf93839-c55f-4f69-80d0-9b4bc6a44a12@github.com> <1SICcvVvxShSN5au4kjtezCp5qINLWtIbLt3_7e0QdQ=.23665d97-6320-46d9-bafc-19716b631b21@github.com> Message-ID: <0shP9gP28ptofIjTO_MpLcxCoEHDyBIbNRliuOROeSg=.9c5bcb4b-05a2-4d55-bbc5-e6e473973d86@github.com> On Mon, 27 May 2024 07:09:26 GMT, Matthias Baesken wrote: >> When running with ubsan enabled on Linux x86_64, I get in the HS :tier1 tests this error : >> >> runtime/ErrorHandling/TestDwarf_dontCheckDecoder.jtr >> >> /jdk/src/hotspot/share/utilities/vmError.cpp:2090:26: runtime error: division by zero >> #0 0x7f16bc531f32 in crash_with_sigfpe /jdk/src/hotspot/share/utilities/vmError.cpp:2090 >> #1 0x7f16bc531f32 in VMError::controlled_crash(int) /jdk/src/hotspot/share/utilities/vmError.cpp:2137 >> #2 0x7f16bea2d8fd in JNI_CreateJavaVM_inner /jdk/src/hotspot/share/prims/jni.cpp:3621 >> #3 0x7f16bea2d8fd in JNI_CreateJavaVM /jdk/src/hotspot/share/prims/jni.cpp:3672 >> #4 0x7f16c5dbd0e5 in InitializeJVM /jdk/src/java.base/share/native/libjli/java.c:1550 >> #5 0x7f16c5dbd0e5 in JavaMain /jdk/src/java.base/share/native/libjli/java.c:491 >> #6 0x7f16c5dc6748 in ThreadJavaMain /jdk/src/java.base/unix/native/libjli/java_md.c:642 >> #7 0x7f16c5d756e9 in start_thread (/lib64/libpthread.so.0+0xa6e9) (BuildId: 2f8d3c2d0f4d7888c2598d2ff6356537f5708a73) >> #8 0x7f16c531550e in clone (/lib64/libc.so.6+0x11850e) (BuildId: f732026552f6adff988b338e92d466bc81a01c37) >> >> Reason is that we do a float division by zero to get a signal . This is desired by us so not really an error but ubsan cannot know this. >> So add an attribute to this function that it has undefined behavior. >> See https://clang.llvm.org/docs/UndefinedBehaviorSanitizer.html (division by zero) . "Floating point division by zero. This is undefined per the C and C++ standards" > > Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: > > adjust comment +1 ------------- Marked as reviewed by stuefe (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19394#pullrequestreview-2080539515 From aturbanov at openjdk.org Mon May 27 10:09:08 2024 From: aturbanov at openjdk.org (Andrey Turbanov) Date: Mon, 27 May 2024 10:09:08 GMT Subject: RFR: 8331193: Return references when possible in GrowableArray [v9] In-Reply-To: References: Message-ID: On Thu, 23 May 2024 08:28:14 GMT, Johan Sj?len wrote: >> Hi, >> >> This PR introduces the possibility of using references more often when using GrowableArray, where as previously this was only possible when using the `at()` method. This lets us avoid copying and redundant method calls and makes the API more streamlined. After the patch, we can use `at_grow` just like `at` works. The same goes for `top`, `first`, and `last`. >> >> >> Some example code: >> ```c++ >> // Before this patch this worked: >> GrowableArray arr(8,8,-1); // Pre-fill with 8 -1s >> int& x = arr.at(7); >> if (x == -1) { >> x = 2; >> } >> assert(arr.at(7) == 2, "this holds"); >> // but this was forbidden >> int& x = arr.at_grow(9, -1); // Compilation error! at_grow returns E, not E& >> // so we had to do >> int x = arr.at_grow(9, -1); >> if (x == -1) { >> arr.at_put(9, 2); >> } >> >> >> Thanks. > > Johan Sj?len has updated the pull request incrementally with two additional commits since the last revision: > > - Merge remote-tracking branch 'origin/return-reference' into return-reference > - Use references when using top() src/hotspot/share/utilities/growableArray.hpp line 173: > 171: E const& top() const { > 172: assert(_len > 0, "empty"); > 173: return _data[_len - 1]; Suggestion: return _data[_len - 1]; ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18975#discussion_r1615830435 From jsjolen at openjdk.org Mon May 27 10:12:18 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Mon, 27 May 2024 10:12:18 GMT Subject: RFR: 8312132: Add tracking of multiple address spaces in NMT [v105] In-Reply-To: References: Message-ID: On Sat, 25 May 2024 06:01:01 GMT, Thomas Stuefe wrote: >> That doesn't seem right to me. `ssize_t` has a guaranteed range of `[-1, INT_MAX)`, the -1 being there for errors. We need as full of a range of negative numbers as possible. >> >> Good question regarding 32-bit, will have to think about that one. >> >> Btw: Yes, I know, we can underflow or overflow the diff, but in practice no one will allocate `2**64` bytes, I am willing to take that risk. > > Hm. We use ssize_t in many places for working with memory deltas, It works on all our platforms. And I don't see a good alternative here. int64 is not a replacement. To me, int64 feels like using void* for pointers, obfuscating intent. Its a memory size, I'd like therefore to see that in code. Apart from the obvious 64/32 bit problem. > > One could probably use ptrdiff_t, but no pointers are involved, it seems awkward. > > One could use size_t+boolean tupels, and that may be the hypercorrect way. It would solve the problem of not being > >> Btw: Yes, I know, we can underflow or overflow the diff, but in practice no one will allocate 2**64 bytes, I am willing to take that risk. > > An overflow would occur at 2**63. On 32-bit, on 2**31. So, on 32-bit you cannot express data sizes >= 2GB correctly. Which seems to me like a real limit. > > Maybe a size_t+boolean tupel is the right way to go. You can express large ranges with `int64_t`. There's no problem on 32-bit systems, just a bit of extra instructions emitted and some stack spillage. Also gah, I was sure that I wrote 63 and not 64 :-). Why not just typedef the `int64_t` as `delta` and call it a day? >An overflow would occur at 263. On 32-bit, on 231. So, on 32-bit you cannot express data sizes >= 2GB correctly. Which seems to me like a real limit. If anything, we should switch the `position` from `size_t` to `uint64_t` in `VMATree`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18289#discussion_r1615833813 From mbaesken at openjdk.org Mon May 27 10:30:08 2024 From: mbaesken at openjdk.org (Matthias Baesken) Date: Mon, 27 May 2024 10:30:08 GMT Subject: RFR: 8332894: ubsan: vmError.cpp:2090:26: runtime error: division by zero [v2] In-Reply-To: <1SICcvVvxShSN5au4kjtezCp5qINLWtIbLt3_7e0QdQ=.23665d97-6320-46d9-bafc-19716b631b21@github.com> References: <5Dnql_PtTbZUQxDqrnZBxmkE0ztmxwtom04vQWG--Z0=.fbf93839-c55f-4f69-80d0-9b4bc6a44a12@github.com> <1SICcvVvxShSN5au4kjtezCp5qINLWtIbLt3_7e0QdQ=.23665d97-6320-46d9-bafc-19716b631b21@github.com> Message-ID: On Mon, 27 May 2024 07:09:26 GMT, Matthias Baesken wrote: >> When running with ubsan enabled on Linux x86_64, I get in the HS :tier1 tests this error : >> >> runtime/ErrorHandling/TestDwarf_dontCheckDecoder.jtr >> >> /jdk/src/hotspot/share/utilities/vmError.cpp:2090:26: runtime error: division by zero >> #0 0x7f16bc531f32 in crash_with_sigfpe /jdk/src/hotspot/share/utilities/vmError.cpp:2090 >> #1 0x7f16bc531f32 in VMError::controlled_crash(int) /jdk/src/hotspot/share/utilities/vmError.cpp:2137 >> #2 0x7f16bea2d8fd in JNI_CreateJavaVM_inner /jdk/src/hotspot/share/prims/jni.cpp:3621 >> #3 0x7f16bea2d8fd in JNI_CreateJavaVM /jdk/src/hotspot/share/prims/jni.cpp:3672 >> #4 0x7f16c5dbd0e5 in InitializeJVM /jdk/src/java.base/share/native/libjli/java.c:1550 >> #5 0x7f16c5dbd0e5 in JavaMain /jdk/src/java.base/share/native/libjli/java.c:491 >> #6 0x7f16c5dc6748 in ThreadJavaMain /jdk/src/java.base/unix/native/libjli/java_md.c:642 >> #7 0x7f16c5d756e9 in start_thread (/lib64/libpthread.so.0+0xa6e9) (BuildId: 2f8d3c2d0f4d7888c2598d2ff6356537f5708a73) >> #8 0x7f16c531550e in clone (/lib64/libc.so.6+0x11850e) (BuildId: f732026552f6adff988b338e92d466bc81a01c37) >> >> Reason is that we do a float division by zero to get a signal . This is desired by us so not really an error but ubsan cannot know this. >> So add an attribute to this function that it has undefined behavior. >> See https://clang.llvm.org/docs/UndefinedBehaviorSanitizer.html (division by zero) . "Floating point division by zero. This is undefined per the C and C++ standards" > > Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: > > adjust comment Thanks for the reviews ! ------------- PR Comment: https://git.openjdk.org/jdk/pull/19394#issuecomment-2133170947 From mbaesken at openjdk.org Mon May 27 10:30:08 2024 From: mbaesken at openjdk.org (Matthias Baesken) Date: Mon, 27 May 2024 10:30:08 GMT Subject: Integrated: 8332894: ubsan: vmError.cpp:2090:26: runtime error: division by zero In-Reply-To: <5Dnql_PtTbZUQxDqrnZBxmkE0ztmxwtom04vQWG--Z0=.fbf93839-c55f-4f69-80d0-9b4bc6a44a12@github.com> References: <5Dnql_PtTbZUQxDqrnZBxmkE0ztmxwtom04vQWG--Z0=.fbf93839-c55f-4f69-80d0-9b4bc6a44a12@github.com> Message-ID: On Fri, 24 May 2024 13:30:41 GMT, Matthias Baesken wrote: > When running with ubsan enabled on Linux x86_64, I get in the HS :tier1 tests this error : > > runtime/ErrorHandling/TestDwarf_dontCheckDecoder.jtr > > /jdk/src/hotspot/share/utilities/vmError.cpp:2090:26: runtime error: division by zero > #0 0x7f16bc531f32 in crash_with_sigfpe /jdk/src/hotspot/share/utilities/vmError.cpp:2090 > #1 0x7f16bc531f32 in VMError::controlled_crash(int) /jdk/src/hotspot/share/utilities/vmError.cpp:2137 > #2 0x7f16bea2d8fd in JNI_CreateJavaVM_inner /jdk/src/hotspot/share/prims/jni.cpp:3621 > #3 0x7f16bea2d8fd in JNI_CreateJavaVM /jdk/src/hotspot/share/prims/jni.cpp:3672 > #4 0x7f16c5dbd0e5 in InitializeJVM /jdk/src/java.base/share/native/libjli/java.c:1550 > #5 0x7f16c5dbd0e5 in JavaMain /jdk/src/java.base/share/native/libjli/java.c:491 > #6 0x7f16c5dc6748 in ThreadJavaMain /jdk/src/java.base/unix/native/libjli/java_md.c:642 > #7 0x7f16c5d756e9 in start_thread (/lib64/libpthread.so.0+0xa6e9) (BuildId: 2f8d3c2d0f4d7888c2598d2ff6356537f5708a73) > #8 0x7f16c531550e in clone (/lib64/libc.so.6+0x11850e) (BuildId: f732026552f6adff988b338e92d466bc81a01c37) > > Reason is that we do a float division by zero to get a signal . This is desired by us so not really an error but ubsan cannot know this. > So add an attribute to this function that it has undefined behavior. > See https://clang.llvm.org/docs/UndefinedBehaviorSanitizer.html (division by zero) . "Floating point division by zero. This is undefined per the C and C++ standards" This pull request has now been integrated. Changeset: 1b8dea4a Author: Matthias Baesken URL: https://git.openjdk.org/jdk/commit/1b8dea4a9288c1518dc501a58d806c7365ea68b3 Stats: 6 lines in 1 file changed: 5 ins; 0 del; 1 mod 8332894: ubsan: vmError.cpp:2090:26: runtime error: division by zero Reviewed-by: dholmes, stuefe ------------- PR: https://git.openjdk.org/jdk/pull/19394 From jsjolen at openjdk.org Mon May 27 10:33:18 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Mon, 27 May 2024 10:33:18 GMT Subject: RFR: 8312132: Add tracking of multiple address spaces in NMT [v105] In-Reply-To: References: Message-ID: On Fri, 24 May 2024 07:36:48 GMT, Thomas Stuefe wrote: >> src/hotspot/share/nmt/nmtNativeCallStackStorage.hpp line 119: >> >>> 117: } >>> 118: } >>> 119: }; >> >> Possibly for follow up RFE: I would like to see number of stacks in the NMT statistic (there is this statistic subcommand to the jcmd VM.native_memory). I also would like to see those statistics in the hs-err file. >> >> For example, if we ever decide to track larger stacks (which would make a lot of sense, 4 frames is really not much), we will see a logarithmic (?) increase in number of stacks. I would like to know those numbers. Note that I sometimes do that during investigations, and I have a RFE open somewhere to make the number of frames in stacks tunable with a VM options. > > Note: mid-term we should place *all* stacks in here, not just those for tracking ZGC. And replace all physical copies of stacks with StackIndex. >we will see a logarithmic (?) increase in number of stacks I'd expect it to be exponential. Assume each function in the stack trace has a branch `if (a) alloc_A() else alloc_B()`, that causes a doubling of total stack sequences stored. I believe that we have the advantage of common subsequences, in which case we can implement it as a trie. That may ruin the `StackIndex`strategy however. Thoughts for the future. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18289#discussion_r1615855611 From stuefe at openjdk.org Mon May 27 11:36:14 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Mon, 27 May 2024 11:36:14 GMT Subject: RFR: 8312132: Add tracking of multiple address spaces in NMT [v105] In-Reply-To: References: Message-ID: On Mon, 27 May 2024 10:09:38 GMT, Johan Sj?len wrote: >> Hm. We use ssize_t in many places for working with memory deltas, It works on all our platforms. And I don't see a good alternative here. int64 is not a replacement. To me, int64 feels like using void* for pointers, obfuscating intent. Its a memory size, I'd like therefore to see that in code. Apart from the obvious 64/32 bit problem. >> >> One could probably use ptrdiff_t, but no pointers are involved, it seems awkward. >> >> One could use size_t+boolean tupels, and that may be the hypercorrect way. It would solve the problem of not being >> >>> Btw: Yes, I know, we can underflow or overflow the diff, but in practice no one will allocate 2**64 bytes, I am willing to take that risk. >> >> An overflow would occur at 2**63. On 32-bit, on 2**31. So, on 32-bit you cannot express data sizes >= 2GB correctly. Which seems to me like a real limit. >> >> Maybe a size_t+boolean tupel is the right way to go. > > You can express large ranges with `int64_t`. There's no problem on 32-bit systems, just a bit of extra instructions emitted and some stack spillage. Also gah, I was sure that I wrote 63 and not 64 :-). Why not just typedef the `int64_t` as `delta` and call it a day? > >>An overflow would occur at 263. On 32-bit, on 231. So, on 32-bit you cannot express data sizes >= 2GB correctly. Which seems to me like a real limit. > > If anything, we should switch the `position` from `size_t` to `uint64_t` in `VMATree`. int64_t as delta_t would be a pragmatic compromise. On 32-bit, it allows to represent the full +-4gb. On 64-bit, nobody cares for deltas > 2^63. On 32-bit, the only annoying part then is casting to/from size_t. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18289#discussion_r1615923878 From jsjolen at openjdk.org Mon May 27 11:49:29 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Mon, 27 May 2024 11:49:29 GMT Subject: RFR: 8312132: Add tracking of multiple address spaces in NMT [v111] In-Reply-To: References: Message-ID: > Hi, > > This PR introduces a new abstraction to NMT, named `MemoryFileTracker`. Today, NMT does not track any memory outside of the virtual memory address space. This means that if you allocated memory in something such as a memory-backed file and use `mmap` to map into that memory, then you'll have trouble reporting this to NMT. This is the situation that ZGC is in, and that is what this patch attempts to fix. > > ## `MemoryFileTracker` > > The `MemoryFileTracker` adds the ability of adding new virtual memory address spaces to NMT and committing memory to these, the basic API is: > > ```c++ > static MemoryFile* make_device(const char* descriptive_name); > static void free_device(MemoryFile* device); > > static void allocate_memory(MemoryFile* device, size_t offset, size_t size, > MEMFLAGS flag, const NativeCallStack& stack); > static void free_memory(MemoryFile* device, size_t offset, size_t size); > > > It is easiest to see how this is used by looking at what ZGC's `ZNMT` class does: > > ```c++ > void ZNMT::reserve(zaddress_unsafe start, size_t size) { > MemTracker::record_virtual_memory_reserve((address)start, size, CALLER_PC, mtJavaHeap); > } > void ZNMT::commit(zoffset offset, size_t size) { > MemTracker::allocate_memory_in(ZNMT::_device, static_cast(offset), size, mtJavaHeap, CALLER_PC); > } > void ZNMT::uncommit(zoffset offset, size_t size) { > MemTracker::free_memory_in(ZNMT::_device, (size_t)offset, size); > } > > void ZNMT::map(zaddress_unsafe addr, size_t size, zoffset offset) { > // NMT doesn't track mappings at the moment. > } > void ZNMT::unmap(zaddress_unsafe addr, size_t size) { > // NMT doesn't track mappings at the moment. > } > > > As you can see, any mapping between reserved regions and device-allocated memory is not recorded in NMT. This means that in detailed mode you only get reserved regions printed for the reserved memory, the device-allocated memory is reported separately. When performing summary reporting any memory allocated via these devices is added to the corresponding `MEMFLAGS` as `committed` memory. > > This patch is also acting as a base on which we deploy multiple new backend ideas to NMT. These ideas are: > > 1. Implement VMA tracking using a balanced binary tree approach. Today's `VirtualMemoryTracker`'s usage of linked lists is slow and brittle, we'd like to move away from it. Our Treap-based approach in this patch gives a performance boost such that we see 25x better performance in a benchmark. The idea and draft of this... Johan Sj?len has updated the pull request incrementally with three additional commits since the last revision: - More extensive error reporting for broken trees during reporting - Rename report funs - Rename to table_size ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18289/files - new: https://git.openjdk.org/jdk/pull/18289/files/67626aca..16826181 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18289&range=110 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18289&range=109-110 Stats: 49 lines in 6 files changed: 32 ins; 0 del; 17 mod Patch: https://git.openjdk.org/jdk/pull/18289.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18289/head:pull/18289 PR: https://git.openjdk.org/jdk/pull/18289 From jsjolen at openjdk.org Mon May 27 12:05:45 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Mon, 27 May 2024 12:05:45 GMT Subject: RFR: 8312132: Add tracking of multiple address spaces in NMT [v112] In-Reply-To: References: Message-ID: > Hi, > > This PR introduces a new abstraction to NMT, named `MemoryFileTracker`. Today, NMT does not track any memory outside of the virtual memory address space. This means that if you allocated memory in something such as a memory-backed file and use `mmap` to map into that memory, then you'll have trouble reporting this to NMT. This is the situation that ZGC is in, and that is what this patch attempts to fix. > > ## `MemoryFileTracker` > > The `MemoryFileTracker` adds the ability of adding new virtual memory address spaces to NMT and committing memory to these, the basic API is: > > ```c++ > static MemoryFile* make_device(const char* descriptive_name); > static void free_device(MemoryFile* device); > > static void allocate_memory(MemoryFile* device, size_t offset, size_t size, > MEMFLAGS flag, const NativeCallStack& stack); > static void free_memory(MemoryFile* device, size_t offset, size_t size); > > > It is easiest to see how this is used by looking at what ZGC's `ZNMT` class does: > > ```c++ > void ZNMT::reserve(zaddress_unsafe start, size_t size) { > MemTracker::record_virtual_memory_reserve((address)start, size, CALLER_PC, mtJavaHeap); > } > void ZNMT::commit(zoffset offset, size_t size) { > MemTracker::allocate_memory_in(ZNMT::_device, static_cast(offset), size, mtJavaHeap, CALLER_PC); > } > void ZNMT::uncommit(zoffset offset, size_t size) { > MemTracker::free_memory_in(ZNMT::_device, (size_t)offset, size); > } > > void ZNMT::map(zaddress_unsafe addr, size_t size, zoffset offset) { > // NMT doesn't track mappings at the moment. > } > void ZNMT::unmap(zaddress_unsafe addr, size_t size) { > // NMT doesn't track mappings at the moment. > } > > > As you can see, any mapping between reserved regions and device-allocated memory is not recorded in NMT. This means that in detailed mode you only get reserved regions printed for the reserved memory, the device-allocated memory is reported separately. When performing summary reporting any memory allocated via these devices is added to the corresponding `MEMFLAGS` as `committed` memory. > > This patch is also acting as a base on which we deploy multiple new backend ideas to NMT. These ideas are: > > 1. Implement VMA tracking using a balanced binary tree approach. Today's `VirtualMemoryTracker`'s usage of linked lists is slow and brittle, we'd like to move away from it. Our Treap-based approach in this patch gives a performance boost such that we see 25x better performance in a benchmark. The idea and draft of this... Johan Sj?len has updated the pull request incrementally with two additional commits since the last revision: - Assert on the tracking level - Naming fixing ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18289/files - new: https://git.openjdk.org/jdk/pull/18289/files/16826181..58766b5b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18289&range=111 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18289&range=110-111 Stats: 4 lines in 2 files changed: 2 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/18289.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18289/head:pull/18289 PR: https://git.openjdk.org/jdk/pull/18289 From dholmes at openjdk.org Mon May 27 12:15:04 2024 From: dholmes at openjdk.org (David Holmes) Date: Mon, 27 May 2024 12:15:04 GMT Subject: RFR: 8332894: ubsan: vmError.cpp:2090:26: runtime error: division by zero [v2] In-Reply-To: References: <5Dnql_PtTbZUQxDqrnZBxmkE0ztmxwtom04vQWG--Z0=.fbf93839-c55f-4f69-80d0-9b4bc6a44a12@github.com> <2l2cPdYfocVcIFfNOjfkyKOF6W0_3JGDzZlUoRyWo38=.c6cb7e9c-e5b6-467c-a51a-312f2f5e6444@github.com> Message-ID: On Mon, 27 May 2024 07:49:00 GMT, Matthias Baesken wrote: >> macOS `raise` raises the signal to the process not the thread (per Posix requirements). > > Hi David, so the comment > > // OSX implements raise(sig) incorrectly so we need to > // explicitly target the current thread > > seems to be not correct, should we change it e.g. to your comment > > > // macOS raise raises the signal to the process not the thread (per Posix requirements) > // so we need to explicitly target the current thread The comment says raise is broken, it just doesn't say exactly how, though it is implied by the "we need to explicitly target the current thread". ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19394#discussion_r1615969855 From jsjolen at openjdk.org Mon May 27 12:30:47 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Mon, 27 May 2024 12:30:47 GMT Subject: RFR: 8312132: Add tracking of multiple address spaces in NMT [v113] In-Reply-To: References: Message-ID: > Hi, > > This PR introduces a new abstraction to NMT, named `MemoryFileTracker`. Today, NMT does not track any memory outside of the virtual memory address space. This means that if you allocated memory in something such as a memory-backed file and use `mmap` to map into that memory, then you'll have trouble reporting this to NMT. This is the situation that ZGC is in, and that is what this patch attempts to fix. > > ## `MemoryFileTracker` > > The `MemoryFileTracker` adds the ability of adding new virtual memory address spaces to NMT and committing memory to these, the basic API is: > > ```c++ > static MemoryFile* make_device(const char* descriptive_name); > static void free_device(MemoryFile* device); > > static void allocate_memory(MemoryFile* device, size_t offset, size_t size, > MEMFLAGS flag, const NativeCallStack& stack); > static void free_memory(MemoryFile* device, size_t offset, size_t size); > > > It is easiest to see how this is used by looking at what ZGC's `ZNMT` class does: > > ```c++ > void ZNMT::reserve(zaddress_unsafe start, size_t size) { > MemTracker::record_virtual_memory_reserve((address)start, size, CALLER_PC, mtJavaHeap); > } > void ZNMT::commit(zoffset offset, size_t size) { > MemTracker::allocate_memory_in(ZNMT::_device, static_cast(offset), size, mtJavaHeap, CALLER_PC); > } > void ZNMT::uncommit(zoffset offset, size_t size) { > MemTracker::free_memory_in(ZNMT::_device, (size_t)offset, size); > } > > void ZNMT::map(zaddress_unsafe addr, size_t size, zoffset offset) { > // NMT doesn't track mappings at the moment. > } > void ZNMT::unmap(zaddress_unsafe addr, size_t size) { > // NMT doesn't track mappings at the moment. > } > > > As you can see, any mapping between reserved regions and device-allocated memory is not recorded in NMT. This means that in detailed mode you only get reserved regions printed for the reserved memory, the device-allocated memory is reported separately. When performing summary reporting any memory allocated via these devices is added to the corresponding `MEMFLAGS` as `committed` memory. > > This patch is also acting as a base on which we deploy multiple new backend ideas to NMT. These ideas are: > > 1. Implement VMA tracking using a balanced binary tree approach. Today's `VirtualMemoryTracker`'s usage of linked lists is slow and brittle, we'd like to move away from it. Our Treap-based approach in this patch gives a performance boost such that we see 25x better performance in a benchmark. The idea and draft of this... Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: Include memtracker ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18289/files - new: https://git.openjdk.org/jdk/pull/18289/files/58766b5b..90b6f6ae Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18289&range=112 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18289&range=111-112 Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/18289.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18289/head:pull/18289 PR: https://git.openjdk.org/jdk/pull/18289 From stefank at openjdk.org Mon May 27 13:22:04 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Mon, 27 May 2024 13:22:04 GMT Subject: RFR: 8331539: [REDO] NMT: add/make a mandatory MEMFLAGS argument to family of os::reserve/commit/uncommit memory API [v4] In-Reply-To: References: <1i0PKv9mCusM6BZqXG8ULe0lRA2Nz2ix4aZHz9otNMM=.b9d2d151-883e-4cb6-be48-4ba45b49ed43@github.com> Message-ID: On Fri, 24 May 2024 13:46:15 GMT, Afshin Zafari wrote: >> This PR fixes the problems existed in the original PR (https://github.com/openjdk/jdk/pull/18745). There are two main fixes here: >> 1- `ReservedSpace` class is changed so that the `_flag` member never changes after it is set in ctor. Since reserving memory regions may go thru a try and fail sequence of reserve-release pairs, changing the `_flag` member at failed releases would lead to incorrect flags in subsequent reserves. >> Also, some assertion are added to the getters of a `ReservedSpace` to check if the region is successfully reserved. >> >> 2- In order to have adjacent regions with different flags, CDS reserves a (large) region `R` and then splits it into sub regions `R1` and `R2` (`R == <---R1---><--R2-->`). At release time, NMT tracks only `R` and ignores releasing `R1` and `R2`. This ignoring is problematic when a requested region `R` is size-aligned to `R1---R---R2` first and then the `R1` and `R2` are released (`chop_extra_memory` function is called for this). In this case, NMT ignores tracking `R1` and `R2` with false assumption that a containing `R` will be released. Therefore, `R1` and `R2` remain in the NMT reserved-regions-list and when a new reserve happens at that regions, NMT complains by raising an exception. >> >> Tests: >> mach5 tiers 1-5, {linux-x64, macosx-aarch64, windows-x64, linux-aarch64 } x {debug, non-debug} > > Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision: > > more fixes. Changes requested by stefank (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/19343#pullrequestreview-2080884905 From stefank at openjdk.org Mon May 27 13:22:04 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Mon, 27 May 2024 13:22:04 GMT Subject: RFR: 8331539: [REDO] NMT: add/make a mandatory MEMFLAGS argument to family of os::reserve/commit/uncommit memory API [v2] In-Reply-To: References: <1i0PKv9mCusM6BZqXG8ULe0lRA2Nz2ix4aZHz9otNMM=.b9d2d151-883e-4cb6-be48-4ba45b49ed43@github.com> <_M5SvhyN_E_8HUeamhiLJMp37flhjgTVE_X7t8jmPVc=.f86cbb23-9461-4013-83bf-d6b154b96cfd@github.com> <9XzKmn3xJvlbw4gz2vK_NZ6yOwfKB9VzHE6CBSz-73E=.dfaa5291-95b6-403d-b363-42131ebf4c4c@github.com> Message-ID: <8_W3gPFqX8RC7V2QvFSmKAOTEK4z6uHOf4NnA0RDp7A=.428d32d1-0624-48de-ac7b-0f1acc6a0a14@github.com> On Fri, 24 May 2024 11:48:37 GMT, Afshin Zafari wrote: >> Hmm. os::release_memory also calls `record_virtual_memory_release`, and then this code calls it again with a second ThreadCritical, but then it is called again with `extra_memory`. I still find this addition of `extra_memory` highly dubious. > > Some facts: > - `MemTracker::record_virtual_memory_release()` has no `ThreadCritical` internally and therefore should be called inside a critical section. > - When `os::release_memory()` returns, the `ThreadCritical` that is created there is destroyed and a new one should be created again here. > - Releasing a sub-region that flagged for CDS and is contained in a larger CDS region is ignored at `MemTracker::record_virtual_memory_release()`. It is a valid case due to the way that CDS reserves and/or releases regions. > - This exceptional case is notified to `MemTracker` by passing `true` as `extra_memory`. > - Inside `MemTracker`, the `extra_memory == true` is used in the places where the exceptional case should/would be addressed. There are a couple of "therefore", "should", and "is a valid case", above that makes it sound like this is the only way to implement things. My point is that I don't think it is, and I would like to see a step back where we try to think about a way to write this without adding these extra layers of code. With that said, Thomas is thinking about ways to change how we keep track of the NMT flags, so I hope that if we make those changes we could skip having to add the code here. >> The flags sent to the NMT subsystem is correct, but the flags recorded in the ReservedSpaces will be wrong, AFAIKT. You can probably verify that by adding asserts. > > If your comment refers only to these lines of code, they are already verified. Since, inside the split function, the sub-regions get the new flags and all the reserved and committed amounts are moved from the large region to the new ones. So, the accounting of memory is correct. > > FWIW, if we trace down the call at line 1346 of `total_space_rs = Metaspace::reserve_address_space_for_compressed_classes(total_range_size, false /* optimize_for_zero_base */);` the region may get different flags of `mtClass` or `mtMetaspace` based on the checked criteria down there. > > If you comment on all such cases, then I will double check for them and add assertion for. My point is that the `archive_space_rs` and `class_space_rs` can get the wrong flags assigned to them. The split functions don't change them. Right? I would like to see the code run through our testing with these checks: assert(archive_space_rs.nmt_flag() == mtClassShared, "Sanity"); assert(class_space_rs.nmt_flag() == mtClass, "Sanity"); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19343#discussion_r1616042193 PR Review Comment: https://git.openjdk.org/jdk/pull/19343#discussion_r1616028639 From kvn at openjdk.org Mon May 27 15:32:01 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 27 May 2024 15:32:01 GMT Subject: RFR: 8329958: jdk22 win x86 make fails: downcallLinker.cpp(36) redefinition In-Reply-To: References: Message-ID: On Mon, 27 May 2024 02:13:44 GMT, David Holmes wrote: > Trivial fix to add JNICALL to the function declaration. > > This will be backported to JDK 22. > > Testing: > - tier1 sanity builds > > Thanks Looks good. Can you remove "jdk22" from title? It is confusing since the build fail with latest JDK too. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19406#pullrequestreview-2081157647 From aph at openjdk.org Mon May 27 16:56:00 2024 From: aph at openjdk.org (Andrew Haley) Date: Mon, 27 May 2024 16:56:00 GMT Subject: RFR: 8331117: [PPC64] secondary_super_cache does not scale well In-Reply-To: References: Message-ID: On Fri, 24 May 2024 09:31:41 GMT, Martin Doerr wrote: > @theRealAph: It would be great if you could take a look and see if you can spot any bug. Especially, I wonder why `r_array_length` happens to be 0 in some cases, but x86 doesn't check. Why would it not be zero? Some classes don't have secondary super types. In addition, 12ns is very slow. I don't understand that. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19368#issuecomment-2133828862 From aph at openjdk.org Mon May 27 17:14:01 2024 From: aph at openjdk.org (Andrew Haley) Date: Mon, 27 May 2024 17:14:01 GMT Subject: RFR: 8331117: [PPC64] secondary_super_cache does not scale well In-Reply-To: References: Message-ID: <9sxDci-Wb9APcvhuEgjkpiQ2t5DnWD3pOHlOhyCBLLg=.c6efe9a2-234b-4125-9211-57016073c04a@github.com> On Thu, 23 May 2024 14:11:36 GMT, Martin Doerr wrote: > PPC64 implementation of [JDK-8180450](https://bugs.openjdk.org/browse/JDK-8180450). Please review! I noticed that `r_array_length` is sometimes 0 and I don't see code for that on x86. Any idea? How can we verify it? By comparing the performance using the micro benchmarks? Run all of tier1 with `-XX:+VerifySecondarySupers` ------------- PR Comment: https://git.openjdk.org/jdk/pull/19368#issuecomment-2133847120 From mdoerr at openjdk.org Mon May 27 17:17:01 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 27 May 2024 17:17:01 GMT Subject: RFR: 8331117: [PPC64] secondary_super_cache does not scale well In-Reply-To: <9sxDci-Wb9APcvhuEgjkpiQ2t5DnWD3pOHlOhyCBLLg=.c6efe9a2-234b-4125-9211-57016073c04a@github.com> References: <9sxDci-Wb9APcvhuEgjkpiQ2t5DnWD3pOHlOhyCBLLg=.c6efe9a2-234b-4125-9211-57016073c04a@github.com> Message-ID: On Mon, 27 May 2024 17:11:41 GMT, Andrew Haley wrote: >> PPC64 implementation of [JDK-8180450](https://bugs.openjdk.org/browse/JDK-8180450). Please review! >> I noticed that `r_array_length` is sometimes 0 and I don't see code for that on x86. Any idea? >> How can we verify it? By comparing the performance using the micro benchmarks? >> >> Micro benchmark results without patch (measured on Power10 with 2*8 hardware threads): >> >> Original >> SecondarySuperCacheHits: 13.033 ?(99.9%) 0.058 ns/op [Average] >> SecondarySuperCacheInterContention.test avgt 15 432.366 ? 8.364 ns/op >> SecondarySuperCacheInterContention.test:t1 avgt 15 432.310 ? 8.460 ns/op >> SecondarySuperCacheInterContention.test:t2 avgt 15 432.422 ? 10.819 ns/op >> SecondarySuperCacheIntraContention.test avgt 15 355.192 ? 3.597 ns/op >> SecondarySupersLookup.testNegative00 avgt 15 12.274 ? 0.026 ns/op >> SecondarySupersLookup.testNegative01 avgt 15 12.300 ? 0.039 ns/op >> SecondarySupersLookup.testNegative02 avgt 15 12.304 ? 0.034 ns/op >> SecondarySupersLookup.testNegative03 avgt 15 12.276 ? 0.050 ns/op >> SecondarySupersLookup.testNegative04 avgt 15 12.235 ? 0.044 ns/op >> SecondarySupersLookup.testNegative05 avgt 15 12.308 ? 0.156 ns/op >> SecondarySupersLookup.testNegative06 avgt 15 12.291 ? 0.048 ns/op >> SecondarySupersLookup.testNegative07 avgt 15 12.307 ? 0.052 ns/op >> SecondarySupersLookup.testNegative08 avgt 15 12.398 ? 0.075 ns/op >> SecondarySupersLookup.testNegative09 avgt 15 12.552 ? 0.122 ns/op >> SecondarySupersLookup.testNegative10 avgt 15 12.490 ? 0.083 ns/op >> SecondarySupersLookup.testNegative16 avgt 15 12.565 ? 0.092 ns/op >> SecondarySupersLookup.testNegative20 avgt 15 19.059 ? 0.958 ns/op >> SecondarySupersLookup.testNegative30 avgt 15 19.268 ? 0.124 ns/op >> SecondarySupersLookup.testNegative32 avgt 15 20.059 ? 0.114 ns/op >> SecondarySupersLookup.testNegative40 avgt 15 25.117 ? 0.368 ns/op >> SecondarySupersLookup.testNegative50 avgt 15 32.735 ? 0.359 ns/op >> SecondarySupersLookup.testNegative55 avgt 15 34.866 ? 0.152 ns/op >> SecondarySupersLookup.testNegative56 avgt 15 35.492 ? 0.276 ns/op >> SecondarySupersLookup.testNegative57 avgt 15 36.620 ? 0.334 ns/op >> SecondarySupersLookup.testNegative58 avgt 15 37.226 ? 0.180 ns/op >> SecondarySupersLookup.testNegative59 avgt 15 37.774 ? 0.241 ns/op >> SecondarySupersLookup.testNegative60 avgt 15 38.627 ? 1.451 ns/op >> SecondarySupersLookup.testNegative61 avgt 15 ... > >> PPC64 implementation of [JDK-8180450](https://bugs.openjdk.org/browse/JDK-8180450). Please review! I noticed that `r_array_length` is sometimes 0 and I don't see code for that on x86. Any idea? How can we verify it? By comparing the performance using the micro benchmarks? > > Run all of tier1 with `-XX:+VerifySecondarySupers` > > @theRealAph: It would be great if you could take a look and see if you can spot any bug. Especially, I wonder why `r_array_length` happens to be 0 in some cases, but x86 doesn't check. > > Why would it not be zero? Some classes don't have secondary super types. In addition, 12ns is very slow. I don't understand that. I had to check for `r_array_length >= 0` here: https://github.com/openjdk/jdk/pull/19368/files#diff-0f708565c9e138b8013165540634368334f5d1df2ba437e39696e9791440050dR2312 The x86 implementation doesn't do that and I wonder why. Doesn't it access stale memory (https://github.com/openjdk/jdk/blob/be1d374bc54d43aae3b3c1feace22d38fe2156b6/src/hotspot/cpu/x86/macroAssembler_x86.cpp#L4967)? ------------- PR Comment: https://git.openjdk.org/jdk/pull/19368#issuecomment-2133849808 From mdoerr at openjdk.org Mon May 27 18:49:01 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 27 May 2024 18:49:01 GMT Subject: RFR: 8331117: [PPC64] secondary_super_cache does not scale well In-Reply-To: References: Message-ID: On Mon, 27 May 2024 16:53:13 GMT, Andrew Haley wrote: > 12ns is very slow. I don't understand that. Right, it's surprisingly slow regardless if the patch is applied or not. My x86 machine is about 8x faster. The PPC64 machine isn't optimized for single thread performance. It's configured to use SMT8 (8 threads per core). I guess s390 will achieve better single thread performance. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19368#issuecomment-2133931855 From shade at openjdk.org Mon May 27 19:05:02 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 27 May 2024 19:05:02 GMT Subject: RFR: 8329958: jdk22 win x86 make fails: downcallLinker.cpp(36) redefinition In-Reply-To: References: Message-ID: On Mon, 27 May 2024 02:13:44 GMT, David Holmes wrote: > Trivial fix to add JNICALL to the function declaration. > > This will be backported to JDK 22. > > Testing: > - tier1 sanity builds > > Thanks Agreed with Vladimir's comment. Otherwise looks good. ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19406#pullrequestreview-2081375388 From kbarrett at openjdk.org Mon May 27 19:45:05 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Mon, 27 May 2024 19:45:05 GMT Subject: RFR: 8332894: ubsan: vmError.cpp:2090:26: runtime error: division by zero In-Reply-To: <78byblRrnZErliour4J6QZABWFIwihnpK6PImRmsVJI=.ded439ab-6548-40e6-a1b1-6b9162c71543@github.com> References: <5Dnql_PtTbZUQxDqrnZBxmkE0ztmxwtom04vQWG--Z0=.fbf93839-c55f-4f69-80d0-9b4bc6a44a12@github.com> <78byblRrnZErliour4J6QZABWFIwihnpK6PImRmsVJI=.ded439ab-6548-40e6-a1b1-6b9162c71543@github.com> Message-ID: On Sun, 26 May 2024 06:08:25 GMT, Thomas Stuefe wrote: > When this was written, the point was to raise a "real" SIGFPE. That matters because the behavior is subtly different from a real signal compared to one faked with raise (asynchronous vs synchronous). > > Among other things, this SIGFPE is used for regression testing https://bugs.openjdk.org/browse/JDK-8065895. [...] Thanks for the background info. In light of that, I think the approach of using an attribute to suppress the ubsan failure is good, and agree there needs to be some commentary added to explain what's going on. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19394#issuecomment-2133979645 From kbarrett at openjdk.org Mon May 27 20:31:05 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Mon, 27 May 2024 20:31:05 GMT Subject: RFR: 8332894: ubsan: vmError.cpp:2090:26: runtime error: division by zero [v2] In-Reply-To: References: <5Dnql_PtTbZUQxDqrnZBxmkE0ztmxwtom04vQWG--Z0=.fbf93839-c55f-4f69-80d0-9b4bc6a44a12@github.com> <2l2cPdYfocVcIFfNOjfkyKOF6W0_3JGDzZlUoRyWo38=.c6cb7e9c-e5b6-467c-a51a-312f2f5e6444@github.com> Message-ID: On Mon, 27 May 2024 12:12:39 GMT, David Holmes wrote: >> Hi David, so the comment >> >> // OSX implements raise(sig) incorrectly so we need to >> // explicitly target the current thread >> >> seems to be not correct, should we change it e.g. to your comment >> >> >> // macOS raise raises the signal to the process not the thread (per Posix requirements) >> // so we need to explicitly target the current thread > > The comment says raise is broken, it just doesn't say exactly how, though it is implied by the "we need to explicitly target the current thread". It's not "broken". OSX/darwin is BSD-derived, and does not always follow POSIX. https://developer.apple.com/library/archive/documentation/System/Conceptual/ManPages_iPhoneOS/man3/raise.3.html "The raise() function sends the signal sig to the current process." ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19394#discussion_r1616372058 From azafari at openjdk.org Mon May 27 21:28:01 2024 From: azafari at openjdk.org (Afshin Zafari) Date: Mon, 27 May 2024 21:28:01 GMT Subject: RFR: 8331539: [REDO] NMT: add/make a mandatory MEMFLAGS argument to family of os::reserve/commit/uncommit memory API [v2] In-Reply-To: <8_W3gPFqX8RC7V2QvFSmKAOTEK4z6uHOf4NnA0RDp7A=.428d32d1-0624-48de-ac7b-0f1acc6a0a14@github.com> References: <1i0PKv9mCusM6BZqXG8ULe0lRA2Nz2ix4aZHz9otNMM=.b9d2d151-883e-4cb6-be48-4ba45b49ed43@github.com> <_M5SvhyN_E_8HUeamhiLJMp37flhjgTVE_X7t8jmPVc=.f86cbb23-9461-4013-83bf-d6b154b96cfd@github.com> <9XzKmn3xJvlbw4gz2vK_NZ6yOwfKB9VzHE6CBSz-73E=.dfaa5291-95b6-403d-b363-42131ebf4c4c@github.com> <8_W3gPFqX8RC7V2QvFSmKAOTEK4z6uHOf4NnA0RDp7A=.428d32d1-0624-48de-ac7b-0f1acc6a0a14@github.com> Message-ID: On Mon, 27 May 2024 13:04:32 GMT, Stefan Karlsson wrote: >> If your comment refers only to these lines of code, they are already verified. Since, inside the split function, the sub-regions get the new flags and all the reserved and committed amounts are moved from the large region to the new ones. So, the accounting of memory is correct. >> >> FWIW, if we trace down the call at line 1346 of `total_space_rs = Metaspace::reserve_address_space_for_compressed_classes(total_range_size, false /* optimize_for_zero_base */);` the region may get different flags of `mtClass` or `mtMetaspace` based on the checked criteria down there. >> >> If you comment on all such cases, then I will double check for them and add assertion for. > > My point is that the `archive_space_rs` and `class_space_rs` can get the wrong flags assigned to them. The split functions don't change them. Right? > > I would like to see the code run through our testing with these checks: > > assert(archive_space_rs.nmt_flag() == mtClassShared, "Sanity"); > assert(class_space_rs.nmt_flag() == mtClass, "Sanity"); The call to `MemTracker::record_virtual_memory_split_reserved` at line 1364, takes two flags for the split parts. The corresponding regions in NMT take that flags. The sanity assertions will be added anyway. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19343#discussion_r1616398251 From dholmes at openjdk.org Mon May 27 21:58:04 2024 From: dholmes at openjdk.org (David Holmes) Date: Mon, 27 May 2024 21:58:04 GMT Subject: RFR: 8329958: Windows x86 build fails: downcallLinker.cpp(36) redefinition In-Reply-To: References: Message-ID: On Mon, 27 May 2024 02:13:44 GMT, David Holmes wrote: > Trivial fix to add JNICALL to the function declaration. > > This will be backported to JDK 22. > > Testing: > - tier1 sanity builds > > Thanks Thanks for the reviews. Title updated. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19406#issuecomment-2134079429 From dholmes at openjdk.org Mon May 27 21:58:04 2024 From: dholmes at openjdk.org (David Holmes) Date: Mon, 27 May 2024 21:58:04 GMT Subject: Integrated: 8329958: Windows x86 build fails: downcallLinker.cpp(36) redefinition In-Reply-To: References: Message-ID: On Mon, 27 May 2024 02:13:44 GMT, David Holmes wrote: > Trivial fix to add JNICALL to the function declaration. > > This will be backported to JDK 22. > > Testing: > - tier1 sanity builds > > Thanks This pull request has now been integrated. Changeset: 86eb5d9f Author: David Holmes URL: https://git.openjdk.org/jdk/commit/86eb5d9f3be30ff9df1318f18ab73c7129c978f6 Stats: 3 lines in 1 file changed: 1 ins; 0 del; 2 mod 8329958: Windows x86 build fails: downcallLinker.cpp(36) redefinition Reviewed-by: kvn, shade ------------- PR: https://git.openjdk.org/jdk/pull/19406 From aph at openjdk.org Mon May 27 22:07:01 2024 From: aph at openjdk.org (Andrew Haley) Date: Mon, 27 May 2024 22:07:01 GMT Subject: RFR: 8331117: [PPC64] secondary_super_cache does not scale well In-Reply-To: References: <9sxDci-Wb9APcvhuEgjkpiQ2t5DnWD3pOHlOhyCBLLg=.c6efe9a2-234b-4125-9211-57016073c04a@github.com> Message-ID: On Mon, 27 May 2024 17:14:34 GMT, Martin Doerr wrote: > > Why would it not be zero? Some classes don't have secondary super types. > > I had to check for `r_array_length >= 0` here: https://github.com/openjdk/jdk/pull/19368/files#diff-0f708565c9e138b8013165540634368334f5d1df2ba437e39696e9791440050dR2312 The x86 implementation doesn't do that and I wonder why. Doesn't it access stale memory, here? No, because we already checked that there must be something to look at before calling the slow path. Invariant: array_length == popcount(bitmap) > https://github.com/openjdk/jdk/blob/be1d374bc54d43aae3b3c1feace22d38fe2156b6/src/hotspot/cpu/x86/macroAssembler_x86.cpp#L4967 If there's a bit set in the bitmap, then there must be a corresponding entry in the array. If we get to the slow path there must be at least two bits set in the bitmap. Therefore, at this point, array_length >= 2. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19368#issuecomment-2134088392 From aph at openjdk.org Mon May 27 22:07:01 2024 From: aph at openjdk.org (Andrew Haley) Date: Mon, 27 May 2024 22:07:01 GMT Subject: RFR: 8331117: [PPC64] secondary_super_cache does not scale well In-Reply-To: References: Message-ID: <9r7YIjXZrqTi4r-iXlUHrvP-qmL6Kz2DvYylL_vjd4E=.3f3631a8-7dd7-4309-ae1e-0d884796b10f@github.com> On Thu, 23 May 2024 14:11:36 GMT, Martin Doerr wrote: > PPC64 implementation of [JDK-8180450](https://bugs.openjdk.org/browse/JDK-8180450). Please review! > I noticed that `r_array_length` is sometimes 0 and I don't see code for that on x86. Any idea? > How can we verify it? By comparing the performance using the micro benchmarks? > > Micro benchmark results without patch (measured on Power10 with 2*8 hardware threads): > > Original > SecondarySuperCacheHits: 13.033 ?(99.9%) 0.058 ns/op [Average] > SecondarySuperCacheInterContention.test avgt 15 432.366 ? 8.364 ns/op > SecondarySuperCacheInterContention.test:t1 avgt 15 432.310 ? 8.460 ns/op > SecondarySuperCacheInterContention.test:t2 avgt 15 432.422 ? 10.819 ns/op > SecondarySuperCacheIntraContention.test avgt 15 355.192 ? 3.597 ns/op > SecondarySupersLookup.testNegative00 avgt 15 12.274 ? 0.026 ns/op > SecondarySupersLookup.testNegative01 avgt 15 12.300 ? 0.039 ns/op > SecondarySupersLookup.testNegative02 avgt 15 12.304 ? 0.034 ns/op > SecondarySupersLookup.testNegative03 avgt 15 12.276 ? 0.050 ns/op > SecondarySupersLookup.testNegative04 avgt 15 12.235 ? 0.044 ns/op > SecondarySupersLookup.testNegative05 avgt 15 12.308 ? 0.156 ns/op > SecondarySupersLookup.testNegative06 avgt 15 12.291 ? 0.048 ns/op > SecondarySupersLookup.testNegative07 avgt 15 12.307 ? 0.052 ns/op > SecondarySupersLookup.testNegative08 avgt 15 12.398 ? 0.075 ns/op > SecondarySupersLookup.testNegative09 avgt 15 12.552 ? 0.122 ns/op > SecondarySupersLookup.testNegative10 avgt 15 12.490 ? 0.083 ns/op > SecondarySupersLookup.testNegative16 avgt 15 12.565 ? 0.092 ns/op > SecondarySupersLookup.testNegative20 avgt 15 19.059 ? 0.958 ns/op > SecondarySupersLookup.testNegative30 avgt 15 19.268 ? 0.124 ns/op > SecondarySupersLookup.testNegative32 avgt 15 20.059 ? 0.114 ns/op > SecondarySupersLookup.testNegative40 avgt 15 25.117 ? 0.368 ns/op > SecondarySupersLookup.testNegative50 avgt 15 32.735 ? 0.359 ns/op > SecondarySupersLookup.testNegative55 avgt 15 34.866 ? 0.152 ns/op > SecondarySupersLookup.testNegative56 avgt 15 35.492 ? 0.276 ns/op > SecondarySupersLookup.testNegative57 avgt 15 36.620 ? 0.334 ns/op > SecondarySupersLookup.testNegative58 avgt 15 37.226 ? 0.180 ns/op > SecondarySupersLookup.testNegative59 avgt 15 37.774 ? 0.241 ns/op > SecondarySupersLookup.testNegative60 avgt 15 38.627 ? 1.451 ns/op > SecondarySupersLookup.testNegative61 avgt 15 39.395 ? 0.249 ns/op > SecondarySupersLookup.testNegative62 avgt 15 ... `cmpdi(CCR0, r_bitmap, (bit + 1) & Klass::SECONDARY_SUPERS_TABLE_MASK);` Why is this a compare, not a bit test? ------------- PR Comment: https://git.openjdk.org/jdk/pull/19368#issuecomment-2134089608 From iklam at openjdk.org Tue May 28 05:37:05 2024 From: iklam at openjdk.org (Ioi Lam) Date: Tue, 28 May 2024 05:37:05 GMT Subject: RFR: 8332105: Exploded JDK does not include CDS In-Reply-To: References: Message-ID: <_nlJfozSSKixiGDELAYqHEfoiWzEwblrsM3ateLoFuw=.506c8ecc-27d8-40c2-ac7a-71f08790b03c@github.com> On Mon, 27 May 2024 09:13:25 GMT, Thomas Stuefe wrote: > > > Seems okay. This test should have had `requires vm.cds` anyway. > > > Just out of curiosity why is CDS not compatible with an exploded build? > > @dholmes-ora Thanks for the review. Honestly, I don't know. Maybe @iklam knows. > > > Isn't the exploded build supposed to be as fast as possible? I think that's why people use it, and it'd be a shame to allow anything, such as building a CDS arcive, to slow that process down. > > Sure, but not generating a CDS archive at build time and being unable to dump or use an archive at all are two different things. The exploded build has tens of thousands of class files. If any of them are modified, the CDS archive may no longer be valid. There's no quick way of checking that. That's why CDS doesn't support the exploded build (or any apps that load class files from a directory). ------------- PR Comment: https://git.openjdk.org/jdk/pull/19188#issuecomment-2134378616 From rehn at openjdk.org Tue May 28 06:58:05 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Tue, 28 May 2024 06:58:05 GMT Subject: RFR: 8332265: RISC-V: Materialize pointers faster by using a temp register [v7] In-Reply-To: References: Message-ID: <4yPCk16w-U3oI-jkPoMINpaETYFfUIkp79yc7FTndtk=.5e30bb34-59d6-40dc-9432-1515fc725e47@github.com> On Fri, 24 May 2024 11:27:07 GMT, Robbin Ehn wrote: > Updated change looks good. It would be nice to see how much this will benefit performance. Here are 'some' number, it still unclear if these actually are significant: BASELINE | movptr2 fop 3120 msec | 2811 msec =0.900962 h2 19156 msec | 17600 msec =0.918772 jython 24060 msec | 23343 msec =0.9702 luindex 3222 msec | 3226 msec =1.00124 lusearch 4383 msec | 4380 msec =0.999316 lusearch-fix 4096 msec | 4359 msec =1.06421 pmd 7417 msec | 7342 msec =0.989888 jython 24060 msec | 23343 msec =0.9702 fop(Xcomp) 3060 msec | 3058 msec =0.999346 h2(Xcomp) 38724 msec | 38717 msec =0.999819 jython(Xcomp) 29999 msec | 29694 msec =0.989833 luindex(Xcomp) 5259 msec | 5195 msec =0.98783 lusearch(Xcomp) 6364 msec | 6269 msec =0.985072 lusearch-fix(Xcomp) 6430 msec | 6534 msec =1.01617 pmd(Xcomp) 7360 msec | 6999 msec =0.950951 jython(Xcomp) 29999 msec | 29694 msec =0.989833 Avg:0.983353 Integrating later today! ------------- PR Comment: https://git.openjdk.org/jdk/pull/19246#issuecomment-2134472452 From stuefe at openjdk.org Tue May 28 07:00:02 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 28 May 2024 07:00:02 GMT Subject: RFR: 8332105: Exploded JDK does not include CDS In-Reply-To: <_nlJfozSSKixiGDELAYqHEfoiWzEwblrsM3ateLoFuw=.506c8ecc-27d8-40c2-ac7a-71f08790b03c@github.com> References: <_nlJfozSSKixiGDELAYqHEfoiWzEwblrsM3ateLoFuw=.506c8ecc-27d8-40c2-ac7a-71f08790b03c@github.com> Message-ID: On Tue, 28 May 2024 05:34:02 GMT, Ioi Lam wrote: > > > > Seems okay. This test should have had `requires vm.cds` anyway. > > > > Just out of curiosity why is CDS not compatible with an exploded build? > > > > > > @dholmes-ora Thanks for the review. Honestly, I don't know. Maybe @iklam knows. > > > Isn't the exploded build supposed to be as fast as possible? I think that's why people use it, and it'd be a shame to allow anything, such as building a CDS arcive, to slow that process down. > > > > > > Sure, but not generating a CDS archive at build time and being unable to dump or use an archive at all are two different things. > > The exploded build has tens of thousands of class files. If any of them are modified, the CDS archive may no longer be valid. There's no quick way of checking that. That's why CDS doesn't support the exploded build (or any apps that load class files from a directory). Ah, thank you for exlaining. That makes sense. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19188#issuecomment-2134477955 From epeter at openjdk.org Tue May 28 07:33:07 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 28 May 2024 07:33:07 GMT Subject: RFR: 8328181: C2: assert(MaxVectorSize >= 32) failed: vector length should be >= 32 [v2] In-Reply-To: References: Message-ID: <4s13KsZ8dnv_t_5AUyOFmjWsUwRJDtl0OjBGOMlmlRs=.bcf5f967-d70f-4879-bb16-2d1045a63fb4@github.com> On Mon, 8 Apr 2024 02:35:33 GMT, Jatin Bhateja wrote: >> This bug fix patch tightens the predication check for small constant length clear array pattern and relaxes associated feature checks. Modified few comments for clarity. >> >> Kindly review and approve. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Cleanup predicates. src/hotspot/cpu/x86/x86.ad line 1753: > 1751: } > 1752: break; > 1753: case Op_ClearArray: This seems problematic, and may lead to the regression in https://bugs.openjdk.org/browse/JDK-8332487 On non-AVX512 platforms, this is now always `true` instead of always `false`. Probably this was not intended, and you thought this way going to be default `false`? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18464#discussion_r1616735713 From tschatzl at openjdk.org Tue May 28 09:30:29 2024 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Tue, 28 May 2024 09:30:29 GMT Subject: RFR: 8332936: Test vmTestbase/metaspace/gc/watermark_70_80/TestDescription.java fails with no GC's recorded Message-ID: Hi all, please review this change to exclude the watermark tests from use with -Xcomp. The failures reported are related to -Xcomp triggering the wrong kind of garbage collection pauses (CodeCache related GCs instead of Metadata related GCs) the test then fails on. The proposed solution is to just disable the tests with -Xcomp: the tests are not related to compilation at all. Testing: local, gha Thanks, Thomas ------------- Commit messages: - 8332936 Changes: https://git.openjdk.org/jdk/pull/19421/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19421&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8332936 Stats: 4 lines in 4 files changed: 4 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/19421.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19421/head:pull/19421 PR: https://git.openjdk.org/jdk/pull/19421 From stefank at openjdk.org Tue May 28 09:38:01 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Tue, 28 May 2024 09:38:01 GMT Subject: RFR: 8332936: Test vmTestbase/metaspace/gc/watermark_70_80/TestDescription.java fails with no GC's recorded In-Reply-To: References: Message-ID: On Tue, 28 May 2024 09:25:29 GMT, Thomas Schatzl wrote: > Hi all, > > please review this change to exclude the watermark tests from use with -Xcomp. > > The failures reported are related to -Xcomp triggering the wrong kind of garbage collection pauses (CodeCache related GCs instead of Metadata related GCs) the test then fails on. > > The proposed solution is to just disable the tests with -Xcomp: the tests are not related to compilation at all. > > Testing: local, gha > > Thanks, > Thomas Marked as reviewed by stefank (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/19421#pullrequestreview-2082261241 From epeter at openjdk.org Tue May 28 09:50:09 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 28 May 2024 09:50:09 GMT Subject: RFR: 8328181: C2: assert(MaxVectorSize >= 32) failed: vector length should be >= 32 [v2] In-Reply-To: <4s13KsZ8dnv_t_5AUyOFmjWsUwRJDtl0OjBGOMlmlRs=.bcf5f967-d70f-4879-bb16-2d1045a63fb4@github.com> References: <4s13KsZ8dnv_t_5AUyOFmjWsUwRJDtl0OjBGOMlmlRs=.bcf5f967-d70f-4879-bb16-2d1045a63fb4@github.com> Message-ID: On Tue, 28 May 2024 07:30:54 GMT, Emanuel Peter wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> Cleanup predicates. > > src/hotspot/cpu/x86/x86.ad line 1753: > >> 1751: } >> 1752: break; >> 1753: case Op_ClearArray: > > This seems problematic, and may lead to the regression in https://bugs.openjdk.org/browse/JDK-8332487 > > On non-AVX512 platforms, this is now always `true` instead of always `false`. Probably this was not intended, and you thought this way going to be default `false`? I don't understand what you are implying. Are you saying this is not the reason for the regression? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18464#discussion_r1616928926 From jbhateja at openjdk.org Tue May 28 10:12:15 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 28 May 2024 10:12:15 GMT Subject: RFR: 8328181: C2: assert(MaxVectorSize >= 32) failed: vector length should be >= 32 [v2] In-Reply-To: References: <4s13KsZ8dnv_t_5AUyOFmjWsUwRJDtl0OjBGOMlmlRs=.bcf5f967-d70f-4879-bb16-2d1045a63fb4@github.com> Message-ID: On Tue, 28 May 2024 09:47:08 GMT, Emanuel Peter wrote: >> src/hotspot/cpu/x86/x86.ad line 1753: >> >>> 1751: } >>> 1752: break; >>> 1753: case Op_ClearArray: >> >> This seems problematic, and may lead to the regression in https://bugs.openjdk.org/browse/JDK-8332487 >> >> On non-AVX512 platforms, this is now always `true` instead of always `false`. Probably this was not intended, and you thought this way going to be default `false`? > > I don't understand what you are implying. Are you saying this is not the reason for the regression? Yes, this can cause regression since now on non-AVX512 targets compiler may not emit StoreL based instruction sequence and select one of the clear array pattern based on target feature checks. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18464#discussion_r1616958675 From rehn at openjdk.org Tue May 28 12:57:11 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Tue, 28 May 2024 12:57:11 GMT Subject: Integrated: 8332265: RISC-V: Materialize pointers faster by using a temp register In-Reply-To: References: Message-ID: On Wed, 15 May 2024 09:34:11 GMT, Robbin Ehn wrote: > Hi, please consider! > > Materializing a 48-bit pointer, using an additional register, we can do with: > lui + lui + slli + add + addi > This 15% faster both on VF2 and in CPU models, compared to movptr(). > > As we often materialize during calls there is free registers. > > I have choose just a few spot to use it, many more can use. > E.g. la() with tmp register can use li48 instead of movptr. > > Running tests now (so far so good), as if I screwed up IC calls it should be seen fast. > And benchmarks when hardware is free. This pull request has now been integrated. Changeset: 7b52d0ac Author: Robbin Ehn URL: https://git.openjdk.org/jdk/commit/7b52d0acfc7d6083b407efa0877c139e9837f86b Stats: 212 lines in 8 files changed: 123 ins; 13 del; 76 mod 8332265: RISC-V: Materialize pointers faster by using a temp register Reviewed-by: fyang, luhenry, mli ------------- PR: https://git.openjdk.org/jdk/pull/19246 From jsjolen at openjdk.org Tue May 28 13:00:20 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Tue, 28 May 2024 13:00:20 GMT Subject: RFR: 8331193: Return references when possible in GrowableArray [v9] In-Reply-To: References: Message-ID: On Mon, 27 May 2024 10:06:44 GMT, Andrey Turbanov wrote: >> Johan Sj?len has updated the pull request incrementally with two additional commits since the last revision: >> >> - Merge remote-tracking branch 'origin/return-reference' into return-reference >> - Use references when using top() > > src/hotspot/share/utilities/growableArray.hpp line 173: > >> 171: E const& top() const { >> 172: assert(_len > 0, "empty"); >> 173: return _data[_len - 1]; > > Suggestion: > > return _data[_len - 1]; Oof, thank you. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18975#discussion_r1617199467 From jsjolen at openjdk.org Tue May 28 13:00:20 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Tue, 28 May 2024 13:00:20 GMT Subject: RFR: 8331193: Return references when possible in GrowableArray [v10] In-Reply-To: References: Message-ID: <25-6a8FYbAADZJKCUbkuAAJfnpfIog7W4G1zn72athM=.3d8392ef-2f8c-4180-afad-e8c8d9e927ad@github.com> > Hi, > > This PR introduces the possibility of using references more often when using GrowableArray, where as previously this was only possible when using the `at()` method. This lets us avoid copying and redundant method calls and makes the API more streamlined. After the patch, we can use `at_grow` just like `at` works. The same goes for `top`, `first`, and `last`. > > > Some example code: > ```c++ > // Before this patch this worked: > GrowableArray arr(8,8,-1); // Pre-fill with 8 -1s > int& x = arr.at(7); > if (x == -1) { > x = 2; > } > assert(arr.at(7) == 2, "this holds"); > // but this was forbidden > int& x = arr.at_grow(9, -1); // Compilation error! at_grow returns E, not E& > // so we had to do > int x = arr.at_grow(9, -1); > if (x == -1) { > arr.at_put(9, 2); > } > > > Thanks. Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: Update src/hotspot/share/utilities/growableArray.hpp Co-authored-by: Andrey Turbanov ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18975/files - new: https://git.openjdk.org/jdk/pull/18975/files/ff269e39..210d430d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18975&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18975&range=08-09 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/18975.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18975/head:pull/18975 PR: https://git.openjdk.org/jdk/pull/18975 From mdoerr at openjdk.org Tue May 28 13:08:13 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 28 May 2024 13:08:13 GMT Subject: RFR: 8331117: [PPC64] secondary_super_cache does not scale well [v2] In-Reply-To: References: Message-ID: > PPC64 implementation of [JDK-8180450](https://bugs.openjdk.org/browse/JDK-8180450). Please review! > I noticed that `r_array_length` is sometimes 0 and I don't see code for that on x86. Any idea? > How can we verify it? By comparing the performance using the micro benchmarks? > > Micro benchmark results without patch (measured on Power10 with 2*8 hardware threads): > > Original > SecondarySuperCacheHits: 13.033 ?(99.9%) 0.058 ns/op [Average] > SecondarySuperCacheInterContention.test avgt 15 432.366 ? 8.364 ns/op > SecondarySuperCacheInterContention.test:t1 avgt 15 432.310 ? 8.460 ns/op > SecondarySuperCacheInterContention.test:t2 avgt 15 432.422 ? 10.819 ns/op > SecondarySuperCacheIntraContention.test avgt 15 355.192 ? 3.597 ns/op > SecondarySupersLookup.testNegative00 avgt 15 12.274 ? 0.026 ns/op > SecondarySupersLookup.testNegative01 avgt 15 12.300 ? 0.039 ns/op > SecondarySupersLookup.testNegative02 avgt 15 12.304 ? 0.034 ns/op > SecondarySupersLookup.testNegative03 avgt 15 12.276 ? 0.050 ns/op > SecondarySupersLookup.testNegative04 avgt 15 12.235 ? 0.044 ns/op > SecondarySupersLookup.testNegative05 avgt 15 12.308 ? 0.156 ns/op > SecondarySupersLookup.testNegative06 avgt 15 12.291 ? 0.048 ns/op > SecondarySupersLookup.testNegative07 avgt 15 12.307 ? 0.052 ns/op > SecondarySupersLookup.testNegative08 avgt 15 12.398 ? 0.075 ns/op > SecondarySupersLookup.testNegative09 avgt 15 12.552 ? 0.122 ns/op > SecondarySupersLookup.testNegative10 avgt 15 12.490 ? 0.083 ns/op > SecondarySupersLookup.testNegative16 avgt 15 12.565 ? 0.092 ns/op > SecondarySupersLookup.testNegative20 avgt 15 19.059 ? 0.958 ns/op > SecondarySupersLookup.testNegative30 avgt 15 19.268 ? 0.124 ns/op > SecondarySupersLookup.testNegative32 avgt 15 20.059 ? 0.114 ns/op > SecondarySupersLookup.testNegative40 avgt 15 25.117 ? 0.368 ns/op > SecondarySupersLookup.testNegative50 avgt 15 32.735 ? 0.359 ns/op > SecondarySupersLookup.testNegative55 avgt 15 34.866 ? 0.152 ns/op > SecondarySupersLookup.testNegative56 avgt 15 35.492 ? 0.276 ns/op > SecondarySupersLookup.testNegative57 avgt 15 36.620 ? 0.334 ns/op > SecondarySupersLookup.testNegative58 avgt 15 37.226 ? 0.180 ns/op > SecondarySupersLookup.testNegative59 avgt 15 37.774 ? 0.241 ns/op > SecondarySupersLookup.testNegative60 avgt 15 38.627 ? 1.451 ns/op > SecondarySupersLookup.testNegative61 avgt 15 39.395 ? 0.249 ns/op > SecondarySupersLookup.testNegative62 avgt 15 ... Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: Fix bit test and add assertion for array lenght. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19368/files - new: https://git.openjdk.org/jdk/pull/19368/files/a5208a72..6753375e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19368&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19368&range=00-01 Stats: 12 lines in 1 file changed: 7 ins; 1 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/19368.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19368/head:pull/19368 PR: https://git.openjdk.org/jdk/pull/19368 From mdoerr at openjdk.org Tue May 28 13:08:13 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 28 May 2024 13:08:13 GMT Subject: RFR: 8331117: [PPC64] secondary_super_cache does not scale well In-Reply-To: References: Message-ID: On Thu, 23 May 2024 14:11:36 GMT, Martin Doerr wrote: > PPC64 implementation of [JDK-8180450](https://bugs.openjdk.org/browse/JDK-8180450). Please review! > I noticed that `r_array_length` is sometimes 0 and I don't see code for that on x86. Any idea? > How can we verify it? By comparing the performance using the micro benchmarks? > > Micro benchmark results without patch (measured on Power10 with 2*8 hardware threads): > > Original > SecondarySuperCacheHits: 13.033 ?(99.9%) 0.058 ns/op [Average] > SecondarySuperCacheInterContention.test avgt 15 432.366 ? 8.364 ns/op > SecondarySuperCacheInterContention.test:t1 avgt 15 432.310 ? 8.460 ns/op > SecondarySuperCacheInterContention.test:t2 avgt 15 432.422 ? 10.819 ns/op > SecondarySuperCacheIntraContention.test avgt 15 355.192 ? 3.597 ns/op > SecondarySupersLookup.testNegative00 avgt 15 12.274 ? 0.026 ns/op > SecondarySupersLookup.testNegative01 avgt 15 12.300 ? 0.039 ns/op > SecondarySupersLookup.testNegative02 avgt 15 12.304 ? 0.034 ns/op > SecondarySupersLookup.testNegative03 avgt 15 12.276 ? 0.050 ns/op > SecondarySupersLookup.testNegative04 avgt 15 12.235 ? 0.044 ns/op > SecondarySupersLookup.testNegative05 avgt 15 12.308 ? 0.156 ns/op > SecondarySupersLookup.testNegative06 avgt 15 12.291 ? 0.048 ns/op > SecondarySupersLookup.testNegative07 avgt 15 12.307 ? 0.052 ns/op > SecondarySupersLookup.testNegative08 avgt 15 12.398 ? 0.075 ns/op > SecondarySupersLookup.testNegative09 avgt 15 12.552 ? 0.122 ns/op > SecondarySupersLookup.testNegative10 avgt 15 12.490 ? 0.083 ns/op > SecondarySupersLookup.testNegative16 avgt 15 12.565 ? 0.092 ns/op > SecondarySupersLookup.testNegative20 avgt 15 19.059 ? 0.958 ns/op > SecondarySupersLookup.testNegative30 avgt 15 19.268 ? 0.124 ns/op > SecondarySupersLookup.testNegative32 avgt 15 20.059 ? 0.114 ns/op > SecondarySupersLookup.testNegative40 avgt 15 25.117 ? 0.368 ns/op > SecondarySupersLookup.testNegative50 avgt 15 32.735 ? 0.359 ns/op > SecondarySupersLookup.testNegative55 avgt 15 34.866 ? 0.152 ns/op > SecondarySupersLookup.testNegative56 avgt 15 35.492 ? 0.276 ns/op > SecondarySupersLookup.testNegative57 avgt 15 36.620 ? 0.334 ns/op > SecondarySupersLookup.testNegative58 avgt 15 37.226 ? 0.180 ns/op > SecondarySupersLookup.testNegative59 avgt 15 37.774 ? 0.241 ns/op > SecondarySupersLookup.testNegative60 avgt 15 38.627 ? 1.451 ns/op > SecondarySupersLookup.testNegative61 avgt 15 39.395 ? 0.249 ns/op > SecondarySupersLookup.testNegative62 avgt 15 ... Thank you so much for finding my bug! This explains why I got array_lenght 0. Fixed and added assertion (see 2nd commit). ------------- PR Comment: https://git.openjdk.org/jdk/pull/19368#issuecomment-2135170515 From jsjolen at openjdk.org Tue May 28 13:09:21 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Tue, 28 May 2024 13:09:21 GMT Subject: RFR: 8312132: Add tracking of multiple address spaces in NMT [v105] In-Reply-To: References: Message-ID: <3Nhm9zU8m07QkS0yKiJyzNaMNxiIS1pBSIuAvYSDhIs=.4f6db9d6-0a24-496c-bbfd-aff240cff369@github.com> On Fri, 24 May 2024 07:55:31 GMT, Thomas Stuefe wrote: >> Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: >> >> Lower number of pages > > src/hotspot/share/nmt/memoryFileTracker.cpp line 72: > >> 70: return; >> 71: } >> 72: assert(prev->val().out.type() == current->val().in.type(), "must be"); > > Slight modification, since I expect we will stare at the output of this function to analyse broken trees. Please keep record of "brokenness" and assert at the end only. And print out the current number of the mapping, too. Then, on assert, print out "tree broken first at record XXX". I had a go at this. Could you take a look and see if it seems reasonable? > test/hotspot/gtest/nmt/test_nmt_memoryfiletracker.cpp line 53: > >> 51: TEST_VM_F(MemoryFileTrackerTest, Basics) { >> 52: this->basics(); >> 53: } > > Curious, just a question. You like using fixture classes even if not necessary. Why not write the test directly into a TEST_VM ? Negligible cost, more often than not it turns out that I need the fixture anyway. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18289#discussion_r1617214510 PR Review Comment: https://git.openjdk.org/jdk/pull/18289#discussion_r1617209949 From jsjolen at openjdk.org Tue May 28 13:18:19 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Tue, 28 May 2024 13:18:19 GMT Subject: RFR: 8312132: Add tracking of multiple address spaces in NMT [v49] In-Reply-To: References: <1cKD_eCdTb8AmNQwA9T4GFK0xu_CjJeABePgatn8xSY=.ec58f99d-bcd6-4e92-87a4-d1e49d33f4af@github.com> Message-ID: On Fri, 26 Apr 2024 17:42:52 GMT, Gerard Ziemski wrote: >> We could invert the relationship such that the outer class is the `AllStatic` class and the inner class is the allocatable class. I'll look into it at a later stage as it's all a big renaming. >> >> Personally, I don't mind the `::Instance` nomenclature to indicate that "this is the global instance that we're accessing". As long as we keep away from static, global singletons that we can't make many instances like VirtualMemoryTracker and MallocTracker are written, I'm a happy goose. > > I agree with you actually, but the `Instance` jumps out at me and makes me wonder why we decided to use it, compared to the others, that are happy to be static classes. > > We should have it all done same way. If you like to use `Instance`, then `VirtualMemoryTracker` and `MallocTracker` should use one as well (at some point later). I'm taking this to mean that we can later transition to the `Instance` pattern in the future for the rest of the classes, which I do think is the right way to go. Static classes generally make our code more difficult to test. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18289#discussion_r1617227778 From jsjolen at openjdk.org Tue May 28 13:27:16 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Tue, 28 May 2024 13:27:16 GMT Subject: RFR: 8312132: Add tracking of multiple address spaces in NMT [v105] In-Reply-To: References: Message-ID: On Fri, 24 May 2024 06:13:54 GMT, Thomas Stuefe wrote: >>> We claim that: >>> >>> > Our Treap-based approach in this patch gives a performance boost such that we see 25x better performance in a benchmark. >>> >>> May I ask how you ran it? I would like to be able to reproduce our claim. >> >> Sure, it was a while since I ran the benchmark. You're going to have to do a bit of work here, to get it working. >> >> You take this file: https://github.com/tstuefe/jdk/blob/6be830cd2e90a009effb016fbda2e92e1fca8247/test/hotspot/gtest/nmt/test_nmtvmadict.cpp#L1 >> >> And you port it to the VMATree instead of VMADict (or whatever it's called). Then you run it and look at output. You could also take one of the stress tests that I made, remove the verification calls, and run the same stress test for VirtualMemoryTracker. > >> > We claim that: >> > > Our Treap-based approach in this patch gives a performance boost such that we see 25x better performance in a benchmark. >> > >> > >> > May I ask how you ran it? I would like to be able to reproduce our claim. >> >> Sure, it was a while since I ran the benchmark. You're going to have to do a bit of work here, to get it working. >> >> You take this file: https://github.com/tstuefe/jdk/blob/6be830cd2e90a009effb016fbda2e92e1fca8247/test/hotspot/gtest/nmt/test_nmtvmadict.cpp#L1 >> >> And you port it to the VMATree instead of VMADict (or whatever it's called). Then you run it and look at output. You could also take one of the stress tests that I made, remove the verification calls, and run the same stress test for VirtualMemoryTracker. > > The claim makes also sense if you think about it. A binary tree will always grossly outperform a linked list for sorted insert/delete. Hi @tstuefe, @gerard-ziemski, @afshin-zafari What do we think is necessary to have this PR merged in? Right now, I know that Thomas has some gripes with the private/public API and visibility. I agree, it can be cleaned up, but can't this wait until after the PR is merged? I believe that there are multiple small clean ups and fixes that gets rid of some ugliness, but the actual functionality of this PR is over all well-tested. I see the following points as needing attention before merging: 1. NativeCallStackStorage -- needs some testing for both summary and detailed mode. *Maybe* get the `bool is_detailed` out of there, but to me this is optional, it receives the info from `MemTracker` anyway, just through the constructor. 2. The locking and reporting mechanisms. Is locking the MemoryFileTracker structures for the duration of the JCMD call acceptable? This means potential stalling of the VM, no? 3. Run through some better/deeper testing than just GHA Is there anything that I am missing? This will have limited rollout to the subset of users using both ZGC and NMT. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18289#issuecomment-2135213186 From stuefe at openjdk.org Tue May 28 13:40:27 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 28 May 2024 13:40:27 GMT Subject: RFR: 8333047: Remove arena-size-workaround in jvmtiUtils.cpp Message-ID: In `JvmtiUtil::single_threaded_resource_area()`, we create a resource area that is supposed to work even if the current thread is not attached yet and there is no associated Thread or the Thread has no valid ResourceArea. It contains a workaround: // lazily create the single threaded resource area // pick a size which is not a standard since the pools don't exist yet _single_threaded_resource_area = new (mtInternal) ResourceArea(Chunk::non_pool_size); It specifies a non-standard chunk size to circumvent the chunk-pool-based allocation in the RA constructor, ensuring that only malloc is used. This is because in the old days the ChunkPools had been allocated from C-Heap and there was a time window when no chunk pools were live yet. This is quirky and a bit ugly. It is also unnecessary since [JDK-8272112](https://bugs.openjdk.org/browse/JDK-8272112) (since JDK 18). We now create chunk pools as global objects, so they are live as soon as the libjvm C++ initialization ran. We can remove this workaround and the comment. --- Tests: GHAs. I also manually called this function, and allocated from the resulting ResourceArea, at the very beginning of CreateJavaVM. I made sure that both allocations and follow-up-chunk-allocation worked even this early in VM life. ------------- Commit messages: - copyrights - remove non_pool_size - start Changes: https://git.openjdk.org/jdk/pull/19425/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19425&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8333047 Stats: 8 lines in 3 files changed: 0 ins; 3 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/19425.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19425/head:pull/19425 PR: https://git.openjdk.org/jdk/pull/19425 From mdoerr at openjdk.org Tue May 28 14:07:02 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 28 May 2024 14:07:02 GMT Subject: RFR: 8331117: [PPC64] secondary_super_cache does not scale well [v2] In-Reply-To: References: Message-ID: On Tue, 28 May 2024 13:08:13 GMT, Martin Doerr wrote: >> PPC64 implementation of [JDK-8180450](https://bugs.openjdk.org/browse/JDK-8180450). Please review! >> I noticed that `r_array_length` is sometimes 0 and I don't see code for that on x86. Any idea? >> How can we verify it? By comparing the performance using the micro benchmarks? >> >> Micro benchmark results without patch (measured on Power10 with 2*8 hardware threads): >> >> Original >> SecondarySuperCacheHits: 13.033 ?(99.9%) 0.058 ns/op [Average] >> SecondarySuperCacheInterContention.test avgt 15 432.366 ? 8.364 ns/op >> SecondarySuperCacheInterContention.test:t1 avgt 15 432.310 ? 8.460 ns/op >> SecondarySuperCacheInterContention.test:t2 avgt 15 432.422 ? 10.819 ns/op >> SecondarySuperCacheIntraContention.test avgt 15 355.192 ? 3.597 ns/op >> SecondarySupersLookup.testNegative00 avgt 15 12.274 ? 0.026 ns/op >> SecondarySupersLookup.testNegative01 avgt 15 12.300 ? 0.039 ns/op >> SecondarySupersLookup.testNegative02 avgt 15 12.304 ? 0.034 ns/op >> SecondarySupersLookup.testNegative03 avgt 15 12.276 ? 0.050 ns/op >> SecondarySupersLookup.testNegative04 avgt 15 12.235 ? 0.044 ns/op >> SecondarySupersLookup.testNegative05 avgt 15 12.308 ? 0.156 ns/op >> SecondarySupersLookup.testNegative06 avgt 15 12.291 ? 0.048 ns/op >> SecondarySupersLookup.testNegative07 avgt 15 12.307 ? 0.052 ns/op >> SecondarySupersLookup.testNegative08 avgt 15 12.398 ? 0.075 ns/op >> SecondarySupersLookup.testNegative09 avgt 15 12.552 ? 0.122 ns/op >> SecondarySupersLookup.testNegative10 avgt 15 12.490 ? 0.083 ns/op >> SecondarySupersLookup.testNegative16 avgt 15 12.565 ? 0.092 ns/op >> SecondarySupersLookup.testNegative20 avgt 15 19.059 ? 0.958 ns/op >> SecondarySupersLookup.testNegative30 avgt 15 19.268 ? 0.124 ns/op >> SecondarySupersLookup.testNegative32 avgt 15 20.059 ? 0.114 ns/op >> SecondarySupersLookup.testNegative40 avgt 15 25.117 ? 0.368 ns/op >> SecondarySupersLookup.testNegative50 avgt 15 32.735 ? 0.359 ns/op >> SecondarySupersLookup.testNegative55 avgt 15 34.866 ? 0.152 ns/op >> SecondarySupersLookup.testNegative56 avgt 15 35.492 ? 0.276 ns/op >> SecondarySupersLookup.testNegative57 avgt 15 36.620 ? 0.334 ns/op >> SecondarySupersLookup.testNegative58 avgt 15 37.226 ? 0.180 ns/op >> SecondarySupersLookup.testNegative59 avgt 15 37.774 ? 0.241 ns/op >> SecondarySupersLookup.testNegative60 avgt 15 38.627 ? 1.451 ns/op >> SecondarySupersLookup.testNegative61 avgt 15 ... > > Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: > > Fix bit test and add assertion for array lenght. Performance seems to be not affected by that bug. Note that I have used https://github.com/openjdk/jdk/pull/19427 to run TypePollution micro benchmarks. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19368#issuecomment-2135302123 From matsaave at openjdk.org Tue May 28 14:42:07 2024 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Tue, 28 May 2024 14:42:07 GMT Subject: RFR: 8293980: Resolve CONSTANT_FieldRef at CDS dump time [v3] In-Reply-To: References: <6tYcoQdH8mEhbBRUoAcOi3Gue7Lz9qIjJh3GpcLKGGE=.0b286773-cc03-434e-88ca-2c9cc5efce67@github.com> Message-ID: On Sat, 25 May 2024 06:48:26 GMT, Ioi Lam wrote: >> ### Overview >> >> This PR archives `CONSTANT_FieldRef` entries in the _resolved_ state when it's safe to do so. >> >> I.e., when a `CONSTANT_FieldRef` constant pool entry in class `A` refers to a *non-static* field `B.F`, >> - `B` is the same class as `A`; or >> - `B` is a supertype of `A`; or >> - `B` is one of the [vmClasses](https://github.com/openjdk/jdk/blob/3d4185a9ce482cc655a4c67f39cb2682b02ae4fe/src/hotspot/share/classfile/vmClasses.hpp), and `A` is loaded by the boot class loader. >> >> Under these conditions, it's guaranteed that whenever `A` tries to use this entry at runtime, `B` is guaranteed to have already been resolved in A's system dictionary, to the same value as resolved during dump time. >> >> Therefore, we can safely archive the `ResolvedFieldEntry` in class `A` that refers to `B.F`. >> >> (Note that we do not archive the `CONSTANT_FieldRef` entries for static fields, as the resolution of such entries can lead to class initialization at runtime. We plan to handle them in a future RFE.) >> >> ### Static CDS Archive >> >> This feature is implemented in three steps for static CDS archive dump: >> >> 1. At the end of the training run, `ClassListWriter` iterates over all loaded classes and writes the indices of their resolved `Class` and `FieldRef` constant pool entries into the classlist file, with the `@cp` prefix. E.g., the following means that the constant pool entries at indices 2, 19 and 106 were resolved during the training run: >> >> @cp java/util/Objects 2 19 106 >> >> 2. When creating the static CDS archive from the classlist file, `ClassListParser` processes the `@cp` entries and resolves all the indicated entries. >> >> 3. Inside the `ArchiveBuilder::make_klasses_shareable()` function, we iterate over all entries in all archived `ConstantPools`. When we see a _resolved_ entry that does not satisfy the safety requirements as stated in _Overview_, we revert it back to the unresolved state. >> >> ### Dynamic CDS Archive >> >> When dumping the dynamic CDS archive, `ClassListWriter` and `ClassListParser` are not used, so steps 1 and 2 are skipped. We only perform step 3 when the archive is being written. >> >> ### Limitations >> >> - For safety, we limit this optimization to only classes loaded by the boot, platform, and app class loaders. This may be relaxed in the future. >> - We archive only the constant pool entries that are actually resolved during the training run. We don't speculatively resolve other entries, as doing so may cause C2 to... > > Ioi Lam has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - Fixed typo in previous commit > - Merge branch 'master' into 8293980-resolve-fields-at-dumptime > - @matias9927 comments - moved remove_resolved_field_entries_if_non_deterministic() to cpCache > - Merge branch 'master' into 8293980-resolve-fields-at-dumptime > - 8293980: Resolve CONSTANT_FieldRef at CDS dump time Changes look good, thanks! ------------- Marked as reviewed by matsaave (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19355#pullrequestreview-2083038736 From asmehra at openjdk.org Tue May 28 14:47:26 2024 From: asmehra at openjdk.org (Ashutosh Mehra) Date: Tue, 28 May 2024 14:47:26 GMT Subject: RFR: 8333093: Incorrect comment in zAddress_aarch64.cpp Message-ID: <6yybRqvW1XKIzua5ysA940z3LlGxUSLj06fNiXljqiY=.a66836e1-ba5c-42fe-b1cd-3034bb40a76b@github.com> This PR is just updating the comments, so no need for any testing. ------------- Commit messages: - 8333093: Incorrect comment in zAddress_aarch64.cpp Changes: https://git.openjdk.org/jdk/pull/19428/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19428&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8333093 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/19428.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19428/head:pull/19428 PR: https://git.openjdk.org/jdk/pull/19428 From mdoerr at openjdk.org Tue May 28 14:59:14 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 28 May 2024 14:59:14 GMT Subject: RFR: 8331117: [PPC64] secondary_super_cache does not scale well [v3] In-Reply-To: References: Message-ID: <0YQF8jE_JFiy_K34aIy6cybUwnpp47-6jrnmZ3jbcAI=.c6663758-17f6-40f8-a738-4e4bf7e9ddaf@github.com> > PPC64 implementation of [JDK-8180450](https://bugs.openjdk.org/browse/JDK-8180450). Please review! > I noticed that `r_array_length` is sometimes 0 and I don't see code for that on x86. Any idea? > How can we verify it? By comparing the performance using the micro benchmarks? > > Micro benchmark results without patch (measured on Power10 with 2*8 hardware threads): > > Original > SecondarySuperCacheHits: 13.033 ?(99.9%) 0.058 ns/op [Average] > SecondarySuperCacheInterContention.test avgt 15 432.366 ? 8.364 ns/op > SecondarySuperCacheInterContention.test:t1 avgt 15 432.310 ? 8.460 ns/op > SecondarySuperCacheInterContention.test:t2 avgt 15 432.422 ? 10.819 ns/op > SecondarySuperCacheIntraContention.test avgt 15 355.192 ? 3.597 ns/op > SecondarySupersLookup.testNegative00 avgt 15 12.274 ? 0.026 ns/op > SecondarySupersLookup.testNegative01 avgt 15 12.300 ? 0.039 ns/op > SecondarySupersLookup.testNegative02 avgt 15 12.304 ? 0.034 ns/op > SecondarySupersLookup.testNegative03 avgt 15 12.276 ? 0.050 ns/op > SecondarySupersLookup.testNegative04 avgt 15 12.235 ? 0.044 ns/op > SecondarySupersLookup.testNegative05 avgt 15 12.308 ? 0.156 ns/op > SecondarySupersLookup.testNegative06 avgt 15 12.291 ? 0.048 ns/op > SecondarySupersLookup.testNegative07 avgt 15 12.307 ? 0.052 ns/op > SecondarySupersLookup.testNegative08 avgt 15 12.398 ? 0.075 ns/op > SecondarySupersLookup.testNegative09 avgt 15 12.552 ? 0.122 ns/op > SecondarySupersLookup.testNegative10 avgt 15 12.490 ? 0.083 ns/op > SecondarySupersLookup.testNegative16 avgt 15 12.565 ? 0.092 ns/op > SecondarySupersLookup.testNegative20 avgt 15 19.059 ? 0.958 ns/op > SecondarySupersLookup.testNegative30 avgt 15 19.268 ? 0.124 ns/op > SecondarySupersLookup.testNegative32 avgt 15 20.059 ? 0.114 ns/op > SecondarySupersLookup.testNegative40 avgt 15 25.117 ? 0.368 ns/op > SecondarySupersLookup.testNegative50 avgt 15 32.735 ? 0.359 ns/op > SecondarySupersLookup.testNegative55 avgt 15 34.866 ? 0.152 ns/op > SecondarySupersLookup.testNegative56 avgt 15 35.492 ? 0.276 ns/op > SecondarySupersLookup.testNegative57 avgt 15 36.620 ? 0.334 ns/op > SecondarySupersLookup.testNegative58 avgt 15 37.226 ? 0.180 ns/op > SecondarySupersLookup.testNegative59 avgt 15 37.774 ? 0.241 ns/op > SecondarySupersLookup.testNegative60 avgt 15 38.627 ? 1.451 ns/op > SecondarySupersLookup.testNegative61 avgt 15 39.395 ? 0.249 ns/op > SecondarySupersLookup.testNegative62 avgt 15 ... Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: Adapt assertion. We sometimes have only 1 element in the secondary supers array. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19368/files - new: https://git.openjdk.org/jdk/pull/19368/files/6753375e..c1840719 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19368&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19368&range=01-02 Stats: 4 lines in 1 file changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/19368.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19368/head:pull/19368 PR: https://git.openjdk.org/jdk/pull/19368 From stefank at openjdk.org Tue May 28 15:00:03 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Tue, 28 May 2024 15:00:03 GMT Subject: RFR: 8333093: Incorrect comment in zAddress_aarch64.cpp In-Reply-To: <6yybRqvW1XKIzua5ysA940z3LlGxUSLj06fNiXljqiY=.a66836e1-ba5c-42fe-b1cd-3034bb40a76b@github.com> References: <6yybRqvW1XKIzua5ysA940z3LlGxUSLj06fNiXljqiY=.a66836e1-ba5c-42fe-b1cd-3034bb40a76b@github.com> Message-ID: On Tue, 28 May 2024 14:38:40 GMT, Ashutosh Mehra wrote: > This PR is just updating the comments, so no need for any testing. Marked as reviewed by stefank (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/19428#pullrequestreview-2083085958 From asmehra at openjdk.org Tue May 28 15:05:08 2024 From: asmehra at openjdk.org (Ashutosh Mehra) Date: Tue, 28 May 2024 15:05:08 GMT Subject: RFR: 8333093: Incorrect comment in zAddress_aarch64.cpp In-Reply-To: <6yybRqvW1XKIzua5ysA940z3LlGxUSLj06fNiXljqiY=.a66836e1-ba5c-42fe-b1cd-3034bb40a76b@github.com> References: <6yybRqvW1XKIzua5ysA940z3LlGxUSLj06fNiXljqiY=.a66836e1-ba5c-42fe-b1cd-3034bb40a76b@github.com> Message-ID: On Tue, 28 May 2024 14:38:40 GMT, Ashutosh Mehra wrote: > This PR is just updating the comments, so no need for any testing. As this is just a change in comments, merging it with one approval. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19428#issuecomment-2135455501 From asmehra at openjdk.org Tue May 28 15:05:09 2024 From: asmehra at openjdk.org (Ashutosh Mehra) Date: Tue, 28 May 2024 15:05:09 GMT Subject: Integrated: 8333093: Incorrect comment in zAddress_aarch64.cpp In-Reply-To: <6yybRqvW1XKIzua5ysA940z3LlGxUSLj06fNiXljqiY=.a66836e1-ba5c-42fe-b1cd-3034bb40a76b@github.com> References: <6yybRqvW1XKIzua5ysA940z3LlGxUSLj06fNiXljqiY=.a66836e1-ba5c-42fe-b1cd-3034bb40a76b@github.com> Message-ID: On Tue, 28 May 2024 14:38:40 GMT, Ashutosh Mehra wrote: > This PR is just updating the comments, so no need for any testing. This pull request has now been integrated. Changeset: 51ae08f7 Author: Ashutosh Mehra URL: https://git.openjdk.org/jdk/commit/51ae08f72b879bc611177ea643cd88e36185d9e8 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod 8333093: Incorrect comment in zAddress_aarch64.cpp Reviewed-by: stefank ------------- PR: https://git.openjdk.org/jdk/pull/19428 From mli at openjdk.org Tue May 28 15:41:27 2024 From: mli at openjdk.org (Hamlin Li) Date: Tue, 28 May 2024 15:41:27 GMT Subject: RFR: 8332899: RISC-V: add comment and make the code more readable (if possible) in MacroAssembler::movptr Message-ID: Hi, Can you help to review the patch? As discussed, https://github.com/openjdk/jdk/pull/19246#discussion_r1613279908, it's worth to make the code more readable. For movptr1, add some comments to help understand the tricky part. For movptr2, it uses the similar (tricky) way as movptr1, so I align the code implementation with movptr1, and try to make it more straightforward. I tried it, hope it's better. Thanks. ------------- Commit messages: - Initial commit Changes: https://git.openjdk.org/jdk/pull/19431/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19431&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8332899 Stats: 32 lines in 1 file changed: 25 ins; 3 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/19431.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19431/head:pull/19431 PR: https://git.openjdk.org/jdk/pull/19431 From sgibbons at openjdk.org Tue May 28 16:03:16 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Tue, 28 May 2024 16:03:16 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v38] In-Reply-To: References: Message-ID: On Fri, 24 May 2024 20:12:07 GMT, Vladimir Kozlov wrote: >> Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: >> >> Test clarifications > > test/jdk/java/lang/StringBuffer/IndexOf.java line 28: > >> 26: * @summary Test indexOf and lastIndexOf >> 27: * @run main/othervm IndexOf >> 28: * @run main/othervm -XX:+IgnoreUnrecognizedVMOptions -Xcomp -XX:-TieredCompilation -XX:UseAVX=2 -XX:+UnlockDiagnosticVMOptions -XX:+EnableX86ECoreOpts IndexOf > > I suggest to split it into 2 subtest jobs and use `@requires vm.cpu.features ~= ".*avx2.*"` for second which specified `-XX:UseAVX=2`. > See `compiler/loopopts/superword/TestDependencyOffsets.java` for example. @vnkozlov I'm getting an error in CI tests with this line added. Can you please advise? `TEST RESULT: Error. Parse Exception: Syntax error in @requires expression: invalid name: vm.cpu.features` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1617556335 From sgibbons at openjdk.org Tue May 28 16:06:16 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Tue, 28 May 2024 16:06:16 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v41] In-Reply-To: References: Message-ID: On Sat, 25 May 2024 06:33:51 GMT, Alan Bateman wrote: >> Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix test; review comments > > test/jdk/java/lang/StringBuffer/IndexOf.java line 47: > >> 45: public class IndexOf { >> 46: >> 47: static Random generator = new Random(); > > @RogerRiggs Would you have cycles to look at Scott's changes to this test? I suspect it will need to be re-structured, re-formatted, and commented to get into maintainable shape. I am going to revert my changes to this file as the test `jdk/java/lang/String/IndexOf.java` covers the code better. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1617560447 From jsjolen at openjdk.org Tue May 28 16:09:19 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Tue, 28 May 2024 16:09:19 GMT Subject: RFR: 8331193: Return references when possible in GrowableArray [v11] In-Reply-To: References: Message-ID: > Hi, > > This PR introduces the possibility of using references more often when using GrowableArray, where as previously this was only possible when using the `at()` method. This lets us avoid copying and redundant method calls and makes the API more streamlined. After the patch, we can use `at_grow` just like `at` works. The same goes for `top`, `first`, and `last`. > > > Some example code: > ```c++ > // Before this patch this worked: > GrowableArray arr(8,8,-1); // Pre-fill with 8 -1s > int& x = arr.at(7); > if (x == -1) { > x = 2; > } > assert(arr.at(7) == 2, "this holds"); > // but this was forbidden > int& x = arr.at_grow(9, -1); // Compilation error! at_grow returns E, not E& > // so we had to do > int x = arr.at_grow(9, -1); > if (x == -1) { > arr.at_put(9, 2); > } > > > Thanks. Johan Sj?len has updated the pull request incrementally with two additional commits since the last revision: - Merge remote-tracking branch 'origin/return-reference' into return-reference - Also add test for first and last ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18975/files - new: https://git.openjdk.org/jdk/pull/18975/files/210d430d..aefe0ccc Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18975&range=10 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18975&range=09-10 Stats: 9 lines in 1 file changed: 9 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/18975.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18975/head:pull/18975 PR: https://git.openjdk.org/jdk/pull/18975 From sgibbons at openjdk.org Tue May 28 16:12:43 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Tue, 28 May 2024 16:12:43 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v44] In-Reply-To: References: Message-ID: > Re-write the IndexOf code without the use of the pcmpestri instruction, only using AVX2 instructions. This change accelerates String.IndexOf on average 1.3x for AVX2. The benchmark numbers: > > > Benchmark Score Latest > StringIndexOf.advancedWithMediumSub 343.573 317.934 0.925375393x > StringIndexOf.advancedWithShortSub1 1039.081 1053.96 1.014319384x > StringIndexOf.advancedWithShortSub2 55.828 110.541 1.980027943x > StringIndexOf.constantPattern 9.361 11.906 1.271872663x > StringIndexOf.searchCharLongSuccess 4.216 4.218 1.000474383x > StringIndexOf.searchCharMediumSuccess 3.133 3.216 1.02649218x > StringIndexOf.searchCharShortSuccess 3.76 3.761 1.000265957x > StringIndexOf.success 9.186 9.713 1.057369911x > StringIndexOf.successBig 14.341 46.343 3.231504079x > StringIndexOfChar.latin1_AVX2_String 6220.918 12154.52 1.953814533x > StringIndexOfChar.latin1_AVX2_char 5503.556 5540.044 1.006629895x > StringIndexOfChar.latin1_SSE4_String 6978.854 6818.689 0.977049957x > StringIndexOfChar.latin1_SSE4_char 5657.499 5474.624 0.967675646x > StringIndexOfChar.latin1_Short_String 7132.541 6863.359 0.962260014x > StringIndexOfChar.latin1_Short_char 16013.389 16162.437 1.009307711x > StringIndexOfChar.latin1_mixed_String 7386.123 14771.622 1.999915517x > StringIndexOfChar.latin1_mixed_char 9901.671 9782.245 0.987938803 Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: Revert changes to IndexOf.java ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16753/files - new: https://git.openjdk.org/jdk/pull/16753/files/15994a39..01cb58fb Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16753&range=43 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16753&range=42-43 Stats: 382 lines in 1 file changed: 0 ins; 222 del; 160 mod Patch: https://git.openjdk.org/jdk/pull/16753.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16753/head:pull/16753 PR: https://git.openjdk.org/jdk/pull/16753 From epeter at openjdk.org Tue May 28 16:14:04 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 28 May 2024 16:14:04 GMT Subject: RFR: 8331193: Return references when possible in GrowableArray [v11] In-Reply-To: References: Message-ID: On Tue, 28 May 2024 16:09:19 GMT, Johan Sj?len wrote: >> Hi, >> >> This PR introduces the possibility of using references more often when using GrowableArray, where as previously this was only possible when using the `at()` method. This lets us avoid copying and redundant method calls and makes the API more streamlined. After the patch, we can use `at_grow` just like `at` works. The same goes for `top`, `first`, and `last`. >> >> >> Some example code: >> ```c++ >> // Before this patch this worked: >> GrowableArray arr(8,8,-1); // Pre-fill with 8 -1s >> int& x = arr.at(7); >> if (x == -1) { >> x = 2; >> } >> assert(arr.at(7) == 2, "this holds"); >> // but this was forbidden >> int& x = arr.at_grow(9, -1); // Compilation error! at_grow returns E, not E& >> // so we had to do >> int x = arr.at_grow(9, -1); >> if (x == -1) { >> arr.at_put(9, 2); >> } >> >> >> Thanks. > > Johan Sj?len has updated the pull request incrementally with two additional commits since the last revision: > > - Merge remote-tracking branch 'origin/return-reference' into return-reference > - Also add test for first and last Thanks for adding the extra tests for all methods! Looks good to me now! ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18975#pullrequestreview-2083321746 From stuefe at openjdk.org Tue May 28 16:36:02 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 28 May 2024 16:36:02 GMT Subject: RFR: 8333047: Remove arena-size-workaround in jvmtiUtils.cpp In-Reply-To: References: Message-ID: On Tue, 28 May 2024 12:36:41 GMT, Thomas Stuefe wrote: > In `JvmtiUtil::single_threaded_resource_area()`, we create a resource area that is supposed to work even if the current thread is not attached yet and there is no associated Thread or the Thread has no valid ResourceArea. > > It contains a workaround: > > > // lazily create the single threaded resource area > // pick a size which is not a standard since the pools don't exist yet > _single_threaded_resource_area = new (mtInternal) ResourceArea(Chunk::non_pool_size); > > > It specifies a non-standard chunk size to circumvent the chunk-pool-based allocation in the RA constructor, ensuring that only malloc is used. This is because in the old days the ChunkPools had been allocated from C-Heap and there was a time window when no chunk pools were live yet. > > This is quirky and a bit ugly. It is also unnecessary since [JDK-8272112](https://bugs.openjdk.org/browse/JDK-8272112) (since JDK 18). We now create chunk pools as global objects, so they are live as soon as the libjvm C++ initialization ran. We can remove this workaround and the comment. > > --- > > Tests: GHAs. > I also manually called this function, and allocated from the resulting ResourceArea, at the very beginning of CreateJavaVM. I made sure that both allocations and follow-up-chunk-allocation worked even this early in VM life. x86 problem unrelated ------------- PR Comment: https://git.openjdk.org/jdk/pull/19425#issuecomment-2135680548 From sviswanathan at openjdk.org Tue May 28 16:42:19 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Tue, 28 May 2024 16:42:19 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v43] In-Reply-To: References: Message-ID: <8iJLAJvIzHgXpr5P1hWOLTj-bfO6THNUJkQf7Ki2P9Y=.43680b7c-c3e1-41f8-b065-7955e6237613@github.com> On Sat, 25 May 2024 22:19:41 GMT, Scott Gibbons wrote: >> Re-write the IndexOf code without the use of the pcmpestri instruction, only using AVX2 instructions. This change accelerates String.IndexOf on average 1.3x for AVX2. The benchmark numbers: >> >> >> Benchmark Score Latest >> StringIndexOf.advancedWithMediumSub 343.573 317.934 0.925375393x >> StringIndexOf.advancedWithShortSub1 1039.081 1053.96 1.014319384x >> StringIndexOf.advancedWithShortSub2 55.828 110.541 1.980027943x >> StringIndexOf.constantPattern 9.361 11.906 1.271872663x >> StringIndexOf.searchCharLongSuccess 4.216 4.218 1.000474383x >> StringIndexOf.searchCharMediumSuccess 3.133 3.216 1.02649218x >> StringIndexOf.searchCharShortSuccess 3.76 3.761 1.000265957x >> StringIndexOf.success 9.186 9.713 1.057369911x >> StringIndexOf.successBig 14.341 46.343 3.231504079x >> StringIndexOfChar.latin1_AVX2_String 6220.918 12154.52 1.953814533x >> StringIndexOfChar.latin1_AVX2_char 5503.556 5540.044 1.006629895x >> StringIndexOfChar.latin1_SSE4_String 6978.854 6818.689 0.977049957x >> StringIndexOfChar.latin1_SSE4_char 5657.499 5474.624 0.967675646x >> StringIndexOfChar.latin1_Short_String 7132.541 6863.359 0.962260014x >> StringIndexOfChar.latin1_Short_char 16013.389 16162.437 1.009307711x >> StringIndexOfChar.latin1_mixed_String 7386.123 14771.622 1.999915517x >> StringIndexOfChar.latin1_mixed_char 9901.671 9782.245 0.987938803 > > Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: > > Fix tests src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 239: > 237: // the needle size is less than 32 bytes, we default to a > 238: // byte-by-byte comparison (this will be rare). > 239: // Is this still true? src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 278: > 276: __ bind(L_nextCheck); > 277: __ testq(haystack_len_p, haystack_len_p); > 278: __ je(L_zeroCheckFailed); This check could be removed as the next check covers this one. src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 360: > 358: __ push(rcx); > 359: __ push(r8); > 360: __ push(r9); No need to save/restore rcx/r8/r9 on windows platform as well. src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 379: > 377: > 378: // Assume failure > 379: __ movq(rbp, -1); We are no more using rbp at return point so this is not needed now? src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 488: > 486: __ cmpq(r11, nMinusK); > 487: __ ja_b(L_return); > 488: __ movq(rax, r11); At places where we know that return value in r11 is correct, we dont need to checkRange so this could have its own label. src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 566: > 564: // rbp: -1 > 565: // XMM_BYTE_0 - first element of needle broadcast > 566: // XMM_BYTE_K - last element of needle broadcast The only registers that are used as input in the switch case are: r14 = needle rbx = haystack rsi = haystack length (n) r12 = needle length (k) r10 = n - k (where k is needle length) XMM_BYTE_0 = first element of needle, broadcast XMM_BYTE_K = last element of needle, broadcast So we could only list these, making it easier to comprehend. src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 578: > 576: // helper jumps to L_checkRangeAndReturn with a (-1) return value. > 577: big_case_loop_helper(false, 0, L_checkRangeAndReturn, L_loopTop, mask, hsPtrRet, needleLen, > 578: needle, haystack, hsLength, tmp1, tmp2, tmp3, rScratch, ae, _masm); If we run out of haystack instead of jumping to L_checkRangeAndReturn, we could directly jump to L_retrunError. src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 597: > 595: > 596: // Need a lot of registers here to preserve state across arrays_equals call > 597: This comment is no longer valid, could be removed. src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 621: > 619: __ addq(hsPtrRet, index); > 620: __ movq(r11, hsPtrRet); > 621: __ jmp(L_checkRangeAndReturn); Why do we have to checkRange here, would it not be always correct? It so we could return r11 directly (by moving into rax). src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 660: > 658: // Haystack always copied to stack, so 32-byte reads OK > 659: // Haystack length < 32 > 660: // 10 < needle length < 32 Haystack length <= 32 10 < needle length <= 32 src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 721: > 719: false /* char */, knoreg); > 720: __ testl(rTmp3, rTmp3); > 721: __ jne(L_checkRangeAndReturn); Why do we have to checkRange here, would it not be always correct? It so we could return r11 directly (by moving into rax). src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 1333: > 1331: > 1332: __ cmpq(nMinusK, 32); > 1333: __ jae_b(L_greaterThan32); Should this check be (n-k+1) >= 32? And so accordingly (n-k) >= 31 __ cmpq(nMinusK, 31); __ jae_b(L_greaterThan32); src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 1382: > 1380: > 1381: __ testl(eq_mask, eq_mask); > 1382: __ je(noMatch); We are mixing operation width l and q here. src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 1750: > 1748: // r15 = unused > 1749: // XMM_BYTE_0 - first element of needle, broadcast > 1750: // XMM_BYTE_K - last element of needle, broadcast This comment is duplicated for both small haystack case and big haystack case, could be made a common comment. Also the only registers that are used as input in the switch case are: r14 = needle rbx = haystack rsi = haystack length (n) r12 = needle length (k) r10 = n - k (where k is needle length) XMM_BYTE_0 = first element of needle, broadcast XMM_BYTE_K = last element of needle, broadcast So we could only list these, making it easier to comprehend. src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 1758: > 1756: // > 1757: // If a match is found, jump to L_checkRange, which ensures the > 1758: // matched needle is not past the end of the haystack. Another comment here would be useful: // The found index is returned in set_bit (r11). src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 1810: > 1808: // XMM_BYTE_K - last element of needle, broadcast > 1809: // > 1810: // The haystack is > 32 bytes Good to mention some info about the return found index value in comment about how it is a combination of set_bit (r8), hs_ptr, and haystack. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1617187600 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1617193503 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1617216424 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1617218826 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1617603927 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1617318645 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1617307443 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1617536831 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1617569308 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1617575018 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1617601913 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1616424912 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1616427773 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1617263035 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1617267415 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1617273352 From kvn at openjdk.org Tue May 28 17:00:22 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 28 May 2024 17:00:22 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v38] In-Reply-To: References: Message-ID: <7jqyfDXW_EbstH_s90Fp4O7a214ZaejdM0CyAffzOHs=.544c7a91-c66b-4487-a2bf-0b8e300a94c0@github.com> On Tue, 28 May 2024 16:00:10 GMT, Scott Gibbons wrote: >> test/jdk/java/lang/StringBuffer/IndexOf.java line 28: >> >>> 26: * @summary Test indexOf and lastIndexOf >>> 27: * @run main/othervm IndexOf >>> 28: * @run main/othervm -XX:+IgnoreUnrecognizedVMOptions -Xcomp -XX:-TieredCompilation -XX:UseAVX=2 -XX:+UnlockDiagnosticVMOptions -XX:+EnableX86ECoreOpts IndexOf >> >> I suggest to split it into 2 subtest jobs and use `@requires vm.cpu.features ~= ".*avx2.*"` for second which specified `-XX:UseAVX=2`. >> See `compiler/loopopts/superword/TestDependencyOffsets.java` for example. > > @vnkozlov I'm getting an error in CI tests with this line added. Can you please advise? > > `TEST RESULT: Error. Parse Exception: Syntax error in @requires expression: invalid name: vm.cpu.features` You need to add `vm.cpu.features ` line to `test/jdk/TEST.ROOT` file. Similar to what we have in `test/hotspot/jtreg/TEST.ROOT` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1617630712 From stuefe at openjdk.org Tue May 28 18:08:03 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 28 May 2024 18:08:03 GMT Subject: RFR: 8333047: Remove arena-size-workaround in jvmtiUtils.cpp In-Reply-To: References: Message-ID: On Tue, 28 May 2024 12:36:41 GMT, Thomas Stuefe wrote: > In `JvmtiUtil::single_threaded_resource_area()`, we create a resource area that is supposed to work even if the current thread is not attached yet and there is no associated Thread or the Thread has no valid ResourceArea. > > It contains a workaround: > > > // lazily create the single threaded resource area > // pick a size which is not a standard since the pools don't exist yet > _single_threaded_resource_area = new (mtInternal) ResourceArea(Chunk::non_pool_size); > > > It specifies a non-standard chunk size to circumvent the chunk-pool-based allocation in the RA constructor, ensuring that only malloc is used. This is because in the old days the ChunkPools had been allocated from C-Heap and there was a time window when no chunk pools were live yet. > > This is quirky and a bit ugly. It is also unnecessary since [JDK-8272112](https://bugs.openjdk.org/browse/JDK-8272112) (since JDK 18). We now create chunk pools as global objects, so they are live as soon as the libjvm C++ initialization ran. We can remove this workaround and the comment. > > --- > > Tests: GHAs. > I also manually called this function, and allocated from the resulting ResourceArea, at the very beginning of CreateJavaVM. I made sure that both allocations and follow-up-chunk-allocation worked even this early in VM life. @jdksjolen could you take a look? You know the Arena coding behind it, and this PR is, in a very circumvent way, one of the prerequisites for NMT simplifications I plan. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19425#issuecomment-2135835425 From sgibbons at openjdk.org Tue May 28 18:30:30 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Tue, 28 May 2024 18:30:30 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v45] In-Reply-To: References: Message-ID: > Re-write the IndexOf code without the use of the pcmpestri instruction, only using AVX2 instructions. This change accelerates String.IndexOf on average 1.3x for AVX2. The benchmark numbers: > > > Benchmark Score Latest > StringIndexOf.advancedWithMediumSub 343.573 317.934 0.925375393x > StringIndexOf.advancedWithShortSub1 1039.081 1053.96 1.014319384x > StringIndexOf.advancedWithShortSub2 55.828 110.541 1.980027943x > StringIndexOf.constantPattern 9.361 11.906 1.271872663x > StringIndexOf.searchCharLongSuccess 4.216 4.218 1.000474383x > StringIndexOf.searchCharMediumSuccess 3.133 3.216 1.02649218x > StringIndexOf.searchCharShortSuccess 3.76 3.761 1.000265957x > StringIndexOf.success 9.186 9.713 1.057369911x > StringIndexOf.successBig 14.341 46.343 3.231504079x > StringIndexOfChar.latin1_AVX2_String 6220.918 12154.52 1.953814533x > StringIndexOfChar.latin1_AVX2_char 5503.556 5540.044 1.006629895x > StringIndexOfChar.latin1_SSE4_String 6978.854 6818.689 0.977049957x > StringIndexOfChar.latin1_SSE4_char 5657.499 5474.624 0.967675646x > StringIndexOfChar.latin1_Short_String 7132.541 6863.359 0.962260014x > StringIndexOfChar.latin1_Short_char 16013.389 16162.437 1.009307711x > StringIndexOfChar.latin1_mixed_String 7386.123 14771.622 1.999915517x > StringIndexOfChar.latin1_mixed_char 9901.671 9782.245 0.987938803 Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: Review comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16753/files - new: https://git.openjdk.org/jdk/pull/16753/files/01cb58fb..751aace8 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16753&range=44 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16753&range=43-44 Stats: 49 lines in 4 files changed: 20 ins; 13 del; 16 mod Patch: https://git.openjdk.org/jdk/pull/16753.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16753/head:pull/16753 PR: https://git.openjdk.org/jdk/pull/16753 From sgibbons at openjdk.org Tue May 28 18:30:31 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Tue, 28 May 2024 18:30:31 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v43] In-Reply-To: <8iJLAJvIzHgXpr5P1hWOLTj-bfO6THNUJkQf7Ki2P9Y=.43680b7c-c3e1-41f8-b065-7955e6237613@github.com> References: <8iJLAJvIzHgXpr5P1hWOLTj-bfO6THNUJkQf7Ki2P9Y=.43680b7c-c3e1-41f8-b065-7955e6237613@github.com> Message-ID: On Tue, 28 May 2024 12:48:19 GMT, Sandhya Viswanathan wrote: >> Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix tests > > src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 239: > >> 237: // the needle size is less than 32 bytes, we default to a >> 238: // byte-by-byte comparison (this will be rare). >> 239: // > > Is this still true? Yes. For UL, the code within `L_compareFull` effectively does byte-by-byte. > src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 278: > >> 276: __ bind(L_nextCheck); >> 277: __ testq(haystack_len_p, haystack_len_p); >> 278: __ je(L_zeroCheckFailed); > > This check could be removed as the next check covers this one. No. This is checking for a zero length haystack. The following compare checks for needle length longer than haystack, regardless of the value in each. The comparison is signed, so a haystack length of 0 with a needle length of -1 will pass the following test and assume validity. > src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 360: > >> 358: __ push(rcx); >> 359: __ push(r8); >> 360: __ push(r9); > > No need to save/restore rcx/r8/r9 on windows platform as well. OK. > src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 379: > >> 377: >> 378: // Assume failure >> 379: __ movq(rbp, -1); > > We are no more using rbp at return point so this is not needed now? Removed. > src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 488: > >> 486: __ cmpq(r11, nMinusK); >> 487: __ ja_b(L_return); >> 488: __ movq(rax, r11); > > At places where we know that return value in r11 is correct, we dont need to checkRange so this could have its own label. I don't want to change this because its reason for existence is to ensure we don't return a value that's beyond the end of the haystack. We don't yet have a good enough test to validate whether we're reading past the end of the haystack, so I like this as insurance. > src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 566: > >> 564: // rbp: -1 >> 565: // XMM_BYTE_0 - first element of needle broadcast >> 566: // XMM_BYTE_K - last element of needle broadcast > > The only registers that are used as input in the switch case are: > r14 = needle > rbx = haystack > rsi = haystack length (n) > r12 = needle length (k) > r10 = n - k (where k is needle length) > XMM_BYTE_0 = first element of needle, broadcast > XMM_BYTE_K = last element of needle, broadcast > So we could only list these, making it easier to comprehend. I listed these registers to make it clear which registers had no expected value and could be used for temps, etc. > src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 578: > >> 576: // helper jumps to L_checkRangeAndReturn with a (-1) return value. >> 577: big_case_loop_helper(false, 0, L_checkRangeAndReturn, L_loopTop, mask, hsPtrRet, needleLen, >> 578: needle, haystack, hsLength, tmp1, tmp2, tmp3, rScratch, ae, _masm); > > If we run out of haystack instead of jumping to L_checkRangeAndReturn, we could directly jump to L_retrunError. Again, I think we ought to leave this in. Although it executes ~3 instructions that may not be necessary in some cases I think it's best to perform the check. Once we have a good enough test to check reading past the end of the haystack we can change it. > src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 597: > >> 595: >> 596: // Need a lot of registers here to preserve state across arrays_equals call >> 597: > > This comment is no longer valid, could be removed. OK > src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 621: > >> 619: __ addq(hsPtrRet, index); >> 620: __ movq(r11, hsPtrRet); >> 621: __ jmp(L_checkRangeAndReturn); > > Why do we have to checkRange here, would it not be always correct? It so we could return r11 directly (by moving into rax). There are cases where r11 could have a value that, when added to (k - 1) would go past the end of the haystack. I did all in my power to ensure that it doesn't but there's no test I know of to ensure that condition. I would recommend leaving this in for now. > src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 660: > >> 658: // Haystack always copied to stack, so 32-byte reads OK >> 659: // Haystack length < 32 >> 660: // 10 < needle length < 32 > > Haystack length <= 32 > 10 < needle length <= 32 Changed. > src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 721: > >> 719: false /* char */, knoreg); >> 720: __ testl(rTmp3, rTmp3); >> 721: __ jne(L_checkRangeAndReturn); > > Why do we have to checkRange here, would it not be always correct? It so we could return r11 directly (by moving into rax). OK > src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 1333: > >> 1331: >> 1332: __ cmpq(nMinusK, 32); >> 1333: __ jae_b(L_greaterThan32); > > Should this check be (n-k+1) >= 32? And so accordingly (n-k) >= 31 > __ cmpq(nMinusK, 31); > __ jae_b(L_greaterThan32); No. For (n-k)==32 we can do full reads. I'll clarify by changing the label name. > src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 1382: > >> 1380: >> 1381: __ testl(eq_mask, eq_mask); >> 1382: __ je(noMatch); > > We are mixing operation width l and q here. Fixed. > src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 1750: > >> 1748: // r15 = unused >> 1749: // XMM_BYTE_0 - first element of needle, broadcast >> 1750: // XMM_BYTE_K - last element of needle, broadcast > > This comment is duplicated for both small haystack case and big haystack case, could be made a common comment. > Also the only registers that are used as input in the switch case are: > r14 = needle > rbx = haystack > rsi = haystack length (n) > r12 = needle length (k) > r10 = n - k (where k is needle length) > XMM_BYTE_0 = first element of needle, broadcast > XMM_BYTE_K = last element of needle, broadcast > So we could only list these, making it easier to comprehend. I listed all registers for clarity. This ensures that we know what can be used as values or as scratch registers with no ambiguity. > src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 1758: > >> 1756: // >> 1757: // If a match is found, jump to L_checkRange, which ensures the >> 1758: // matched needle is not past the end of the haystack. > > Another comment here would be useful: > // The found index is returned in set_bit (r11). Added. > src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 1810: > >> 1808: // XMM_BYTE_K - last element of needle, broadcast >> 1809: // >> 1810: // The haystack is > 32 bytes > > Good to mention some info about the return found index value in comment about how it is a combination of set_bit (r8), hs_ptr, and haystack. Done. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1617663227 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1617667775 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1617669103 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1617671612 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1617673870 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1617680570 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1617699879 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1617700813 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1617704836 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1617705505 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1617711973 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1617713299 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1617714825 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1617716598 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1617717873 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1617726261 From cslucas at openjdk.org Tue May 28 19:22:02 2024 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Tue, 28 May 2024 19:22:02 GMT Subject: RFR: JDK-8324341 : Remove redundant preprocessor #if's checks In-Reply-To: <8EQX7Jsg_SGE173q8uesBrV0-DEHZQtzb5aQTQx3A3Q=.cdd410de-4cd4-44d9-a0f1-730b48b522f3@github.com> References: <8EQX7Jsg_SGE173q8uesBrV0-DEHZQtzb5aQTQx3A3Q=.cdd410de-4cd4-44d9-a0f1-730b48b522f3@github.com> Message-ID: On Fri, 24 May 2024 07:32:36 GMT, Albert Mingkun Yang wrote: >> Can I please get some reviews for this change to remove some redundant #if / #ifdefs ? >> >> My search was just a simple grep + some bash script, though. I tested using JTREG on MacOS, Linux Mariner & Alpine from tier1 to 3. > > Can you merge master to re-trigger GHA? @albertnetymk - I updated the branch, and all GHA are passing. Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/19378#issuecomment-2135947899 From kvn at openjdk.org Tue May 28 20:18:02 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 28 May 2024 20:18:02 GMT Subject: RFR: JDK-8324341 : Remove redundant preprocessor #if's checks In-Reply-To: References: Message-ID: On Fri, 24 May 2024 02:01:36 GMT, Cesar Soares Lucas wrote: > Can I please get some reviews for this change to remove some redundant #if / #ifdefs ? > > My search was just a simple grep + some bash script, though. I tested using JTREG on MacOS, Linux Mariner & Alpine from tier1 to 3. Looks good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19378#pullrequestreview-2083769554 From sviswanathan at openjdk.org Tue May 28 20:28:16 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Tue, 28 May 2024 20:28:16 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v43] In-Reply-To: References: <8iJLAJvIzHgXpr5P1hWOLTj-bfO6THNUJkQf7Ki2P9Y=.43680b7c-c3e1-41f8-b065-7955e6237613@github.com> Message-ID: On Tue, 28 May 2024 17:59:49 GMT, Scott Gibbons wrote: >> src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 578: >> >>> 576: // helper jumps to L_checkRangeAndReturn with a (-1) return value. >>> 577: big_case_loop_helper(false, 0, L_checkRangeAndReturn, L_loopTop, mask, hsPtrRet, needleLen, >>> 578: needle, haystack, hsLength, tmp1, tmp2, tmp3, rScratch, ae, _masm); >> >> If we run out of haystack instead of jumping to L_checkRangeAndReturn, we could directly jump to L_retrunError. > > Again, I think we ought to leave this in. Although it executes ~3 instructions that may not be necessary in some cases I think it's best to perform the check. Once we have a good enough test to check reading past the end of the haystack we can change it. In this particular case, we are returning -1 (NoMatch), so no need to do L_checkRangeAndReturn here, we could directly jump to L_returnError. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1617853337 From sviswanathan at openjdk.org Tue May 28 20:32:16 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Tue, 28 May 2024 20:32:16 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v43] In-Reply-To: References: <8iJLAJvIzHgXpr5P1hWOLTj-bfO6THNUJkQf7Ki2P9Y=.43680b7c-c3e1-41f8-b065-7955e6237613@github.com> Message-ID: <7FujNShE9NvvlsGRZUR061xtnF-PCD8k8fmkM2kCS1I=.25525aec-f0bd-4587-b571-78d5dedc7d55@github.com> On Tue, 28 May 2024 17:30:24 GMT, Scott Gibbons wrote: >> src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 278: >> >>> 276: __ bind(L_nextCheck); >>> 277: __ testq(haystack_len_p, haystack_len_p); >>> 278: __ je(L_zeroCheckFailed); >> >> This check could be removed as the next check covers this one. > > No. This is checking for a zero length haystack. The following compare checks for needle length longer than haystack, regardless of the value in each. The comparison is signed, so a haystack length of 0 with a needle length of -1 will pass the following test and assume validity. But we have already checked for needle length to be greater than 0 in the following lines: __ cmpq(needle_len_p, 0); __ jg_b(L_nextCheck); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1617857240 From sviswanathan at openjdk.org Tue May 28 20:40:17 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Tue, 28 May 2024 20:40:17 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v43] In-Reply-To: References: <8iJLAJvIzHgXpr5P1hWOLTj-bfO6THNUJkQf7Ki2P9Y=.43680b7c-c3e1-41f8-b065-7955e6237613@github.com> Message-ID: On Tue, 28 May 2024 18:11:13 GMT, Scott Gibbons wrote: >> src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 1333: >> >>> 1331: >>> 1332: __ cmpq(nMinusK, 32); >>> 1333: __ jae_b(L_greaterThan32); >> >> Should this check be (n-k+1) >= 32? And so accordingly (n-k) >= 31 >> __ cmpq(nMinusK, 31); >> __ jae_b(L_greaterThan32); > > No. For (n-k)==32 we can do full reads. I'll clarify by changing the label name. We can also do full reads for (n-k) == 31, as we also compare the kth byte. >> src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 1750: >> >>> 1748: // r15 = unused >>> 1749: // XMM_BYTE_0 - first element of needle, broadcast >>> 1750: // XMM_BYTE_K - last element of needle, broadcast >> >> This comment is duplicated for both small haystack case and big haystack case, could be made a common comment. >> Also the only registers that are used as input in the switch case are: >> r14 = needle >> rbx = haystack >> rsi = haystack length (n) >> r12 = needle length (k) >> r10 = n - k (where k is needle length) >> XMM_BYTE_0 = first element of needle, broadcast >> XMM_BYTE_K = last element of needle, broadcast >> So we could only list these, making it easier to comprehend. > > I listed all registers for clarity. This ensures that we know what can be used as values or as scratch registers with no ambiguity. Sounds good. We could keep only comment out of the two as it is the same for both small haystack and big haystack. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1617862799 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1617865049 From sgibbons at openjdk.org Tue May 28 20:54:15 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Tue, 28 May 2024 20:54:15 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v43] In-Reply-To: <7FujNShE9NvvlsGRZUR061xtnF-PCD8k8fmkM2kCS1I=.25525aec-f0bd-4587-b571-78d5dedc7d55@github.com> References: <8iJLAJvIzHgXpr5P1hWOLTj-bfO6THNUJkQf7Ki2P9Y=.43680b7c-c3e1-41f8-b065-7955e6237613@github.com> <7FujNShE9NvvlsGRZUR061xtnF-PCD8k8fmkM2kCS1I=.25525aec-f0bd-4587-b571-78d5dedc7d55@github.com> Message-ID: On Tue, 28 May 2024 20:29:38 GMT, Sandhya Viswanathan wrote: >> No. This is checking for a zero length haystack. The following compare checks for needle length longer than haystack, regardless of the value in each. The comparison is signed, so a haystack length of 0 with a needle length of -1 will pass the following test and assume validity. > > But we have already checked for needle length to be greater than 0 in the following lines: > __ cmpq(needle_len_p, 0); > __ jg_b(L_nextCheck); OK >> Again, I think we ought to leave this in. Although it executes ~3 instructions that may not be necessary in some cases I think it's best to perform the check. Once we have a good enough test to check reading past the end of the haystack we can change it. > > In this particular case, we are returning -1 (NoMatch), so no need to do L_checkRangeAndReturn here, we could directly jump to L_returnError. OK. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1617876757 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1617874637 From sgibbons at openjdk.org Tue May 28 20:59:16 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Tue, 28 May 2024 20:59:16 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v43] In-Reply-To: References: <8iJLAJvIzHgXpr5P1hWOLTj-bfO6THNUJkQf7Ki2P9Y=.43680b7c-c3e1-41f8-b065-7955e6237613@github.com> Message-ID: On Tue, 28 May 2024 20:35:26 GMT, Sandhya Viswanathan wrote: >> No. For (n-k)==32 we can do full reads. I'll clarify by changing the label name. > > We can also do full reads for (n-k) == 31, as we also compare the kth byte. I'll change and test. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1617883225 From duke at openjdk.org Tue May 28 21:06:14 2024 From: duke at openjdk.org (Volodymyr Paprotski) Date: Tue, 28 May 2024 21:06:14 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v43] In-Reply-To: References: <8iJLAJvIzHgXpr5P1hWOLTj-bfO6THNUJkQf7Ki2P9Y=.43680b7c-c3e1-41f8-b065-7955e6237613@github.com> Message-ID: On Tue, 28 May 2024 17:36:03 GMT, Scott Gibbons wrote: >> src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 488: >> >>> 486: __ cmpq(r11, nMinusK); >>> 487: __ ja_b(L_return); >>> 488: __ movq(rax, r11); >> >> At places where we know that return value in r11 is correct, we dont need to checkRange so this could have its own label. > > I don't want to change this because its reason for existence is to ensure we don't return a value that's beyond the end of the haystack. We don't yet have a good enough test to validate whether we're reading past the end of the haystack, so I like this as insurance. I would recommend an experiment. Disable the range-check and run String/IndexOf.java test. Particularly run test4(), which is designed exactly to test the reads beyond the end. It wont find all the bad reads, but right now if there are any failures, they are 'hidden' by this range-check. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1617888680 From sgibbons at openjdk.org Tue May 28 21:06:14 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Tue, 28 May 2024 21:06:14 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v43] In-Reply-To: References: <8iJLAJvIzHgXpr5P1hWOLTj-bfO6THNUJkQf7Ki2P9Y=.43680b7c-c3e1-41f8-b065-7955e6237613@github.com> Message-ID: On Tue, 28 May 2024 20:56:42 GMT, Scott Gibbons wrote: >> We can also do full reads for (n-k) == 31, as we also compare the kth byte. > > I'll change and test. Passes tests, so I'll change. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1617886613 From sgibbons at openjdk.org Tue May 28 21:06:15 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Tue, 28 May 2024 21:06:15 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v43] In-Reply-To: References: <8iJLAJvIzHgXpr5P1hWOLTj-bfO6THNUJkQf7Ki2P9Y=.43680b7c-c3e1-41f8-b065-7955e6237613@github.com> Message-ID: On Tue, 28 May 2024 20:37:43 GMT, Sandhya Viswanathan wrote: >> I listed all registers for clarity. This ensures that we know what can be used as values or as scratch registers with no ambiguity. > > Sounds good. We could keep only comment out of the two as it is the same for both small haystack and big haystack. OK ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1617889756 From sgibbons at openjdk.org Tue May 28 21:12:16 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Tue, 28 May 2024 21:12:16 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v37] In-Reply-To: References: <4xYUBsOJ_eDSuj6w9AjUo_6gFN_9piWR-ChLrHQoXl4=.88756684-8e9c-48e3-8b59-f5f684b81cde@github.com> Message-ID: On Fri, 24 May 2024 20:42:12 GMT, Scott Gibbons wrote: >> test/jdk/java/lang/StringBuffer/ECoreIndexOf.java line 185: >> >>> 183: } >>> 184: >>> 185: private static int indexOfKernel(String haystack, String needle) { >> >> Is the intention of kernels not to be inlined so that it would be part of separate compilation? >> >> If so, you probably want to annotate it with `@CompilerControl(CompilerControl.Mode.DONT_INLINE)` >> >> i.e. https://github.com/openjdk/jmh/blob/master/jmh-samples/src/main/java/org/openjdk/jmh/samples/JMHSample_16_CompilerControl.java > > Fixed. CompilerControl is unavailable here. Added a runtime option instead. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1617894475 From sgibbons at openjdk.org Tue May 28 21:12:16 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Tue, 28 May 2024 21:12:16 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v38] In-Reply-To: <7jqyfDXW_EbstH_s90Fp4O7a214ZaejdM0CyAffzOHs=.544c7a91-c66b-4487-a2bf-0b8e300a94c0@github.com> References: <7jqyfDXW_EbstH_s90Fp4O7a214ZaejdM0CyAffzOHs=.544c7a91-c66b-4487-a2bf-0b8e300a94c0@github.com> Message-ID: On Tue, 28 May 2024 16:57:54 GMT, Vladimir Kozlov wrote: >> @vnkozlov I'm getting an error in CI tests with this line added. Can you please advise? >> >> `TEST RESULT: Error. Parse Exception: Syntax error in @requires expression: invalid name: vm.cpu.features` > > You need to add `vm.cpu.features ` line to `test/jdk/TEST.ROOT` file. Similar to what we have in `test/hotspot/jtreg/TEST.ROOT` Fixed. Thanks. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1617893462 From sgibbons at openjdk.org Tue May 28 21:20:15 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Tue, 28 May 2024 21:20:15 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v43] In-Reply-To: <8iJLAJvIzHgXpr5P1hWOLTj-bfO6THNUJkQf7Ki2P9Y=.43680b7c-c3e1-41f8-b065-7955e6237613@github.com> References: <8iJLAJvIzHgXpr5P1hWOLTj-bfO6THNUJkQf7Ki2P9Y=.43680b7c-c3e1-41f8-b065-7955e6237613@github.com> Message-ID: On Tue, 28 May 2024 16:37:23 GMT, Sandhya Viswanathan wrote: >> Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix tests > > src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 488: > >> 486: __ cmpq(r11, nMinusK); >> 487: __ ja_b(L_return); >> 488: __ movq(rax, r11); > > At places where we know that return value in r11 is correct, we dont need to checkRange so this could have its own label. Disabling causes the test to succeed, so we're not finding matches beyond the end of the string, correct? Are we confident that this test passing can warrant removing the range check? @sviswa7 ? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1617901070 From lmesnik at openjdk.org Tue May 28 22:29:28 2024 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Tue, 28 May 2024 22:29:28 GMT Subject: RFR: 8332259: JvmtiTrace::safe_get_thread_name fails if current thread is in native state [v5] In-Reply-To: References: Message-ID: <2Aorg4EW1Sl5s0tplzUb89ZNUeZg2xsPj3VkJQflzN4=.9072eee0-c481-4da9-ade9-5595ab78030f@github.com> > The JvmtiTrace::safe_get_thread_name sometimes crashes when called while current thread is in native thread state. > > It happens when thread_name is set for tracing from jvmti functions. > See: > https://github.com/openjdk/jdk/blob/master/src/hotspot/share/prims/jvmtiEnter.xsl#L649 > > The setup is called and the thread name is used in tracing before the thread transition. There is no good location where this method could be called from vm thread_state only. Some functions like raw monitor enter/exit never transition in vm state. So sometimes it is needed to call this function from native thread state. > > The change should affect JVMTI trace mode only (-XX:TraceJVMTI). > > Verified by running jvmti/jdi/jdb tests with tracing enabled. Leonid Mesnik has updated the pull request incrementally with two additional commits since the last revision: - fixed space. - The result is updated. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19275/files - new: https://git.openjdk.org/jdk/pull/19275/files/12ddfca2..81dc4073 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19275&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19275&range=03-04 Stats: 2 lines in 2 files changed: 1 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/19275.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19275/head:pull/19275 PR: https://git.openjdk.org/jdk/pull/19275 From lmesnik at openjdk.org Tue May 28 22:29:29 2024 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Tue, 28 May 2024 22:29:29 GMT Subject: RFR: 8332259: JvmtiTrace::safe_get_thread_name fails if current thread is in native state [v4] In-Reply-To: References: Message-ID: On Fri, 17 May 2024 22:31:32 GMT, Leonid Mesnik wrote: >> The JvmtiTrace::safe_get_thread_name sometimes crashes when called while current thread is in native thread state. >> >> It happens when thread_name is set for tracing from jvmti functions. >> See: >> https://github.com/openjdk/jdk/blob/master/src/hotspot/share/prims/jvmtiEnter.xsl#L649 >> >> The setup is called and the thread name is used in tracing before the thread transition. There is no good location where this method could be called from vm thread_state only. Some functions like raw monitor enter/exit never transition in vm state. So sometimes it is needed to call this function from native thread state. >> >> The change should affect JVMTI trace mode only (-XX:TraceJVMTI). >> >> Verified by running jvmti/jdi/jdb tests with tracing enabled. > > Leonid Mesnik has updated the pull request incrementally with one additional commit since the last revision: > > wrong thread state I discussed the issue with @sspitsyn and after this, I updated the function to return something more descriptive in case we can't read thread state. (Easier to understand what happens.) ------------- PR Comment: https://git.openjdk.org/jdk/pull/19275#issuecomment-2136203620 From sgibbons at openjdk.org Tue May 28 22:33:18 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Tue, 28 May 2024 22:33:18 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v43] In-Reply-To: References: <8iJLAJvIzHgXpr5P1hWOLTj-bfO6THNUJkQf7Ki2P9Y=.43680b7c-c3e1-41f8-b065-7955e6237613@github.com> Message-ID: <2OsgJsQtfArLRfrVbwvYJKpx3ljhT2fU3UUdWJsUiCY=.91914663-2fef-4696-b1d8-4f7b0c951205@github.com> On Tue, 28 May 2024 21:17:07 GMT, Scott Gibbons wrote: >> src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 488: >> >>> 486: __ cmpq(r11, nMinusK); >>> 487: __ ja_b(L_return); >>> 488: __ movq(rax, r11); >> >> At places where we know that return value in r11 is correct, we dont need to checkRange so this could have its own label. > > Disabling causes the test to succeed, so we're not finding matches beyond the end of the string, correct? Are we confident that this test passing can warrant removing the range check? @sviswa7 ? Removed. >> src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 621: >> >>> 619: __ addq(hsPtrRet, index); >>> 620: __ movq(r11, hsPtrRet); >>> 621: __ jmp(L_checkRangeAndReturn); >> >> Why do we have to checkRange here, would it not be always correct? It so we could return r11 directly (by moving into rax). > > There are cases where r11 could have a value that, when added to (k - 1) would go past the end of the haystack. I did all in my power to ensure that it doesn't but there's no test I know of to ensure that condition. I would recommend leaving this in for now. Removed checkRangeAndReturn ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1617956870 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1617956635 From sgibbons at openjdk.org Tue May 28 22:33:19 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Tue, 28 May 2024 22:33:19 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v19] In-Reply-To: References: <8Y-nIHc8vfB1X_hp3tpqqqgpCzu6dAt6BBIP_zc4Q70=.c9a48c68-8c14-4af9-8357-ab50e62a5fd3@github.com> Message-ID: On Thu, 16 May 2024 18:09:04 GMT, Scott Gibbons wrote: >> src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 418: >> >>> 416: __ cmpq(haystack_len, 0x10); >>> 417: __ ja_b(L_moreThan16); >>> 418: >> >> An assert here to check for header size >= 16 would be good. >> Also a comment here would he good, something like: >> // Copy 16 or 32 bytes prior to haystack end onto stack >> // This will possibly including some object header bytes when haystack length is less than 16 or 32 bytes // Set the new haystack address to beginning of copied haystack on stack adjusting for extra bytes copied > > I don't know how to assert header size >= 16 bytes, so I'll add a comment stating such. If you can tell me how to assert, I'll add that code in place of the comment. Fixed in library_call.cpp ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1617955173 From sgibbons at openjdk.org Tue May 28 22:47:42 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Tue, 28 May 2024 22:47:42 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v46] In-Reply-To: References: Message-ID: > Re-write the IndexOf code without the use of the pcmpestri instruction, only using AVX2 instructions. This change accelerates String.IndexOf on average 1.3x for AVX2. The benchmark numbers: > > > Benchmark Score Latest > StringIndexOf.advancedWithMediumSub 343.573 317.934 0.925375393x > StringIndexOf.advancedWithShortSub1 1039.081 1053.96 1.014319384x > StringIndexOf.advancedWithShortSub2 55.828 110.541 1.980027943x > StringIndexOf.constantPattern 9.361 11.906 1.271872663x > StringIndexOf.searchCharLongSuccess 4.216 4.218 1.000474383x > StringIndexOf.searchCharMediumSuccess 3.133 3.216 1.02649218x > StringIndexOf.searchCharShortSuccess 3.76 3.761 1.000265957x > StringIndexOf.success 9.186 9.713 1.057369911x > StringIndexOf.successBig 14.341 46.343 3.231504079x > StringIndexOfChar.latin1_AVX2_String 6220.918 12154.52 1.953814533x > StringIndexOfChar.latin1_AVX2_char 5503.556 5540.044 1.006629895x > StringIndexOfChar.latin1_SSE4_String 6978.854 6818.689 0.977049957x > StringIndexOfChar.latin1_SSE4_char 5657.499 5474.624 0.967675646x > StringIndexOfChar.latin1_Short_String 7132.541 6863.359 0.962260014x > StringIndexOfChar.latin1_Short_char 16013.389 16162.437 1.009307711x > StringIndexOfChar.latin1_mixed_String 7386.123 14771.622 1.999915517x > StringIndexOfChar.latin1_mixed_char 9901.671 9782.245 0.987938803 Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: Final review comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16753/files - new: https://git.openjdk.org/jdk/pull/16753/files/751aace8..355325d0 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16753&range=45 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16753&range=44-45 Stats: 95 lines in 3 files changed: 23 ins; 51 del; 21 mod Patch: https://git.openjdk.org/jdk/pull/16753.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16753/head:pull/16753 PR: https://git.openjdk.org/jdk/pull/16753 From amenkov at openjdk.org Tue May 28 23:07:03 2024 From: amenkov at openjdk.org (Alex Menkov) Date: Tue, 28 May 2024 23:07:03 GMT Subject: RFR: 8330852: All callers of JvmtiEnvBase::get_threadOop_and_JavaThread should pass current thread explicitly [v4] In-Reply-To: References: Message-ID: On Fri, 3 May 2024 01:54:24 GMT, Alex Menkov wrote: >> Some cleanup related to JvmtiEnvBase::get_threadOop_and_JavaThread method >> >> Testing: tier1-6 > > Alex Menkov has updated the pull request incrementally with three additional commits since the last revision: > > - update > - Revert "renamed current_thread to current" > > This reverts commit d5d614bcf0861466acd695296e974d2253f84c9f. > - Revert "renamed current_thread tp current" > > This reverts commit 4602632221044aa754a1bc8d11e7a3e9a0092590. Ping. Can I get second review please. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18986#issuecomment-2136243867 From sgibbons at openjdk.org Tue May 28 23:52:27 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Tue, 28 May 2024 23:52:27 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v47] In-Reply-To: References: Message-ID: > Re-write the IndexOf code without the use of the pcmpestri instruction, only using AVX2 instructions. This change accelerates String.IndexOf on average 1.3x for AVX2. The benchmark numbers: > > > Benchmark Score Latest > StringIndexOf.advancedWithMediumSub 343.573 317.934 0.925375393x > StringIndexOf.advancedWithShortSub1 1039.081 1053.96 1.014319384x > StringIndexOf.advancedWithShortSub2 55.828 110.541 1.980027943x > StringIndexOf.constantPattern 9.361 11.906 1.271872663x > StringIndexOf.searchCharLongSuccess 4.216 4.218 1.000474383x > StringIndexOf.searchCharMediumSuccess 3.133 3.216 1.02649218x > StringIndexOf.searchCharShortSuccess 3.76 3.761 1.000265957x > StringIndexOf.success 9.186 9.713 1.057369911x > StringIndexOf.successBig 14.341 46.343 3.231504079x > StringIndexOfChar.latin1_AVX2_String 6220.918 12154.52 1.953814533x > StringIndexOfChar.latin1_AVX2_char 5503.556 5540.044 1.006629895x > StringIndexOfChar.latin1_SSE4_String 6978.854 6818.689 0.977049957x > StringIndexOfChar.latin1_SSE4_char 5657.499 5474.624 0.967675646x > StringIndexOfChar.latin1_Short_String 7132.541 6863.359 0.962260014x > StringIndexOfChar.latin1_Short_char 16013.389 16162.437 1.009307711x > StringIndexOfChar.latin1_mixed_String 7386.123 14771.622 1.999915517x > StringIndexOfChar.latin1_mixed_char 9901.671 9782.245 0.987938803 Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: Move assert to where it's actually important. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16753/files - new: https://git.openjdk.org/jdk/pull/16753/files/355325d0..db0ab75a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16753&range=46 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16753&range=45-46 Stats: 2 lines in 1 file changed: 1 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/16753.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16753/head:pull/16753 PR: https://git.openjdk.org/jdk/pull/16753 From sspitsyn at openjdk.org Wed May 29 01:02:45 2024 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Wed, 29 May 2024 01:02:45 GMT Subject: RFR: 8311177: Switching to interpreter only mode in carrier thread can lead to crashes Message-ID: Please, review the following `interp-only` issue related to carrier threads. There are 3 problems fixed here: - The `EnterInterpOnlyModeClosure::do_threads` is taking the `JvmtiThreadState` with the `jt->jvmti_thread_state()` which is incorrect when we have a deal with a carrier thread. The target state is known at the point when the `HandshakeClosure` is set, so the fix is to pass it as a constructor parameter. - The `state->is_pending_interp_only_mode())` was processed at mounts only but it has to be processed for unmounts as well. - The test `test/hotspot/jtreg/serviceability/jvmti/vthread/MethodExitTest/libMethodExitTest.cpp` has a wrong assumption that there can't be `MethodExit` event on the carrier thread when the function `breakpoint_hit1` is being executed. However, it can happen if the virtual thread gets unmounted. The fix also includes new test case `vthread/CarrierThreadEventNotification` developed by Patricio. Testing: - Ran new test case locally - Ran mach5 tiers 1-6 ------------- Commit messages: - fix trailing spaces in new test - 8311177: Switching to interpreter only mode in carrier thread can lead to crashes Changes: https://git.openjdk.org/jdk/pull/19438/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19438&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8311177 Stats: 251 lines in 7 files changed: 231 ins; 9 del; 11 mod Patch: https://git.openjdk.org/jdk/pull/19438.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19438/head:pull/19438 PR: https://git.openjdk.org/jdk/pull/19438 From sspitsyn at openjdk.org Wed May 29 01:07:06 2024 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Wed, 29 May 2024 01:07:06 GMT Subject: RFR: 8332917: failure_handler should execute gdb "info threads" command on linux In-Reply-To: References: Message-ID: On Fri, 24 May 2024 19:45:21 GMT, Chris Plummer wrote: > On linux, failure_handler dumps stack traces for all threads, but this dump does not include the name of each thread. The gdb "info threads" command will give a summary of all threads, and if debugging process, the summary will include each thread's name. If debugging a core file, for some reason the thread name is not included, but the summary is still useful. > > Tested by running some tests that fail with a timeout, and looking at the failure_handler gdb output for both the process and the core file. Marked as reviewed by sspitsyn (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/19401#pullrequestreview-2084069635 From sspitsyn at openjdk.org Wed May 29 01:23:06 2024 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Wed, 29 May 2024 01:23:06 GMT Subject: RFR: 8332259: JvmtiTrace::safe_get_thread_name fails if current thread is in native state [v5] In-Reply-To: <2Aorg4EW1Sl5s0tplzUb89ZNUeZg2xsPj3VkJQflzN4=.9072eee0-c481-4da9-ade9-5595ab78030f@github.com> References: <2Aorg4EW1Sl5s0tplzUb89ZNUeZg2xsPj3VkJQflzN4=.9072eee0-c481-4da9-ade9-5595ab78030f@github.com> Message-ID: On Tue, 28 May 2024 22:29:28 GMT, Leonid Mesnik wrote: >> The JvmtiTrace::safe_get_thread_name sometimes crashes when called while current thread is in native thread state. >> >> It happens when thread_name is set for tracing from jvmti functions. >> See: >> https://github.com/openjdk/jdk/blob/master/src/hotspot/share/prims/jvmtiEnter.xsl#L649 >> >> The setup is called and the thread name is used in tracing before the thread transition. There is no good location where this method could be called from vm thread_state only. Some functions like raw monitor enter/exit never transition in vm state. So sometimes it is needed to call this function from native thread state. >> >> The change should affect JVMTI trace mode only (-XX:TraceJVMTI). >> >> Verified by running jvmti/jdi/jdb tests with tracing enabled. > > Leonid Mesnik has updated the pull request incrementally with two additional commits since the last revision: > > - fixed space. > - The result is updated. This looks good, Posted one nit though. src/hotspot/share/prims/jvmtiTrace.cpp line 284: > 282: JavaThreadState current_state = JavaThread::cast(Thread::current())->thread_state(); > 283: if (current_state == _thread_in_native || current_state == _thread_blocked) { > 284: return "not readable"; Nit: I'd suggest to make it more detailed, something like like this: "" or "" ------------- Marked as reviewed by sspitsyn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19275#pullrequestreview-2084079674 PR Review Comment: https://git.openjdk.org/jdk/pull/19275#discussion_r1618051643 From cjplummer at openjdk.org Wed May 29 01:51:06 2024 From: cjplummer at openjdk.org (Chris Plummer) Date: Wed, 29 May 2024 01:51:06 GMT Subject: RFR: 8330852: All callers of JvmtiEnvBase::get_threadOop_and_JavaThread should pass current thread explicitly [v4] In-Reply-To: References: Message-ID: On Fri, 3 May 2024 01:54:24 GMT, Alex Menkov wrote: >> Some cleanup related to JvmtiEnvBase::get_threadOop_and_JavaThread method >> >> Testing: tier1-6 > > Alex Menkov has updated the pull request incrementally with three additional commits since the last revision: > > - update > - Revert "renamed current_thread to current" > > This reverts commit d5d614bcf0861466acd695296e974d2253f84c9f. > - Revert "renamed current_thread tp current" > > This reverts commit 4602632221044aa754a1bc8d11e7a3e9a0092590. Looks good. ------------- Marked as reviewed by cjplummer (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18986#pullrequestreview-2084100945 From sviswanathan at openjdk.org Wed May 29 03:05:18 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Wed, 29 May 2024 03:05:18 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v47] In-Reply-To: References: Message-ID: On Tue, 28 May 2024 23:52:27 GMT, Scott Gibbons wrote: >> Re-write the IndexOf code without the use of the pcmpestri instruction, only using AVX2 instructions. This change accelerates String.IndexOf on average 1.3x for AVX2. The benchmark numbers: >> >> >> Benchmark Score Latest >> StringIndexOf.advancedWithMediumSub 343.573 317.934 0.925375393x >> StringIndexOf.advancedWithShortSub1 1039.081 1053.96 1.014319384x >> StringIndexOf.advancedWithShortSub2 55.828 110.541 1.980027943x >> StringIndexOf.constantPattern 9.361 11.906 1.271872663x >> StringIndexOf.searchCharLongSuccess 4.216 4.218 1.000474383x >> StringIndexOf.searchCharMediumSuccess 3.133 3.216 1.02649218x >> StringIndexOf.searchCharShortSuccess 3.76 3.761 1.000265957x >> StringIndexOf.success 9.186 9.713 1.057369911x >> StringIndexOf.successBig 14.341 46.343 3.231504079x >> StringIndexOfChar.latin1_AVX2_String 6220.918 12154.52 1.953814533x >> StringIndexOfChar.latin1_AVX2_char 5503.556 5540.044 1.006629895x >> StringIndexOfChar.latin1_SSE4_String 6978.854 6818.689 0.977049957x >> StringIndexOfChar.latin1_SSE4_char 5657.499 5474.624 0.967675646x >> StringIndexOfChar.latin1_Short_String 7132.541 6863.359 0.962260014x >> StringIndexOfChar.latin1_Short_char 16013.389 16162.437 1.009307711x >> StringIndexOfChar.latin1_mixed_String 7386.123 14771.622 1.999915517x >> StringIndexOfChar.latin1_mixed_char 9901.671 9782.245 0.987938803 > > Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: > > Move assert to where it's actually important. Looks good to me. ------------- Marked as reviewed by sviswanathan (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16753#pullrequestreview-2084177134 From ccheung at openjdk.org Wed May 29 05:05:06 2024 From: ccheung at openjdk.org (Calvin Cheung) Date: Wed, 29 May 2024 05:05:06 GMT Subject: RFR: 8330198: Add some class loading related perf counters to measure VM startup [v4] In-Reply-To: References: <7AWghiG_TSVMjkfVfA_krBMWZNMRVlakI7kny1tuJ9s=.d4ca3b29-923a-48e6-80d7-97c72ea6e308@github.com> Message-ID: On Mon, 27 May 2024 04:19:08 GMT, David Holmes wrote: >> Calvin Cheung has updated the pull request incrementally with one additional commit since the last revision: >> >> @dholmes-ora comments > > src/hotspot/share/cds/dynamicArchive.cpp line 123: > >> 121: >> 122: log_info(cds,dynamic)("CDS dynamic dump: clinit = " INT64_FORMAT "ms)", >> 123: (int64_t)ClassLoader::class_init_time_ms()); > > Nit: just use JLONG_FORMAT and avoid the cast Fixed. > src/hotspot/share/classfile/classLoader.cpp line 144: > >> 142: log.print_cr("ClassLoader:"); >> 143: log.print_cr(" clinit: " INT64_FORMAT "ms / " INT64_FORMAT " events", (int64_t)ClassLoader::class_init_time_ms(), (int64_t)ClassLoader::class_init_count()); >> 144: log.print_cr(" link methods: " INT64_FORMAT "ms / " INT64_FORMAT " events", (int64_t)Management::ticks_to_ms(_perf_ik_link_methods_time->get_value()) , (int64_t)_perf_ik_link_methods_count->get_value()); > > Why are you casting all the jlong values to int64_t instead of just using JLONG_FORMAT? I've changed them to JLONG_FORMAT and removed the casting. > src/hotspot/share/runtime/java.cpp line 165: > >> 163: ClassLoader::print_counters(); >> 164: } >> 165: } > > This method seems unnecessary. Inside `print_counters` it checks if the log is enabled and whether `ProfileClassLinkage` is set, so no need to check the log is enabled here. Wherever this is called you should just call `ClassLoader::print_counters` directly. (Further the "init" part of the name is only meaningful for the call site at the end of VM initialization.) This function will cover other sets of counters in the future. Maybe changing its name to `log_vm_stats`? > src/hotspot/share/runtime/java.cpp line 367: > >> 365: ThreadsSMRSupport::log_statistics(); >> 366: >> 367: log_vm_init_stats(); > > Do we really want to call `ClassLoader::print_counters` here? IIUC most everything else here is printing to tty, but `ClassLoader::print_counters` will "print" to whereever the logging has been configured. (` ThreadsSMRSupport::log_statistics` seems similarly misplaced as it too uses logging). If using `tty`, the output would lose the logging tag. The output would look as follows: ClassLoader: clinit: 11ms / 285 events link methods: 13ms / 7493 events method adapters: 12ms / 571 events versus with logging tag: [0.094s][info][init] ClassLoader: [0.094s][info][init] clinit: 11ms / 278 events [0.094s][info][init] link methods: 13ms / 7336 events [0.094s][info][init] method adapters: 12ms / 571 events > src/hotspot/share/runtime/perfData.hpp line 838: > >> 836: } >> 837: >> 838: const char* name() const { return (_timerp != nullptr) ? _timerp->name() : nullptr; } > > So now all the callers of this need a null check too. I wonder if this should just be an assertion check, as we should only ever call this when we have encountered a valid/live counter. I've changed it to the following: const char* name() const { assert(_timerp != nullptr, "sanity"); return _timerp->name(); } > src/hotspot/share/runtime/threads.cpp line 832: > >> 830: >> 831: if (ProfileClassLinkage) { >> 832: log_info(init)("Before main:"); > > "Before main: " ?? That seems very launcher specific. How about "At VM initialization completion"? Fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18790#discussion_r1618181043 PR Review Comment: https://git.openjdk.org/jdk/pull/18790#discussion_r1618181124 PR Review Comment: https://git.openjdk.org/jdk/pull/18790#discussion_r1618181402 PR Review Comment: https://git.openjdk.org/jdk/pull/18790#discussion_r1618181476 PR Review Comment: https://git.openjdk.org/jdk/pull/18790#discussion_r1618181207 PR Review Comment: https://git.openjdk.org/jdk/pull/18790#discussion_r1618181263 From ccheung at openjdk.org Wed May 29 05:05:06 2024 From: ccheung at openjdk.org (Calvin Cheung) Date: Wed, 29 May 2024 05:05:06 GMT Subject: RFR: 8330198: Add some class loading related perf counters to measure VM startup [v3] In-Reply-To: References: <7yfsvM0ff6gBYLefpro2qTcEMBmCOHd3YICcygItlZs=.d900a439-4932-46e6-b287-d1bf2789f195@github.com> Message-ID: On Mon, 27 May 2024 04:26:16 GMT, David Holmes wrote: >> If only `ProfileClassLinkage` is set to true without `-Xlog:init`, the user will not see any counters output. >> In `java.cpp`: >> >> 160 void log_vm_init_stats() { >> 161 LogStreamHandle(Info, init) log; >> 162 if (log.is_enabled()) { >> 163 ClassLoader::print_counters(); >> 164 } >> 165 } >> >> >> In the future, there will be other sets of counters controlled by other diagnostic flags. > > Yeah I'm not really getting the control aspects here. If I turn on logging I should not get these new counters unless I explicitly ask for them - simply turning on the logging should not set `ProfileClassLinkage` IMO. But enabling `ProfileClassLinkage` should turn on `init` logging, else it serves no purpose. We are planning to add more diagnostic flags to control different sets of counters. With the current design, the user just needs to specify `-Xlog:init` to enable all the "new" counters. If the `init` logging is enabled by individual flag, the user needs to enable individual flag in the command line. Anyway, I think the follow would achieve what you are alluding to? if (FLAG_IS_CMDLINE(ProfileClassLinkage) && !log_is_enabled(Info, init)) { LogConfiguration::configure_stdout(LogLevel::Info, true, LOG_TAGS(init)); } I think it's better to keep the current change. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18790#discussion_r1618180989 From dholmes at openjdk.org Wed May 29 07:31:05 2024 From: dholmes at openjdk.org (David Holmes) Date: Wed, 29 May 2024 07:31:05 GMT Subject: RFR: 8330198: Add some class loading related perf counters to measure VM startup [v4] In-Reply-To: References: <7AWghiG_TSVMjkfVfA_krBMWZNMRVlakI7kny1tuJ9s=.d4ca3b29-923a-48e6-80d7-97c72ea6e308@github.com> Message-ID: On Wed, 29 May 2024 05:02:05 GMT, Calvin Cheung wrote: >> src/hotspot/share/runtime/java.cpp line 165: >> >>> 163: ClassLoader::print_counters(); >>> 164: } >>> 165: } >> >> This method seems unnecessary. Inside `print_counters` it checks if the log is enabled and whether `ProfileClassLinkage` is set, so no need to check the log is enabled here. Wherever this is called you should just call `ClassLoader::print_counters` directly. (Further the "init" part of the name is only meaningful for the call site at the end of VM initialization.) > > This function will cover other sets of counters in the future. Maybe changing its name to `log_vm_stats`? Regardless there seems to be confusion about which method should be responsible for checking if the requisite logging is enabled. They should not both do it. >> src/hotspot/share/runtime/java.cpp line 367: >> >>> 365: ThreadsSMRSupport::log_statistics(); >>> 366: >>> 367: log_vm_init_stats(); >> >> Do we really want to call `ClassLoader::print_counters` here? IIUC most everything else here is printing to tty, but `ClassLoader::print_counters` will "print" to whereever the logging has been configured. (` ThreadsSMRSupport::log_statistics` seems similarly misplaced as it too uses logging). > > If using `tty`, the output would lose the logging tag. The output would look as follows: > > ClassLoader: > clinit: 11ms / 285 events > link methods: 13ms / 7493 events > method adapters: 12ms / 571 events > > versus with logging tag: > > [0.094s][info][init] ClassLoader: > [0.094s][info][init] clinit: 11ms / 278 events > [0.094s][info][init] link methods: 13ms / 7336 events > [0.094s][info][init] method adapters: 12ms / 571 events Yes I understand that, but this method is generally printing a ton of stuff to the tty - that is what it is for. If we want to add such stuff to the output then it too should just go to the tty - else it doesn't belong in this method IMO. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18790#discussion_r1618341748 PR Review Comment: https://git.openjdk.org/jdk/pull/18790#discussion_r1618343557 From aph at openjdk.org Wed May 29 07:39:08 2024 From: aph at openjdk.org (Andrew Haley) Date: Wed, 29 May 2024 07:39:08 GMT Subject: RFR: 8331117: [PPC64] secondary_super_cache does not scale well [v2] In-Reply-To: References: Message-ID: On Tue, 28 May 2024 14:04:13 GMT, Martin Doerr wrote: > Performance seems to be not affected by that bug. That is extremely suspicious. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19368#issuecomment-2136738622 From stefank at openjdk.org Wed May 29 07:42:02 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Wed, 29 May 2024 07:42:02 GMT Subject: RFR: 8331539: [REDO] NMT: add/make a mandatory MEMFLAGS argument to family of os::reserve/commit/uncommit memory API [v2] In-Reply-To: References: <1i0PKv9mCusM6BZqXG8ULe0lRA2Nz2ix4aZHz9otNMM=.b9d2d151-883e-4cb6-be48-4ba45b49ed43@github.com> <_M5SvhyN_E_8HUeamhiLJMp37flhjgTVE_X7t8jmPVc=.f86cbb23-9461-4013-83bf-d6b154b96cfd@github.com> <9XzKmn3xJvlbw4gz2vK_NZ6yOwfKB9VzHE6CBSz-73E=.dfaa5291-95b6-403d-b363-42131ebf4c4c@github.com> <8_W3gPFqX8RC7V2QvFSmKAOTEK4z6uHOf4NnA0RDp7A=.428d32d1-0624-48de-ac7b-0f1acc6a0a14@github.com> Message-ID: On Mon, 27 May 2024 21:25:12 GMT, Afshin Zafari wrote: >> My point is that the `archive_space_rs` and `class_space_rs` can get the wrong flags assigned to them. The split functions don't change them. Right? >> >> I would like to see the code run through our testing with these checks: >> >> assert(archive_space_rs.nmt_flag() == mtClassShared, "Sanity"); >> assert(class_space_rs.nmt_flag() == mtClass, "Sanity"); > > The call to `MemTracker::record_virtual_memory_split_reserved` at line 1364, takes two flags for the split parts. The corresponding regions in NMT take that flags. > The sanity assertions will be added anyway. I think you are still missing my point. `record_virtual_memory_split_reserved` doesn't update the `archive_space_rs` and `class_space_rs` instances with the correct flag. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19343#discussion_r1618362958 From aph at openjdk.org Wed May 29 07:43:06 2024 From: aph at openjdk.org (Andrew Haley) Date: Wed, 29 May 2024 07:43:06 GMT Subject: RFR: 8331117: [PPC64] secondary_super_cache does not scale well [v3] In-Reply-To: <0YQF8jE_JFiy_K34aIy6cybUwnpp47-6jrnmZ3jbcAI=.c6663758-17f6-40f8-a738-4e4bf7e9ddaf@github.com> References: <0YQF8jE_JFiy_K34aIy6cybUwnpp47-6jrnmZ3jbcAI=.c6663758-17f6-40f8-a738-4e4bf7e9ddaf@github.com> Message-ID: On Tue, 28 May 2024 14:59:14 GMT, Martin Doerr wrote: >> PPC64 implementation of [JDK-8180450](https://bugs.openjdk.org/browse/JDK-8180450). Please review! >> I noticed that `r_array_length` is sometimes 0 and I don't see code for that on x86. Any idea? >> How can we verify it? By comparing the performance using the micro benchmarks? >> >> Micro benchmark results without patch (measured on Power10 with 2*8 hardware threads): >> >> Original >> SecondarySuperCacheHits: 13.033 ?(99.9%) 0.058 ns/op [Average] >> SecondarySuperCacheInterContention.test avgt 15 432.366 ? 8.364 ns/op >> SecondarySuperCacheInterContention.test:t1 avgt 15 432.310 ? 8.460 ns/op >> SecondarySuperCacheInterContention.test:t2 avgt 15 432.422 ? 10.819 ns/op >> SecondarySuperCacheIntraContention.test avgt 15 355.192 ? 3.597 ns/op >> SecondarySupersLookup.testNegative00 avgt 15 12.274 ? 0.026 ns/op >> SecondarySupersLookup.testNegative01 avgt 15 12.300 ? 0.039 ns/op >> SecondarySupersLookup.testNegative02 avgt 15 12.304 ? 0.034 ns/op >> SecondarySupersLookup.testNegative03 avgt 15 12.276 ? 0.050 ns/op >> SecondarySupersLookup.testNegative04 avgt 15 12.235 ? 0.044 ns/op >> SecondarySupersLookup.testNegative05 avgt 15 12.308 ? 0.156 ns/op >> SecondarySupersLookup.testNegative06 avgt 15 12.291 ? 0.048 ns/op >> SecondarySupersLookup.testNegative07 avgt 15 12.307 ? 0.052 ns/op >> SecondarySupersLookup.testNegative08 avgt 15 12.398 ? 0.075 ns/op >> SecondarySupersLookup.testNegative09 avgt 15 12.552 ? 0.122 ns/op >> SecondarySupersLookup.testNegative10 avgt 15 12.490 ? 0.083 ns/op >> SecondarySupersLookup.testNegative16 avgt 15 12.565 ? 0.092 ns/op >> SecondarySupersLookup.testNegative20 avgt 15 19.059 ? 0.958 ns/op >> SecondarySupersLookup.testNegative30 avgt 15 19.268 ? 0.124 ns/op >> SecondarySupersLookup.testNegative32 avgt 15 20.059 ? 0.114 ns/op >> SecondarySupersLookup.testNegative40 avgt 15 25.117 ? 0.368 ns/op >> SecondarySupersLookup.testNegative50 avgt 15 32.735 ? 0.359 ns/op >> SecondarySupersLookup.testNegative55 avgt 15 34.866 ? 0.152 ns/op >> SecondarySupersLookup.testNegative56 avgt 15 35.492 ? 0.276 ns/op >> SecondarySupersLookup.testNegative57 avgt 15 36.620 ? 0.334 ns/op >> SecondarySupersLookup.testNegative58 avgt 15 37.226 ? 0.180 ns/op >> SecondarySupersLookup.testNegative59 avgt 15 37.774 ? 0.241 ns/op >> SecondarySupersLookup.testNegative60 avgt 15 38.627 ? 1.451 ns/op >> SecondarySupersLookup.testNegative61 avgt 15 ... > > Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: > > Adapt assertion. We sometimes have only 1 element in the secondary supers array. src/hotspot/cpu/ppc/macroAssembler_ppc.cpp line 2201: > 2199: li(result, 1); // failure > 2200: // We test the MSB of r_array_index, i.e. its sign bit > 2201: bgt(CCR0, L_fallthrough); This looks wrong. Should be greater or equal. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19368#discussion_r1618364435 From jsjolen at openjdk.org Wed May 29 07:45:02 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Wed, 29 May 2024 07:45:02 GMT Subject: RFR: 8333047: Remove arena-size-workaround in jvmtiUtils.cpp In-Reply-To: References: Message-ID: On Tue, 28 May 2024 12:36:41 GMT, Thomas Stuefe wrote: > In `JvmtiUtil::single_threaded_resource_area()`, we create a resource area that is supposed to work even if the current thread is not attached yet and there is no associated Thread or the Thread has no valid ResourceArea. > > It contains a workaround: > > > // lazily create the single threaded resource area > // pick a size which is not a standard since the pools don't exist yet > _single_threaded_resource_area = new (mtInternal) ResourceArea(Chunk::non_pool_size); > > > It specifies a non-standard chunk size to circumvent the chunk-pool-based allocation in the RA constructor, ensuring that only malloc is used. This is because in the old days the ChunkPools had been allocated from C-Heap and there was a time window when no chunk pools were live yet. > > This is quirky and a bit ugly. It is also unnecessary since [JDK-8272112](https://bugs.openjdk.org/browse/JDK-8272112) (since JDK 18). We now create chunk pools as global objects, so they are live as soon as the libjvm C++ initialization ran. We can remove this workaround and the comment. > > --- > > Tests: GHAs. > I also manually called this function, and allocated from the resulting ResourceArea, at the very beginning of CreateJavaVM. I made sure that both allocations and follow-up-chunk-allocation worked even this early in VM life. Today, the ChunkPools are allocated before main through static initialization. That means that the ChunkPools exists when main starts executing, so this is safe. ------------- Marked as reviewed by jsjolen (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19425#pullrequestreview-2084558186 From mdoerr at openjdk.org Wed May 29 08:14:29 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 29 May 2024 08:14:29 GMT Subject: RFR: 8331117: [PPC64] secondary_super_cache does not scale well [v4] In-Reply-To: References: Message-ID: > PPC64 implementation of [JDK-8180450](https://bugs.openjdk.org/browse/JDK-8180450). Please review! > I noticed that `r_array_length` is sometimes 0 and I don't see code for that on x86. Any idea? > How can we verify it? By comparing the performance using the micro benchmarks? > > Micro benchmark results without patch (measured on Power10 with 2*8 hardware threads): > > Original > SecondarySuperCacheHits: 13.033 ?(99.9%) 0.058 ns/op [Average] > SecondarySuperCacheInterContention.test avgt 15 432.366 ? 8.364 ns/op > SecondarySuperCacheInterContention.test:t1 avgt 15 432.310 ? 8.460 ns/op > SecondarySuperCacheInterContention.test:t2 avgt 15 432.422 ? 10.819 ns/op > SecondarySuperCacheIntraContention.test avgt 15 355.192 ? 3.597 ns/op > SecondarySupersLookup.testNegative00 avgt 15 12.274 ? 0.026 ns/op > SecondarySupersLookup.testNegative01 avgt 15 12.300 ? 0.039 ns/op > SecondarySupersLookup.testNegative02 avgt 15 12.304 ? 0.034 ns/op > SecondarySupersLookup.testNegative03 avgt 15 12.276 ? 0.050 ns/op > SecondarySupersLookup.testNegative04 avgt 15 12.235 ? 0.044 ns/op > SecondarySupersLookup.testNegative05 avgt 15 12.308 ? 0.156 ns/op > SecondarySupersLookup.testNegative06 avgt 15 12.291 ? 0.048 ns/op > SecondarySupersLookup.testNegative07 avgt 15 12.307 ? 0.052 ns/op > SecondarySupersLookup.testNegative08 avgt 15 12.398 ? 0.075 ns/op > SecondarySupersLookup.testNegative09 avgt 15 12.552 ? 0.122 ns/op > SecondarySupersLookup.testNegative10 avgt 15 12.490 ? 0.083 ns/op > SecondarySupersLookup.testNegative16 avgt 15 12.565 ? 0.092 ns/op > SecondarySupersLookup.testNegative20 avgt 15 19.059 ? 0.958 ns/op > SecondarySupersLookup.testNegative30 avgt 15 19.268 ? 0.124 ns/op > SecondarySupersLookup.testNegative32 avgt 15 20.059 ? 0.114 ns/op > SecondarySupersLookup.testNegative40 avgt 15 25.117 ? 0.368 ns/op > SecondarySupersLookup.testNegative50 avgt 15 32.735 ? 0.359 ns/op > SecondarySupersLookup.testNegative55 avgt 15 34.866 ? 0.152 ns/op > SecondarySupersLookup.testNegative56 avgt 15 35.492 ? 0.276 ns/op > SecondarySupersLookup.testNegative57 avgt 15 36.620 ? 0.334 ns/op > SecondarySupersLookup.testNegative58 avgt 15 37.226 ? 0.180 ns/op > SecondarySupersLookup.testNegative59 avgt 15 37.774 ? 0.241 ns/op > SecondarySupersLookup.testNegative60 avgt 15 38.627 ? 1.451 ns/op > SecondarySupersLookup.testNegative61 avgt 15 39.395 ? 0.249 ns/op > SecondarySupersLookup.testNegative62 avgt 15 ... Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: Fix check for sign bit. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19368/files - new: https://git.openjdk.org/jdk/pull/19368/files/c1840719..14fc650f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19368&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19368&range=02-03 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/19368.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19368/head:pull/19368 PR: https://git.openjdk.org/jdk/pull/19368 From mdoerr at openjdk.org Wed May 29 08:14:29 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 29 May 2024 08:14:29 GMT Subject: RFR: 8331117: [PPC64] secondary_super_cache does not scale well [v3] In-Reply-To: References: <0YQF8jE_JFiy_K34aIy6cybUwnpp47-6jrnmZ3jbcAI=.c6663758-17f6-40f8-a738-4e4bf7e9ddaf@github.com> Message-ID: On Wed, 29 May 2024 07:40:21 GMT, Andrew Haley wrote: >> Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: >> >> Adapt assertion. We sometimes have only 1 element in the secondary supers array. > > src/hotspot/cpu/ppc/macroAssembler_ppc.cpp line 2201: > >> 2199: li(result, 1); // failure >> 2200: // We test the MSB of r_array_index, i.e. its sign bit >> 2201: bgt(CCR0, L_fallthrough); > > This looks wrong. Should be greater or equal. Right. Fixed. Thank you! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19368#discussion_r1618416597 From crschnick at xpipe.io Wed May 29 08:23:48 2024 From: crschnick at xpipe.io (Christopher Schnick) Date: Wed, 29 May 2024 10:23:48 +0200 Subject: [EXTERNAL] Re: External _JAVA_OPTIONS environment variable sourcing for self-contained applications In-Reply-To: References: <1bc8a1a8-5adf-4a00-800c-cfe626608ae6@oracle.com> <918f3a96-cc75-43a5-b19b-fefe063e82ea@oracle.com> <285f99c9-0689-4059-b9c4-860879332465@xpipe.io> Message-ID: <10c34c7d-fedc-4a55-909c-28180fb74093@xpipe.io> So is there any update on this? From the existing discussion, it was still not apparent whether the hotspot developers consider this being a problem that should be fixed properly. There were already a few possible solutions proposed in this thread. If the core hotspot developers are a little bit out of their element when it comes to deployment models in practice and how to best approach this, I am sure that you can consult with some trusted colleagues that deal with this more often and can maybe share their opinion on this. Best Christopher Schnick On 10/05/2024 18:40, Bruno Borges wrote: > Java runtime sharing (among multiple applications in the same > environment) has become less and less important, and I think that is > what those environment variables were meant for, to ensure any JVM > would start with values from these env vars. > > But deployment models have certainly evolved: > > * > More than half of Java applications in the Cloud are deployed as > containers (see New Relic report from 2023). > * > Java applications deployed to Virtual Machines tend to have > exclusivity over the VM resources. Example: big data solutions are > pushed to VMs dedicated to them. > * > Developers tend to have multiple JDKs installed these days, from 8 > all the way to 21. Expecting flags in those environment variables > to work consistently across all versions is unrealistic. > * > Some developer tools have been shipping their own java runtimes > for quite some time already (e.g. JetBrains and Eclipse IDEs). > > > I do like Christopher's suggestion of an option in the JVM to disable > environment variable sourcing of _JAVA_OPTIONS and JAVA_TOOL_OPTIONS. > It gives back control to the application developer on how the runtime > should behave, especially in the scenario of Java desktop > applications, and it would align with the intents of jlink/jpackage. > > > ------------------------------------------------------------------------ > *From:* hotspot-dev on behalf of > Christopher Schnick > *Sent:* May 10, 2024 3:42 AM > *To:* David Holmes > *Cc:* hotspot-dev at openjdk.org > *Subject:* [EXTERNAL] Re: External _JAVA_OPTIONS environment variable > sourcing for self-contained applications > [You don't often get email from crschnick at xpipe.io. Learn why this is > important at https://aka.ms/LearnAboutSenderIdentification ] > > ?From my perspective, it doesn't really matter which environment > variable you're talking about. Even if there are small differences in > which order they apply, they generally all cause the issue of a global > configuration interfering with a local isolated self contained runtime > image. So _JAVA_OPTIONS and JAVA_TOOL_OPTIONS cause the same problems, > with only minor differences. > > In practice, global environment variables are intended for things like > Java 8 applications that run via a globally installed JRE. The huge > issue is that there is a chance of an option being included in there > that is not supported by more recent JVMs like one for Java 21. If this > is the case, then ALL self contained graphical Java applications don't > even start up due to an unrecognized option and don't show an error > message (If you are running a console based application, then it prints > something but for desktop applications there is nothing). As of right > now, there is no possibility of running a global JRE/JDK configured with > certain environment variable options on the same system as a self > contained Java application created with the available JDK tools if the > options are not exactly compatible. That problem is especially relevant > when running JVMs from different vendors for different applications as > they differentiate themselves through options. One incompatible option > is all it takes for nothing to run anymore. > > There are multiple different possibilities that I can think of to > somehow improve this situation: > > - Give developers the option to unset these variables in the > automatically generated launcher script for jlink. Technically one can > modify the launcher script manually, but since it is automatically > generated in the beginning, it would be nicer if jlink could do that > automatically. Also give developers the option to do the same thing in > the generated native jpackage launcher executable. There's currently no > other way in jpackage to set any environment variables. > > - Add some form of JVM option to disable environment variable sourcing > for other JVM options. That way this option could be passed in jlink and > jpackage, not requiring any modifications to the jlink and jpackage > tools. This would also be a good solution. Such an option would also be > useful for quick debugging in other cases. > > On 10/05/2024 01:47, David Holmes wrote: > > On 9/05/2024 5:40 pm, Alan Bateman wrote: > >> On 09/05/2024 08:03, David Holmes wrote: > >>> > >>> How does such a jpackaged application actually launch/load the JVM? > >>> I'm wondering if there is a way to insert a new "shell" environment > >>> to launch the JVM without having those env vars present ... though I > >>> guess there may be other env vars that your application still needs. > >> > >> For modular applications, there is a jlink option to generate a > >> launcher (script) for the application. That's a potential place to > >> unset environment variables that shouldn't be inherited.? It may not > >> help here as it sounds like this is an application image produced by > >> jpackage with a native launcher, and the warning message is hidden as > >> there is no console (I assume). > >> > >> I think we should consider deprecating and eventually removing > >> _JAVA_OPTIONS. It's always been problematic that it appends rather > >> than prepend and it has issues in areas such as quoting. When > >> JDK_JAVA_OPTIONS was added then we had hoped that developers would > >> move from the undocumented env variable. The new env variable fixes a > >> bunch of things in the areas of quoting, arg files, works with > >> launcher options, and it of course prepends so it doesn't override > >> options. > > > > I think overriding options was a feature of `_JAVA_OPTIONS` not a bug > > - at least at the time. :) But deployment models have evolved (to a > > point where I don't even know/understand how things get deployed these > > days and who has control of the command-line and/or the env!). > > Deprecation may be a reasonable thing but doesn't help the current > > situation. > > > > David > > > >> -Alan -------------- next part -------------- An HTML attachment was scrubbed... URL: From ayang at openjdk.org Wed May 29 08:41:05 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Wed, 29 May 2024 08:41:05 GMT Subject: RFR: JDK-8324341 : Remove redundant preprocessor #if's checks In-Reply-To: References: Message-ID: On Fri, 24 May 2024 02:01:36 GMT, Cesar Soares Lucas wrote: > Can I please get some reviews for this change to remove some redundant #if / #ifdefs ? > > My search was just a simple grep + some bash script, though. I tested using JTREG on MacOS, Linux Mariner & Alpine from tier1 to 3. Marked as reviewed by ayang (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/19378#pullrequestreview-2084709516 From jsjolen at openjdk.org Wed May 29 08:51:35 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Wed, 29 May 2024 08:51:35 GMT Subject: RFR: 8312132: Add tracking of multiple address spaces in NMT [v114] In-Reply-To: References: Message-ID: > Hi, > > This PR introduces a new abstraction to NMT, named `MemoryFileTracker`. Today, NMT does not track any memory outside of the virtual memory address space. This means that if you allocated memory in something such as a memory-backed file and use `mmap` to map into that memory, then you'll have trouble reporting this to NMT. This is the situation that ZGC is in, and that is what this patch attempts to fix. > > ## `MemoryFileTracker` > > The `MemoryFileTracker` adds the ability of adding new virtual memory address spaces to NMT and committing memory to these, the basic API is: > > ```c++ > static MemoryFile* make_device(const char* descriptive_name); > static void free_device(MemoryFile* device); > > static void allocate_memory(MemoryFile* device, size_t offset, size_t size, > MEMFLAGS flag, const NativeCallStack& stack); > static void free_memory(MemoryFile* device, size_t offset, size_t size); > > > It is easiest to see how this is used by looking at what ZGC's `ZNMT` class does: > > ```c++ > void ZNMT::reserve(zaddress_unsafe start, size_t size) { > MemTracker::record_virtual_memory_reserve((address)start, size, CALLER_PC, mtJavaHeap); > } > void ZNMT::commit(zoffset offset, size_t size) { > MemTracker::allocate_memory_in(ZNMT::_device, static_cast(offset), size, mtJavaHeap, CALLER_PC); > } > void ZNMT::uncommit(zoffset offset, size_t size) { > MemTracker::free_memory_in(ZNMT::_device, (size_t)offset, size); > } > > void ZNMT::map(zaddress_unsafe addr, size_t size, zoffset offset) { > // NMT doesn't track mappings at the moment. > } > void ZNMT::unmap(zaddress_unsafe addr, size_t size) { > // NMT doesn't track mappings at the moment. > } > > > As you can see, any mapping between reserved regions and device-allocated memory is not recorded in NMT. This means that in detailed mode you only get reserved regions printed for the reserved memory, the device-allocated memory is reported separately. When performing summary reporting any memory allocated via these devices is added to the corresponding `MEMFLAGS` as `committed` memory. > > This patch is also acting as a base on which we deploy multiple new backend ideas to NMT. These ideas are: > > 1. Implement VMA tracking using a balanced binary tree approach. Today's `VirtualMemoryTracker`'s usage of linked lists is slow and brittle, we'd like to move away from it. Our Treap-based approach in this patch gives a performance boost such that we see 25x better performance in a benchmark. The idea and draft of this... Johan Sj?len has updated the pull request incrementally with two additional commits since the last revision: - Improve tests - Use inner type def ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18289/files - new: https://git.openjdk.org/jdk/pull/18289/files/90b6f6ae..885bc480 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18289&range=113 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18289&range=112-113 Stats: 59 lines in 4 files changed: 21 ins; 14 del; 24 mod Patch: https://git.openjdk.org/jdk/pull/18289.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18289/head:pull/18289 PR: https://git.openjdk.org/jdk/pull/18289 From mbaesken at openjdk.org Wed May 29 09:14:10 2024 From: mbaesken at openjdk.org (Matthias Baesken) Date: Wed, 29 May 2024 09:14:10 GMT Subject: RFR: 8333149: ubsan : memset on nullptr target detected in jvmtiEnvBase.cpp get_object_monitor_usage Message-ID: When running with ubsan - enabled binaries (--enable-ubsan), in the vmTestbase/nsk/jdi tests some cases of memset on nullptr destinations are detected in get_object_monitor_usage . // null out memory for robustness memset(ret.waiters, 0, ret.waiter_count * sizeof(jthread *)); memset(ret.notify_waiters, 0, ret.notify_waiter_count * sizeof(jthread *)); probably we should add checks there. Example : vmTestbase/nsk/jdi/ObjectReference/entryCount/entrycount002/TestDescription.jtr debugee.stderr> /src/hotspot/share/prims/jvmtiEnvBase.cpp:1560:11: runtime error: null pointer passed as argument 1, which is declared to never be null debugee.stderr> #0 0x7ffb2568559c in JvmtiEnvBase::get_object_monitor_usage(JavaThread*, _jobject*, jvmtiMonitorUsage*) src/hotspot/share/prims/jvmtiEnvBase.cpp:1560 debugee.stderr> #1 0x7ffb27987bd7 in VM_GetObjectMonitorUsage::doit() src/hotspot/share/prims/jvmtiEnvBase.hpp:594 debugee.stderr> #2 0x7ffb28ddc2dd in VM_Operation::evaluate() src/hotspot/share/runtime/vmOperations.cpp:75 debugee.stderr> #3 0x7ffb28deac41 in VMThread::evaluate_operation(VM_Operation*) src/hotspot/share/runtime/vmThread.cpp:283 debugee.stderr> #4 0x7ffb28decc4f in VMThread::inner_execute(VM_Operation*) src/hotspot/share/runtime/vmThread.cpp:427 debugee.stderr> #5 0x7ffb28ded7b9 in VMThread::loop() src/hotspot/share/runtime/vmThread.cpp:493 debugee.stderr> #6 0x7ffb28ded8a7 in VMThread::run() src/hotspot/share/runtime/vmThread.cpp:177 debugee.stderr> #7 0x7ffb28b7e31a in Thread::call_run() src/hotspot/share/runtime/thread.cpp:225 debugee.stderr> #8 0x7ffb281c4971 in thread_native_entry src/hotspot/os/linux/os_linux.cpp:846 debugee.stderr> #9 0x7ffb2df416e9 in start_thread (/lib64/libpthread.so.0+0xa6e9) (BuildId: 2f8d3c2d0f4d7888c2598d2ff6356537f5708a73) debugee.stderr> #10 0x7ffb2d51550e in clone (/lib64/libc.so.6+0x11850e) (BuildId: f732026552f6adff988b338e92d466bc81a01c37) vmTestbase/nsk/jdi/ObjectReference/owningThread/owningthread002/TestDescription.jtr debugee.stderr> /src/hotspot/share/prims/jvmtiEnvBase.cpp:1561:11: runtime error: null pointer passed as argument 1, which is declared to never be null debugee.stderr> #0 0x7f1e070855bb in JvmtiEnvBase::get_object_monitor_usage(JavaThread*, _jobject*, jvmtiMonitorUsage*) src/hotspot/share/prims/jvmtiEnvBase.cpp:1561 debugee.stderr> #1 0x7f1e09387bd7 in VM_GetObjectMonitorUsage::doit() src/hotspot/share/prims/jvmtiEnvBase.hpp:594 debugee.stderr> #2 0x7f1e0a7dc2dd in VM_Operation::evaluate() src/hotspot/share/runtime/vmOperations.cpp:75 debugee.stderr> #3 0x7f1e0a7eac41 in VMThread::evaluate_operation(VM_Operation*) src/hotspot/share/runtime/vmThread.cpp:283 debugee.stderr> #4 0x7f1e0a7ecc4f in VMThread::inner_execute(VM_Operation*) src/hotspot/share/runtime/vmThread.cpp:427 debugee.stderr> #5 0x7f1e0a7ed7b9 in VMThread::loop() src/hotspot/share/runtime/vmThread.cpp:493 debugee.stderr> #6 0x7f1e0a7ed8a7 in VMThread::run() src/hotspot/share/runtime/vmThread.cpp:177 debugee.stderr> #7 0x7f1e0a57e31a in Thread::call_run() src/hotspot/share/runtime/thread.cpp:225 debugee.stderr> #8 0x7f1e09bc4971 in thread_native_entry src/hotspot/os/linux/os_linux.cpp:846 debugee.stderr> #9 0x7f1e0f9bf6e9 in start_thread (/lib64/libpthread.so.0+0xa6e9) (BuildId: 2f8d3c2d0f4d7888c2598d2ff6356537f5708a73) debugee.stderr> #10 0x7f1e0ef1550e in clone (/lib64/libc.so.6+0x11850e) (BuildId: f732026552f6adff988b338e92d466bc81a01c37) ------------- Commit messages: - JDK-8333149 Changes: https://git.openjdk.org/jdk/pull/19450/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19450&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8333149 Stats: 6 lines in 1 file changed: 4 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/19450.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19450/head:pull/19450 PR: https://git.openjdk.org/jdk/pull/19450 From cslucas at openjdk.org Wed May 29 09:39:09 2024 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Wed, 29 May 2024 09:39:09 GMT Subject: Integrated: JDK-8324341 : Remove redundant preprocessor #if's checks In-Reply-To: References: Message-ID: On Fri, 24 May 2024 02:01:36 GMT, Cesar Soares Lucas wrote: > Can I please get some reviews for this change to remove some redundant #if / #ifdefs ? > > My search was just a simple grep + some bash script, though. I tested using JTREG on MacOS, Linux Mariner & Alpine from tier1 to 3. This pull request has now been integrated. Changeset: 6d718ae5 Author: Cesar Soares Lucas Committer: Albert Mingkun Yang URL: https://git.openjdk.org/jdk/commit/6d718ae51aeb7143ebfa561501b87fe1ba48039a Stats: 16 lines in 6 files changed: 0 ins; 16 del; 0 mod 8324341: Remove redundant preprocessor #if's checks Reviewed-by: kvn, ayang ------------- PR: https://git.openjdk.org/jdk/pull/19378 From ayang at openjdk.org Wed May 29 09:40:04 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Wed, 29 May 2024 09:40:04 GMT Subject: RFR: 8332936: Test vmTestbase/metaspace/gc/watermark_70_80/TestDescription.java fails with no GC's recorded In-Reply-To: References: Message-ID: On Tue, 28 May 2024 09:25:29 GMT, Thomas Schatzl wrote: > Hi all, > > please review this change to exclude the watermark tests from use with -Xcomp. > > The failures reported are related to -Xcomp triggering the wrong kind of garbage collection pauses (CodeCache related GCs instead of Metadata related GCs) the test then fails on. > > The proposed solution is to just disable the tests with -Xcomp: the tests are not related to compilation at all. > > Testing: local, gha > > Thanks, > Thomas Marked as reviewed by ayang (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/19421#pullrequestreview-2084872715 From aph at openjdk.org Wed May 29 10:00:08 2024 From: aph at openjdk.org (Andrew Haley) Date: Wed, 29 May 2024 10:00:08 GMT Subject: RFR: 8331558: AArch64: optimize integer remainder [v9] In-Reply-To: References: Message-ID: On Fri, 10 May 2024 02:17:12 GMT, Jin Guojie wrote: >> 8331558: AArch64: optimize integer remainder >> On some Arm processors, a separate multiply/subtract is actually faster than the combined instruction. >> >> 8331556: AArch64: CPU_Model support for Neoverse N1/N2/V1/V2 >> Add full platform coverage for Neoverse variants in vm_version.?pp >> >> The following test has passed, which shows definite performance improvement. >> >> make test TEST="micro:java.lang.IntegerDivMod" >> make test TEST="micro:java.lang.LongDivMod" >> >> * IntegerDivMod.testDivideRemainderUnsigned >> baseline(ns/ops) 2223 >> with this pacth(ns/ops) 1885 >> improvement(%) 17.93% >> >> * IntegerDivMod.testRemainderUnsigned >> baseline(ns/ops) 2225 >> with this pacth(ns/ops) 1885 >> improvement(%) 18.03% >> >> * LongDivMod.testDivideRemainderUnsigned >> baseline(ns/ops) 2231 >> with this pacth(ns/ops) 1894 >> improvement(%) 17.79% >> >> * LongDivMod.testRemainderUnsigned >> baseline(ns/ops) 2232 >> with this pacth(ns/ops) 1891 >> improvement(%) 18.03% > > Jin Guojie has updated the pull request incrementally with two additional commits since the last revision: > > - Move big functions out of macroAssembler_aarch64.hpp > - Fix is_neoverse() > > These macros (CPU_MODEL_NEOVERSE_N1...) are definitions of is_model, not _cpu. This had to be pulled out because of regressions, but it's not hard to resubmit a correct version. @jinguojie-alibaba , are you interested in fixing this? ------------- PR Comment: https://git.openjdk.org/jdk/pull/19093#issuecomment-2137019782 From aph-open at littlepinkcloud.com Wed May 29 10:05:38 2024 From: aph-open at littlepinkcloud.com (Andrew Haley) Date: Wed, 29 May 2024 11:05:38 +0100 Subject: [EXTERNAL] Re: External _JAVA_OPTIONS environment variable sourcing for self-contained applications In-Reply-To: <10c34c7d-fedc-4a55-909c-28180fb74093@xpipe.io> References: <1bc8a1a8-5adf-4a00-800c-cfe626608ae6@oracle.com> <918f3a96-cc75-43a5-b19b-fefe063e82ea@oracle.com> <285f99c9-0689-4059-b9c4-860879332465@xpipe.io> <10c34c7d-fedc-4a55-909c-28180fb74093@xpipe.io> Message-ID: <999d912a-68ad-4c5d-8b88-ef93d3b5d6f0@littlepinkcloud.com> On 5/29/24 09:23, Christopher Schnick wrote: > So is there any update on this? From the existing discussion, it was still not apparent whether the hotspot developers consider this being a problem that should be fixed properly. There were already a few possible solutions proposed in this thread. I don't think there were many that would pass a compatibility and specification review. "Give developers the option to unset these variables in the automatically generated launcher script for jlink" might well be OK, though. It'd be worth a try. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From azafari at openjdk.org Wed May 29 10:52:01 2024 From: azafari at openjdk.org (Afshin Zafari) Date: Wed, 29 May 2024 10:52:01 GMT Subject: RFR: 8331539: [REDO] NMT: add/make a mandatory MEMFLAGS argument to family of os::reserve/commit/uncommit memory API [v2] In-Reply-To: References: <1i0PKv9mCusM6BZqXG8ULe0lRA2Nz2ix4aZHz9otNMM=.b9d2d151-883e-4cb6-be48-4ba45b49ed43@github.com> <_M5SvhyN_E_8HUeamhiLJMp37flhjgTVE_X7t8jmPVc=.f86cbb23-9461-4013-83bf-d6b154b96cfd@github.com> <9XzKmn3xJvlbw4gz2vK_NZ6yOwfKB9VzHE6CBSz-73E=.dfaa5291-95b6-403d-b363-42131ebf4c4c@github.com> <8_W3gPFqX8RC7V2QvFSmKAOTEK4z6uHOf4NnA0RDp7A=.428d32d1-0624-48de-ac7b-0f1acc6a0a14@github.com> Message-ID: On Wed, 29 May 2024 07:39:24 GMT, Stefan Karlsson wrote: >> The call to `MemTracker::record_virtual_memory_split_reserved` at line 1364, takes two flags for the split parts. The corresponding regions in NMT take that flags. >> The sanity assertions will be added anyway. > > I think you are still missing my point. `record_virtual_memory_split_reserved` doesn't update the `archive_space_rs` and `class_space_rs` instances with the correct flag. Oh yes sorry. I was referring to the regions and flags in NMT. Since this PR is going to be closed without merge, no need to do any change in the code. But I remember it for future. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19343#discussion_r1618672203 From sspitsyn at openjdk.org Wed May 29 11:02:01 2024 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Wed, 29 May 2024 11:02:01 GMT Subject: RFR: 8333047: Remove arena-size-workaround in jvmtiUtils.cpp In-Reply-To: References: Message-ID: On Tue, 28 May 2024 12:36:41 GMT, Thomas Stuefe wrote: > In `JvmtiUtil::single_threaded_resource_area()`, we create a resource area that is supposed to work even if the current thread is not attached yet and there is no associated Thread or the Thread has no valid ResourceArea. > > It contains a workaround: > > > // lazily create the single threaded resource area > // pick a size which is not a standard since the pools don't exist yet > _single_threaded_resource_area = new (mtInternal) ResourceArea(Chunk::non_pool_size); > > > It specifies a non-standard chunk size to circumvent the chunk-pool-based allocation in the RA constructor, ensuring that only malloc is used. This is because in the old days the ChunkPools had been allocated from C-Heap and there was a time window when no chunk pools were live yet. > > This is quirky and a bit ugly. It is also unnecessary since [JDK-8272112](https://bugs.openjdk.org/browse/JDK-8272112) (since JDK 18). We now create chunk pools as global objects, so they are live as soon as the libjvm C++ initialization ran. We can remove this workaround and the comment. > > --- > > Tests: GHAs. > I also manually called this function, and allocated from the resulting ResourceArea, at the very beginning of CreateJavaVM. I made sure that both allocations and follow-up-chunk-allocation worked even this early in VM life. Marked as reviewed by sspitsyn (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/19425#pullrequestreview-2085073625 From aph at openjdk.org Wed May 29 11:10:05 2024 From: aph at openjdk.org (Andrew Haley) Date: Wed, 29 May 2024 11:10:05 GMT Subject: RFR: 8325821: [REDO] use "dmb.ishst+dmb.ishld" for release barrier [v5] In-Reply-To: References: <7eML4nr0XN1_QVOO_2tk-yXf8W578S4qb1kA3AoaU8w=.81b03ff5-7ba8-496d-acfe-285ba3de2004@github.com> Message-ID: On Thu, 23 May 2024 05:59:03 GMT, kuaiwei wrote: >> Yes, usually they can be merged in macroAssembler. but it can help to reduce the possibility of unmerged case. Thanks to point it. > > I checked code again. They will be merged if enable AlwaysMergeDMB. So we can skip the check. Add a comment: `// These will be merged if AlwaysMergeDMB is enabled.` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19278#discussion_r1618693680 From aph at openjdk.org Wed May 29 11:16:03 2024 From: aph at openjdk.org (Andrew Haley) Date: Wed, 29 May 2024 11:16:03 GMT Subject: RFR: 8325821: [REDO] use "dmb.ishst+dmb.ishld" for release barrier [v2] In-Reply-To: References: Message-ID: On Wed, 22 May 2024 11:05:49 GMT, Aleksey Shipilev wrote: >> kuaiwei has updated the pull request incrementally with one additional commit since the last revision: >> >> Make MacroAssembler::merge more clear > > Cursory review: This looks ready to me. I think we need jcstress with C1 and C2, and we should be done. @shipilev , do you agree? ------------- PR Comment: https://git.openjdk.org/jdk/pull/19278#issuecomment-2137155885 From shade at openjdk.org Wed May 29 11:16:04 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 29 May 2024 11:16:04 GMT Subject: RFR: 8325821: [REDO] use "dmb.ishst+dmb.ishld" for release barrier [v2] In-Reply-To: References: Message-ID: On Wed, 22 May 2024 11:05:49 GMT, Aleksey Shipilev wrote: >> kuaiwei has updated the pull request incrementally with one additional commit since the last revision: >> >> Make MacroAssembler::merge more clear > > Cursory review: > This looks ready to me. I think we need jcstress with C1 and C2, and we should be done. @shipilev , do you agree? Yes. Just run jcstress with defaults, maybe limiting the time budget to about 24 hours, and we are done. Default configuration would work through different combinations of C1/C2 compilations for all actors, which is what we want to check for this change: that we don't mess up the barrier emitting scheme in different compilers/interpreters. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19278#issuecomment-2137157664 From sspitsyn at openjdk.org Wed May 29 11:18:02 2024 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Wed, 29 May 2024 11:18:02 GMT Subject: RFR: 8333149: ubsan : memset on nullptr target detected in jvmtiEnvBase.cpp get_object_monitor_usage In-Reply-To: References: Message-ID: On Wed, 29 May 2024 09:09:16 GMT, Matthias Baesken wrote: > When running with ubsan - enabled binaries (--enable-ubsan), > in the vmTestbase/nsk/jdi tests some cases of memset on nullptr destinations are detected in get_object_monitor_usage . > > // null out memory for robustness > memset(ret.waiters, 0, ret.waiter_count * sizeof(jthread *)); > memset(ret.notify_waiters, 0, ret.notify_waiter_count * sizeof(jthread *)); > > probably we should add checks there. > Example : > vmTestbase/nsk/jdi/ObjectReference/entryCount/entrycount002/TestDescription.jtr > > debugee.stderr> /src/hotspot/share/prims/jvmtiEnvBase.cpp:1560:11: runtime error: null pointer passed as argument 1, which is declared to never be null > debugee.stderr> #0 0x7ffb2568559c in JvmtiEnvBase::get_object_monitor_usage(JavaThread*, _jobject*, jvmtiMonitorUsage*) src/hotspot/share/prims/jvmtiEnvBase.cpp:1560 > debugee.stderr> #1 0x7ffb27987bd7 in VM_GetObjectMonitorUsage::doit() src/hotspot/share/prims/jvmtiEnvBase.hpp:594 > debugee.stderr> #2 0x7ffb28ddc2dd in VM_Operation::evaluate() src/hotspot/share/runtime/vmOperations.cpp:75 > debugee.stderr> #3 0x7ffb28deac41 in VMThread::evaluate_operation(VM_Operation*) src/hotspot/share/runtime/vmThread.cpp:283 > debugee.stderr> #4 0x7ffb28decc4f in VMThread::inner_execute(VM_Operation*) src/hotspot/share/runtime/vmThread.cpp:427 > debugee.stderr> #5 0x7ffb28ded7b9 in VMThread::loop() src/hotspot/share/runtime/vmThread.cpp:493 > debugee.stderr> #6 0x7ffb28ded8a7 in VMThread::run() src/hotspot/share/runtime/vmThread.cpp:177 > debugee.stderr> #7 0x7ffb28b7e31a in Thread::call_run() src/hotspot/share/runtime/thread.cpp:225 > debugee.stderr> #8 0x7ffb281c4971 in thread_native_entry src/hotspot/os/linux/os_linux.cpp:846 > debugee.stderr> #9 0x7ffb2df416e9 in start_thread (/lib64/libpthread.so.0+0xa6e9) (BuildId: 2f8d3c2d0f4d7888c2598d2ff6356537f5708a73) > debugee.stderr> #10 0x7ffb2d51550e in clone (/lib64/libc.so.6+0x11850e) (BuildId: f732026552f6adff988b338e92d466bc81a01c37) > > vmTestbase/nsk/jdi/ObjectReference/owningThread/owningthread002/TestDescription.jtr > > debugee.stderr> /src/hotspot/share/prims/jvmtiEnvBase.cpp:1561:11: runtime error: null pointer passed as argument 1, which is declared to never be null > debugee.stderr> #0 0x7f1e070855bb in JvmtiEnvBase::get_object_monitor_usage(JavaThread*, _jobject*, jvmtiMonitorUsage*) src/hotspot/share/prims/jvmtiEnvBase.cpp:1561 > debugee.stderr> #1 0x7f1e09387bd7 in VM_GetObjectMonitorUsage::doit() src/hotspot/share/prims/jvmtiEnvBase.hpp:594 > debugee.stderr> #2 0x7f1e0a7dc2dd in VM_Operation::evaluate() src/hotsp... Looks good. Thank you for taking care about it. ------------- Marked as reviewed by sspitsyn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19450#pullrequestreview-2085104582 From shade at openjdk.org Wed May 29 11:37:03 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 29 May 2024 11:37:03 GMT Subject: RFR: 8325821: [REDO] use "dmb.ishst+dmb.ishld" for release barrier [v5] In-Reply-To: References: Message-ID: On Mon, 27 May 2024 03:14:24 GMT, kuaiwei wrote: >> he origin patch for https://bugs.openjdk.org/browse/JDK-8324186 has 2 issues: >> 1 It show regression in some platform, like Apple silicon in mac os >> 2 Can not handle instruction sequence like "dmb.ishld; dmb.ishst; dmb.ishld; dmb.ishld" >> >> It can be fixed by: >> 1 Enable AlwaysMergeDMB by default, only disable it in architecture we can see performance improvement (N1 or N2) >> 2 Check the special pattern and merge the subsequent dmb. >> >> It also fix a bug when code buffer is expanding, st/ld/dmb can not be merged. I added unit tests for these. >> >> This patch still has a unhandled case. Insts like "dmb.ishld; dmb.ishst; dmb.ish", it will merge the last 2 instructions and can not merge all three. Because when emitting dmb.ish, if merge all previous dmbs, the code buffer will shrink the size. I think it may break some resumption and think it's not a common pattern. >> >> In previous PR https://github.com/openjdk/jdk/pull/18467 , I tried an implementation to use state machine for merging. But it looks risky to pending instruction during emitting. > > kuaiwei has updated the pull request incrementally with one additional commit since the last revision: > > Remove tailing white space Note that current jcstress run would likely fail due to [JDK-8332670](https://bugs.openjdk.org/browse/JDK-8332670). ------------- PR Comment: https://git.openjdk.org/jdk/pull/19278#issuecomment-2137194120 From mdoerr at openjdk.org Wed May 29 12:24:06 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 29 May 2024 12:24:06 GMT Subject: RFR: 8333149: ubsan : memset on nullptr target detected in jvmtiEnvBase.cpp get_object_monitor_usage In-Reply-To: References: Message-ID: On Wed, 29 May 2024 09:09:16 GMT, Matthias Baesken wrote: > When running with ubsan - enabled binaries (--enable-ubsan), > in the vmTestbase/nsk/jdi tests some cases of memset on nullptr destinations are detected in get_object_monitor_usage . > > // null out memory for robustness > memset(ret.waiters, 0, ret.waiter_count * sizeof(jthread *)); > memset(ret.notify_waiters, 0, ret.notify_waiter_count * sizeof(jthread *)); > > probably we should add checks there. > Example : > vmTestbase/nsk/jdi/ObjectReference/entryCount/entrycount002/TestDescription.jtr > > debugee.stderr> /src/hotspot/share/prims/jvmtiEnvBase.cpp:1560:11: runtime error: null pointer passed as argument 1, which is declared to never be null > debugee.stderr> #0 0x7ffb2568559c in JvmtiEnvBase::get_object_monitor_usage(JavaThread*, _jobject*, jvmtiMonitorUsage*) src/hotspot/share/prims/jvmtiEnvBase.cpp:1560 > debugee.stderr> #1 0x7ffb27987bd7 in VM_GetObjectMonitorUsage::doit() src/hotspot/share/prims/jvmtiEnvBase.hpp:594 > debugee.stderr> #2 0x7ffb28ddc2dd in VM_Operation::evaluate() src/hotspot/share/runtime/vmOperations.cpp:75 > debugee.stderr> #3 0x7ffb28deac41 in VMThread::evaluate_operation(VM_Operation*) src/hotspot/share/runtime/vmThread.cpp:283 > debugee.stderr> #4 0x7ffb28decc4f in VMThread::inner_execute(VM_Operation*) src/hotspot/share/runtime/vmThread.cpp:427 > debugee.stderr> #5 0x7ffb28ded7b9 in VMThread::loop() src/hotspot/share/runtime/vmThread.cpp:493 > debugee.stderr> #6 0x7ffb28ded8a7 in VMThread::run() src/hotspot/share/runtime/vmThread.cpp:177 > debugee.stderr> #7 0x7ffb28b7e31a in Thread::call_run() src/hotspot/share/runtime/thread.cpp:225 > debugee.stderr> #8 0x7ffb281c4971 in thread_native_entry src/hotspot/os/linux/os_linux.cpp:846 > debugee.stderr> #9 0x7ffb2df416e9 in start_thread (/lib64/libpthread.so.0+0xa6e9) (BuildId: 2f8d3c2d0f4d7888c2598d2ff6356537f5708a73) > debugee.stderr> #10 0x7ffb2d51550e in clone (/lib64/libc.so.6+0x11850e) (BuildId: f732026552f6adff988b338e92d466bc81a01c37) > > vmTestbase/nsk/jdi/ObjectReference/owningThread/owningthread002/TestDescription.jtr > > debugee.stderr> /src/hotspot/share/prims/jvmtiEnvBase.cpp:1561:11: runtime error: null pointer passed as argument 1, which is declared to never be null > debugee.stderr> #0 0x7f1e070855bb in JvmtiEnvBase::get_object_monitor_usage(JavaThread*, _jobject*, jvmtiMonitorUsage*) src/hotspot/share/prims/jvmtiEnvBase.cpp:1561 > debugee.stderr> #1 0x7f1e09387bd7 in VM_GetObjectMonitorUsage::doit() src/hotspot/share/prims/jvmtiEnvBase.hpp:594 > debugee.stderr> #2 0x7f1e0a7dc2dd in VM_Operation::evaluate() src/hotsp... LGTM. ------------- Marked as reviewed by mdoerr (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19450#pullrequestreview-2085244442 From mbaesken at openjdk.org Wed May 29 12:41:06 2024 From: mbaesken at openjdk.org (Matthias Baesken) Date: Wed, 29 May 2024 12:41:06 GMT Subject: RFR: 8333149: ubsan : memset on nullptr target detected in jvmtiEnvBase.cpp get_object_monitor_usage In-Reply-To: References: Message-ID: On Wed, 29 May 2024 09:09:16 GMT, Matthias Baesken wrote: > When running with ubsan - enabled binaries (--enable-ubsan), > in the vmTestbase/nsk/jdi tests some cases of memset on nullptr destinations are detected in get_object_monitor_usage . > > // null out memory for robustness > memset(ret.waiters, 0, ret.waiter_count * sizeof(jthread *)); > memset(ret.notify_waiters, 0, ret.notify_waiter_count * sizeof(jthread *)); > > probably we should add checks there. > Example : > vmTestbase/nsk/jdi/ObjectReference/entryCount/entrycount002/TestDescription.jtr > > debugee.stderr> /src/hotspot/share/prims/jvmtiEnvBase.cpp:1560:11: runtime error: null pointer passed as argument 1, which is declared to never be null > debugee.stderr> #0 0x7ffb2568559c in JvmtiEnvBase::get_object_monitor_usage(JavaThread*, _jobject*, jvmtiMonitorUsage*) src/hotspot/share/prims/jvmtiEnvBase.cpp:1560 > debugee.stderr> #1 0x7ffb27987bd7 in VM_GetObjectMonitorUsage::doit() src/hotspot/share/prims/jvmtiEnvBase.hpp:594 > debugee.stderr> #2 0x7ffb28ddc2dd in VM_Operation::evaluate() src/hotspot/share/runtime/vmOperations.cpp:75 > debugee.stderr> #3 0x7ffb28deac41 in VMThread::evaluate_operation(VM_Operation*) src/hotspot/share/runtime/vmThread.cpp:283 > debugee.stderr> #4 0x7ffb28decc4f in VMThread::inner_execute(VM_Operation*) src/hotspot/share/runtime/vmThread.cpp:427 > debugee.stderr> #5 0x7ffb28ded7b9 in VMThread::loop() src/hotspot/share/runtime/vmThread.cpp:493 > debugee.stderr> #6 0x7ffb28ded8a7 in VMThread::run() src/hotspot/share/runtime/vmThread.cpp:177 > debugee.stderr> #7 0x7ffb28b7e31a in Thread::call_run() src/hotspot/share/runtime/thread.cpp:225 > debugee.stderr> #8 0x7ffb281c4971 in thread_native_entry src/hotspot/os/linux/os_linux.cpp:846 > debugee.stderr> #9 0x7ffb2df416e9 in start_thread (/lib64/libpthread.so.0+0xa6e9) (BuildId: 2f8d3c2d0f4d7888c2598d2ff6356537f5708a73) > debugee.stderr> #10 0x7ffb2d51550e in clone (/lib64/libc.so.6+0x11850e) (BuildId: f732026552f6adff988b338e92d466bc81a01c37) > > vmTestbase/nsk/jdi/ObjectReference/owningThread/owningthread002/TestDescription.jtr > > debugee.stderr> /src/hotspot/share/prims/jvmtiEnvBase.cpp:1561:11: runtime error: null pointer passed as argument 1, which is declared to never be null > debugee.stderr> #0 0x7f1e070855bb in JvmtiEnvBase::get_object_monitor_usage(JavaThread*, _jobject*, jvmtiMonitorUsage*) src/hotspot/share/prims/jvmtiEnvBase.cpp:1561 > debugee.stderr> #1 0x7f1e09387bd7 in VM_GetObjectMonitorUsage::doit() src/hotspot/share/prims/jvmtiEnvBase.hpp:594 > debugee.stderr> #2 0x7f1e0a7dc2dd in VM_Operation::evaluate() src/hotsp... Hi Martin and Serguei, thanks for the reviews ! ------------- PR Comment: https://git.openjdk.org/jdk/pull/19450#issuecomment-2137313538 From mbaesken at openjdk.org Wed May 29 12:41:07 2024 From: mbaesken at openjdk.org (Matthias Baesken) Date: Wed, 29 May 2024 12:41:07 GMT Subject: Integrated: 8333149: ubsan : memset on nullptr target detected in jvmtiEnvBase.cpp get_object_monitor_usage In-Reply-To: References: Message-ID: On Wed, 29 May 2024 09:09:16 GMT, Matthias Baesken wrote: > When running with ubsan - enabled binaries (--enable-ubsan), > in the vmTestbase/nsk/jdi tests some cases of memset on nullptr destinations are detected in get_object_monitor_usage . > > // null out memory for robustness > memset(ret.waiters, 0, ret.waiter_count * sizeof(jthread *)); > memset(ret.notify_waiters, 0, ret.notify_waiter_count * sizeof(jthread *)); > > probably we should add checks there. > Example : > vmTestbase/nsk/jdi/ObjectReference/entryCount/entrycount002/TestDescription.jtr > > debugee.stderr> /src/hotspot/share/prims/jvmtiEnvBase.cpp:1560:11: runtime error: null pointer passed as argument 1, which is declared to never be null > debugee.stderr> #0 0x7ffb2568559c in JvmtiEnvBase::get_object_monitor_usage(JavaThread*, _jobject*, jvmtiMonitorUsage*) src/hotspot/share/prims/jvmtiEnvBase.cpp:1560 > debugee.stderr> #1 0x7ffb27987bd7 in VM_GetObjectMonitorUsage::doit() src/hotspot/share/prims/jvmtiEnvBase.hpp:594 > debugee.stderr> #2 0x7ffb28ddc2dd in VM_Operation::evaluate() src/hotspot/share/runtime/vmOperations.cpp:75 > debugee.stderr> #3 0x7ffb28deac41 in VMThread::evaluate_operation(VM_Operation*) src/hotspot/share/runtime/vmThread.cpp:283 > debugee.stderr> #4 0x7ffb28decc4f in VMThread::inner_execute(VM_Operation*) src/hotspot/share/runtime/vmThread.cpp:427 > debugee.stderr> #5 0x7ffb28ded7b9 in VMThread::loop() src/hotspot/share/runtime/vmThread.cpp:493 > debugee.stderr> #6 0x7ffb28ded8a7 in VMThread::run() src/hotspot/share/runtime/vmThread.cpp:177 > debugee.stderr> #7 0x7ffb28b7e31a in Thread::call_run() src/hotspot/share/runtime/thread.cpp:225 > debugee.stderr> #8 0x7ffb281c4971 in thread_native_entry src/hotspot/os/linux/os_linux.cpp:846 > debugee.stderr> #9 0x7ffb2df416e9 in start_thread (/lib64/libpthread.so.0+0xa6e9) (BuildId: 2f8d3c2d0f4d7888c2598d2ff6356537f5708a73) > debugee.stderr> #10 0x7ffb2d51550e in clone (/lib64/libc.so.6+0x11850e) (BuildId: f732026552f6adff988b338e92d466bc81a01c37) > > vmTestbase/nsk/jdi/ObjectReference/owningThread/owningthread002/TestDescription.jtr > > debugee.stderr> /src/hotspot/share/prims/jvmtiEnvBase.cpp:1561:11: runtime error: null pointer passed as argument 1, which is declared to never be null > debugee.stderr> #0 0x7f1e070855bb in JvmtiEnvBase::get_object_monitor_usage(JavaThread*, _jobject*, jvmtiMonitorUsage*) src/hotspot/share/prims/jvmtiEnvBase.cpp:1561 > debugee.stderr> #1 0x7f1e09387bd7 in VM_GetObjectMonitorUsage::doit() src/hotspot/share/prims/jvmtiEnvBase.hpp:594 > debugee.stderr> #2 0x7f1e0a7dc2dd in VM_Operation::evaluate() src/hotsp... This pull request has now been integrated. Changeset: 43a2f173 Author: Matthias Baesken URL: https://git.openjdk.org/jdk/commit/43a2f17342af8f5bf1f5823df9fa0bf0bdfdfce2 Stats: 6 lines in 1 file changed: 4 ins; 0 del; 2 mod 8333149: ubsan : memset on nullptr target detected in jvmtiEnvBase.cpp get_object_monitor_usage Reviewed-by: sspitsyn, mdoerr ------------- PR: https://git.openjdk.org/jdk/pull/19450 From zgu at openjdk.org Wed May 29 12:42:26 2024 From: zgu at openjdk.org (Zhengyu Gu) Date: Wed, 29 May 2024 12:42:26 GMT Subject: RFR: 8333129: Move ShrinkHeapInSteps flag to Serial GC Message-ID: A trivial change that moves Serial GC specific flag `ShrinkHeapInSteps` to `serial_globals.hpp` ------------- Commit messages: - 8333129: Move ShrinkHeapInSteps flag to Serial GC Changes: https://git.openjdk.org/jdk/pull/19452/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19452&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8333129 Stats: 15 lines in 2 files changed: 4 ins; 5 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/19452.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19452/head:pull/19452 PR: https://git.openjdk.org/jdk/pull/19452 From dholmes at openjdk.org Wed May 29 12:50:04 2024 From: dholmes at openjdk.org (David Holmes) Date: Wed, 29 May 2024 12:50:04 GMT Subject: RFR: 8330198: Add some class loading related perf counters to measure VM startup [v3] In-Reply-To: References: <7yfsvM0ff6gBYLefpro2qTcEMBmCOHd3YICcygItlZs=.d900a439-4932-46e6-b287-d1bf2789f195@github.com> Message-ID: On Wed, 29 May 2024 05:01:27 GMT, Calvin Cheung wrote: >> Yeah I'm not really getting the control aspects here. If I turn on logging I should not get these new counters unless I explicitly ask for them - simply turning on the logging should not set `ProfileClassLinkage` IMO. But enabling `ProfileClassLinkage` should turn on `init` logging, else it serves no purpose. > > We are planning to add more diagnostic flags to control different sets of counters. With the current design, the user just needs to specify `-Xlog:init` to enable all the "new" counters. If the `init` logging is enabled by individual flag, the user needs to enable individual flag in the command line. > Anyway, I think the follow would achieve what you are alluding to? > > if (FLAG_IS_CMDLINE(ProfileClassLinkage) && !log_is_enabled(Info, init)) { > LogConfiguration::configure_stdout(LogLevel::Info, true, LOG_TAGS(init)); > } > > I think it's better to keep the current change. This still seems convoluted to me. A -Xlog option shouldn't control anything but logging. If you want a set of counters enabled then use the flag to enable them, and separately use -Xlog:init to print them (though whether "init" is appropriate here is another matter). You could use -Xlog:init+foo to be more selective about which counters. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18790#discussion_r1618826710 From heidinga at openjdk.org Wed May 29 13:13:07 2024 From: heidinga at openjdk.org (Dan Heidinga) Date: Wed, 29 May 2024 13:13:07 GMT Subject: RFR: 8293980: Resolve CONSTANT_FieldRef at CDS dump time [v3] In-Reply-To: References: <6tYcoQdH8mEhbBRUoAcOi3Gue7Lz9qIjJh3GpcLKGGE=.0b286773-cc03-434e-88ca-2c9cc5efce67@github.com> Message-ID: On Sat, 25 May 2024 06:48:26 GMT, Ioi Lam wrote: >> ### Overview >> >> This PR archives `CONSTANT_FieldRef` entries in the _resolved_ state when it's safe to do so. >> >> I.e., when a `CONSTANT_FieldRef` constant pool entry in class `A` refers to a *non-static* field `B.F`, >> - `B` is the same class as `A`; or >> - `B` is a supertype of `A`; or >> - `B` is one of the [vmClasses](https://github.com/openjdk/jdk/blob/3d4185a9ce482cc655a4c67f39cb2682b02ae4fe/src/hotspot/share/classfile/vmClasses.hpp), and `A` is loaded by the boot class loader. >> >> Under these conditions, it's guaranteed that whenever `A` tries to use this entry at runtime, `B` is guaranteed to have already been resolved in A's system dictionary, to the same value as resolved during dump time. >> >> Therefore, we can safely archive the `ResolvedFieldEntry` in class `A` that refers to `B.F`. >> >> (Note that we do not archive the `CONSTANT_FieldRef` entries for static fields, as the resolution of such entries can lead to class initialization at runtime. We plan to handle them in a future RFE.) >> >> ### Static CDS Archive >> >> This feature is implemented in three steps for static CDS archive dump: >> >> 1. At the end of the training run, `ClassListWriter` iterates over all loaded classes and writes the indices of their resolved `Class` and `FieldRef` constant pool entries into the classlist file, with the `@cp` prefix. E.g., the following means that the constant pool entries at indices 2, 19 and 106 were resolved during the training run: >> >> @cp java/util/Objects 2 19 106 >> >> 2. When creating the static CDS archive from the classlist file, `ClassListParser` processes the `@cp` entries and resolves all the indicated entries. >> >> 3. Inside the `ArchiveBuilder::make_klasses_shareable()` function, we iterate over all entries in all archived `ConstantPools`. When we see a _resolved_ entry that does not satisfy the safety requirements as stated in _Overview_, we revert it back to the unresolved state. >> >> ### Dynamic CDS Archive >> >> When dumping the dynamic CDS archive, `ClassListWriter` and `ClassListParser` are not used, so steps 1 and 2 are skipped. We only perform step 3 when the archive is being written. >> >> ### Limitations >> >> - For safety, we limit this optimization to only classes loaded by the boot, platform, and app class loaders. This may be relaxed in the future. >> - We archive only the constant pool entries that are actually resolved during the training run. We don't speculatively resolve other entries, as doing so may cause C2 to... > > Ioi Lam has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - Fixed typo in previous commit > - Merge branch 'master' into 8293980-resolve-fields-at-dumptime > - @matias9927 comments - moved remove_resolved_field_entries_if_non_deterministic() to cpCache > - Merge branch 'master' into 8293980-resolve-fields-at-dumptime > - 8293980: Resolve CONSTANT_FieldRef at CDS dump time make/GenerateLinkOptData.gmk line 68: > 66: # - The classlist can be influenced by locale. Always set it to en/US. > 67: # - Run with -Xint, as the compiler can speculatively resolve constant pool entries. > 68: # - ForkJoinPool parallelism can cause constant pool resolution to be non-dererministic. Minor typo Suggestion: # - ForkJoinPool parallelism can cause constant pool resolution to be non-deterministic. src/hotspot/share/cds/classListParser.cpp line 848: > 846: if (preresolve_fmi) { > 847: ClassPrelinker::preresolve_field_and_method_cp_entries(THREAD, ik, &preresolve_list); > 848: } Can you clarify the approach here? As I read the code, `ClassPrelinker::preresolve_class_cp_entries` will walk the whole constant pool looking for unresolved class entries that match and then resolve them. `ClassPrelinker::preresolve_field_and_method_cp_entries` walks all methods bytecode by bytecode to resolve them. Doesn't the `preresolve_list` already tell us which CP entries need to be resolved and the cp tag tell us the type of resolution to do? Can we not do this in a single pass over the cp rather than walking method bytecodes? Is the reason for this approach to avoid always resolving FieldMethodRefs for both get and put and only do them if there's a corresponding bytecode? src/hotspot/share/oops/instanceKlass.cpp line 2560: > 2558: // The ConstantPool is cleaned in a separate pass in ArchiveBuilder::make_klasses_shareable(), > 2559: // so no need to do it here. > 2560: //constants()->remove_unshareable_info(); Should this be deleted rather than commented out? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19355#discussion_r1617547809 PR Review Comment: https://git.openjdk.org/jdk/pull/19355#discussion_r1618836313 PR Review Comment: https://git.openjdk.org/jdk/pull/19355#discussion_r1617818205 From stuefe at openjdk.org Wed May 29 13:14:16 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 29 May 2024 13:14:16 GMT Subject: RFR: 8312132: Add tracking of multiple address spaces in NMT [v105] In-Reply-To: References: Message-ID: On Fri, 24 May 2024 06:13:54 GMT, Thomas Stuefe wrote: >>> We claim that: >>> >>> > Our Treap-based approach in this patch gives a performance boost such that we see 25x better performance in a benchmark. >>> >>> May I ask how you ran it? I would like to be able to reproduce our claim. >> >> Sure, it was a while since I ran the benchmark. You're going to have to do a bit of work here, to get it working. >> >> You take this file: https://github.com/tstuefe/jdk/blob/6be830cd2e90a009effb016fbda2e92e1fca8247/test/hotspot/gtest/nmt/test_nmtvmadict.cpp#L1 >> >> And you port it to the VMATree instead of VMADict (or whatever it's called). Then you run it and look at output. You could also take one of the stress tests that I made, remove the verification calls, and run the same stress test for VirtualMemoryTracker. > >> > We claim that: >> > > Our Treap-based approach in this patch gives a performance boost such that we see 25x better performance in a benchmark. >> > >> > >> > May I ask how you ran it? I would like to be able to reproduce our claim. >> >> Sure, it was a while since I ran the benchmark. You're going to have to do a bit of work here, to get it working. >> >> You take this file: https://github.com/tstuefe/jdk/blob/6be830cd2e90a009effb016fbda2e92e1fca8247/test/hotspot/gtest/nmt/test_nmtvmadict.cpp#L1 >> >> And you port it to the VMATree instead of VMADict (or whatever it's called). Then you run it and look at output. You could also take one of the stress tests that I made, remove the verification calls, and run the same stress test for VirtualMemoryTracker. > > The claim makes also sense if you think about it. A binary tree will always grossly outperform a linked list for sorted insert/delete. > Hi @tstuefe, @gerard-ziemski, @afshin-zafari > > What do we think is necessary to have this PR merged in? All tests green, and Oracle having run this through their CI. > > Right now, I know that Thomas has some gripes with the private/public API and visibility. I agree, it can be cleaned up, but can't this wait until after the PR is merged? Yes > I believe that there are multiple small clean ups and fixes that gets rid of some ugliness, but the actual functionality of this PR is over all well-tested. > > I see the following points as needing attention before merging: > > 1. NativeCallStackStorage -- needs some testing for both summary and detailed mode. _Maybe_ get the `bool is_detailed` out of there, but to me this is optional, it receives the info from `MemTracker` anyway, just through the constructor. > 2. The locking and reporting mechanisms. Is locking the MemoryFileTracker structures for the duration of the JCMD call acceptable? This means potential stalling of the VM, no? Well, its not worse than what we do now for VirtualMemoryTracker, no? That said, when I am careful I usually try to separate output (writing to an opaque outputStream* that can be god knows what) from querying information. Simplest way is to query info from MemoryFileTracker under lock protection and write report to a stringStream first, than dump that one outside of lock protection to the real output stream. > 3. Run through some better/deeper testing than just GHA > > Is there anything that I am missing? This will have limited rollout to the subset of users using both ZGC and NMT. If you want to be super carefull, give us a diagnostic option that can switch off the new VMATree feed if needed. That way we won't see ZGC footprint in the output, but in case there are problems, we have a quick solution. If we don't see anything bad happening, we can remove that switch again. -- I will take a last look at it start of next week. Tomorrow is holiday, and I am busy with some other things. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18289#issuecomment-2137378753 From rehn at openjdk.org Wed May 29 14:28:07 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Wed, 29 May 2024 14:28:07 GMT Subject: RFR: 8332689: RISC-V: Use load instead of trampolines Message-ID: Hi all, please consider! Today we do JAL to **dest** if **dest** is in reach (+/- 1 MB). Using a very small application or running very short time we have fast patchable calls. But any normal application running longer will increase the code size and code chrun/fragmentation. So whatever or not you get hot fast calls rely on luck. To be patchable and get code cache reach we also emit a stub trampoline which we can point the JAL to. This would be the common case for a patchable call. Code stream: JAL Stubs: AUIPC LD JALR On some CPUs L1D and L1I can't contain the same cache line, which means the tramopline stub can bounce from L1I->L1D->L1I, which is expensive. Even if you don't have that problem having a call to a jump is not the fastest way. Loading the address avoids the pitsfalls of cmodx. This patch suggest to solve the problems with trampolines, we take small penalty in the naive case of JAL to **dest**, and instead do by default: Code stream: AUIPC LD JALR Stubs: An experimental option for turning trampolines back on exists. It should be possible to enhanced this with the WIP [Zjid](https://github.com/riscv/riscv-j-extension) by changing the JALR to JAL and nop out the auipc+ld (as the current proposal of Zjid forces the I-fetcher to fetch instruction in order (meaning we will avoid a lot issues which arm has)) when in reach and vice-versa. Numbers from VF2 (I have done them a few times, they are always overall in favor of this patch): fop (msec) 2239 | 2128 = 0.950424 h2 (msec) 18660 | 16594 = 0.889282 jython (msec) 22022 | 21925 = 0.995595 luindex (msec) 2866 | 2842 = 0.991626 lusearch (msec) 4108 | 4311 = 1.04942 lusearch-fix (msec) 4406 | 4116 = 0.934181 pmd (msec) 5976 | 5897 = 0.98678 jython (msec) 22022 | 21925 = 0.995595 Avg: 0.974112 fop(xcomp) (msec) 2721 | 2714 = 0.997427 h2(xcomp) (msec) 37719 | 38004 = 1.00756 jython(xcomp) (msec) 28563 | 29470 = 1.03175 luindex(xcomp) (msec) 5303 | 5512 = 1.03941 lusearch(xcomp) (msec) 6702 | 6271 = 0.935691 lusearch-fix(xcomp) (msec) 6721 | 6217 = 0.925011 pmd(xcomp) (msec) 6835 | 6587 = 0.963716 jython(xcomp) (msec) 28563 | 29470 = 1.03175 Avg: 0.99154 o.r.actors.JmhAkkaUct.run (ms/op) 8585.440 | 7548.347 = 0.879203 o.r.actors.JmhReactors.run (ms/op) 65004.694 | 64448.824 = 0.991449 o.r.jdk.concurrent.JmhFjKmeans.run (ms/op) 47751.653 | 45747.490 = 0.958029 o.r.jdk.concurrent.JmhFutureGenetic.run (ms/op) 12083.628 | 11427.650 = 0.945713 o.r.jdk.streams.JmhMnemonics.run (ms/op) 32691.025 | 31002.088 = 0.948336 o.r.jdk.streams.JmhParMnemonics.run (ms/op) 27500.431 | 23747.117 = 0.863518 o.r.jdk.streams.JmhScrabble.run (ms/op) 3688.182 | 3528.943 = 0.956825 o.r.neo4j.JmhNeo4jAnalytics.run (ms/op) 20153.371 | 21704.731 = 1.07698 o.r.rx.JmhRxScrabble.run (ms/op) 1197.749 | 1160.465 = 0.968872 o.r.scala.dotty.JmhDotty.run (ms/op) 18385.552 | 18561.341 = 1.00956 o.r.scala.sat.JmhScalaDoku.run (ms/op) 25243.887 | 22112.289 = 0.875946 o.r.scala.stdlib.JmhScalaKmeans.run (ms/op) 2610.509 | 2498.539 = 0.957108 o.r.scala.stm.JmhPhilosophers.run (ms/op) 5875.997 | 6101.689 = 1.03841 o.r.scala.stm.JmhScalaStmBench7.run (ms/op) 8723.122 | 8760.115 = 1.00424 o.r.twitter.finagle.JmhFinagleChirper.run (ms/op) 21209.541 | 21732.213 = 1.02464 o.r.twitter.finagle.JmhFinagleHttp.run (ms/op) 20782.221 | 20390.960 = 0.981173 Avg: 0.9675 It's been throught a couple of t1-t3, but I need to re-run test after latest merge. ------------- Commit messages: - Remove accidental files - Remove accidental files - Baseline Changes: https://git.openjdk.org/jdk/pull/19453/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19453&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8332689 Stats: 802 lines in 15 files changed: 595 ins; 103 del; 104 mod Patch: https://git.openjdk.org/jdk/pull/19453.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19453/head:pull/19453 PR: https://git.openjdk.org/jdk/pull/19453 From lmesnik at openjdk.org Wed May 29 15:02:14 2024 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Wed, 29 May 2024 15:02:14 GMT Subject: RFR: 8332259: JvmtiTrace::safe_get_thread_name fails if current thread is in native state [v5] In-Reply-To: <2Aorg4EW1Sl5s0tplzUb89ZNUeZg2xsPj3VkJQflzN4=.9072eee0-c481-4da9-ade9-5595ab78030f@github.com> References: <2Aorg4EW1Sl5s0tplzUb89ZNUeZg2xsPj3VkJQflzN4=.9072eee0-c481-4da9-ade9-5595ab78030f@github.com> Message-ID: On Tue, 28 May 2024 22:29:28 GMT, Leonid Mesnik wrote: >> The JvmtiTrace::safe_get_thread_name sometimes crashes when called while current thread is in native thread state. >> >> It happens when thread_name is set for tracing from jvmti functions. >> See: >> https://github.com/openjdk/jdk/blob/master/src/hotspot/share/prims/jvmtiEnter.xsl#L649 >> >> The setup is called and the thread name is used in tracing before the thread transition. There is no good location where this method could be called from vm thread_state only. Some functions like raw monitor enter/exit never transition in vm state. So sometimes it is needed to call this function from native thread state. >> >> The change should affect JVMTI trace mode only (-XX:TraceJVMTI). >> >> Verified by running jvmti/jdi/jdb tests with tracing enabled. > > Leonid Mesnik has updated the pull request incrementally with two additional commits since the last revision: > > - fixed space. > - The result is updated. The name becomes too long in the logs. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19275#issuecomment-2137627742 From lmesnik at openjdk.org Wed May 29 15:02:14 2024 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Wed, 29 May 2024 15:02:14 GMT Subject: Integrated: 8332259: JvmtiTrace::safe_get_thread_name fails if current thread is in native state In-Reply-To: References: Message-ID: On Fri, 17 May 2024 01:48:30 GMT, Leonid Mesnik wrote: > The JvmtiTrace::safe_get_thread_name sometimes crashes when called while current thread is in native thread state. > > It happens when thread_name is set for tracing from jvmti functions. > See: > https://github.com/openjdk/jdk/blob/master/src/hotspot/share/prims/jvmtiEnter.xsl#L649 > > The setup is called and the thread name is used in tracing before the thread transition. There is no good location where this method could be called from vm thread_state only. Some functions like raw monitor enter/exit never transition in vm state. So sometimes it is needed to call this function from native thread state. > > The change should affect JVMTI trace mode only (-XX:TraceJVMTI). > > Verified by running jvmti/jdi/jdb tests with tracing enabled. This pull request has now been integrated. Changeset: 03b7a858 Author: Leonid Mesnik URL: https://git.openjdk.org/jdk/commit/03b7a8586a77983b1851ddd3f4555fe2fca57919 Stats: 17 lines in 2 files changed: 16 ins; 0 del; 1 mod 8332259: JvmtiTrace::safe_get_thread_name fails if current thread is in native state Reviewed-by: dholmes, sspitsyn ------------- PR: https://git.openjdk.org/jdk/pull/19275 From mli at openjdk.org Wed May 29 15:55:15 2024 From: mli at openjdk.org (Hamlin Li) Date: Wed, 29 May 2024 15:55:15 GMT Subject: RFR: 8332900: RISC-V: refactor nativeInst_riscv.cpp and macroAssembler_riscv.cpp Message-ID: Hi, Can you help to review the patch? Currently, code in nativeInst_riscv.cpp and macroAssembler_riscv.cpp call each other, which is not right for readability and maintainance. After refactoring, basically only code in nativeInst_riscv.cpp calls code in macroAssembler_riscv.cpp, but not in reverse direction. The only exception is code related to `membar` where code in macroAssembler_riscv.cpp still calls code nativeInst_riscv.cpp, I'm not sure whether it should be refactored too. Thanks! * Tests are still running, so far so good. ------------- Commit messages: - move instruct checking - move instruction_size Changes: https://git.openjdk.org/jdk/pull/19459/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19459&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8332900 Stats: 697 lines in 6 files changed: 318 ins; 300 del; 79 mod Patch: https://git.openjdk.org/jdk/pull/19459.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19459/head:pull/19459 PR: https://git.openjdk.org/jdk/pull/19459 From iklam at openjdk.org Wed May 29 16:07:20 2024 From: iklam at openjdk.org (Ioi Lam) Date: Wed, 29 May 2024 16:07:20 GMT Subject: RFR: 8293980: Resolve CONSTANT_FieldRef at CDS dump time [v4] In-Reply-To: <6tYcoQdH8mEhbBRUoAcOi3Gue7Lz9qIjJh3GpcLKGGE=.0b286773-cc03-434e-88ca-2c9cc5efce67@github.com> References: <6tYcoQdH8mEhbBRUoAcOi3Gue7Lz9qIjJh3GpcLKGGE=.0b286773-cc03-434e-88ca-2c9cc5efce67@github.com> Message-ID: > ### Overview > > This PR archives `CONSTANT_FieldRef` entries in the _resolved_ state when it's safe to do so. > > I.e., when a `CONSTANT_FieldRef` constant pool entry in class `A` refers to a *non-static* field `B.F`, > - `B` is the same class as `A`; or > - `B` is a supertype of `A`; or > - `B` is one of the [vmClasses](https://github.com/openjdk/jdk/blob/3d4185a9ce482cc655a4c67f39cb2682b02ae4fe/src/hotspot/share/classfile/vmClasses.hpp), and `A` is loaded by the boot class loader. > > Under these conditions, it's guaranteed that whenever `A` tries to use this entry at runtime, `B` is guaranteed to have already been resolved in A's system dictionary, to the same value as resolved during dump time. > > Therefore, we can safely archive the `ResolvedFieldEntry` in class `A` that refers to `B.F`. > > (Note that we do not archive the `CONSTANT_FieldRef` entries for static fields, as the resolution of such entries can lead to class initialization at runtime. We plan to handle them in a future RFE.) > > ### Static CDS Archive > > This feature is implemented in three steps for static CDS archive dump: > > 1. At the end of the training run, `ClassListWriter` iterates over all loaded classes and writes the indices of their resolved `Class` and `FieldRef` constant pool entries into the classlist file, with the `@cp` prefix. E.g., the following means that the constant pool entries at indices 2, 19 and 106 were resolved during the training run: > > @cp java/util/Objects 2 19 106 > > 2. When creating the static CDS archive from the classlist file, `ClassListParser` processes the `@cp` entries and resolves all the indicated entries. > > 3. Inside the `ArchiveBuilder::make_klasses_shareable()` function, we iterate over all entries in all archived `ConstantPools`. When we see a _resolved_ entry that does not satisfy the safety requirements as stated in _Overview_, we revert it back to the unresolved state. > > ### Dynamic CDS Archive > > When dumping the dynamic CDS archive, `ClassListWriter` and `ClassListParser` are not used, so steps 1 and 2 are skipped. We only perform step 3 when the archive is being written. > > ### Limitations > > - For safety, we limit this optimization to only classes loaded by the boot, platform, and app class loaders. This may be relaxed in the future. > - We archive only the constant pool entries that are actually resolved during the training run. We don't speculatively resolve other entries, as doing so may cause C2 to unnecessarily generate code for paths that are never taken by the app... Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: @DanHeidinga comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19355/files - new: https://git.openjdk.org/jdk/pull/19355/files/89184c33..17a1ce62 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19355&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19355&range=02-03 Stats: 4 lines in 2 files changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/19355.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19355/head:pull/19355 PR: https://git.openjdk.org/jdk/pull/19355 From iklam at openjdk.org Wed May 29 16:07:22 2024 From: iklam at openjdk.org (Ioi Lam) Date: Wed, 29 May 2024 16:07:22 GMT Subject: RFR: 8293980: Resolve CONSTANT_FieldRef at CDS dump time [v3] In-Reply-To: References: <6tYcoQdH8mEhbBRUoAcOi3Gue7Lz9qIjJh3GpcLKGGE=.0b286773-cc03-434e-88ca-2c9cc5efce67@github.com> Message-ID: On Wed, 29 May 2024 12:53:57 GMT, Dan Heidinga wrote: >> Ioi Lam has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: >> >> - Fixed typo in previous commit >> - Merge branch 'master' into 8293980-resolve-fields-at-dumptime >> - @matias9927 comments - moved remove_resolved_field_entries_if_non_deterministic() to cpCache >> - Merge branch 'master' into 8293980-resolve-fields-at-dumptime >> - 8293980: Resolve CONSTANT_FieldRef at CDS dump time > > src/hotspot/share/cds/classListParser.cpp line 848: > >> 846: if (preresolve_fmi) { >> 847: ClassPrelinker::preresolve_field_and_method_cp_entries(THREAD, ik, &preresolve_list); >> 848: } > > Can you clarify the approach here? > > As I read the code, `ClassPrelinker::preresolve_class_cp_entries` will walk the whole constant pool looking for unresolved class entries that match and then resolve them. `ClassPrelinker::preresolve_field_and_method_cp_entries` walks all methods bytecode by bytecode to resolve them. > > Doesn't the `preresolve_list` already tell us which CP entries need to be resolved and the cp tag tell us the type of resolution to do? Can we not do this in a single pass over the cp rather than walking method bytecodes? > > Is the reason for this approach to avoid always resolving FieldMethodRefs for both get and put and only do them if there's a corresponding bytecode? `preresolve_list` has the original CP indices (E.g., `putfield #123` as stored in the classfile), but in HotSpot, after bytecode rewriting, the u2 following the bytecode is changed to an index into the `cpcache()->_resolved_field_entries` array, so it becomes something like `putfield #45`. So we need to know how to convert the `123` index to the `45` index. We could walk `_resolved_field_entries` to find the `ResolvedFieldEntry` whose `_cpool_index` is `123`. However, before the `ResolvedFieldEntry` is resolved, we don't know which bytecode is used to resolve it, so we don't know whether it's for a static field or non-static field. Since we want to filter out the static fields in the PR, we need to: - walk the bytecodes to find only getfield/putfield bytecodes - these bytecodes will give us an index to the `_resolved_field_entries` array - from there, we discover the original CP index - then we see if this index is set to true in `preresolve_list` There's also a compatibility issue -- it's possible to have static and non-static field access using the same CP index: class Hack { static int myField; int foo(boolean flag) { try { if (flag) { // throw IncompatibleClassChangeError return /* pseudo code*/ getfield this.myField; } else { // OK return /* pseudo code*/ getstatic Hack.myField; } } catch (Throwable) { return 5678; } } So we must call `InterpreterRuntime::resolve_get_put()` which performs all the checks for access rights, static-vs-non-static, etc. This call requires a Method parameter, so we must walk all the Methods to find an appropriate one. Perhaps it's possible to refactor the `InterpreterRuntime` code to avoid passing in a Method, but I am hesitant to do that with code that deals with access right checks. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19355#discussion_r1619144592 From gziemski at openjdk.org Wed May 29 16:18:16 2024 From: gziemski at openjdk.org (Gerard Ziemski) Date: Wed, 29 May 2024 16:18:16 GMT Subject: RFR: 8312132: Add tracking of multiple address spaces in NMT [v114] In-Reply-To: References: Message-ID: On Wed, 29 May 2024 08:51:35 GMT, Johan Sj?len wrote: >> Hi, >> >> This PR introduces a new abstraction to NMT, named `MemoryFileTracker`. Today, NMT does not track any memory outside of the virtual memory address space. This means that if you allocated memory in something such as a memory-backed file and use `mmap` to map into that memory, then you'll have trouble reporting this to NMT. This is the situation that ZGC is in, and that is what this patch attempts to fix. >> >> ## `MemoryFileTracker` >> >> The `MemoryFileTracker` adds the ability of adding new virtual memory address spaces to NMT and committing memory to these, the basic API is: >> >> ```c++ >> static MemoryFile* make_device(const char* descriptive_name); >> static void free_device(MemoryFile* device); >> >> static void allocate_memory(MemoryFile* device, size_t offset, size_t size, >> MEMFLAGS flag, const NativeCallStack& stack); >> static void free_memory(MemoryFile* device, size_t offset, size_t size); >> >> >> It is easiest to see how this is used by looking at what ZGC's `ZNMT` class does: >> >> ```c++ >> void ZNMT::reserve(zaddress_unsafe start, size_t size) { >> MemTracker::record_virtual_memory_reserve((address)start, size, CALLER_PC, mtJavaHeap); >> } >> void ZNMT::commit(zoffset offset, size_t size) { >> MemTracker::allocate_memory_in(ZNMT::_device, static_cast(offset), size, mtJavaHeap, CALLER_PC); >> } >> void ZNMT::uncommit(zoffset offset, size_t size) { >> MemTracker::free_memory_in(ZNMT::_device, (size_t)offset, size); >> } >> >> void ZNMT::map(zaddress_unsafe addr, size_t size, zoffset offset) { >> // NMT doesn't track mappings at the moment. >> } >> void ZNMT::unmap(zaddress_unsafe addr, size_t size) { >> // NMT doesn't track mappings at the moment. >> } >> >> >> As you can see, any mapping between reserved regions and device-allocated memory is not recorded in NMT. This means that in detailed mode you only get reserved regions printed for the reserved memory, the device-allocated memory is reported separately. When performing summary reporting any memory allocated via these devices is added to the corresponding `MEMFLAGS` as `committed` memory. >> >> This patch is also acting as a base on which we deploy multiple new backend ideas to NMT. These ideas are: >> >> 1. Implement VMA tracking using a balanced binary tree approach. Today's `VirtualMemoryTracker`'s usage of linked lists is slow and brittle, we'd like to move away from it. Our Treap-based approach in this patch gives a performance bo... > > Johan Sj?len has updated the pull request incrementally with two additional commits since the last revision: > > - Improve tests > - Use inner type def I still need to look deeper here, but I do not want to hold up the code if others are good with it. We can always address any issues I find later in follow ups... ------------- PR Review: https://git.openjdk.org/jdk/pull/18289#pullrequestreview-2085863399 From gziemski at openjdk.org Wed May 29 16:18:17 2024 From: gziemski at openjdk.org (Gerard Ziemski) Date: Wed, 29 May 2024 16:18:17 GMT Subject: RFR: 8312132: Add tracking of multiple address spaces in NMT [v105] In-Reply-To: References: Message-ID: On Wed, 29 May 2024 13:09:46 GMT, Thomas Stuefe wrote: >>> > We claim that: >>> > > Our Treap-based approach in this patch gives a performance boost such that we see 25x better performance in a benchmark. >>> > >>> > >>> > May I ask how you ran it? I would like to be able to reproduce our claim. >>> >>> Sure, it was a while since I ran the benchmark. You're going to have to do a bit of work here, to get it working. >>> >>> You take this file: https://github.com/tstuefe/jdk/blob/6be830cd2e90a009effb016fbda2e92e1fca8247/test/hotspot/gtest/nmt/test_nmtvmadict.cpp#L1 >>> >>> And you port it to the VMATree instead of VMADict (or whatever it's called). Then you run it and look at output. You could also take one of the stress tests that I made, remove the verification calls, and run the same stress test for VirtualMemoryTracker. >> >> The claim makes also sense if you think about it. A binary tree will always grossly outperform a linked list for sorted insert/delete. > >> Hi @tstuefe, @gerard-ziemski, @afshin-zafari >> >> What do we think is necessary to have this PR merged in? > > All tests green, and Oracle having run this through their CI. > >> >> Right now, I know that Thomas has some gripes with the private/public API and visibility. I agree, it can be cleaned up, but can't this wait until after the PR is merged? > > Yes > >> I believe that there are multiple small clean ups and fixes that gets rid of some ugliness, but the actual functionality of this PR is over all well-tested. >> >> I see the following points as needing attention before merging: >> >> 1. NativeCallStackStorage -- needs some testing for both summary and detailed mode. _Maybe_ get the `bool is_detailed` out of there, but to me this is optional, it receives the info from `MemTracker` anyway, just through the constructor. >> 2. The locking and reporting mechanisms. Is locking the MemoryFileTracker structures for the duration of the JCMD call acceptable? This means potential stalling of the VM, no? > > Well, its not worse than what we do now for VirtualMemoryTracker, no? That said, when I am careful I usually try to separate output (writing to an opaque outputStream* that can be god knows what) from querying information. Simplest way is to query info from MemoryFileTracker under lock protection and write report to a stringStream first, than dump that one outside of lock protection to the real output stream. > >> 3. Run through some better/deeper testing than just GHA >> >> Is there anything that I am missing? This will have limited rollout to the subset of users using both ZGC and NMT. > > If you want to be super carefull, give us a diagnostic option that can switch off the new VMATree feed if needed. That way we won't see ZGC footprint in the output, but in case there are problems, we have a quick solution. If we don't see anything bad happening, we can remove that switch again. > > -- > > I will take a last look at it start of next week. Tomorrow is holiday, and I am busy with some other things. > Hi @tstuefe, @gerard-ziemski, @afshin-zafari > > What do we think is necessary to have this PR merged in? > > Right now, I know that Thomas has some gripes with the private/public API and visibility. I agree, it can be cleaned up, but can't this wait until after the PR is merged? I believe that there are multiple small clean ups and fixes that gets rid of some ugliness, but the actual functionality of this PR is over all well-tested. > > I see the following points as needing attention before merging: > > 1. NativeCallStackStorage -- needs some testing for both summary and detailed mode. _Maybe_ get the `bool is_detailed` out of there, but to me this is optional, it receives the info from `MemTracker` anyway, just through the constructor. > 2. The locking and reporting mechanisms. Is locking the MemoryFileTracker structures for the duration of the JCMD call acceptable? This means potential stalling of the VM, no? > 3. Run through some better/deeper testing than just GHA > > Is there anything that I am missing? This will have limited rollout to the subset of users using both ZGC and NMT. I wanted to run some benchmarks, and gain a deeper understanding of how the code works. I will not block this PR moving forward, however, at this point I don't feel like I have enough understanding of the code to give a proper review, yet. I can always catch up on it later after it's checked in. If you do decide to check this in now, would you mind removing me as the reviewer (if you know how to this)? I don't think I earned a reviewer credit on this one yet ------------- PR Comment: https://git.openjdk.org/jdk/pull/18289#issuecomment-2137792276 From gziemski at openjdk.org Wed May 29 16:22:16 2024 From: gziemski at openjdk.org (Gerard Ziemski) Date: Wed, 29 May 2024 16:22:16 GMT Subject: RFR: 8312132: Add tracking of multiple address spaces in NMT [v114] In-Reply-To: References: Message-ID: On Wed, 29 May 2024 08:51:35 GMT, Johan Sj?len wrote: >> Hi, >> >> This PR introduces a new abstraction to NMT, named `MemoryFileTracker`. Today, NMT does not track any memory outside of the virtual memory address space. This means that if you allocated memory in something such as a memory-backed file and use `mmap` to map into that memory, then you'll have trouble reporting this to NMT. This is the situation that ZGC is in, and that is what this patch attempts to fix. >> >> ## `MemoryFileTracker` >> >> The `MemoryFileTracker` adds the ability of adding new virtual memory address spaces to NMT and committing memory to these, the basic API is: >> >> ```c++ >> static MemoryFile* make_device(const char* descriptive_name); >> static void free_device(MemoryFile* device); >> >> static void allocate_memory(MemoryFile* device, size_t offset, size_t size, >> MEMFLAGS flag, const NativeCallStack& stack); >> static void free_memory(MemoryFile* device, size_t offset, size_t size); >> >> >> It is easiest to see how this is used by looking at what ZGC's `ZNMT` class does: >> >> ```c++ >> void ZNMT::reserve(zaddress_unsafe start, size_t size) { >> MemTracker::record_virtual_memory_reserve((address)start, size, CALLER_PC, mtJavaHeap); >> } >> void ZNMT::commit(zoffset offset, size_t size) { >> MemTracker::allocate_memory_in(ZNMT::_device, static_cast(offset), size, mtJavaHeap, CALLER_PC); >> } >> void ZNMT::uncommit(zoffset offset, size_t size) { >> MemTracker::free_memory_in(ZNMT::_device, (size_t)offset, size); >> } >> >> void ZNMT::map(zaddress_unsafe addr, size_t size, zoffset offset) { >> // NMT doesn't track mappings at the moment. >> } >> void ZNMT::unmap(zaddress_unsafe addr, size_t size) { >> // NMT doesn't track mappings at the moment. >> } >> >> >> As you can see, any mapping between reserved regions and device-allocated memory is not recorded in NMT. This means that in detailed mode you only get reserved regions printed for the reserved memory, the device-allocated memory is reported separately. When performing summary reporting any memory allocated via these devices is added to the corresponding `MEMFLAGS` as `committed` memory. >> >> This patch is also acting as a base on which we deploy multiple new backend ideas to NMT. These ideas are: >> >> 1. Implement VMA tracking using a balanced binary tree approach. Today's `VirtualMemoryTracker`'s usage of linked lists is slow and brittle, we'd like to move away from it. Our Treap-based approach in this patch gives a performance bo... > > Johan Sj?len has updated the pull request incrementally with two additional commits since the last revision: > > - Improve tests > - Use inner type def Trying to unglue my review (not sure how to drop my change request from my review...) ------------- PR Review: https://git.openjdk.org/jdk/pull/18289#pullrequestreview-2085871127 From sgibbons at openjdk.org Wed May 29 16:55:14 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Wed, 29 May 2024 16:55:14 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v47] In-Reply-To: References: Message-ID: On Tue, 28 May 2024 23:52:27 GMT, Scott Gibbons wrote: >> Re-write the IndexOf code without the use of the pcmpestri instruction, only using AVX2 instructions. This change accelerates String.IndexOf on average 1.3x for AVX2. The benchmark numbers: >> >> >> Benchmark Score Latest >> StringIndexOf.advancedWithMediumSub 343.573 317.934 0.925375393x >> StringIndexOf.advancedWithShortSub1 1039.081 1053.96 1.014319384x >> StringIndexOf.advancedWithShortSub2 55.828 110.541 1.980027943x >> StringIndexOf.constantPattern 9.361 11.906 1.271872663x >> StringIndexOf.searchCharLongSuccess 4.216 4.218 1.000474383x >> StringIndexOf.searchCharMediumSuccess 3.133 3.216 1.02649218x >> StringIndexOf.searchCharShortSuccess 3.76 3.761 1.000265957x >> StringIndexOf.success 9.186 9.713 1.057369911x >> StringIndexOf.successBig 14.341 46.343 3.231504079x >> StringIndexOfChar.latin1_AVX2_String 6220.918 12154.52 1.953814533x >> StringIndexOfChar.latin1_AVX2_char 5503.556 5540.044 1.006629895x >> StringIndexOfChar.latin1_SSE4_String 6978.854 6818.689 0.977049957x >> StringIndexOfChar.latin1_SSE4_char 5657.499 5474.624 0.967675646x >> StringIndexOfChar.latin1_Short_String 7132.541 6863.359 0.962260014x >> StringIndexOfChar.latin1_Short_char 16013.389 16162.437 1.009307711x >> StringIndexOfChar.latin1_mixed_String 7386.123 14771.622 1.999915517x >> StringIndexOfChar.latin1_mixed_char 9901.671 9782.245 0.987938803 > > Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: > > Move assert to where it's actually important. Thank you all for the comments. If there are no objections, I'll integrate these fixes tomorrow morning. I've run tier1-3 tests with the appropriate options on my machine with no errors, so my confidence is high. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16753#issuecomment-2137861781 From jsjolen at openjdk.org Wed May 29 17:04:33 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Wed, 29 May 2024 17:04:33 GMT Subject: RFR: 8312132: Add tracking of multiple address spaces in NMT [v115] In-Reply-To: References: Message-ID: > Hi, > > This PR introduces a new abstraction to NMT, named `MemoryFileTracker`. Today, NMT does not track any memory outside of the virtual memory address space. This means that if you allocated memory in something such as a memory-backed file and use `mmap` to map into that memory, then you'll have trouble reporting this to NMT. This is the situation that ZGC is in, and that is what this patch attempts to fix. > > ## `MemoryFileTracker` > > The `MemoryFileTracker` adds the ability of adding new virtual memory address spaces to NMT and committing memory to these, the basic API is: > > ```c++ > static MemoryFile* make_device(const char* descriptive_name); > static void free_device(MemoryFile* device); > > static void allocate_memory(MemoryFile* device, size_t offset, size_t size, > MEMFLAGS flag, const NativeCallStack& stack); > static void free_memory(MemoryFile* device, size_t offset, size_t size); > > > It is easiest to see how this is used by looking at what ZGC's `ZNMT` class does: > > ```c++ > void ZNMT::reserve(zaddress_unsafe start, size_t size) { > MemTracker::record_virtual_memory_reserve((address)start, size, CALLER_PC, mtJavaHeap); > } > void ZNMT::commit(zoffset offset, size_t size) { > MemTracker::allocate_memory_in(ZNMT::_device, static_cast(offset), size, mtJavaHeap, CALLER_PC); > } > void ZNMT::uncommit(zoffset offset, size_t size) { > MemTracker::free_memory_in(ZNMT::_device, (size_t)offset, size); > } > > void ZNMT::map(zaddress_unsafe addr, size_t size, zoffset offset) { > // NMT doesn't track mappings at the moment. > } > void ZNMT::unmap(zaddress_unsafe addr, size_t size) { > // NMT doesn't track mappings at the moment. > } > > > As you can see, any mapping between reserved regions and device-allocated memory is not recorded in NMT. This means that in detailed mode you only get reserved regions printed for the reserved memory, the device-allocated memory is reported separately. When performing summary reporting any memory allocated via these devices is added to the corresponding `MEMFLAGS` as `committed` memory. > > This patch is also acting as a base on which we deploy multiple new backend ideas to NMT. These ideas are: > > 1. Implement VMA tracking using a balanced binary tree approach. Today's `VirtualMemoryTracker`'s usage of linked lists is slow and brittle, we'd like to move away from it. Our Treap-based approach in this patch gives a performance boost such that we see 25x better performance in a benchmark. The idea and draft of this... Johan Sj?len has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 233 commits: - Use file not device nomenclature - Perform the simplest possible summary of the memory reserved and committed - Merge remote-tracking branch 'openjdk/master' into nmt-physical-device - Visit all - Lower max - Improve tests - Use inner type def - Include memtracker - Assert on the tracking level - Naming fixing - ... and 223 more: https://git.openjdk.org/jdk/compare/43a2f173...9c37beeb ------------- Changes: https://git.openjdk.org/jdk/pull/18289/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=18289&range=114 Stats: 2360 lines in 21 files changed: 2255 ins; 86 del; 19 mod Patch: https://git.openjdk.org/jdk/pull/18289.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18289/head:pull/18289 PR: https://git.openjdk.org/jdk/pull/18289 From mli at openjdk.org Wed May 29 18:29:14 2024 From: mli at openjdk.org (Hamlin Li) Date: Wed, 29 May 2024 18:29:14 GMT Subject: RFR: 8332900: RISC-V: refactor nativeInst_riscv.cpp and macroAssembler_riscv.cpp [v2] In-Reply-To: References: Message-ID: > Hi, > Can you help to review the patch? > Currently, code in nativeInst_riscv.cpp and macroAssembler_riscv.cpp call each other, which is not right for readability and maintainance. > After refactoring, basically only code in nativeInst_riscv.cpp calls code in macroAssembler_riscv.cpp, but not in reverse direction. The only exception is code related to `membar` where code in macroAssembler_riscv.cpp still calls code nativeInst_riscv.cpp, I'm not sure whether it should be refactored too. > > Thanks! > > * Tests are still running, so far so good. Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: move membar ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19459/files - new: https://git.openjdk.org/jdk/pull/19459/files/f4df2a65..cd408fe0 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19459&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19459&range=00-01 Stats: 80 lines in 4 files changed: 37 ins; 41 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/19459.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19459/head:pull/19459 PR: https://git.openjdk.org/jdk/pull/19459 From mli at openjdk.org Wed May 29 18:54:27 2024 From: mli at openjdk.org (Hamlin Li) Date: Wed, 29 May 2024 18:54:27 GMT Subject: RFR: 8332900: RISC-V: refactor nativeInst_riscv.cpp and macroAssembler_riscv.cpp [v3] In-Reply-To: References: Message-ID: > Hi, > Can you help to review the patch? > Currently, code in nativeInst_riscv.cpp and macroAssembler_riscv.cpp call each other, which is not right for readability and maintainance. > After refactoring, basically only code in nativeInst_riscv.cpp calls code in macroAssembler_riscv.cpp, but not in reverse direction. The only exception is code related to `membar` where code in macroAssembler_riscv.cpp still calls code nativeInst_riscv.cpp, I'm not sure whether it should be refactored too. > > Thanks! > > * Tests are still running, so far so good. Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: restrict accessbility ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19459/files - new: https://git.openjdk.org/jdk/pull/19459/files/cd408fe0..fe345dd3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19459&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19459&range=01-02 Stats: 77 lines in 1 file changed: 42 ins; 35 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/19459.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19459/head:pull/19459 PR: https://git.openjdk.org/jdk/pull/19459 From kbarrett at openjdk.org Wed May 29 18:57:09 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Wed, 29 May 2024 18:57:09 GMT Subject: RFR: 8333133: Simplify QuickSort::sort Message-ID: The "idempotent" argument is removed from that function, with associated simplifications to the implementation. Callers are updated to remove that argument. Callers that were providing a false value are unaffected in their behavior. The 3 callers that were providing a true value to request the associated feature are also unaffected (other than by being made faster), because the arrays involved don't contain any equivalent pairs. There are also some miscellaneous cleanups, including using the swap utility and fixing some comments. Testing: mach5 tier1-3 ------------- Commit messages: - remove idempotent Changes: https://git.openjdk.org/jdk/pull/19464/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19464&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8333133 Stats: 125 lines in 11 files changed: 3 ins; 95 del; 27 mod Patch: https://git.openjdk.org/jdk/pull/19464.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19464/head:pull/19464 PR: https://git.openjdk.org/jdk/pull/19464 From cjplummer at openjdk.org Wed May 29 19:11:03 2024 From: cjplummer at openjdk.org (Chris Plummer) Date: Wed, 29 May 2024 19:11:03 GMT Subject: RFR: 8311177: Switching to interpreter only mode in carrier thread can lead to crashes In-Reply-To: References: Message-ID: On Tue, 28 May 2024 22:24:53 GMT, Serguei Spitsyn wrote: > Please, review the following `interp-only` issue related to carrier threads. > There are 3 problems fixed here: > - The `EnterInterpOnlyModeClosure::do_threads` is taking the `JvmtiThreadState` with the `jt->jvmti_thread_state()` which is incorrect when we have a deal with a carrier thread. The target state is known at the point when the `HandshakeClosure` is set, so the fix is to pass it as a constructor parameter. > - The `state->is_pending_interp_only_mode())` was processed at mounts only but it has to be processed for unmounts as well. > - The test `test/hotspot/jtreg/serviceability/jvmti/vthread/MethodExitTest/libMethodExitTest.cpp` has a wrong assumption that there can't be `MethodExit` event on the carrier thread when the function `breakpoint_hit1` is being executed. However, it can happen if the virtual thread gets unmounted. > > The fix also includes new test case `vthread/CarrierThreadEventNotification` developed by Patricio. > > Testing: > - Ran new test case locally > - Ran mach5 tiers 1-6 test/hotspot/jtreg/serviceability/jvmti/vthread/MethodExitTest/libMethodExitTest.cpp line 201: > 199: > 200: // need to reset this value after the breakpoint_hit1 > 201: received_method_exit_event = JNI_FALSE; There was a loom-dev email thread regarding this last year. Seems related. I had concluded that the way the test was written that no MethodExit event should have been received. I'm not sure if I missed something in my analysis or if this failure is a result of your changes: https://mail.openjdk.org/pipermail/loom-dev/2023-August/006059.html https://mail.openjdk.org/pipermail/loom-dev/2023-September/006170.html ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19438#discussion_r1619356206 From kvn at openjdk.org Wed May 29 21:39:14 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 29 May 2024 21:39:14 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v47] In-Reply-To: References: Message-ID: On Tue, 28 May 2024 23:52:27 GMT, Scott Gibbons wrote: >> Re-write the IndexOf code without the use of the pcmpestri instruction, only using AVX2 instructions. This change accelerates String.IndexOf on average 1.3x for AVX2. The benchmark numbers: >> >> >> Benchmark Score Latest >> StringIndexOf.advancedWithMediumSub 343.573 317.934 0.925375393x >> StringIndexOf.advancedWithShortSub1 1039.081 1053.96 1.014319384x >> StringIndexOf.advancedWithShortSub2 55.828 110.541 1.980027943x >> StringIndexOf.constantPattern 9.361 11.906 1.271872663x >> StringIndexOf.searchCharLongSuccess 4.216 4.218 1.000474383x >> StringIndexOf.searchCharMediumSuccess 3.133 3.216 1.02649218x >> StringIndexOf.searchCharShortSuccess 3.76 3.761 1.000265957x >> StringIndexOf.success 9.186 9.713 1.057369911x >> StringIndexOf.successBig 14.341 46.343 3.231504079x >> StringIndexOfChar.latin1_AVX2_String 6220.918 12154.52 1.953814533x >> StringIndexOfChar.latin1_AVX2_char 5503.556 5540.044 1.006629895x >> StringIndexOfChar.latin1_SSE4_String 6978.854 6818.689 0.977049957x >> StringIndexOfChar.latin1_SSE4_char 5657.499 5474.624 0.967675646x >> StringIndexOfChar.latin1_Short_String 7132.541 6863.359 0.962260014x >> StringIndexOfChar.latin1_Short_char 16013.389 16162.437 1.009307711x >> StringIndexOfChar.latin1_mixed_String 7386.123 14771.622 1.999915517x >> StringIndexOfChar.latin1_mixed_char 9901.671 9782.245 0.987938803 > > Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: > > Move assert to where it's actually important. Let me test the latest version before integration. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16753#issuecomment-2138303300 From kvn at openjdk.org Wed May 29 21:44:17 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 29 May 2024 21:44:17 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v47] In-Reply-To: References: Message-ID: <-jpTM1HhjURGU9BNxceoaF1OlfoVla_Jlnj9BYVCOTQ=.088cff2a-eb4d-43a1-8072-4b688af1d244@github.com> On Tue, 28 May 2024 23:52:27 GMT, Scott Gibbons wrote: >> Re-write the IndexOf code without the use of the pcmpestri instruction, only using AVX2 instructions. This change accelerates String.IndexOf on average 1.3x for AVX2. The benchmark numbers: >> >> >> Benchmark Score Latest >> StringIndexOf.advancedWithMediumSub 343.573 317.934 0.925375393x >> StringIndexOf.advancedWithShortSub1 1039.081 1053.96 1.014319384x >> StringIndexOf.advancedWithShortSub2 55.828 110.541 1.980027943x >> StringIndexOf.constantPattern 9.361 11.906 1.271872663x >> StringIndexOf.searchCharLongSuccess 4.216 4.218 1.000474383x >> StringIndexOf.searchCharMediumSuccess 3.133 3.216 1.02649218x >> StringIndexOf.searchCharShortSuccess 3.76 3.761 1.000265957x >> StringIndexOf.success 9.186 9.713 1.057369911x >> StringIndexOf.successBig 14.341 46.343 3.231504079x >> StringIndexOfChar.latin1_AVX2_String 6220.918 12154.52 1.953814533x >> StringIndexOfChar.latin1_AVX2_char 5503.556 5540.044 1.006629895x >> StringIndexOfChar.latin1_SSE4_String 6978.854 6818.689 0.977049957x >> StringIndexOfChar.latin1_SSE4_char 5657.499 5474.624 0.967675646x >> StringIndexOfChar.latin1_Short_String 7132.541 6863.359 0.962260014x >> StringIndexOfChar.latin1_Short_char 16013.389 16162.437 1.009307711x >> StringIndexOfChar.latin1_mixed_String 7386.123 14771.622 1.999915517x >> StringIndexOfChar.latin1_mixed_char 9901.671 9782.245 0.987938803 > > Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: > > Move assert to where it's actually important. test/jdk/TEST.ROOT line 103: > 101: vm.jvmti \ > 102: vm.cpu.features \ > 103: vm.compiler2.enabled \ `vm.compiler2.enabled ` already listed at line 91 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1619506711 From duke at openjdk.org Wed May 29 21:46:08 2024 From: duke at openjdk.org (duke) Date: Wed, 29 May 2024 21:46:08 GMT Subject: Withdrawn: 8320794: Emulate rest of vblendvp[sd] on ECore In-Reply-To: <8ajDeYtrlyZUXnTl29xwLr1rwGIYzjj5wThm9yjrBVY=.c75c1992-c836-4969-aea4-e3cbf428dfad@github.com> References: <8ajDeYtrlyZUXnTl29xwLr1rwGIYzjj5wThm9yjrBVY=.c75c1992-c836-4969-aea4-e3cbf428dfad@github.com> Message-ID: On Thu, 14 Mar 2024 19:02:17 GMT, Volodymyr Paprotski wrote: > Replace vpblendvp[sd] with macro assembler call and test in: > - `C2_MacroAssembler::vector_cast_float_to_int_special_cases_avx` > - `C2_MacroAssembler::vector_cast_double_to_int_special_cases_avx` > - `C2_MacroAssembler::vector_count_leading_zeros_int_avx` > > Functional testing with existing and new tests: > `make test TEST="test/hotspot/jtreg/compiler/vectorapi/reshape test/hotspot/jtreg/compiler/vectorization/runner/BasicIntOpTest.java"` > > Benchmarking with existing and new tests: > > make test TEST="micro:org.openjdk.bench.jdk.incubator.vector.VectorFPtoIntCastOperations.microFloat256ToInteger256" > make test TEST="micro:org.openjdk.bench.jdk.incubator.vector.VectorFPtoIntCastOperations.microDouble256ToInteger256" > make test TEST="micro:org.openjdk.bench.vm.compiler.VectorBitCount.WithSuperword.intLeadingZeroCount" > > > Performance before: > > Benchmark (SIZE) Mode Cnt Score Error Units > VectorFPtoIntCastOperations.microDouble256ToInteger256 512 thrpt 5 17271.078 ? 184.140 ops/ms > VectorFPtoIntCastOperations.microDouble256ToInteger256 1024 thrpt 5 9310.507 ? 88.136 ops/ms > VectorFPtoIntCastOperations.microFloat256ToInteger256 512 thrpt 5 11137.594 ? 19.009 ops/ms > VectorFPtoIntCastOperations.microFloat256ToInteger256 1024 thrpt 5 5425.001 ? 3.136 ops/ms > VectorBitCount.WithSuperword.intLeadingZeroCount 1024 0 thrpt 4 0.994 ? 0.002 ops/us > > > Performance after: > > Benchmark (SIZE) Mode Cnt Score Error Units > VectorFPtoIntCastOperations.microDouble256ToInteger256 512 thrpt 5 19222.048 ? 87.622 ops/ms > VectorFPtoIntCastOperations.microDouble256ToInteger256 1024 thrpt 5 9233.245 ? 123.493 ops/ms > VectorFPtoIntCastOperations.microFloat256ToInteger256 512 thrpt 5 11672.806 ? 10.854 ops/ms > VectorFPtoIntCastOperations.microFloat256ToInteger256 1024 thrpt 5 6009.735 ? 12.173 ops/ms > VectorBitCount.WithSuperword.intLeadingZeroCount 1024 0 thrpt 4 1.039 ? 0.004 ops/us This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/18310 From sgibbons at openjdk.org Wed May 29 22:20:31 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Wed, 29 May 2024 22:20:31 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v48] In-Reply-To: References: Message-ID: > Re-write the IndexOf code without the use of the pcmpestri instruction, only using AVX2 instructions. This change accelerates String.IndexOf on average 1.3x for AVX2. The benchmark numbers: > > > Benchmark Score Latest > StringIndexOf.advancedWithMediumSub 343.573 317.934 0.925375393x > StringIndexOf.advancedWithShortSub1 1039.081 1053.96 1.014319384x > StringIndexOf.advancedWithShortSub2 55.828 110.541 1.980027943x > StringIndexOf.constantPattern 9.361 11.906 1.271872663x > StringIndexOf.searchCharLongSuccess 4.216 4.218 1.000474383x > StringIndexOf.searchCharMediumSuccess 3.133 3.216 1.02649218x > StringIndexOf.searchCharShortSuccess 3.76 3.761 1.000265957x > StringIndexOf.success 9.186 9.713 1.057369911x > StringIndexOf.successBig 14.341 46.343 3.231504079x > StringIndexOfChar.latin1_AVX2_String 6220.918 12154.52 1.953814533x > StringIndexOfChar.latin1_AVX2_char 5503.556 5540.044 1.006629895x > StringIndexOfChar.latin1_SSE4_String 6978.854 6818.689 0.977049957x > StringIndexOfChar.latin1_SSE4_char 5657.499 5474.624 0.967675646x > StringIndexOfChar.latin1_Short_String 7132.541 6863.359 0.962260014x > StringIndexOfChar.latin1_Short_char 16013.389 16162.437 1.009307711x > StringIndexOfChar.latin1_mixed_String 7386.123 14771.622 1.999915517x > StringIndexOfChar.latin1_mixed_char 9901.671 9782.245 0.987938803 Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: Remove duplicate vm.compiler2.enabled ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16753/files - new: https://git.openjdk.org/jdk/pull/16753/files/db0ab75a..ed06edd6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16753&range=47 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16753&range=46-47 Stats: 1 line in 1 file changed: 0 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/16753.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16753/head:pull/16753 PR: https://git.openjdk.org/jdk/pull/16753 From sgibbons at openjdk.org Wed May 29 22:20:31 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Wed, 29 May 2024 22:20:31 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v47] In-Reply-To: <-jpTM1HhjURGU9BNxceoaF1OlfoVla_Jlnj9BYVCOTQ=.088cff2a-eb4d-43a1-8072-4b688af1d244@github.com> References: <-jpTM1HhjURGU9BNxceoaF1OlfoVla_Jlnj9BYVCOTQ=.088cff2a-eb4d-43a1-8072-4b688af1d244@github.com> Message-ID: On Wed, 29 May 2024 21:41:42 GMT, Vladimir Kozlov wrote: >> Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: >> >> Move assert to where it's actually important. > > test/jdk/TEST.ROOT line 103: > >> 101: vm.jvmti \ >> 102: vm.cpu.features \ >> 103: vm.compiler2.enabled \ > > `vm.compiler2.enabled ` already listed at line 91 Thanks! Removed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1619532884 From ccheung at openjdk.org Thu May 30 00:32:19 2024 From: ccheung at openjdk.org (Calvin Cheung) Date: Thu, 30 May 2024 00:32:19 GMT Subject: RFR: 8330198: Add some class loading related perf counters to measure VM startup [v5] In-Reply-To: References: Message-ID: > Adding a few perf counters related to class loading to measure VM startup. The counters are only active if the user specifies `-Xlog:init` in the command line. A diagnostic flag `ProfileClassLinkage` is added to control the new counters. The flag is set to false by default and will be enabled if `-Xlog:init` is specified. > > This change is already in the leyden/premain branch. There are more counters in the branch to measure other stuff. For now, just upstreaming class loader related counters. > > Refer to the [comment](https://bugs.openjdk.org/browse/JDK-8330198?focusedId=14665311&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14665311) in the bug report for an example output. > > Passed tiers 1 - 4 testing. Calvin Cheung has updated the pull request incrementally with one additional commit since the last revision: more comments from @dholmes-ora ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18790/files - new: https://git.openjdk.org/jdk/pull/18790/files/209c4662..7dd08a32 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18790&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18790&range=03-04 Stats: 40 lines in 8 files changed: 8 ins; 7 del; 25 mod Patch: https://git.openjdk.org/jdk/pull/18790.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18790/head:pull/18790 PR: https://git.openjdk.org/jdk/pull/18790 From ccheung at openjdk.org Thu May 30 00:32:19 2024 From: ccheung at openjdk.org (Calvin Cheung) Date: Thu, 30 May 2024 00:32:19 GMT Subject: RFR: 8330198: Add some class loading related perf counters to measure VM startup [v3] In-Reply-To: References: <7yfsvM0ff6gBYLefpro2qTcEMBmCOHd3YICcygItlZs=.d900a439-4932-46e6-b287-d1bf2789f195@github.com> Message-ID: On Wed, 29 May 2024 12:47:20 GMT, David Holmes wrote: >> We are planning to add more diagnostic flags to control different sets of counters. With the current design, the user just needs to specify `-Xlog:init` to enable all the "new" counters. If the `init` logging is enabled by individual flag, the user needs to enable individual flag in the command line. >> Anyway, I think the follow would achieve what you are alluding to? >> >> if (FLAG_IS_CMDLINE(ProfileClassLinkage) && !log_is_enabled(Info, init)) { >> LogConfiguration::configure_stdout(LogLevel::Info, true, LOG_TAGS(init)); >> } >> >> I think it's better to keep the current change. > > This still seems convoluted to me. A -Xlog option shouldn't control anything but logging. If you want a set of counters enabled then use the flag to enable them, and separately use -Xlog:init to print them (though whether "init" is appropriate here is another matter). You could use -Xlog:init+foo to be more selective about which counters. I've modified the fix so that the user needs to specify both `-Xlog:init` and `-XX:+ProfileClassLinkage` for the counters to be printed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18790#discussion_r1619628682 From ccheung at openjdk.org Thu May 30 00:32:19 2024 From: ccheung at openjdk.org (Calvin Cheung) Date: Thu, 30 May 2024 00:32:19 GMT Subject: RFR: 8330198: Add some class loading related perf counters to measure VM startup [v4] In-Reply-To: References: <7AWghiG_TSVMjkfVfA_krBMWZNMRVlakI7kny1tuJ9s=.d4ca3b29-923a-48e6-80d7-97c72ea6e308@github.com> Message-ID: On Wed, 29 May 2024 07:26:51 GMT, David Holmes wrote: >> This function will cover other sets of counters in the future. Maybe changing its name to `log_vm_stats`? > > Regardless there seems to be confusion about which method should be responsible for checking if the requisite logging is enabled. They should not both do it. I've changed `log_vm_stats` to check if both `-Xlog:init` and `-XX:+ProfileClassLinkage` are enabled before calling `ClassLoader::print_counters`. In `print_counters`, assert statements are added to ensure both are enabled. >> If using `tty`, the output would lose the logging tag. The output would look as follows: >> >> ClassLoader: >> clinit: 11ms / 285 events >> link methods: 13ms / 7493 events >> method adapters: 12ms / 571 events >> >> versus with logging tag: >> >> [0.094s][info][init] ClassLoader: >> [0.094s][info][init] clinit: 11ms / 278 events >> [0.094s][info][init] link methods: 13ms / 7336 events >> [0.094s][info][init] method adapters: 12ms / 571 events > > Yes I understand that, but this method is generally printing a ton of stuff to the tty - that is what it is for. If we want to add such stuff to the output then it too should just go to the tty - else it doesn't belong in this method IMO. With my new modified fix, the `log_vm_stats` accepts a `outputStream *st` argument so that in `java.cpp` a `tty` would be passed in. In `threads.cpp`, a `LogStreamHandle` could be passed in. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18790#discussion_r1619630383 PR Review Comment: https://git.openjdk.org/jdk/pull/18790#discussion_r1619630458 From amenkov at openjdk.org Thu May 30 01:09:03 2024 From: amenkov at openjdk.org (Alex Menkov) Date: Thu, 30 May 2024 01:09:03 GMT Subject: RFR: 8311177: Switching to interpreter only mode in carrier thread can lead to crashes In-Reply-To: References: Message-ID: On Tue, 28 May 2024 22:24:53 GMT, Serguei Spitsyn wrote: > Please, review the following `interp-only` issue related to carrier threads. > There are 3 problems fixed here: > - The `EnterInterpOnlyModeClosure::do_threads` is taking the `JvmtiThreadState` with the `jt->jvmti_thread_state()` which is incorrect when we have a deal with a carrier thread. The target state is known at the point when the `HandshakeClosure` is set, so the fix is to pass it as a constructor parameter. > - The `state->is_pending_interp_only_mode())` was processed at mounts only but it has to be processed for unmounts as well. > - The test `test/hotspot/jtreg/serviceability/jvmti/vthread/MethodExitTest/libMethodExitTest.cpp` has a wrong assumption that there can't be `MethodExit` event on the carrier thread when the function `breakpoint_hit1` is being executed. However, it can happen if the virtual thread gets unmounted. > > The fix also includes new test case `vthread/CarrierThreadEventNotification` developed by Patricio. > > Testing: > - Ran new test case locally > - Ran mach5 tiers 1-6 test/hotspot/jtreg/serviceability/jvmti/vthread/CarrierThreadEventNotification/CarrierThreadEventNotification.java line 2: > 1: /* > 2: * Copyright (c) 2023, Oracle and/or its affiliates. All rights reserved. (c) 2024 test/hotspot/jtreg/serviceability/jvmti/vthread/CarrierThreadEventNotification/libCarrierThreadEventNotification.cpp line 2: > 1: /* > 2: * Copyright (c) 2023, Oracle and/or its affiliates. All rights reserved. (c) 2024 test/hotspot/jtreg/serviceability/jvmti/vthread/CarrierThreadEventNotification/libCarrierThreadEventNotification.cpp line 40: > 38: > 39: static const char* CTHREAD_NAME_START = "ForkJoinPool"; > 40: static const int CTHREAD_NAME_START_LEN = (int)strlen("ForkJoinPool"); should be `size_t` (the value is used for `strncmp` which expects `size_t`) test/hotspot/jtreg/serviceability/jvmti/vthread/CarrierThreadEventNotification/libCarrierThreadEventNotification.cpp line 44: > 42: static jint > 43: get_cthreads(JNIEnv* jni, jthread** cthreads_p) { > 44: jthread* tested_cthreads = NULL; Suggestion: jthread* tested_cthreads = nullptr; test/hotspot/jtreg/serviceability/jvmti/vthread/CarrierThreadEventNotification/libCarrierThreadEventNotification.cpp line 44: > 42: static jint > 43: get_cthreads(JNIEnv* jni, jthread** cthreads_p) { > 44: jthread* tested_cthreads = NULL; This local variable has the same name as global. I'd suggest to rename the local var or remove it (and the function should set both `tested_cthreads` and ` cthread_cnt`) test/hotspot/jtreg/serviceability/jvmti/vthread/CarrierThreadEventNotification/libCarrierThreadEventNotification.cpp line 91: > 89: for (int i = 0; i < cthread_cnt; i++) { > 90: jthread thread = tested_cthreads[i]; > 91: char* tname = get_thread_name(jvmti, jni, thread); `tname` is not needed ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19438#discussion_r1619642814 PR Review Comment: https://git.openjdk.org/jdk/pull/19438#discussion_r1619642981 PR Review Comment: https://git.openjdk.org/jdk/pull/19438#discussion_r1619643931 PR Review Comment: https://git.openjdk.org/jdk/pull/19438#discussion_r1619660102 PR Review Comment: https://git.openjdk.org/jdk/pull/19438#discussion_r1619662506 PR Review Comment: https://git.openjdk.org/jdk/pull/19438#discussion_r1619665290 From sspitsyn at openjdk.org Thu May 30 02:04:02 2024 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Thu, 30 May 2024 02:04:02 GMT Subject: RFR: 8311177: Switching to interpreter only mode in carrier thread can lead to crashes In-Reply-To: References: Message-ID: On Thu, 30 May 2024 00:41:28 GMT, Alex Menkov wrote: >> Please, review the following `interp-only` issue related to carrier threads. >> There are 3 problems fixed here: >> - The `EnterInterpOnlyModeClosure::do_threads` is taking the `JvmtiThreadState` with the `jt->jvmti_thread_state()` which is incorrect when we have a deal with a carrier thread. The target state is known at the point when the `HandshakeClosure` is set, so the fix is to pass it as a constructor parameter. >> - The `state->is_pending_interp_only_mode())` was processed at mounts only but it has to be processed for unmounts as well. >> - The test `test/hotspot/jtreg/serviceability/jvmti/vthread/MethodExitTest/libMethodExitTest.cpp` has a wrong assumption that there can't be `MethodExit` event on the carrier thread when the function `breakpoint_hit1` is being executed. However, it can happen if the virtual thread gets unmounted. >> >> The fix also includes new test case `vthread/CarrierThreadEventNotification` developed by Patricio. >> >> Testing: >> - Ran new test case locally >> - Ran mach5 tiers 1-6 > > test/hotspot/jtreg/serviceability/jvmti/vthread/CarrierThreadEventNotification/CarrierThreadEventNotification.java line 2: > >> 1: /* >> 2: * Copyright (c) 2023, Oracle and/or its affiliates. All rights reserved. > > (c) 2024 Fixed, thanks. > test/hotspot/jtreg/serviceability/jvmti/vthread/CarrierThreadEventNotification/libCarrierThreadEventNotification.cpp line 2: > >> 1: /* >> 2: * Copyright (c) 2023, Oracle and/or its affiliates. All rights reserved. > > (c) 2024 Fixed, thanks. > test/hotspot/jtreg/serviceability/jvmti/vthread/CarrierThreadEventNotification/libCarrierThreadEventNotification.cpp line 40: > >> 38: >> 39: static const char* CTHREAD_NAME_START = "ForkJoinPool"; >> 40: static const int CTHREAD_NAME_START_LEN = (int)strlen("ForkJoinPool"); > > should be `size_t` (the value is used for `strncmp` which expects `size_t`) Fixed, thanks. > test/hotspot/jtreg/serviceability/jvmti/vthread/CarrierThreadEventNotification/libCarrierThreadEventNotification.cpp line 44: > >> 42: static jint >> 43: get_cthreads(JNIEnv* jni, jthread** cthreads_p) { >> 44: jthread* tested_cthreads = NULL; > > Suggestion: > > jthread* tested_cthreads = nullptr; Fixed, thanks. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19438#discussion_r1619716426 PR Review Comment: https://git.openjdk.org/jdk/pull/19438#discussion_r1619716604 PR Review Comment: https://git.openjdk.org/jdk/pull/19438#discussion_r1619717881 PR Review Comment: https://git.openjdk.org/jdk/pull/19438#discussion_r1619718490 From dholmes at openjdk.org Thu May 30 02:08:10 2024 From: dholmes at openjdk.org (David Holmes) Date: Thu, 30 May 2024 02:08:10 GMT Subject: RFR: 8333149: ubsan : memset on nullptr target detected in jvmtiEnvBase.cpp get_object_monitor_usage In-Reply-To: References: Message-ID: <8u0gQV87R2tRvHGmuMB94iM8VR9bIC3t_acofhGNG1E=.8bc0312d-7c87-4e38-891c-ca1f3034a4c7@github.com> On Wed, 29 May 2024 12:38:21 GMT, Matthias Baesken wrote: >> When running with ubsan - enabled binaries (--enable-ubsan), >> in the vmTestbase/nsk/jdi tests some cases of memset on nullptr destinations are detected in get_object_monitor_usage . >> >> // null out memory for robustness >> memset(ret.waiters, 0, ret.waiter_count * sizeof(jthread *)); >> memset(ret.notify_waiters, 0, ret.notify_waiter_count * sizeof(jthread *)); >> >> probably we should add checks there. >> Example : >> vmTestbase/nsk/jdi/ObjectReference/entryCount/entrycount002/TestDescription.jtr >> >> debugee.stderr> /src/hotspot/share/prims/jvmtiEnvBase.cpp:1560:11: runtime error: null pointer passed as argument 1, which is declared to never be null >> debugee.stderr> #0 0x7ffb2568559c in JvmtiEnvBase::get_object_monitor_usage(JavaThread*, _jobject*, jvmtiMonitorUsage*) src/hotspot/share/prims/jvmtiEnvBase.cpp:1560 >> debugee.stderr> #1 0x7ffb27987bd7 in VM_GetObjectMonitorUsage::doit() src/hotspot/share/prims/jvmtiEnvBase.hpp:594 >> debugee.stderr> #2 0x7ffb28ddc2dd in VM_Operation::evaluate() src/hotspot/share/runtime/vmOperations.cpp:75 >> debugee.stderr> #3 0x7ffb28deac41 in VMThread::evaluate_operation(VM_Operation*) src/hotspot/share/runtime/vmThread.cpp:283 >> debugee.stderr> #4 0x7ffb28decc4f in VMThread::inner_execute(VM_Operation*) src/hotspot/share/runtime/vmThread.cpp:427 >> debugee.stderr> #5 0x7ffb28ded7b9 in VMThread::loop() src/hotspot/share/runtime/vmThread.cpp:493 >> debugee.stderr> #6 0x7ffb28ded8a7 in VMThread::run() src/hotspot/share/runtime/vmThread.cpp:177 >> debugee.stderr> #7 0x7ffb28b7e31a in Thread::call_run() src/hotspot/share/runtime/thread.cpp:225 >> debugee.stderr> #8 0x7ffb281c4971 in thread_native_entry src/hotspot/os/linux/os_linux.cpp:846 >> debugee.stderr> #9 0x7ffb2df416e9 in start_thread (/lib64/libpthread.so.0+0xa6e9) (BuildId: 2f8d3c2d0f4d7888c2598d2ff6356537f5708a73) >> debugee.stderr> #10 0x7ffb2d51550e in clone (/lib64/libc.so.6+0x11850e) (BuildId: f732026552f6adff988b338e92d466bc81a01c37) >> >> vmTestbase/nsk/jdi/ObjectReference/owningThread/owningthread002/TestDescription.jtr >> >> debugee.stderr> /src/hotspot/share/prims/jvmtiEnvBase.cpp:1561:11: runtime error: null pointer passed as argument 1, which is declared to never be null >> debugee.stderr> #0 0x7f1e070855bb in JvmtiEnvBase::get_object_monitor_usage(JavaThread*, _jobject*, jvmtiMonitorUsage*) src/hotspot/share/prims/jvmtiEnvBase.cpp:1561 >> debugee.stderr> #1 0x7f1e09387bd7 in VM_GetObjectMonitorUsage::doit() src/hotspot/share/prims/jvmtiEnvBase.hpp:594 >> debugee.std... > > Hi Martin and Serguei, thanks for the reviews ! @MBaesken This was not proposed as a trivial PR and so is subject to the 24 hour rule. Please don't push these ubsan "fixes" quickly as we need time to assess their validity and the right way to address them. This fix looks wrong to me because those values cannot be null as it implies the `allocate` function failed which means we would not reach this code! ------------- PR Comment: https://git.openjdk.org/jdk/pull/19450#issuecomment-2138540409 From sspitsyn at openjdk.org Thu May 30 02:13:01 2024 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Thu, 30 May 2024 02:13:01 GMT Subject: RFR: 8311177: Switching to interpreter only mode in carrier thread can lead to crashes In-Reply-To: References: Message-ID: On Thu, 30 May 2024 01:03:01 GMT, Alex Menkov wrote: >> Please, review the following `interp-only` issue related to carrier threads. >> There are 3 problems fixed here: >> - The `EnterInterpOnlyModeClosure::do_threads` is taking the `JvmtiThreadState` with the `jt->jvmti_thread_state()` which is incorrect when we have a deal with a carrier thread. The target state is known at the point when the `HandshakeClosure` is set, so the fix is to pass it as a constructor parameter. >> - The `state->is_pending_interp_only_mode())` was processed at mounts only but it has to be processed for unmounts as well. >> - The test `test/hotspot/jtreg/serviceability/jvmti/vthread/MethodExitTest/libMethodExitTest.cpp` has a wrong assumption that there can't be `MethodExit` event on the carrier thread when the function `breakpoint_hit1` is being executed. However, it can happen if the virtual thread gets unmounted. >> >> The fix also includes new test case `vthread/CarrierThreadEventNotification` developed by Patricio. >> >> Testing: >> - Ran new test case locally >> - Ran mach5 tiers 1-6 > > test/hotspot/jtreg/serviceability/jvmti/vthread/CarrierThreadEventNotification/libCarrierThreadEventNotification.cpp line 44: > >> 42: static jint >> 43: get_cthreads(JNIEnv* jni, jthread** cthreads_p) { >> 44: jthread* tested_cthreads = NULL; > > This local variable has the same name as global. > I'd suggest to rename the local var or remove it (and the function should set both `tested_cthreads` and ` cthread_cnt`) Thanks. Renamed the local to `cthreads` and the global to `carrier_threads`. > test/hotspot/jtreg/serviceability/jvmti/vthread/CarrierThreadEventNotification/libCarrierThreadEventNotification.cpp line 91: > >> 89: for (int i = 0; i < cthread_cnt; i++) { >> 90: jthread thread = tested_cthreads[i]; >> 91: char* tname = get_thread_name(jvmti, jni, thread); > > `tname` is not needed Removed, thanks. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19438#discussion_r1619726804 PR Review Comment: https://git.openjdk.org/jdk/pull/19438#discussion_r1619728368 From kvn at openjdk.org Thu May 30 02:21:13 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 30 May 2024 02:21:13 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v48] In-Reply-To: References: Message-ID: On Wed, 29 May 2024 22:20:31 GMT, Scott Gibbons wrote: >> Re-write the IndexOf code without the use of the pcmpestri instruction, only using AVX2 instructions. This change accelerates String.IndexOf on average 1.3x for AVX2. The benchmark numbers: >> >> >> Benchmark Score Latest >> StringIndexOf.advancedWithMediumSub 343.573 317.934 0.925375393x >> StringIndexOf.advancedWithShortSub1 1039.081 1053.96 1.014319384x >> StringIndexOf.advancedWithShortSub2 55.828 110.541 1.980027943x >> StringIndexOf.constantPattern 9.361 11.906 1.271872663x >> StringIndexOf.searchCharLongSuccess 4.216 4.218 1.000474383x >> StringIndexOf.searchCharMediumSuccess 3.133 3.216 1.02649218x >> StringIndexOf.searchCharShortSuccess 3.76 3.761 1.000265957x >> StringIndexOf.success 9.186 9.713 1.057369911x >> StringIndexOf.successBig 14.341 46.343 3.231504079x >> StringIndexOfChar.latin1_AVX2_String 6220.918 12154.52 1.953814533x >> StringIndexOfChar.latin1_AVX2_char 5503.556 5540.044 1.006629895x >> StringIndexOfChar.latin1_SSE4_String 6978.854 6818.689 0.977049957x >> StringIndexOfChar.latin1_SSE4_char 5657.499 5474.624 0.967675646x >> StringIndexOfChar.latin1_Short_String 7132.541 6863.359 0.962260014x >> StringIndexOfChar.latin1_Short_char 16013.389 16162.437 1.009307711x >> StringIndexOfChar.latin1_mixed_String 7386.123 14771.622 1.999915517x >> StringIndexOfChar.latin1_mixed_char 9901.671 9782.245 0.987938803 > > Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: > > Remove duplicate vm.compiler2.enabled My testing passed. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16753#pullrequestreview-2086978326 From sspitsyn at openjdk.org Thu May 30 02:31:29 2024 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Thu, 30 May 2024 02:31:29 GMT Subject: RFR: 8311177: Switching to interpreter only mode in carrier thread can lead to crashes [v2] In-Reply-To: References: Message-ID: <55rWd_Kn3Jf8kfmkMtVnzRVs_o0KK_jnuZthiS9awDA=.555b5928-38d1-422c-9014-7d4cf31a950d@github.com> > Please, review the following `interp-only` issue related to carrier threads. > There are 3 problems fixed here: > - The `EnterInterpOnlyModeClosure::do_threads` is taking the `JvmtiThreadState` with the `jt->jvmti_thread_state()` which is incorrect when we have a deal with a carrier thread. The target state is known at the point when the `HandshakeClosure` is set, so the fix is to pass it as a constructor parameter. > - The `state->is_pending_interp_only_mode())` was processed at mounts only but it has to be processed for unmounts as well. > - The test `test/hotspot/jtreg/serviceability/jvmti/vthread/MethodExitTest/libMethodExitTest.cpp` has a wrong assumption that there can't be `MethodExit` event on the carrier thread when the function `breakpoint_hit1` is being executed. However, it can happen if the virtual thread gets unmounted. > > The fix also includes new test case `vthread/CarrierThreadEventNotification` developed by Patricio. > > Testing: > - Ran new test case locally > - Ran mach5 tiers 1-6 Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: review: addressed nits in new test ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19438/files - new: https://git.openjdk.org/jdk/pull/19438/files/a0f5d278..2f75975f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19438&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19438&range=00-01 Stats: 18 lines in 2 files changed: 0 ins; 2 del; 16 mod Patch: https://git.openjdk.org/jdk/pull/19438.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19438/head:pull/19438 PR: https://git.openjdk.org/jdk/pull/19438 From sspitsyn at openjdk.org Thu May 30 02:44:06 2024 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Thu, 30 May 2024 02:44:06 GMT Subject: RFR: 8311177: Switching to interpreter only mode in carrier thread can lead to crashes [v2] In-Reply-To: References: Message-ID: <7D1Cchdl8jpFGHWJq0YLCELHQGXz6OLpkxHdLahhgmA=.4b815259-ba39-4ecb-9819-585c0123fca5@github.com> On Wed, 29 May 2024 19:06:57 GMT, Chris Plummer wrote: >> Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: >> >> review: addressed nits in new test > > test/hotspot/jtreg/serviceability/jvmti/vthread/MethodExitTest/libMethodExitTest.cpp line 201: > >> 199: >> 200: // need to reset this value after the breakpoint_hit1 >> 201: received_method_exit_event = JNI_FALSE; > > There was a loom-dev email thread regarding this last year. Seems related. I had concluded that the way the test was written that no MethodExit event should have been received. I'm not sure if I missed something in my analysis or if this failure is a result of your changes: > > https://mail.openjdk.org/pipermail/loom-dev/2023-August/006059.html > https://mail.openjdk.org/pipermail/loom-dev/2023-September/006170.html Thank you for the comment and links to the discussion. In fact, I've observed the MethodExit events really posted between the breakpoint hits: `hit1` and `hit2`. The first one is at the return from the `unmount()` method. I was not able to prove why they should not be expected. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19438#discussion_r1619756552 From heidinga at openjdk.org Thu May 30 04:18:06 2024 From: heidinga at openjdk.org (Dan Heidinga) Date: Thu, 30 May 2024 04:18:06 GMT Subject: RFR: 8293980: Resolve CONSTANT_FieldRef at CDS dump time [v3] In-Reply-To: References: <6tYcoQdH8mEhbBRUoAcOi3Gue7Lz9qIjJh3GpcLKGGE=.0b286773-cc03-434e-88ca-2c9cc5efce67@github.com> Message-ID: On Wed, 29 May 2024 16:03:35 GMT, Ioi Lam wrote: > We could walk `_resolved_field_entries` to find the `ResolvedFieldEntry` whose `_cpool_index` is `123`. However, before the `ResolvedFieldEntry` is resolved, we don't know which bytecode is used to resolve it, so we don't know whether it's for a static field or non-static field. Since we want to filter out the static fields in the PR, we need to: > > * walk the bytecodes to find only getfield/putfield bytecodes > * these bytecodes will give us an index to the `_resolved_field_entries` array > * from there, we discover the original CP index > * then we see if this index is set to true in `preresolve_list` Something's been bothering me about this explanation and I think I've put my finger on it. As you show, the same CP entry can be referenced by both `getstatic` & `getfield` bytecodes though only one will successfully resolve. Walking the bytecodes doesn't actually tell us anything - the resolution status should be different for instance vs static fields which means we're should always be safe to attempt the resolution of fields as instance fields provided we ignore errors. > So we must call `InterpreterRuntime::resolve_get_put()` which performs all the checks for access rights, static-vs-non-static, etc. This call requires a Method parameter, so we must walk all the Methods to find an appropriate one. The Method parameter is necessary for puts to final fields - either `` for static finals or an `` method for instance finals. In either case, the we don't actually resolve the field for puts so it doesn't matter if we pass the "correct" method or not during pre resolution as it will never successfully complete. I think we'd be OK to send any method we want to that call when doing preresolution provided we ignore the errors ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19355#discussion_r1619844681 From david.holmes at oracle.com Thu May 30 04:30:51 2024 From: david.holmes at oracle.com (David Holmes) Date: Thu, 30 May 2024 14:30:51 +1000 Subject: [EXTERNAL] Re: External _JAVA_OPTIONS environment variable sourcing for self-contained applications In-Reply-To: <999d912a-68ad-4c5d-8b88-ef93d3b5d6f0@littlepinkcloud.com> References: <1bc8a1a8-5adf-4a00-800c-cfe626608ae6@oracle.com> <918f3a96-cc75-43a5-b19b-fefe063e82ea@oracle.com> <285f99c9-0689-4059-b9c4-860879332465@xpipe.io> <10c34c7d-fedc-4a55-909c-28180fb74093@xpipe.io> <999d912a-68ad-4c5d-8b88-ef93d3b5d6f0@littlepinkcloud.com> Message-ID: <567cd5e4-f0d4-4c69-be66-2e220dc640eb@oracle.com> On 29/05/2024 8:05 pm, Andrew Haley wrote: > On 5/29/24 09:23, Christopher Schnick wrote: > > So is there any update on this? From the existing discussion, it was > still not apparent whether the hotspot developers consider this being a > problem that should be fixed properly. There were already a few possible > solutions proposed in this thread. > > I don't think there were many that would pass a compatibility and > specification review. "Give developers the option to unset these > variables in the automatically generated launcher script for jlink" > might well be OK, though. It'd be worth a try. I also think this is something that we should see about fixing in jlink, such that the problematic env-vars are omitted. I'm less inclined to support the suggestion that a new flag be added to hotspot that tells it to ignore the env vars, as you will need to add it in jlink anyway. But again I am not familiar with jlink and the jlink developers do not generally hang out on hotspot-dev. So I would suggest filing a JBS issue against jlink or starting a discussion on ... core-libs-dev? David From duke at openjdk.org Thu May 30 05:37:30 2024 From: duke at openjdk.org (Jin Guojie) Date: Thu, 30 May 2024 05:37:30 GMT Subject: RFR: 8331558: AArch64: optimize integer remainder Message-ID: <2sQ52bHtUebVvRZ6dd0zC3So9sN2mm40kXaYLm0nm_k=.5ec3561b-8dc4-4666-af9f-c32e19ff1c04@github.com> On some Arm processors, a separate multiply/subtract is actually faster than the combined instruction. (1) The following test has passed, which shows performance improvement. make test TEST="micro:java.lang.IntegerDivMod" make test TEST="micro:java.lang.LongDivMod" * IntegerDivMod.testDivideRemainderUnsigned baseline(ns/ops) 2223 with this pacth(ns/ops) 1885 improvement(%) 17.93% * IntegerDivMod.testRemainderUnsigned baseline(ns/ops) 2225 with this pacth(ns/ops) 1885 improvement(%) 18.03% * LongDivMod.testDivideRemainderUnsigned baseline(ns/ops) 2231 with this pacth(ns/ops) 1894 improvement(%) 17.79% * LongDivMod.testRemainderUnsigned baseline(ns/ops) 2232 with this pacth(ns/ops) 1891 improvement(%) 18.03% (2) jtreg test has passed make run-test? TEST=tier1 ------------- Commit messages: - 8331558: AArch64: optimize integer remainder Changes: https://git.openjdk.org/jdk/pull/19471/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19471&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8331558 Stats: 73 lines in 4 files changed: 58 ins; 9 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/19471.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19471/head:pull/19471 PR: https://git.openjdk.org/jdk/pull/19471 From crschnick at xpipe.io Thu May 30 05:51:34 2024 From: crschnick at xpipe.io (Christopher Schnick) Date: Thu, 30 May 2024 07:51:34 +0200 Subject: [EXTERNAL] Re: External _JAVA_OPTIONS environment variable sourcing for self-contained applications In-Reply-To: <567cd5e4-f0d4-4c69-be66-2e220dc640eb@oracle.com> References: <1bc8a1a8-5adf-4a00-800c-cfe626608ae6@oracle.com> <918f3a96-cc75-43a5-b19b-fefe063e82ea@oracle.com> <285f99c9-0689-4059-b9c4-860879332465@xpipe.io> <10c34c7d-fedc-4a55-909c-28180fb74093@xpipe.io> <999d912a-68ad-4c5d-8b88-ef93d3b5d6f0@littlepinkcloud.com> <567cd5e4-f0d4-4c69-be66-2e220dc640eb@oracle.com> Message-ID: <9532fa6e-fcba-49ab-a965-762e3056869b@xpipe.io> Alright I see your points. I can definitely crosspost this thread to the core libs mailing list. The only case in which I see this still being mainly a hotspot issue is if there is more global configuration creeping into runtime images apart from environment variables. Is there any other global configuration data always sourced that I'm not aware of like registry values, Java Control Panel settings (is that even still around?), other global configuration files, etc.? On 30/05/2024 06:30, David Holmes wrote: > On 29/05/2024 8:05 pm, Andrew Haley wrote: >> On 5/29/24 09:23, Christopher Schnick wrote: >> ?> So is there any update on this? From the existing discussion, it >> was still not apparent whether the hotspot developers consider this >> being a problem that should be fixed properly. There were already a >> few possible solutions proposed in this thread. >> >> I don't think there were many that would pass a compatibility and >> specification review. "Give developers the option to unset these >> variables in the automatically generated launcher script for jlink" >> might well be OK, though. It'd be worth a try. > > I also think this is something that we should see about fixing in > jlink, such that the problematic env-vars are omitted. I'm less > inclined to support the suggestion that a new flag be added to hotspot > that tells it to ignore the env vars, as you will need to add it in > jlink anyway. > > But again I am not familiar with jlink and the jlink developers do not > generally hang out on hotspot-dev. So I would suggest filing a JBS > issue against jlink or starting a discussion on ... core-libs-dev? > > David > From dholmes at openjdk.org Thu May 30 06:06:09 2024 From: dholmes at openjdk.org (David Holmes) Date: Thu, 30 May 2024 06:06:09 GMT Subject: RFR: 8332259: JvmtiTrace::safe_get_thread_name fails if current thread is in native state [v5] In-Reply-To: References: <2Aorg4EW1Sl5s0tplzUb89ZNUeZg2xsPj3VkJQflzN4=.9072eee0-c481-4da9-ade9-5595ab78030f@github.com> Message-ID: On Wed, 29 May 2024 01:18:57 GMT, Serguei Spitsyn wrote: >> Leonid Mesnik has updated the pull request incrementally with two additional commits since the last revision: >> >> - fixed space. >> - The result is updated. > > src/hotspot/share/prims/jvmtiTrace.cpp line 284: > >> 282: JavaThreadState current_state = JavaThread::cast(Thread::current())->thread_state(); >> 283: if (current_state == _thread_in_native || current_state == _thread_blocked) { >> 284: return "not readable"; > > Nit: I'd suggest to make it more detailed, something like like this: > "" or "" Yes this would have looked better if the text was more clearly an error message with angle brackets. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19275#discussion_r1619989123 From thartmann at openjdk.org Thu May 30 06:28:20 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 30 May 2024 06:28:20 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v48] In-Reply-To: References: Message-ID: On Wed, 29 May 2024 22:20:31 GMT, Scott Gibbons wrote: >> Re-write the IndexOf code without the use of the pcmpestri instruction, only using AVX2 instructions. This change accelerates String.IndexOf on average 1.3x for AVX2. The benchmark numbers: >> >> >> Benchmark Score Latest >> StringIndexOf.advancedWithMediumSub 343.573 317.934 0.925375393x >> StringIndexOf.advancedWithShortSub1 1039.081 1053.96 1.014319384x >> StringIndexOf.advancedWithShortSub2 55.828 110.541 1.980027943x >> StringIndexOf.constantPattern 9.361 11.906 1.271872663x >> StringIndexOf.searchCharLongSuccess 4.216 4.218 1.000474383x >> StringIndexOf.searchCharMediumSuccess 3.133 3.216 1.02649218x >> StringIndexOf.searchCharShortSuccess 3.76 3.761 1.000265957x >> StringIndexOf.success 9.186 9.713 1.057369911x >> StringIndexOf.successBig 14.341 46.343 3.231504079x >> StringIndexOfChar.latin1_AVX2_String 6220.918 12154.52 1.953814533x >> StringIndexOfChar.latin1_AVX2_char 5503.556 5540.044 1.006629895x >> StringIndexOfChar.latin1_SSE4_String 6978.854 6818.689 0.977049957x >> StringIndexOfChar.latin1_SSE4_char 5657.499 5474.624 0.967675646x >> StringIndexOfChar.latin1_Short_String 7132.541 6863.359 0.962260014x >> StringIndexOfChar.latin1_Short_char 16013.389 16162.437 1.009307711x >> StringIndexOfChar.latin1_mixed_String 7386.123 14771.622 1.999915517x >> StringIndexOfChar.latin1_mixed_char 9901.671 9782.245 0.987938803 > > Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: > > Remove duplicate vm.compiler2.enabled Control question: Are we confident with this potentially going into JDK 23 or should we rather postpone to JDK 24? The fork is next week. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16753#issuecomment-2138771509 From epeter at openjdk.org Thu May 30 06:28:21 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 30 May 2024 06:28:21 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v48] In-Reply-To: References: Message-ID: On Wed, 29 May 2024 22:20:31 GMT, Scott Gibbons wrote: >> Re-write the IndexOf code without the use of the pcmpestri instruction, only using AVX2 instructions. This change accelerates String.IndexOf on average 1.3x for AVX2. The benchmark numbers: >> >> >> Benchmark Score Latest >> StringIndexOf.advancedWithMediumSub 343.573 317.934 0.925375393x >> StringIndexOf.advancedWithShortSub1 1039.081 1053.96 1.014319384x >> StringIndexOf.advancedWithShortSub2 55.828 110.541 1.980027943x >> StringIndexOf.constantPattern 9.361 11.906 1.271872663x >> StringIndexOf.searchCharLongSuccess 4.216 4.218 1.000474383x >> StringIndexOf.searchCharMediumSuccess 3.133 3.216 1.02649218x >> StringIndexOf.searchCharShortSuccess 3.76 3.761 1.000265957x >> StringIndexOf.success 9.186 9.713 1.057369911x >> StringIndexOf.successBig 14.341 46.343 3.231504079x >> StringIndexOfChar.latin1_AVX2_String 6220.918 12154.52 1.953814533x >> StringIndexOfChar.latin1_AVX2_char 5503.556 5540.044 1.006629895x >> StringIndexOfChar.latin1_SSE4_String 6978.854 6818.689 0.977049957x >> StringIndexOfChar.latin1_SSE4_char 5657.499 5474.624 0.967675646x >> StringIndexOfChar.latin1_Short_String 7132.541 6863.359 0.962260014x >> StringIndexOfChar.latin1_Short_char 16013.389 16162.437 1.009307711x >> StringIndexOfChar.latin1_mixed_String 7386.123 14771.622 1.999915517x >> StringIndexOfChar.latin1_mixed_char 9901.671 9782.245 0.987938803 > > Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: > > Remove duplicate vm.compiler2.enabled test/jdk/java/lang/String/IndexOf.java line 35: > 33: * @requires vm.cpu.features ~= ".*avx2.*" > 34: * @requires vm.compiler2.enabled > 35: * @run main/othervm -XX:+IgnoreUnrecognizedVMOptions -Xcomp -XX:-TieredCompilation -XX:UseAVX=2 -XX:+UnlockDiagnosticVMOptions -XX:+EnableX86ECoreOpts IndexOf Same here: why is the test AVX2 specific? Could other platforms not also be "tickled" in interesting ways with this test? test/jdk/java/lang/StringBuffer/ECoreIndexOf.java line 29: > 27: * @requires vm.cpu.features ~= ".*avx2.*" > 28: * @requires vm.compiler2.enabled > 29: * @run main/othervm -XX:+UnlockDiagnosticVMOptions -XX:+EnableX86ECoreOpts -XX:UseAVX=2 -Xbatch -XX:-TieredCompilation -XX:CompileCommand=dontinline,ECoreIndexOf.indexOfKernel ECoreIndexOf Does this test really need to be `avx2` specific? Does it even need to be C2 specific? Or can this run on all platforms? test/jdk/java/lang/StringBuffer/IndexOf.java line 188: > 186: } > 187: > 188: } It looks like you just indented basically the whole file by 1 space. Why? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1620019084 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1620016717 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1620013302 From epeter at openjdk.org Thu May 30 06:28:21 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 30 May 2024 06:28:21 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v48] In-Reply-To: References: Message-ID: <_0H1QRaXnFyO9eGa7IvO1l4ZzNK_27D59ebYAphp8eg=.0fe38944-0b61-4a1a-b63d-04315b02117f@github.com> On Thu, 30 May 2024 06:21:36 GMT, Emanuel Peter wrote: >> Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: >> >> Remove duplicate vm.compiler2.enabled > > test/jdk/java/lang/StringBuffer/ECoreIndexOf.java line 29: > >> 27: * @requires vm.cpu.features ~= ".*avx2.*" >> 28: * @requires vm.compiler2.enabled >> 29: * @run main/othervm -XX:+UnlockDiagnosticVMOptions -XX:+EnableX86ECoreOpts -XX:UseAVX=2 -Xbatch -XX:-TieredCompilation -XX:CompileCommand=dontinline,ECoreIndexOf.indexOfKernel ECoreIndexOf > > Does this test really need to be `avx2` specific? Does it even need to be C2 specific? > Or can this run on all platforms? Would be a shame to spend so much time on writing a test and then not apply it everywhere ;) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1620017891 From dholmes at openjdk.org Thu May 30 07:10:02 2024 From: dholmes at openjdk.org (David Holmes) Date: Thu, 30 May 2024 07:10:02 GMT Subject: RFR: 8333129: Move ShrinkHeapInSteps flag to Serial GC In-Reply-To: References: Message-ID: <9r2Ogmj5whqtm0rlT0ecXR_SoZmc6pfcVOtnc273XTo=.87531f95-a7b6-4d67-8133-f8c180dd34e9@github.com> On Wed, 29 May 2024 12:36:40 GMT, Zhengyu Gu wrote: > A trivial change that moves Serial GC specific flag `ShrinkHeapInSteps` to `serial_globals.hpp` @zhengyu123 I don't think we can do this in quite such a direct way. This flag was added in JDK 9 under [JDK-8146436](https://bugs.openjdk.org/browse/JDK-8146436) and applied to all GC's AFAICS. Over time it seems to have been relegated to only working with SerialGC, but I can still find articles that reference it for GC tuning e.g. https://docs.oracle.com/en/java/javase/22/gctuning/factors-affecting-garbage-collection-performance.html So if this is indeed only for SerialGC now then we need to check when it stopped applying elsewhere and whether all the relevant docs have been updated. Then I think we would need to deprecate it for non-Serial (which is tricky because the flag deprecation process isn't intended to be runtime selective like that). I need to flag this directly with our GC team ------------- Changes requested by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19452#pullrequestreview-2087407510 From duke at openjdk.org Thu May 30 07:12:31 2024 From: duke at openjdk.org (Jin Guojie) Date: Thu, 30 May 2024 07:12:31 GMT Subject: RFR: 8331558: AArch64: optimize integer remainder [v2] In-Reply-To: <2sQ52bHtUebVvRZ6dd0zC3So9sN2mm40kXaYLm0nm_k=.5ec3561b-8dc4-4666-af9f-c32e19ff1c04@github.com> References: <2sQ52bHtUebVvRZ6dd0zC3So9sN2mm40kXaYLm0nm_k=.5ec3561b-8dc4-4666-af9f-c32e19ff1c04@github.com> Message-ID: > On some Arm processors, a separate multiply/subtract is actually faster than the combined instruction. > > (1) The following test has passed, which shows performance improvement. > > make test TEST="micro:java.lang.IntegerDivMod" > make test TEST="micro:java.lang.LongDivMod" > > * IntegerDivMod.testDivideRemainderUnsigned baseline(ns/ops) 2223 with this pacth(ns/ops) 1885 improvement(%) 17.93% > > * IntegerDivMod.testRemainderUnsigned baseline(ns/ops) 2225 with this pacth(ns/ops) 1885 improvement(%) 18.03% > > * LongDivMod.testDivideRemainderUnsigned baseline(ns/ops) 2231 with this pacth(ns/ops) 1894 improvement(%) 17.79% > > * LongDivMod.testRemainderUnsigned baseline(ns/ops) 2232 with this pacth(ns/ops) 1891 improvement(%) 18.03% > > (2) jtreg test has passed > > make run-test? TEST=tier1 Jin Guojie has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: - Merge branch 'openjdk:master' into dev0530 - 8331558: AArch64: optimize integer remainder On some Arm processors, a separate multiply/subtract is actually faster than the combined instruction. (1) The following test has passed, which shows performance improvement. make test TEST="micro:java.lang.IntegerDivMod" make test TEST="micro:java.lang.LongDivMod" * IntegerDivMod.testDivideRemainderUnsigned baseline(ns/ops) 2223 with this pacth(ns/ops) 1885 improvement(%) 17.93% * IntegerDivMod.testRemainderUnsigned baseline(ns/ops) 2225 with this pacth(ns/ops) 1885 improvement(%) 18.03% * LongDivMod.testDivideRemainderUnsigned baseline(ns/ops) 2231 with this pacth(ns/ops) 1894 improvement(%) 17.79% * LongDivMod.testRemainderUnsigned baseline(ns/ops) 2232 with this pacth(ns/ops) 1891 improvement(%) 18.03% (2) jtreg test has passed make run-test? TEST=tier1 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19471/files - new: https://git.openjdk.org/jdk/pull/19471/files/12af7ac0..21af82e4 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19471&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19471&range=00-01 Stats: 1143 lines in 39 files changed: 673 ins; 235 del; 235 mod Patch: https://git.openjdk.org/jdk/pull/19471.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19471/head:pull/19471 PR: https://git.openjdk.org/jdk/pull/19471 From duke at openjdk.org Thu May 30 07:45:31 2024 From: duke at openjdk.org (kuaiwei) Date: Thu, 30 May 2024 07:45:31 GMT Subject: RFR: 8325821: [REDO] use "dmb.ishst+dmb.ishld" for release barrier [v6] In-Reply-To: References: Message-ID: > he origin patch for https://bugs.openjdk.org/browse/JDK-8324186 has 2 issues: > 1 It show regression in some platform, like Apple silicon in mac os > 2 Can not handle instruction sequence like "dmb.ishld; dmb.ishst; dmb.ishld; dmb.ishld" > > It can be fixed by: > 1 Enable AlwaysMergeDMB by default, only disable it in architecture we can see performance improvement (N1 or N2) > 2 Check the special pattern and merge the subsequent dmb. > > It also fix a bug when code buffer is expanding, st/ld/dmb can not be merged. I added unit tests for these. > > This patch still has a unhandled case. Insts like "dmb.ishld; dmb.ishst; dmb.ish", it will merge the last 2 instructions and can not merge all three. Because when emitting dmb.ish, if merge all previous dmbs, the code buffer will shrink the size. I think it may break some resumption and think it's not a common pattern. > > In previous PR https://github.com/openjdk/jdk/pull/18467 , I tried an implementation to use state machine for merging. But it looks risky to pending instruction during emitting. kuaiwei has updated the pull request incrementally with one additional commit since the last revision: Add comment in aarch64.ad ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19278/files - new: https://git.openjdk.org/jdk/pull/19278/files/8ef3e037..7df2103f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19278&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19278&range=04-05 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/19278.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19278/head:pull/19278 PR: https://git.openjdk.org/jdk/pull/19278 From kbarrett at openjdk.org Thu May 30 07:53:01 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Thu, 30 May 2024 07:53:01 GMT Subject: RFR: 8333129: Move ShrinkHeapInSteps flag to Serial GC In-Reply-To: <9r2Ogmj5whqtm0rlT0ecXR_SoZmc6pfcVOtnc273XTo=.87531f95-a7b6-4d67-8133-f8c180dd34e9@github.com> References: <9r2Ogmj5whqtm0rlT0ecXR_SoZmc6pfcVOtnc273XTo=.87531f95-a7b6-4d67-8133-f8c180dd34e9@github.com> Message-ID: On Thu, 30 May 2024 07:06:59 GMT, David Holmes wrote: > @zhengyu123 I don't think we can do this in quite such a direct way. This flag was added in JDK 9 under [JDK-8146436](https://bugs.openjdk.org/browse/JDK-8146436) and applied to all GC's AFAICS. Over time it seems to have been relegated to only working with SerialGC, but I can still find articles that reference it for GC tuning e.g. The flag implementation was added in CardGeneration. I'm not certain, but I think that class was only ever used by Serial and CMS. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19452#issuecomment-2138893102 From kbarrett at openjdk.org Thu May 30 08:08:01 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Thu, 30 May 2024 08:08:01 GMT Subject: RFR: 8333129: Move ShrinkHeapInSteps flag to Serial GC In-Reply-To: References: Message-ID: On Wed, 29 May 2024 12:36:40 GMT, Zhengyu Gu wrote: > A trivial change that moves Serial GC specific flag `ShrinkHeapInSteps` to `serial_globals.hpp` Looks good. Disavowing triviality to give @dholmes-ora a chance to comment further. ------------- Marked as reviewed by kbarrett (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19452#pullrequestreview-2087541754 From jsjolen at openjdk.org Thu May 30 08:19:56 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Thu, 30 May 2024 08:19:56 GMT Subject: RFR: 8312132: Add tracking of multiple address spaces in NMT [v116] In-Reply-To: References: Message-ID: > Hi, > > This PR introduces a new abstraction to NMT, named `MemoryFileTracker`. Today, NMT does not track any memory outside of the virtual memory address space. This means that if you allocated memory in something such as a memory-backed file and use `mmap` to map into that memory, then you'll have trouble reporting this to NMT. This is the situation that ZGC is in, and that is what this patch attempts to fix. > > ## `MemoryFileTracker` > > The `MemoryFileTracker` adds the ability of adding new virtual memory address spaces to NMT and committing memory to these, the basic API is: > > ```c++ > static MemoryFile* make_device(const char* descriptive_name); > static void free_device(MemoryFile* device); > > static void allocate_memory(MemoryFile* device, size_t offset, size_t size, > MEMFLAGS flag, const NativeCallStack& stack); > static void free_memory(MemoryFile* device, size_t offset, size_t size); > > > It is easiest to see how this is used by looking at what ZGC's `ZNMT` class does: > > ```c++ > void ZNMT::reserve(zaddress_unsafe start, size_t size) { > MemTracker::record_virtual_memory_reserve((address)start, size, CALLER_PC, mtJavaHeap); > } > void ZNMT::commit(zoffset offset, size_t size) { > MemTracker::allocate_memory_in(ZNMT::_device, static_cast(offset), size, mtJavaHeap, CALLER_PC); > } > void ZNMT::uncommit(zoffset offset, size_t size) { > MemTracker::free_memory_in(ZNMT::_device, (size_t)offset, size); > } > > void ZNMT::map(zaddress_unsafe addr, size_t size, zoffset offset) { > // NMT doesn't track mappings at the moment. > } > void ZNMT::unmap(zaddress_unsafe addr, size_t size) { > // NMT doesn't track mappings at the moment. > } > > > As you can see, any mapping between reserved regions and device-allocated memory is not recorded in NMT. This means that in detailed mode you only get reserved regions printed for the reserved memory, the device-allocated memory is reported separately. When performing summary reporting any memory allocated via these devices is added to the corresponding `MEMFLAGS` as `committed` memory. > > This patch is also acting as a base on which we deploy multiple new backend ideas to NMT. These ideas are: > > 1. Implement VMA tracking using a balanced binary tree approach. Today's `VirtualMemoryTracker`'s usage of linked lists is slow and brittle, we'd like to move away from it. Our Treap-based approach in this patch gives a performance boost such that we see 25x better performance in a benchmark. The idea and draft of this... Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: Make Treap and VMATree NONCOPYABLE as an accidental copy in tests caused double-free crashes ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18289/files - new: https://git.openjdk.org/jdk/pull/18289/files/9c37beeb..e401a7a4 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18289&range=115 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18289&range=114-115 Stats: 7 lines in 3 files changed: 4 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/18289.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18289/head:pull/18289 PR: https://git.openjdk.org/jdk/pull/18289 From jsjolen at openjdk.org Thu May 30 08:27:37 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Thu, 30 May 2024 08:27:37 GMT Subject: RFR: 8312132: Add tracking of multiple address spaces in NMT [v117] In-Reply-To: References: Message-ID: > Hi, > > This PR introduces a new abstraction to NMT, named `MemoryFileTracker`. Today, NMT does not track any memory outside of the virtual memory address space. This means that if you allocated memory in something such as a memory-backed file and use `mmap` to map into that memory, then you'll have trouble reporting this to NMT. This is the situation that ZGC is in, and that is what this patch attempts to fix. > > ## `MemoryFileTracker` > > The `MemoryFileTracker` adds the ability of adding new virtual memory address spaces to NMT and committing memory to these, the basic API is: > > ```c++ > static MemoryFile* make_device(const char* descriptive_name); > static void free_device(MemoryFile* device); > > static void allocate_memory(MemoryFile* device, size_t offset, size_t size, > MEMFLAGS flag, const NativeCallStack& stack); > static void free_memory(MemoryFile* device, size_t offset, size_t size); > > > It is easiest to see how this is used by looking at what ZGC's `ZNMT` class does: > > ```c++ > void ZNMT::reserve(zaddress_unsafe start, size_t size) { > MemTracker::record_virtual_memory_reserve((address)start, size, CALLER_PC, mtJavaHeap); > } > void ZNMT::commit(zoffset offset, size_t size) { > MemTracker::allocate_memory_in(ZNMT::_device, static_cast(offset), size, mtJavaHeap, CALLER_PC); > } > void ZNMT::uncommit(zoffset offset, size_t size) { > MemTracker::free_memory_in(ZNMT::_device, (size_t)offset, size); > } > > void ZNMT::map(zaddress_unsafe addr, size_t size, zoffset offset) { > // NMT doesn't track mappings at the moment. > } > void ZNMT::unmap(zaddress_unsafe addr, size_t size) { > // NMT doesn't track mappings at the moment. > } > > > As you can see, any mapping between reserved regions and device-allocated memory is not recorded in NMT. This means that in detailed mode you only get reserved regions printed for the reserved memory, the device-allocated memory is reported separately. When performing summary reporting any memory allocated via these devices is added to the corresponding `MEMFLAGS` as `committed` memory. > > This patch is also acting as a base on which we deploy multiple new backend ideas to NMT. These ideas are: > > 1. Implement VMA tracking using a balanced binary tree approach. Today's `VirtualMemoryTracker`'s usage of linked lists is slow and brittle, we'd like to move away from it. Our Treap-based approach in this patch gives a performance boost such that we see 25x better performance in a benchmark. The idea and draft of this... Johan Sj?len has updated the pull request incrementally with two additional commits since the last revision: - Just return int - Using functions for EXPECTs messes up reports when tests fail ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18289/files - new: https://git.openjdk.org/jdk/pull/18289/files/e401a7a4..ed1e1f21 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18289&range=116 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18289&range=115-116 Stats: 19 lines in 1 file changed: 0 ins; 7 del; 12 mod Patch: https://git.openjdk.org/jdk/pull/18289.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18289/head:pull/18289 PR: https://git.openjdk.org/jdk/pull/18289 From duke at openjdk.org Thu May 30 08:30:06 2024 From: duke at openjdk.org (kuaiwei) Date: Thu, 30 May 2024 08:30:06 GMT Subject: RFR: 8325821: [REDO] use "dmb.ishst+dmb.ishld" for release barrier [v5] In-Reply-To: References: Message-ID: On Wed, 29 May 2024 11:34:09 GMT, Aleksey Shipilev wrote: >> kuaiwei has updated the pull request incrementally with one additional commit since the last revision: >> >> Remove tailing white space > > Note that current jcstress run would likely fail due to [JDK-8332670](https://bugs.openjdk.org/browse/JDK-8332670). > > This looks ready to me. I think we need jcstress with C1 and C2, and we should be done. @shipilev , do you agree? > > Yes. Just run jcstress with defaults, maybe limiting the time budget to about 24 hours, and we are done. Default configuration would work through different combinations of C1/C2 compilations for all actors, which is what we want to check for this change: that we don't mess up the barrier emitting scheme in different compilers/interpreters. I can run the jcstress test. I will run fastdebug build with `java -jar jcstress-latest.jar -tb 24h` Is it the correct command ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/19278#issuecomment-2138995483 From shade at openjdk.org Thu May 30 08:30:07 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 30 May 2024 08:30:07 GMT Subject: RFR: 8325821: [REDO] use "dmb.ishst+dmb.ishld" for release barrier [v5] In-Reply-To: References: Message-ID: On Thu, 30 May 2024 08:24:55 GMT, kuaiwei wrote: > I can run the jcstress test. I will run fastdebug build with `java -jar jcstress-latest.jar -tb 24h` Is it the correct command ? Yes, I think so. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19278#issuecomment-2139001982 From duke at openjdk.org Thu May 30 08:30:08 2024 From: duke at openjdk.org (kuaiwei) Date: Thu, 30 May 2024 08:30:08 GMT Subject: RFR: 8325821: [REDO] use "dmb.ishst+dmb.ishld" for release barrier [v6] In-Reply-To: References: <7eML4nr0XN1_QVOO_2tk-yXf8W578S4qb1kA3AoaU8w=.81b03ff5-7ba8-496d-acfe-285ba3de2004@github.com> Message-ID: On Wed, 29 May 2024 11:07:25 GMT, Andrew Haley wrote: >> I checked code again. They will be merged if enable AlwaysMergeDMB. So we can skip the check. > > Add a comment: > > `// These will be merged if AlwaysMergeDMB is enabled.` Comment added. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19278#discussion_r1620242166 From jsjolen at openjdk.org Thu May 30 08:30:31 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Thu, 30 May 2024 08:30:31 GMT Subject: RFR: 8312132: Add tracking of multiple address spaces in NMT [v118] In-Reply-To: References: Message-ID: > Hi, > > This PR introduces a new abstraction to NMT, named `MemoryFileTracker`. Today, NMT does not track any memory outside of the virtual memory address space. This means that if you allocated memory in something such as a memory-backed file and use `mmap` to map into that memory, then you'll have trouble reporting this to NMT. This is the situation that ZGC is in, and that is what this patch attempts to fix. > > ## `MemoryFileTracker` > > The `MemoryFileTracker` adds the ability of adding new virtual memory address spaces to NMT and committing memory to these, the basic API is: > > ```c++ > static MemoryFile* make_device(const char* descriptive_name); > static void free_device(MemoryFile* device); > > static void allocate_memory(MemoryFile* device, size_t offset, size_t size, > MEMFLAGS flag, const NativeCallStack& stack); > static void free_memory(MemoryFile* device, size_t offset, size_t size); > > > It is easiest to see how this is used by looking at what ZGC's `ZNMT` class does: > > ```c++ > void ZNMT::reserve(zaddress_unsafe start, size_t size) { > MemTracker::record_virtual_memory_reserve((address)start, size, CALLER_PC, mtJavaHeap); > } > void ZNMT::commit(zoffset offset, size_t size) { > MemTracker::allocate_memory_in(ZNMT::_device, static_cast(offset), size, mtJavaHeap, CALLER_PC); > } > void ZNMT::uncommit(zoffset offset, size_t size) { > MemTracker::free_memory_in(ZNMT::_device, (size_t)offset, size); > } > > void ZNMT::map(zaddress_unsafe addr, size_t size, zoffset offset) { > // NMT doesn't track mappings at the moment. > } > void ZNMT::unmap(zaddress_unsafe addr, size_t size) { > // NMT doesn't track mappings at the moment. > } > > > As you can see, any mapping between reserved regions and device-allocated memory is not recorded in NMT. This means that in detailed mode you only get reserved regions printed for the reserved memory, the device-allocated memory is reported separately. When performing summary reporting any memory allocated via these devices is added to the corresponding `MEMFLAGS` as `committed` memory. > > This patch is also acting as a base on which we deploy multiple new backend ideas to NMT. These ideas are: > > 1. Implement VMA tracking using a balanced binary tree approach. Today's `VirtualMemoryTracker`'s usage of linked lists is slow and brittle, we'd like to move away from it. Our Treap-based approach in this patch gives a performance boost such that we see 25x better performance in a benchmark. The idea and draft of this... Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: Fix introduced bugs in tests ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18289/files - new: https://git.openjdk.org/jdk/pull/18289/files/ed1e1f21..924bce04 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18289&range=117 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18289&range=116-117 Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/18289.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18289/head:pull/18289 PR: https://git.openjdk.org/jdk/pull/18289 From aph at openjdk.org Thu May 30 08:35:03 2024 From: aph at openjdk.org (Andrew Haley) Date: Thu, 30 May 2024 08:35:03 GMT Subject: RFR: 8331558: AArch64: optimize integer remainder [v2] In-Reply-To: References: <2sQ52bHtUebVvRZ6dd0zC3So9sN2mm40kXaYLm0nm_k=.5ec3561b-8dc4-4666-af9f-c32e19ff1c04@github.com> Message-ID: On Thu, 30 May 2024 07:12:31 GMT, Jin Guojie wrote: >> On some Arm processors, a separate multiply/subtract is actually faster than the combined instruction. >> >> (1) The following test has passed, which shows performance improvement. >> >> make test TEST="micro:java.lang.IntegerDivMod" >> make test TEST="micro:java.lang.LongDivMod" >> >> * IntegerDivMod.testDivideRemainderUnsigned baseline(ns/ops) 2223 with this pacth(ns/ops) 1885 improvement(%) 17.93% >> >> * IntegerDivMod.testRemainderUnsigned baseline(ns/ops) 2225 with this pacth(ns/ops) 1885 improvement(%) 18.03% >> >> * LongDivMod.testDivideRemainderUnsigned baseline(ns/ops) 2231 with this pacth(ns/ops) 1894 improvement(%) 17.79% >> >> * LongDivMod.testRemainderUnsigned baseline(ns/ops) 2232 with this pacth(ns/ops) 1891 improvement(%) 18.03% >> >> (2) jtreg test has passed >> >> make run-test? TEST=tier1 > > Jin Guojie has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: > > - Merge branch 'openjdk:master' into dev0530 > - 8331558: AArch64: optimize integer remainder > > On some Arm processors, a separate multiply/subtract is actually faster than the combined instruction. > > (1) The following test has passed, which shows performance improvement. > > make test TEST="micro:java.lang.IntegerDivMod" > make test TEST="micro:java.lang.LongDivMod" > > * IntegerDivMod.testDivideRemainderUnsigned > baseline(ns/ops) 2223 > with this pacth(ns/ops) 1885 > improvement(%) 17.93% > > * IntegerDivMod.testRemainderUnsigned > baseline(ns/ops) 2225 > with this pacth(ns/ops) 1885 > improvement(%) 18.03% > > * LongDivMod.testDivideRemainderUnsigned > baseline(ns/ops) 2231 > with this pacth(ns/ops) 1894 > improvement(%) 17.79% > > * LongDivMod.testRemainderUnsigned > baseline(ns/ops) 2232 > with this pacth(ns/ops) 1891 > improvement(%) 18.03% > > (2) jtreg test has passed > > make run-test? TEST=tier1 src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp line 2299: > 2297: if (VM_Version::is_neoverse()) { > 2298: mul(rscratch2, Rn, Rm); > 2299: sub(Rd, Ra, rscratch2); It's too risky to use `rscratch2` here. Instead, please make another version of `msub` that take a scratch register as an argument. We'll then use the new `msub` in places that we know are safe, such as compiler-generated code, and it won't cause any future maintenance surprises. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19471#discussion_r1620253887 From duke at openjdk.org Thu May 30 09:23:13 2024 From: duke at openjdk.org (Jin Guojie) Date: Thu, 30 May 2024 09:23:13 GMT Subject: RFR: 8331558: AArch64: optimize integer remainder [v3] In-Reply-To: <2sQ52bHtUebVvRZ6dd0zC3So9sN2mm40kXaYLm0nm_k=.5ec3561b-8dc4-4666-af9f-c32e19ff1c04@github.com> References: <2sQ52bHtUebVvRZ6dd0zC3So9sN2mm40kXaYLm0nm_k=.5ec3561b-8dc4-4666-af9f-c32e19ff1c04@github.com> Message-ID: > On some Arm processors, a separate multiply/subtract is actually faster than the combined instruction. > > (1) The following test has passed, which shows performance improvement. > > make test TEST="micro:java.lang.IntegerDivMod" > make test TEST="micro:java.lang.LongDivMod" > > * IntegerDivMod.testDivideRemainderUnsigned baseline(ns/ops) 2223 with this pacth(ns/ops) 1885 improvement(%) 17.93% > > * IntegerDivMod.testRemainderUnsigned baseline(ns/ops) 2225 with this pacth(ns/ops) 1885 improvement(%) 18.03% > > * LongDivMod.testDivideRemainderUnsigned baseline(ns/ops) 2231 with this pacth(ns/ops) 1894 improvement(%) 17.79% > > * LongDivMod.testRemainderUnsigned baseline(ns/ops) 2232 with this pacth(ns/ops) 1891 improvement(%) 18.03% > > (2) jtreg test has passed > > make run-test? TEST=tier1 Jin Guojie has updated the pull request incrementally with two additional commits since the last revision: - Merge branch 'dev0530' of https://github.com/jinguojie-alibaba/jdk into dev0530 - MacroAssembler::msub() takes a scratch register as an argument ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19471/files - new: https://git.openjdk.org/jdk/pull/19471/files/21af82e4..73c7bdc0 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19471&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19471&range=01-02 Stats: 17 lines in 4 files changed: 2 ins; 0 del; 15 mod Patch: https://git.openjdk.org/jdk/pull/19471.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19471/head:pull/19471 PR: https://git.openjdk.org/jdk/pull/19471 From gcao at openjdk.org Thu May 30 09:23:17 2024 From: gcao at openjdk.org (Gui Cao) Date: Thu, 30 May 2024 09:23:17 GMT Subject: RFR: 8333245: RISC-V: UseRVV option can't be enabled after JDK-8316859 Message-ID: <14Zzi3W09YcO5NtfL7gUQwY0NDpexCOTdj4reavKKTI=.e8c2822c-6064-47c9-88d4-de50b980436a@github.com> Because some dev boards only support RVV version 0.7, In [JDK-8316859](https://bugs.openjdk.org/browse/JDK-8316859) we masked the use of HWCAP to probe for RVV extensions, and in the meantime, we can use hwprobe to probe for V extensions in Linux kernel 6.5 and above. But recently we got Banana Pi BPI-F3 board (has RVV1.0), but his kernel is 6.1.15, so the V extensions detected by HWCAP are masked. And we get the warning: `RVV is not supported on this CPU` when we enable UseRVV with the command, and we can't enable UseRVV correctly. Without Patch: zifeihan at bananapif3:~/jre/jdk/bin$ ./java -XX:+PrintFlagsFinal -XX:+UseRVV -version | grep UseRVV OpenJDK 64-Bit Server VM warning: RVV is not supported on this CPU bool UseRVV = false {ARCH product} {command line} bool UseRVVForBigIntegerShiftIntrinsics = false {ARCH product} {default} openjdk version "23-internal" 2024-09-17 OpenJDK Runtime Environment (build 23-internal-adhoc.zifeihan.jdk) OpenJDK 64-Bit Server VM (build 23-internal-adhoc.zifeihan.jdk, mixed mode) With Patch: zifeihan at bananapif3:~/jre/jdk/bin$ ./java -XX:+PrintFlagsFinal -XX:+UseRVV -version | grep UseRVV bool UseRVV = true {ARCH product} {command line} bool UseRVVForBigIntegerShiftIntrinsics = true {ARCH product} {default} openjdk version "23-internal" 2024-09-17 OpenJDK Runtime Environment (build 23-internal-adhoc.zifeihan.jdk) OpenJDK 64-Bit Server VM (build 23-internal-adhoc.zifeihan.jdk, mixed mode) ------------- Commit messages: - 8333245: RISC-V: UseRVV option can't be enabled after JDK-8316859 Changes: https://git.openjdk.org/jdk/pull/19472/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19472&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8333245 Stats: 8 lines in 2 files changed: 0 ins; 6 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/19472.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19472/head:pull/19472 PR: https://git.openjdk.org/jdk/pull/19472 From jsjolen at openjdk.org Thu May 30 09:31:45 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Thu, 30 May 2024 09:31:45 GMT Subject: RFR: 8312132: Add tracking of multiple address spaces in NMT [v119] In-Reply-To: References: Message-ID: > Hi, > > This PR introduces a new abstraction to NMT, named `MemoryFileTracker`. Today, NMT does not track any memory outside of the virtual memory address space. This means that if you allocated memory in something such as a memory-backed file and use `mmap` to map into that memory, then you'll have trouble reporting this to NMT. This is the situation that ZGC is in, and that is what this patch attempts to fix. > > ## `MemoryFileTracker` > > The `MemoryFileTracker` adds the ability of adding new virtual memory address spaces to NMT and committing memory to these, the basic API is: > > ```c++ > static MemoryFile* make_device(const char* descriptive_name); > static void free_device(MemoryFile* device); > > static void allocate_memory(MemoryFile* device, size_t offset, size_t size, > MEMFLAGS flag, const NativeCallStack& stack); > static void free_memory(MemoryFile* device, size_t offset, size_t size); > > > It is easiest to see how this is used by looking at what ZGC's `ZNMT` class does: > > ```c++ > void ZNMT::reserve(zaddress_unsafe start, size_t size) { > MemTracker::record_virtual_memory_reserve((address)start, size, CALLER_PC, mtJavaHeap); > } > void ZNMT::commit(zoffset offset, size_t size) { > MemTracker::allocate_memory_in(ZNMT::_device, static_cast(offset), size, mtJavaHeap, CALLER_PC); > } > void ZNMT::uncommit(zoffset offset, size_t size) { > MemTracker::free_memory_in(ZNMT::_device, (size_t)offset, size); > } > > void ZNMT::map(zaddress_unsafe addr, size_t size, zoffset offset) { > // NMT doesn't track mappings at the moment. > } > void ZNMT::unmap(zaddress_unsafe addr, size_t size) { > // NMT doesn't track mappings at the moment. > } > > > As you can see, any mapping between reserved regions and device-allocated memory is not recorded in NMT. This means that in detailed mode you only get reserved regions printed for the reserved memory, the device-allocated memory is reported separately. When performing summary reporting any memory allocated via these devices is added to the corresponding `MEMFLAGS` as `committed` memory. > > This patch is also acting as a base on which we deploy multiple new backend ideas to NMT. These ideas are: > > 1. Implement VMA tracking using a balanced binary tree approach. Today's `VirtualMemoryTracker`'s usage of linked lists is slow and brittle, we'd like to move away from it. Our Treap-based approach in this patch gives a performance boost such that we see 25x better performance in a benchmark. The idea and draft of this... Johan Sj?len has updated the pull request incrementally with two additional commits since the last revision: - Remove comma from Copyright - Print via stringStream first ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18289/files - new: https://git.openjdk.org/jdk/pull/18289/files/924bce04..c8b90112 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18289&range=118 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18289&range=117-118 Stats: 7 lines in 2 files changed: 4 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/18289.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18289/head:pull/18289 PR: https://git.openjdk.org/jdk/pull/18289 From jsjolen at openjdk.org Thu May 30 09:49:44 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Thu, 30 May 2024 09:49:44 GMT Subject: RFR: 8312132: Add tracking of multiple address spaces in NMT [v120] In-Reply-To: References: Message-ID: > Hi, > > This PR introduces a new abstraction to NMT, named `MemoryFileTracker`. Today, NMT does not track any memory outside of the virtual memory address space. This means that if you allocated memory in something such as a memory-backed file and use `mmap` to map into that memory, then you'll have trouble reporting this to NMT. This is the situation that ZGC is in, and that is what this patch attempts to fix. > > ## `MemoryFileTracker` > > The `MemoryFileTracker` adds the ability of adding new virtual memory address spaces to NMT and committing memory to these, the basic API is: > > ```c++ > static MemoryFile* make_device(const char* descriptive_name); > static void free_device(MemoryFile* device); > > static void allocate_memory(MemoryFile* device, size_t offset, size_t size, > MEMFLAGS flag, const NativeCallStack& stack); > static void free_memory(MemoryFile* device, size_t offset, size_t size); > > > It is easiest to see how this is used by looking at what ZGC's `ZNMT` class does: > > ```c++ > void ZNMT::reserve(zaddress_unsafe start, size_t size) { > MemTracker::record_virtual_memory_reserve((address)start, size, CALLER_PC, mtJavaHeap); > } > void ZNMT::commit(zoffset offset, size_t size) { > MemTracker::allocate_memory_in(ZNMT::_device, static_cast(offset), size, mtJavaHeap, CALLER_PC); > } > void ZNMT::uncommit(zoffset offset, size_t size) { > MemTracker::free_memory_in(ZNMT::_device, (size_t)offset, size); > } > > void ZNMT::map(zaddress_unsafe addr, size_t size, zoffset offset) { > // NMT doesn't track mappings at the moment. > } > void ZNMT::unmap(zaddress_unsafe addr, size_t size) { > // NMT doesn't track mappings at the moment. > } > > > As you can see, any mapping between reserved regions and device-allocated memory is not recorded in NMT. This means that in detailed mode you only get reserved regions printed for the reserved memory, the device-allocated memory is reported separately. When performing summary reporting any memory allocated via these devices is added to the corresponding `MEMFLAGS` as `committed` memory. > > This patch is also acting as a base on which we deploy multiple new backend ideas to NMT. These ideas are: > > 1. Implement VMA tracking using a balanced binary tree approach. Today's `VirtualMemoryTracker`'s usage of linked lists is slow and brittle, we'd like to move away from it. Our Treap-based approach in this patch gives a performance boost such that we see 25x better performance in a benchmark. The idea and draft of this... Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: Device shouldn't be nullptr ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18289/files - new: https://git.openjdk.org/jdk/pull/18289/files/c8b90112..cebd8759 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18289&range=119 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18289&range=118-119 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/18289.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18289/head:pull/18289 PR: https://git.openjdk.org/jdk/pull/18289 From aph at openjdk.org Thu May 30 10:19:04 2024 From: aph at openjdk.org (Andrew Haley) Date: Thu, 30 May 2024 10:19:04 GMT Subject: RFR: 8331558: AArch64: optimize integer remainder [v3] In-Reply-To: References: <2sQ52bHtUebVvRZ6dd0zC3So9sN2mm40kXaYLm0nm_k=.5ec3561b-8dc4-4666-af9f-c32e19ff1c04@github.com> Message-ID: On Thu, 30 May 2024 09:23:13 GMT, Jin Guojie wrote: >> On some Arm processors, a separate multiply/subtract is actually faster than the combined instruction. >> >> (1) The following test has passed, which shows performance improvement. >> >> make test TEST="micro:java.lang.IntegerDivMod" >> make test TEST="micro:java.lang.LongDivMod" >> >> * IntegerDivMod.testDivideRemainderUnsigned baseline(ns/ops) 2223 with this pacth(ns/ops) 1885 improvement(%) 17.93% >> >> * IntegerDivMod.testRemainderUnsigned baseline(ns/ops) 2225 with this pacth(ns/ops) 1885 improvement(%) 18.03% >> >> * LongDivMod.testDivideRemainderUnsigned baseline(ns/ops) 2231 with this pacth(ns/ops) 1894 improvement(%) 17.79% >> >> * LongDivMod.testRemainderUnsigned baseline(ns/ops) 2232 with this pacth(ns/ops) 1891 improvement(%) 18.03% >> >> (2) jtreg test has passed >> >> make run-test? TEST=tier1 > > Jin Guojie has updated the pull request incrementally with two additional commits since the last revision: > > - Merge branch 'dev0530' of https://github.com/jinguojie-alibaba/jdk into dev0530 > - MacroAssembler::msub() takes a scratch register as an argument src/hotspot/cpu/aarch64/macroAssembler_aarch64.hpp line 446: > 444: > 445: void msub(Register Rd, Register Rn, Register Rm, Register Ra, Register tmp = rscratch2); > 446: void msubw(Register Rd, Register Rn, Register Rm, Register Ra, Register tmp = rscratch2); Please delete these two methods that use rscratch2 as a default tmp register. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19471#discussion_r1620421085 From rehn at openjdk.org Thu May 30 10:25:01 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Thu, 30 May 2024 10:25:01 GMT Subject: RFR: 8333245: RISC-V: UseRVV option can't be enabled after JDK-8316859 In-Reply-To: <14Zzi3W09YcO5NtfL7gUQwY0NDpexCOTdj4reavKKTI=.e8c2822c-6064-47c9-88d4-de50b980436a@github.com> References: <14Zzi3W09YcO5NtfL7gUQwY0NDpexCOTdj4reavKKTI=.e8c2822c-6064-47c9-88d4-de50b980436a@github.com> Message-ID: <-BGJp2rN3kaQwIlxfDAim6BEBRZeivSQV5KXKzgUNxY=.19198efa-186f-4209-8c1a-1e6dea10dc67@github.com> On Thu, 30 May 2024 09:13:30 GMT, Gui Cao wrote: > Because some dev boards only support RVV version 0.7, In [JDK-8316859](https://bugs.openjdk.org/browse/JDK-8316859) we masked the use of HWCAP to probe for RVV extensions, and in the meantime, we can use hwprobe to probe for V extensions in Linux kernel 6.5 and above. But recently we got Banana Pi BPI-F3 board (has RVV1.0), but his kernel is 6.1.15, so the V extensions detected by HWCAP are masked. And we get the warning: `RVV is not supported on this CPU` when we enable UseRVV with the command, and we can't enable UseRVV correctly. > > Without Patch: > > zifeihan at bananapif3:~/jre/jdk/bin$ ./java -XX:+PrintFlagsFinal -XX:+UseRVV -version | grep UseRVV > OpenJDK 64-Bit Server VM warning: RVV is not supported on this CPU > bool UseRVV = false {ARCH product} {command line} > bool UseRVVForBigIntegerShiftIntrinsics = false {ARCH product} {default} > openjdk version "23-internal" 2024-09-17 > OpenJDK Runtime Environment (build 23-internal-adhoc.zifeihan.jdk) > OpenJDK 64-Bit Server VM (build 23-internal-adhoc.zifeihan.jdk, mixed mode) > > > With Patch: > > zifeihan at bananapif3:~/jre/jdk/bin$ ./java -XX:+PrintFlagsFinal -XX:+UseRVV -version | grep UseRVV > bool UseRVV = true {ARCH product} {command line} > bool UseRVVForBigIntegerShiftIntrinsics = true {ARCH product} {default} > openjdk version "23-internal" 2024-09-17 > OpenJDK Runtime Environment (build 23-internal-adhoc.zifeihan.jdk) > OpenJDK 64-Bit Server VM (build 23-internal-adhoc.zifeihan.jdk, mixed mode) If you don't have RVV you are not guaranteed to have Zicsr, meaning the csrr read of vlenb may crash ? If you have csr, what will that return in this case? (no V but Zicsr) You also need kernel support for RVV: - You must turn on V from privilege mode, the kernel needs to do this. - If you are context switched in the middle of your vector code the kernel must saves all those V registers. Only kernels with hw_probe that is reporting RVV is guaranteed todo this. If this is a vanilla 6.1.15 you can't use V AFIAK. If there are out of tree patches on top of this to make V work, they need to add the hw_probe patches also. So I would suggest something like: - If UseRVV and hwcap V = true but no hwprobe. - Test if we can csrr in a safe fetch manor. - If we can, we try to read vector context status field, VS, to determine if it's on or off. (in a 'safe fetch' manor) - If that succeds, we store something in v0, change CPU using affinity mask, and verify that v0 contains that value after the change of CPU. - Now you just need to cross fingers that it is v1.0 :) (this will still fail on THEAD) Or similar as we don't want the VM to crash just because an user added +UseRVV erroneously. Note in 6.7 there is `prctl(PR_RISCV_V_SET_CONTROL, unsigned long arg)` to turn on/off V for a thread. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19472#issuecomment-2139241648 From shade at openjdk.org Thu May 30 11:45:02 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 30 May 2024 11:45:02 GMT Subject: RFR: 8333133: Simplify QuickSort::sort In-Reply-To: References: Message-ID: On Wed, 29 May 2024 18:52:03 GMT, Kim Barrett wrote: > The "idempotent" argument is removed from that function, with associated > simplifications to the implementation. Callers are updated to remove that > argument. Callers that were providing a false value are unaffected in their > behavior. The 3 callers that were providing a true value to request the > associated feature are also unaffected (other than by being made faster), > because the arrays involved don't contain any equivalent pairs. > > There are also some miscellaneous cleanups, including using the swap utility > and fixing some comments. > > Testing: mach5 tier1-3 Looks reasonable. src/hotspot/share/utilities/quickSort.hpp line 75: > 73: for ( ; true; ++left_index, --right_index) { > 74: for ( ; comparator(array[left_index], pivot_val) < 0; ++left_index) { > 75: assert(left_index < (length - 1), "reached end of partition"); Let me see if I understand this change. It makes assert stronger: we do not accept `left_index == length - 1` anymore. I guess that would mean the pivot is at the last element? Which makes the partition is empty, which cannot happen? ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19464#pullrequestreview-2088033775 PR Review Comment: https://git.openjdk.org/jdk/pull/19464#discussion_r1620546735 From mdoerr at openjdk.org Thu May 30 12:12:07 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Thu, 30 May 2024 12:12:07 GMT Subject: RFR: 8333149: ubsan : memset on nullptr target detected in jvmtiEnvBase.cpp get_object_monitor_usage In-Reply-To: References: Message-ID: On Wed, 29 May 2024 09:09:16 GMT, Matthias Baesken wrote: > When running with ubsan - enabled binaries (--enable-ubsan), > in the vmTestbase/nsk/jdi tests some cases of memset on nullptr destinations are detected in get_object_monitor_usage . > > // null out memory for robustness > memset(ret.waiters, 0, ret.waiter_count * sizeof(jthread *)); > memset(ret.notify_waiters, 0, ret.notify_waiter_count * sizeof(jthread *)); > > probably we should add checks there. > Example : > vmTestbase/nsk/jdi/ObjectReference/entryCount/entrycount002/TestDescription.jtr > > debugee.stderr> /src/hotspot/share/prims/jvmtiEnvBase.cpp:1560:11: runtime error: null pointer passed as argument 1, which is declared to never be null > debugee.stderr> #0 0x7ffb2568559c in JvmtiEnvBase::get_object_monitor_usage(JavaThread*, _jobject*, jvmtiMonitorUsage*) src/hotspot/share/prims/jvmtiEnvBase.cpp:1560 > debugee.stderr> #1 0x7ffb27987bd7 in VM_GetObjectMonitorUsage::doit() src/hotspot/share/prims/jvmtiEnvBase.hpp:594 > debugee.stderr> #2 0x7ffb28ddc2dd in VM_Operation::evaluate() src/hotspot/share/runtime/vmOperations.cpp:75 > debugee.stderr> #3 0x7ffb28deac41 in VMThread::evaluate_operation(VM_Operation*) src/hotspot/share/runtime/vmThread.cpp:283 > debugee.stderr> #4 0x7ffb28decc4f in VMThread::inner_execute(VM_Operation*) src/hotspot/share/runtime/vmThread.cpp:427 > debugee.stderr> #5 0x7ffb28ded7b9 in VMThread::loop() src/hotspot/share/runtime/vmThread.cpp:493 > debugee.stderr> #6 0x7ffb28ded8a7 in VMThread::run() src/hotspot/share/runtime/vmThread.cpp:177 > debugee.stderr> #7 0x7ffb28b7e31a in Thread::call_run() src/hotspot/share/runtime/thread.cpp:225 > debugee.stderr> #8 0x7ffb281c4971 in thread_native_entry src/hotspot/os/linux/os_linux.cpp:846 > debugee.stderr> #9 0x7ffb2df416e9 in start_thread (/lib64/libpthread.so.0+0xa6e9) (BuildId: 2f8d3c2d0f4d7888c2598d2ff6356537f5708a73) > debugee.stderr> #10 0x7ffb2d51550e in clone (/lib64/libc.so.6+0x11850e) (BuildId: f732026552f6adff988b338e92d466bc81a01c37) > > vmTestbase/nsk/jdi/ObjectReference/owningThread/owningthread002/TestDescription.jtr > > debugee.stderr> /src/hotspot/share/prims/jvmtiEnvBase.cpp:1561:11: runtime error: null pointer passed as argument 1, which is declared to never be null > debugee.stderr> #0 0x7f1e070855bb in JvmtiEnvBase::get_object_monitor_usage(JavaThread*, _jobject*, jvmtiMonitorUsage*) src/hotspot/share/prims/jvmtiEnvBase.cpp:1561 > debugee.stderr> #1 0x7f1e09387bd7 in VM_GetObjectMonitorUsage::doit() src/hotspot/share/prims/jvmtiEnvBase.hpp:594 > debugee.stderr> #2 0x7f1e0a7dc2dd in VM_Operation::evaluate() src/hotsp... Please note that `allocate` sets `*mem_ptr` to `nullptr` if the size is 0. This is not an error. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19450#issuecomment-2139416153 From thartmann at openjdk.org Thu May 30 12:43:23 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 30 May 2024 12:43:23 GMT Subject: RFR: 8333264: Remove unused resolve_sub_helper declaration after JDK-8322630 Message-ID: The `resolve_sub_helper` declaration is unused. Noticed this when working on the Valhalla merge. Thanks, Tobias ------------- Commit messages: - 8333264: Remove unused resolve_sub_helper declaration after JDK-8322630 Changes: https://git.openjdk.org/jdk/pull/19476/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19476&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8333264 Stats: 2 lines in 1 file changed: 0 ins; 2 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/19476.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19476/head:pull/19476 PR: https://git.openjdk.org/jdk/pull/19476 From rcastanedalo at openjdk.org Thu May 30 12:49:01 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 30 May 2024 12:49:01 GMT Subject: RFR: 8333264: Remove unused resolve_sub_helper declaration after JDK-8322630 In-Reply-To: References: Message-ID: On Thu, 30 May 2024 12:39:17 GMT, Tobias Hartmann wrote: > The `resolve_sub_helper` declaration is unused. Noticed this when working on the Valhalla merge. > > Thanks, > Tobias Looks good and trivial. ------------- Marked as reviewed by rcastanedalo (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19476#pullrequestreview-2088185262 From thartmann at openjdk.org Thu May 30 12:52:04 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 30 May 2024 12:52:04 GMT Subject: RFR: 8333264: Remove unused resolve_sub_helper declaration after JDK-8322630 In-Reply-To: References: Message-ID: On Thu, 30 May 2024 12:39:17 GMT, Tobias Hartmann wrote: > The `resolve_sub_helper` declaration is unused. Noticed this when working on the Valhalla merge. > > Thanks, > Tobias Thanks Roberto! ------------- PR Comment: https://git.openjdk.org/jdk/pull/19476#issuecomment-2139486849 From zgu at openjdk.org Thu May 30 13:08:04 2024 From: zgu at openjdk.org (Zhengyu Gu) Date: Thu, 30 May 2024 13:08:04 GMT Subject: RFR: 8333129: Move ShrinkHeapInSteps flag to Serial GC In-Reply-To: References: Message-ID: On Wed, 29 May 2024 12:36:40 GMT, Zhengyu Gu wrote: > A trivial change that moves Serial GC specific flag `ShrinkHeapInSteps` to `serial_globals.hpp` I can confirm that `ShrinkHeapInSteps` flag is only used in `CardGeneration` and `CardGeneration` is only used in Serial and CMS in JDK9u source. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19452#issuecomment-2139515483 From jsjolen at openjdk.org Thu May 30 13:19:41 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Thu, 30 May 2024 13:19:41 GMT Subject: RFR: 8312132: Add tracking of multiple address spaces in NMT [v121] In-Reply-To: References: Message-ID: > Hi, > > This PR introduces a new abstraction to NMT, named `MemoryFileTracker`. Today, NMT does not track any memory outside of the virtual memory address space. This means that if you allocated memory in something such as a memory-backed file and use `mmap` to map into that memory, then you'll have trouble reporting this to NMT. This is the situation that ZGC is in, and that is what this patch attempts to fix. > > ## `MemoryFileTracker` > > The `MemoryFileTracker` adds the ability of adding new virtual memory address spaces to NMT and committing memory to these, the basic API is: > > ```c++ > static MemoryFile* make_device(const char* descriptive_name); > static void free_device(MemoryFile* device); > > static void allocate_memory(MemoryFile* device, size_t offset, size_t size, > MEMFLAGS flag, const NativeCallStack& stack); > static void free_memory(MemoryFile* device, size_t offset, size_t size); > > > It is easiest to see how this is used by looking at what ZGC's `ZNMT` class does: > > ```c++ > void ZNMT::reserve(zaddress_unsafe start, size_t size) { > MemTracker::record_virtual_memory_reserve((address)start, size, CALLER_PC, mtJavaHeap); > } > void ZNMT::commit(zoffset offset, size_t size) { > MemTracker::allocate_memory_in(ZNMT::_device, static_cast(offset), size, mtJavaHeap, CALLER_PC); > } > void ZNMT::uncommit(zoffset offset, size_t size) { > MemTracker::free_memory_in(ZNMT::_device, (size_t)offset, size); > } > > void ZNMT::map(zaddress_unsafe addr, size_t size, zoffset offset) { > // NMT doesn't track mappings at the moment. > } > void ZNMT::unmap(zaddress_unsafe addr, size_t size) { > // NMT doesn't track mappings at the moment. > } > > > As you can see, any mapping between reserved regions and device-allocated memory is not recorded in NMT. This means that in detailed mode you only get reserved regions printed for the reserved memory, the device-allocated memory is reported separately. When performing summary reporting any memory allocated via these devices is added to the corresponding `MEMFLAGS` as `committed` memory. > > This patch is also acting as a base on which we deploy multiple new backend ideas to NMT. These ideas are: > > 1. Implement VMA tracking using a balanced binary tree approach. Today's `VirtualMemoryTracker`'s usage of linked lists is slow and brittle, we'd like to move away from it. Our Treap-based approach in this patch gives a performance boost such that we see 25x better performance in a benchmark. The idea and draft of this... Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: Check for nullptr afterwards ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18289/files - new: https://git.openjdk.org/jdk/pull/18289/files/cebd8759..fd165407 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18289&range=120 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18289&range=119-120 Stats: 8 lines in 1 file changed: 3 ins; 3 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/18289.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18289/head:pull/18289 PR: https://git.openjdk.org/jdk/pull/18289 From sgibbons at openjdk.org Thu May 30 13:19:59 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Thu, 30 May 2024 13:19:59 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v48] In-Reply-To: References: Message-ID: On Thu, 30 May 2024 06:23:05 GMT, Emanuel Peter wrote: >> Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: >> >> Remove duplicate vm.compiler2.enabled > > test/jdk/java/lang/String/IndexOf.java line 35: > >> 33: * @requires vm.cpu.features ~= ".*avx2.*" >> 34: * @requires vm.compiler2.enabled >> 35: * @run main/othervm -XX:+IgnoreUnrecognizedVMOptions -Xcomp -XX:-TieredCompilation -XX:UseAVX=2 -XX:+UnlockDiagnosticVMOptions -XX:+EnableX86ECoreOpts IndexOf > > Same here: why is the test AVX2 specific? Could other platforms not also be "tickled" in interesting ways with this test? There are two test blocks, so all platforms will be able to take advantage of the test via the first block. I'm told that's how this works. > test/jdk/java/lang/StringBuffer/IndexOf.java line 188: > >> 186: } >> 187: >> 188: } > > It looks like you just indented basically the whole file by 1 space. Why? I hadn't noticed this. It's most likely an artifact of my editor as it wasn't intentional. I'll change this back. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1620669257 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1620679629 From sgibbons at openjdk.org Thu May 30 13:20:01 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Thu, 30 May 2024 13:20:01 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v48] In-Reply-To: <_0H1QRaXnFyO9eGa7IvO1l4ZzNK_27D59ebYAphp8eg=.0fe38944-0b61-4a1a-b63d-04315b02117f@github.com> References: <_0H1QRaXnFyO9eGa7IvO1l4ZzNK_27D59ebYAphp8eg=.0fe38944-0b61-4a1a-b63d-04315b02117f@github.com> Message-ID: On Thu, 30 May 2024 06:22:17 GMT, Emanuel Peter wrote: >> test/jdk/java/lang/StringBuffer/ECoreIndexOf.java line 29: >> >>> 27: * @requires vm.cpu.features ~= ".*avx2.*" >>> 28: * @requires vm.compiler2.enabled >>> 29: * @run main/othervm -XX:+UnlockDiagnosticVMOptions -XX:+EnableX86ECoreOpts -XX:UseAVX=2 -Xbatch -XX:-TieredCompilation -XX:CompileCommand=dontinline,ECoreIndexOf.indexOfKernel ECoreIndexOf >> >> Does this test really need to be `avx2` specific? Does it even need to be C2 specific? >> Or can this run on all platforms? > > Would be a shame to spend so much time on writing a test and then not apply it everywhere ;) I'll add a separate @test block to this file. It was, however, written specifically tuned for the new algorithm to exercise known edge cases. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1620676513 From jsjolen at openjdk.org Thu May 30 13:19:41 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Thu, 30 May 2024 13:19:41 GMT Subject: RFR: 8312132: Add tracking of multiple address spaces in NMT [v120] In-Reply-To: References: Message-ID: <1GrkKo76eQlJz-RRYFMysxygcQIneEHzpygUqbJ_odU=.f611e2ed-963a-4c34-920c-c7ffc4c2e282@github.com> On Thu, 30 May 2024 09:49:44 GMT, Johan Sj?len wrote: >> Hi, >> >> This PR introduces a new abstraction to NMT, named `MemoryFileTracker`. Today, NMT does not track any memory outside of the virtual memory address space. This means that if you allocated memory in something such as a memory-backed file and use `mmap` to map into that memory, then you'll have trouble reporting this to NMT. This is the situation that ZGC is in, and that is what this patch attempts to fix. >> >> ## `MemoryFileTracker` >> >> The `MemoryFileTracker` adds the ability of adding new virtual memory address spaces to NMT and committing memory to these, the basic API is: >> >> ```c++ >> static MemoryFile* make_device(const char* descriptive_name); >> static void free_device(MemoryFile* device); >> >> static void allocate_memory(MemoryFile* device, size_t offset, size_t size, >> MEMFLAGS flag, const NativeCallStack& stack); >> static void free_memory(MemoryFile* device, size_t offset, size_t size); >> >> >> It is easiest to see how this is used by looking at what ZGC's `ZNMT` class does: >> >> ```c++ >> void ZNMT::reserve(zaddress_unsafe start, size_t size) { >> MemTracker::record_virtual_memory_reserve((address)start, size, CALLER_PC, mtJavaHeap); >> } >> void ZNMT::commit(zoffset offset, size_t size) { >> MemTracker::allocate_memory_in(ZNMT::_device, static_cast(offset), size, mtJavaHeap, CALLER_PC); >> } >> void ZNMT::uncommit(zoffset offset, size_t size) { >> MemTracker::free_memory_in(ZNMT::_device, (size_t)offset, size); >> } >> >> void ZNMT::map(zaddress_unsafe addr, size_t size, zoffset offset) { >> // NMT doesn't track mappings at the moment. >> } >> void ZNMT::unmap(zaddress_unsafe addr, size_t size) { >> // NMT doesn't track mappings at the moment. >> } >> >> >> As you can see, any mapping between reserved regions and device-allocated memory is not recorded in NMT. This means that in detailed mode you only get reserved regions printed for the reserved memory, the device-allocated memory is reported separately. When performing summary reporting any memory allocated via these devices is added to the corresponding `MEMFLAGS` as `committed` memory. >> >> This patch is also acting as a base on which we deploy multiple new backend ideas to NMT. These ideas are: >> >> 1. Implement VMA tracking using a balanced binary tree approach. Today's `VirtualMemoryTracker`'s usage of linked lists is slow and brittle, we'd like to move away from it. Our Treap-based approach in this patch gives a performance bo... > > Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: > > Device shouldn't be nullptr Tier 1-3 entirely green except for an issue on debug builds where we assert that file != nullptr before we check whether NMT is enabled. If NMT is disabled, then `register_file` will return nullptr, which will cause those asserts to be hit. I've moved the asserts. I've changed the summary semantics so that all commits in `MemoryFileTracker` is both reserved and committed. This means that we both reserve address ranges in the 'ordinary' virtual memory space, and in the MemoryFile. Both of these are accounted for under the same memory flag in summary mode. This is a form of close-to-but-not-really double-accounting. My question is: Is this acceptable? I'm asking the ZGC people about this now. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18289#issuecomment-2139537892 From sgibbons at openjdk.org Thu May 30 13:19:57 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Thu, 30 May 2024 13:19:57 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v49] In-Reply-To: References: Message-ID: <9PIuILHZnLHrZf1sz0Dsq6iup6qgyXw50mD0nGVS04c=.63bd0afd-d818-46fa-a082-a3d2066829cd@github.com> > Re-write the IndexOf code without the use of the pcmpestri instruction, only using AVX2 instructions. This change accelerates String.IndexOf on average 1.3x for AVX2. The benchmark numbers: > > > Benchmark Score Latest > StringIndexOf.advancedWithMediumSub 343.573 317.934 0.925375393x > StringIndexOf.advancedWithShortSub1 1039.081 1053.96 1.014319384x > StringIndexOf.advancedWithShortSub2 55.828 110.541 1.980027943x > StringIndexOf.constantPattern 9.361 11.906 1.271872663x > StringIndexOf.searchCharLongSuccess 4.216 4.218 1.000474383x > StringIndexOf.searchCharMediumSuccess 3.133 3.216 1.02649218x > StringIndexOf.searchCharShortSuccess 3.76 3.761 1.000265957x > StringIndexOf.success 9.186 9.713 1.057369911x > StringIndexOf.successBig 14.341 46.343 3.231504079x > StringIndexOfChar.latin1_AVX2_String 6220.918 12154.52 1.953814533x > StringIndexOfChar.latin1_AVX2_char 5503.556 5540.044 1.006629895x > StringIndexOfChar.latin1_SSE4_String 6978.854 6818.689 0.977049957x > StringIndexOfChar.latin1_SSE4_char 5657.499 5474.624 0.967675646x > StringIndexOfChar.latin1_Short_String 7132.541 6863.359 0.962260014x > StringIndexOfChar.latin1_Short_char 16013.389 16162.437 1.009307711x > StringIndexOfChar.latin1_mixed_String 7386.123 14771.622 1.999915517x > StringIndexOfChar.latin1_mixed_char 9901.671 9782.245 0.987938803 Scott Gibbons has updated the pull request incrementally with two additional commits since the last revision: - Stupid EOL at end - Add @test block; fix test indentation ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16753/files - new: https://git.openjdk.org/jdk/pull/16753/files/ed06edd6..3e150fe3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16753&range=48 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16753&range=47-48 Stats: 166 lines in 2 files changed: 7 ins; 0 del; 159 mod Patch: https://git.openjdk.org/jdk/pull/16753.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16753/head:pull/16753 PR: https://git.openjdk.org/jdk/pull/16753 From fyang at openjdk.org Thu May 30 13:31:03 2024 From: fyang at openjdk.org (Fei Yang) Date: Thu, 30 May 2024 13:31:03 GMT Subject: RFR: 8333245: RISC-V: UseRVV option can't be enabled after JDK-8316859 In-Reply-To: <-BGJp2rN3kaQwIlxfDAim6BEBRZeivSQV5KXKzgUNxY=.19198efa-186f-4209-8c1a-1e6dea10dc67@github.com> References: <14Zzi3W09YcO5NtfL7gUQwY0NDpexCOTdj4reavKKTI=.e8c2822c-6064-47c9-88d4-de50b980436a@github.com> <-BGJp2rN3kaQwIlxfDAim6BEBRZeivSQV5KXKzgUNxY=.19198efa-186f-4209-8c1a-1e6dea10dc67@github.com> Message-ID: On Thu, 30 May 2024 10:21:28 GMT, Robbin Ehn wrote: > If you don't have RVV you are not guaranteed to have Zicsr, meaning the csrr read of vlenb may crash ? If you have csr, what will that return in this case? (no V but Zicsr) > > You also need kernel support for RVV: > > * You must turn on V from privilege mode, the kernel needs to do this. > * If you are context switched in the middle of your vector code the kernel must saves all those V registers. > > Only kernels with hw_probe that is reporting RVV is guaranteed todo this. > > If this is a vanilla 6.1.15 you can't use V AFIAK. If there are out of tree patches on top of this to make V work, they need to add the hw_probe patches also. That makes sense to me. I think you mean kernel versions >= 6.5 where process context switch and hwprobe support for RVV are both added at the same time [1]. [1] https://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux.git/commit/?h=for-next&id=d5e45e810e0e08114035d31d88049544c038e6fc > So I would suggest something like: > > * If UseRVV and hwcap V = true but no hwprobe. > * Test if we can csrr in a safe fetch manor. > * If we can, we try to read vector context status field, VS, to determine if it's on or off. (in a 'safe fetch' manor) > * If that succeds, we store something in v0, change CPU using affinity mask, and verify that v0 contains that value after the change of CPU. > * Now you just need to cross fingers that it is v1.0 :) (this will still fail on THEAD) > > Or similar as we don't want the VM to crash just because an user added +UseRVV erroneously. > > Note in 6.7 there is `prctl(PR_RISCV_V_SET_CONTROL, unsigned long arg)` to turn on/off V for a thread. That sounds very tricky given that we have both rvv-0.7.1 and rvv-1.0 hardwares for now. I think it will be safer and simpler for us to rely on the availability of hwprobe syscall. But I guess will won't be a big issue when we do some simple performance evaluations like a simple JMH run, but yes, you have to change the code to force enable UseRVV when running on the older kernels. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19472#issuecomment-2139557926 From sgibbons at openjdk.org Thu May 30 13:36:18 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Thu, 30 May 2024 13:36:18 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v48] In-Reply-To: References: Message-ID: On Thu, 30 May 2024 06:25:32 GMT, Tobias Hartmann wrote: >> Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: >> >> Remove duplicate vm.compiler2.enabled > > Control question: Are we confident with this potentially going into JDK 23 or should we rather postpone to JDK 24? The fork is next week. Thank you all for the comments. @TobiHartmann I'm comfortable with this going into JDK 23. The code has been functionally stable for me for the past 2 months. The recent churn centers primarily around restructuring the code for readability and maintainability and ensuring protection against reading past the end of strings. Both Vlad (Volodymyr) and @sviswa7 have scoured the code with me and together we have convinced ourselves that we've covered all the bases. Of course we may have missed something but my confidence is high. The overall performance gain as reported by the StringIndexOf JMH averages ~7x running on an e-core as compared with baseline on the same core. It's skewed somewhat towards massive gains for long (~2K) strings (avg 14.4x) and modest gains for small-ish strings (avg ~1.8x). I've measured up to 60x performance improvement for some 2K UTF-16 indexOf operations. Again, thank you all. It's been a fun exercise and I've learned a lot. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16753#issuecomment-2139569361 From thartmann at openjdk.org Thu May 30 13:41:10 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 30 May 2024 13:41:10 GMT Subject: Integrated: 8333264: Remove unused resolve_sub_helper declaration after JDK-8322630 In-Reply-To: References: Message-ID: On Thu, 30 May 2024 12:39:17 GMT, Tobias Hartmann wrote: > The `resolve_sub_helper` declaration is unused. Noticed this when working on the Valhalla merge. > > Thanks, > Tobias This pull request has now been integrated. Changeset: 921860d4 Author: Tobias Hartmann URL: https://git.openjdk.org/jdk/commit/921860d41da2fac180d44a5cdf891b4f660945bc Stats: 2 lines in 1 file changed: 0 ins; 2 del; 0 mod 8333264: Remove unused resolve_sub_helper declaration after JDK-8322630 Reviewed-by: rcastanedalo ------------- PR: https://git.openjdk.org/jdk/pull/19476 From tanksherman27 at gmail.com Thu May 30 13:47:09 2024 From: tanksherman27 at gmail.com (Julian Waters) Date: Thu, 30 May 2024 21:47:09 +0800 Subject: Structure of the HotSpot Interpreter Message-ID: Hi all, I've recently been trying to learn more about HotSpot and studying its internals, but the structure of the Interpreter seems to elude me still. I'm aware that HotSpot doesn't use a traditional switch case (Well, at least not usually, looking at you Zero Port), but how it functions is more or less still a black box to me. What kind of dispatch mechanism does it use, for instance? Is it Direct Threaded, Indirect Threaded, Token Threaded, or something else entirely? Is there somewhere I can learn about how everything connects together? I've tried reading the HotSpot documentation online but there doesn't seem to be an in-depth explanation in them for how it all fits together, I'd greatly appreciate if someone points me in the right direction best regards, Julian From rehn at openjdk.org Thu May 30 13:53:05 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Thu, 30 May 2024 13:53:05 GMT Subject: RFR: 8333245: RISC-V: UseRVV option can't be enabled after JDK-8316859 In-Reply-To: <14Zzi3W09YcO5NtfL7gUQwY0NDpexCOTdj4reavKKTI=.e8c2822c-6064-47c9-88d4-de50b980436a@github.com> References: <14Zzi3W09YcO5NtfL7gUQwY0NDpexCOTdj4reavKKTI=.e8c2822c-6064-47c9-88d4-de50b980436a@github.com> Message-ID: On Thu, 30 May 2024 09:13:30 GMT, Gui Cao wrote: > Because some dev boards only support RVV version 0.7, In [JDK-8316859](https://bugs.openjdk.org/browse/JDK-8316859) we masked the use of HWCAP to probe for RVV extensions, and in the meantime, we can use hwprobe to probe for V extensions in Linux kernel 6.5 and above. But recently we got Banana Pi BPI-F3 board (has RVV1.0), but his kernel is 6.1.15, so the V extensions detected by HWCAP are masked. And we get the warning: `RVV is not supported on this CPU` when we enable UseRVV with the command, and we can't enable UseRVV correctly. > > Without Patch: > > zifeihan at bananapif3:~/jre/jdk/bin$ ./java -XX:+PrintFlagsFinal -XX:+UseRVV -version | grep UseRVV > OpenJDK 64-Bit Server VM warning: RVV is not supported on this CPU > bool UseRVV = false {ARCH product} {command line} > bool UseRVVForBigIntegerShiftIntrinsics = false {ARCH product} {default} > openjdk version "23-internal" 2024-09-17 > OpenJDK Runtime Environment (build 23-internal-adhoc.zifeihan.jdk) > OpenJDK 64-Bit Server VM (build 23-internal-adhoc.zifeihan.jdk, mixed mode) > > > With Patch: > > zifeihan at bananapif3:~/jre/jdk/bin$ ./java -XX:+PrintFlagsFinal -XX:+UseRVV -version | grep UseRVV > bool UseRVV = true {ARCH product} {command line} > bool UseRVVForBigIntegerShiftIntrinsics = true {ARCH product} {default} > openjdk version "23-internal" 2024-09-17 > OpenJDK Runtime Environment (build 23-internal-adhoc.zifeihan.jdk) > OpenJDK 64-Bit Server VM (build 23-internal-adhoc.zifeihan.jdk, mixed mode) Another suggestion, it seem like you can ge the triplet mvendorid/marchid/mimpid from /proc/cpuinfo. So if we can grab those from VM_Version::os_uarch_additional_features() when available and no hwprobe. We can set those 3, and in VM_Version::vendor_features() check if this is BananPie. With big warning that kernel do not support vector let user run with vector ? So that way THEAD is unaffected. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19472#issuecomment-2139602741 From epeter at openjdk.org Thu May 30 13:59:15 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 30 May 2024 13:59:15 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v48] In-Reply-To: References: Message-ID: On Thu, 30 May 2024 13:33:40 GMT, Scott Gibbons wrote: >> Control question: Are we confident with this potentially going into JDK 23 or should we rather postpone to JDK 24? The fork is next week. > > Thank you all for the comments. @TobiHartmann I'm comfortable with this going into JDK 23. The code has been functionally stable for me for the past 2 months. The recent churn centers primarily around restructuring the code for readability and maintainability and ensuring protection against reading past the end of strings. Both Vlad (Volodymyr) and @sviswa7 have scoured the code with me and together we have convinced ourselves that we've covered all the bases. Of course we may have missed something but my confidence is high. > > The overall performance gain as reported by the StringIndexOf JMH averages ~7x running on an e-core as compared with baseline on the same core. It's skewed somewhat towards massive gains for long (~2K) strings (avg 14.4x) and modest gains for small-ish strings (avg ~1.8x). I've measured up to 60x performance improvement for some 2K UTF-16 indexOf operations. > > Again, thank you all. It's been a fun exercise and I've learned a lot. @asgibbons generally it would be nice if you waited for me to accept your changes before integrating. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16753#issuecomment-2139604424 From epeter at openjdk.org Thu May 30 13:59:18 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 30 May 2024 13:59:18 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v49] In-Reply-To: <9PIuILHZnLHrZf1sz0Dsq6iup6qgyXw50mD0nGVS04c=.63bd0afd-d818-46fa-a082-a3d2066829cd@github.com> References: <9PIuILHZnLHrZf1sz0Dsq6iup6qgyXw50mD0nGVS04c=.63bd0afd-d818-46fa-a082-a3d2066829cd@github.com> Message-ID: On Thu, 30 May 2024 13:19:57 GMT, Scott Gibbons wrote: >> Re-write the IndexOf code without the use of the pcmpestri instruction, only using AVX2 instructions. This change accelerates String.IndexOf on average 1.3x for AVX2. The benchmark numbers: >> >> >> Benchmark Score Latest >> StringIndexOf.advancedWithMediumSub 343.573 317.934 0.925375393x >> StringIndexOf.advancedWithShortSub1 1039.081 1053.96 1.014319384x >> StringIndexOf.advancedWithShortSub2 55.828 110.541 1.980027943x >> StringIndexOf.constantPattern 9.361 11.906 1.271872663x >> StringIndexOf.searchCharLongSuccess 4.216 4.218 1.000474383x >> StringIndexOf.searchCharMediumSuccess 3.133 3.216 1.02649218x >> StringIndexOf.searchCharShortSuccess 3.76 3.761 1.000265957x >> StringIndexOf.success 9.186 9.713 1.057369911x >> StringIndexOf.successBig 14.341 46.343 3.231504079x >> StringIndexOfChar.latin1_AVX2_String 6220.918 12154.52 1.953814533x >> StringIndexOfChar.latin1_AVX2_char 5503.556 5540.044 1.006629895x >> StringIndexOfChar.latin1_SSE4_String 6978.854 6818.689 0.977049957x >> StringIndexOfChar.latin1_SSE4_char 5657.499 5474.624 0.967675646x >> StringIndexOfChar.latin1_Short_String 7132.541 6863.359 0.962260014x >> StringIndexOfChar.latin1_Short_char 16013.389 16162.437 1.009307711x >> StringIndexOfChar.latin1_mixed_String 7386.123 14771.622 1.999915517x >> StringIndexOfChar.latin1_mixed_char 9901.671 9782.245 0.987938803 > > Scott Gibbons has updated the pull request incrementally with two additional commits since the last revision: > > - Stupid EOL at end > - Add @test block; fix test indentation test/jdk/java/lang/String/IndexOf.java line 25: > 23: > 24: /* > 25: * @test You should add the `@bug 8320448` for all runs. test/jdk/java/lang/String/IndexOf.java line 27: > 25: * @test > 26: * @summary test String indexOf() intrinsic > 27: * @run main/othervm IndexOf Suggestion: * @run main IndexOf You do not need a new VM if you have no arguments ;) test/jdk/java/lang/StringBuffer/ECoreIndexOf.java line 25: > 23: > 24: /* @test > 25: * @bug 4162796 4162796 You need to fix the bug numbers. test/jdk/java/lang/StringBuffer/ECoreIndexOf.java line 27: > 25: * @bug 4162796 4162796 > 26: * @summary Test indexOf and lastIndexOf > 27: * @run main/othervm -Xbatch -XX:-TieredCompilation -XX:CompileCommand=dontinline,ECoreIndexOf.indexOfKernel ECoreIndexOf I would also add a line without `-XX:-TieredCompilation`, then C1 can be tested with this too test/jdk/java/lang/StringBuffer/ECoreIndexOf.java line 32: > 30: > 31: /* @test > 32: * @bug 4162796 4162796 Here too ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1620760730 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1620756896 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1620753321 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1620754948 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1620753577 From epeter at openjdk.org Thu May 30 13:59:15 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 30 May 2024 13:59:15 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v48] In-Reply-To: References: Message-ID: <_4hKqcW7tE4shxVqG8Et3BjeehNjl0NWvS7PCKZaLe0=.73dc8315-22ee-47c0-8f5b-be74edc2f7a3@github.com> On Thu, 30 May 2024 06:25:32 GMT, Tobias Hartmann wrote: > Control question: Are we confident with this potentially going into JDK 23 or should we rather postpone to JDK 24? The fork is next week. I would hold off. @asgibbons it may pass our tests, and your extensive testing. But you never know what the fuzzer can find over a few weeks once it runs with your changes. I have made that experience many times. Let's just give it a few days, and then we have one JDK version less to worry about for backports on possible follow-up bugs ;) ------------- PR Comment: https://git.openjdk.org/jdk/pull/16753#issuecomment-2139615822 From epeter at openjdk.org Thu May 30 13:59:19 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 30 May 2024 13:59:19 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v48] In-Reply-To: References: <_0H1QRaXnFyO9eGa7IvO1l4ZzNK_27D59ebYAphp8eg=.0fe38944-0b61-4a1a-b63d-04315b02117f@github.com> Message-ID: On Thu, 30 May 2024 13:03:06 GMT, Scott Gibbons wrote: >> Would be a shame to spend so much time on writing a test and then not apply it everywhere ;) > > I'll add a separate @test block to this file. It was, however, written specifically tuned for the new algorithm to exercise known edge cases. A new `@test` sounds like a good idea ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1620747402 From epeter at openjdk.org Thu May 30 13:59:19 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 30 May 2024 13:59:19 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v48] In-Reply-To: References: Message-ID: On Thu, 30 May 2024 12:58:27 GMT, Scott Gibbons wrote: >> test/jdk/java/lang/String/IndexOf.java line 35: >> >>> 33: * @requires vm.cpu.features ~= ".*avx2.*" >>> 34: * @requires vm.compiler2.enabled >>> 35: * @run main/othervm -XX:+IgnoreUnrecognizedVMOptions -Xcomp -XX:-TieredCompilation -XX:UseAVX=2 -XX:+UnlockDiagnosticVMOptions -XX:+EnableX86ECoreOpts IndexOf >> >> Same here: why is the test AVX2 specific? Could other platforms not also be "tickled" in interesting ways with this test? > > There are two test blocks, so all platforms will be able to take advantage of the test via the first block. I'm told that's how this works. Yes, that is right. Good. >> test/jdk/java/lang/StringBuffer/IndexOf.java line 188: >> >>> 186: } >>> 187: >>> 188: } >> >> It looks like you just indented basically the whole file by 1 space. Why? > > I hadn't noticed this. It's most likely an artifact of my editor as it wasn't intentional. I'll change this back. Ok, maybe check your code on GitHub next time ;) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1620768228 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1620746147 From jsjolen at openjdk.org Thu May 30 14:06:15 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Thu, 30 May 2024 14:06:15 GMT Subject: RFR: 8312132: Add tracking of multiple address spaces in NMT [v121] In-Reply-To: References: Message-ID: On Thu, 30 May 2024 13:19:41 GMT, Johan Sj?len wrote: >> Hi, >> >> This PR introduces a new abstraction to NMT, named `MemoryFileTracker`. Today, NMT does not track any memory outside of the virtual memory address space. This means that if you allocated memory in something such as a memory-backed file and use `mmap` to map into that memory, then you'll have trouble reporting this to NMT. This is the situation that ZGC is in, and that is what this patch attempts to fix. >> >> ## `MemoryFileTracker` >> >> The `MemoryFileTracker` adds the ability of adding new virtual memory address spaces to NMT and committing memory to these, the basic API is: >> >> ```c++ >> static MemoryFile* make_device(const char* descriptive_name); >> static void free_device(MemoryFile* device); >> >> static void allocate_memory(MemoryFile* device, size_t offset, size_t size, >> MEMFLAGS flag, const NativeCallStack& stack); >> static void free_memory(MemoryFile* device, size_t offset, size_t size); >> >> >> It is easiest to see how this is used by looking at what ZGC's `ZNMT` class does: >> >> ```c++ >> void ZNMT::reserve(zaddress_unsafe start, size_t size) { >> MemTracker::record_virtual_memory_reserve((address)start, size, CALLER_PC, mtJavaHeap); >> } >> void ZNMT::commit(zoffset offset, size_t size) { >> MemTracker::allocate_memory_in(ZNMT::_device, static_cast(offset), size, mtJavaHeap, CALLER_PC); >> } >> void ZNMT::uncommit(zoffset offset, size_t size) { >> MemTracker::free_memory_in(ZNMT::_device, (size_t)offset, size); >> } >> >> void ZNMT::map(zaddress_unsafe addr, size_t size, zoffset offset) { >> // NMT doesn't track mappings at the moment. >> } >> void ZNMT::unmap(zaddress_unsafe addr, size_t size) { >> // NMT doesn't track mappings at the moment. >> } >> >> >> As you can see, any mapping between reserved regions and device-allocated memory is not recorded in NMT. This means that in detailed mode you only get reserved regions printed for the reserved memory, the device-allocated memory is reported separately. When performing summary reporting any memory allocated via these devices is added to the corresponding `MEMFLAGS` as `committed` memory. >> >> This patch is also acting as a base on which we deploy multiple new backend ideas to NMT. These ideas are: >> >> 1. Implement VMA tracking using a balanced binary tree approach. Today's `VirtualMemoryTracker`'s usage of linked lists is slow and brittle, we'd like to move away from it. Our Treap-based approach in this patch gives a performance bo... > > Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: > > Check for nullptr afterwards An example which I hope clears things up: The results of this depends on the calling sequence of ZNMT that ZGC does.Let's say we have something like: ZNMT::reserve(0xdeadbeef, 500); ZNMT::commit(123, 500); That is, each reserve has an equivalent commit. Then, when doing a summary from NMT, you'll get: mtJavaHeap: 1000 bytes reserved, 500 bytes committed since the commit call also reserves.In the detailed case you will see something like: Virtual memory map: [ 0xdeadbeef - 0xdeadbeef + 500 ] 500 bytes reserved for mtJavaHeap Memory file details: ALlocations of ZGC heap file: [ 123 - 123 + 500 ] 500 bytes reserved and committed for mtJavaHeap ------------- PR Comment: https://git.openjdk.org/jdk/pull/18289#issuecomment-2139630041 From aph-open at littlepinkcloud.com Thu May 30 14:18:18 2024 From: aph-open at littlepinkcloud.com (Andrew Haley) Date: Thu, 30 May 2024 15:18:18 +0100 Subject: Structure of the HotSpot Interpreter In-Reply-To: References: Message-ID: <1a2f7615-895a-4964-9d26-7f92cbdb17ea@littlepinkcloud.com> On 5/30/24 14:47, Julian Waters wrote: > I've recently been trying to learn more about HotSpot and studying its > internals, but the structure of the Interpreter seems to elude me > still. I'm aware that HotSpot doesn't use a traditional switch case > (Well, at least not usually, looking at you Zero Port), but how it > functions is more or less still a black box to me. What kind of > dispatch mechanism does it use, for instance? Is it Direct Threaded, > Indirect Threaded, Token Threaded, or something else entirely? Is > there somewhere I can learn about how everything connects together? > I've tried reading the HotSpot documentation online but there doesn't > seem to be an in-depth explanation in them for how it all fits > together, I'd greatly appreciate if someone points me in the right > direction It's a bytecode interpreter, so token threaded. Print it out with -XX:+PrintInterpreter. Here's lmul (for Arm). lmul 105 lmul [0x0000ffff785428c0, 0x0000ffff785428e0] 32 bytes -------------------------------------------------------------------------------- 0x0000ffff785428c0: ldr x0, [x20], #0x10 ;;@FILE: /home/aph/theRealAph-jdk/src/hotspot/share/interpreter/templateInterpreterGenerator.cpp ;; 359: case ltos: vep = __ pc(); __ pop(ltos); lep = __ pc(); generate_and_dispatch(t); break; 0x0000ffff785428c4: ldr x1, [x20], #0x10 ;; 359: case ltos: vep = __ pc(); __ pop(ltos); lep = __ pc(); generate_and_dispatch(t); break; ;; 378: __ verify_FPU(1, t->tos_in()); ;; 391: __ dispatch_prolog(tos_out, step); 0x0000ffff785428c8: mul x0, x0, x1 Fetch the next bytecode. x22 is the bytecode pointer: 0x0000ffff785428cc: ldrb w8, [x22, #1]! ;; 403: __ dispatch_epilog(tos_out, step); Offset to the dispatch table: 0x0000ffff785428d0: add w9, w8, #0x500 Load the address of the next action, and jump to it: 0x0000ffff785428d4: ldr x9, [x21, w9, uxtw #3] 0x0000ffff785428d8: br x9 The interesting source is in the various templateInterpreter and templateTable files. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From sgibbons at openjdk.org Thu May 30 15:00:19 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Thu, 30 May 2024 15:00:19 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v48] In-Reply-To: <_4hKqcW7tE4shxVqG8Et3BjeehNjl0NWvS7PCKZaLe0=.73dc8315-22ee-47c0-8f5b-be74edc2f7a3@github.com> References: <_4hKqcW7tE4shxVqG8Et3BjeehNjl0NWvS7PCKZaLe0=.73dc8315-22ee-47c0-8f5b-be74edc2f7a3@github.com> Message-ID: On Thu, 30 May 2024 13:56:30 GMT, Emanuel Peter wrote: >> Control question: Are we confident with this potentially going into JDK 23 or should we rather postpone to JDK 24? The fork is next week. > >> Control question: Are we confident with this potentially going into JDK 23 or should we rather postpone to JDK 24? The fork is next week. > > I would hold off. @asgibbons it may pass our tests, and your extensive testing. But you never know what the fuzzer can find over a few weeks once it runs with your changes. I have made that experience many times. Let's just give it a few days, and then we have one JDK version less to worry about for backports on possible follow-up bugs ;) @eme64 I'm glad to have received your feedback. I see I have erroneously assumed that by making the exact code change you requested still requires your acceptance - I won't make that mistake again. I had also erroneously assumed that your review was complete and you had no further changes for me to make. I'd also not like to make that mistake again, but I'm unsure how to conclude that a review is complete - it seems like 7 hours of elapsed time isn't sufficient to indicate completion, so can you please help me figure this out? Perhaps it's just my distaste for "trickle-in" comments, which I should get over, or is there another way you can suggest? As for the fuzzer I would be very interested in learning more about this. We have a significant number of compute resources, so it may be valuable for us to set up a copy of the fuzzer on-site to improve the quality of our submissions. Can you help in pointing me to someone that can advise me on how to do this? As for holding off the integration, I'll leave the decision to a sponsor for this PR. I don't believe increasing the reviewer count just to "force" reevaluation should be an acceptable practice, although I'm not an insider in this community. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16753#issuecomment-2139814010 From duke at openjdk.org Thu May 30 15:19:17 2024 From: duke at openjdk.org (Volodymyr Paprotski) Date: Thu, 30 May 2024 15:19:17 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v48] In-Reply-To: <_4hKqcW7tE4shxVqG8Et3BjeehNjl0NWvS7PCKZaLe0=.73dc8315-22ee-47c0-8f5b-be74edc2f7a3@github.com> References: <_4hKqcW7tE4shxVqG8Et3BjeehNjl0NWvS7PCKZaLe0=.73dc8315-22ee-47c0-8f5b-be74edc2f7a3@github.com> Message-ID: <3r6BovGjkFUudXIeF6FF3ODENJ5F_wdHG1z4eyjpI-Y=.61eb125c-932d-4713-93fe-9f9ccb6584e4@github.com> On Thu, 30 May 2024 13:56:30 GMT, Emanuel Peter wrote: >> Control question: Are we confident with this potentially going into JDK 23 or should we rather postpone to JDK 24? The fork is next week. > >> Control question: Are we confident with this potentially going into JDK 23 or should we rather postpone to JDK 24? The fork is next week. > > I would hold off. @asgibbons it may pass our tests, and your extensive testing. But you never know what the fuzzer can find over a few weeks once it runs with your changes. I have made that experience many times. Let's just give it a few days, and then we have one JDK version less to worry about for backports on possible follow-up bugs ;) @eme64 I guess to add some confidence.. we did also 'test it independently' to catch blind spots. i.e. `String/IndexOf.java` is from me. I tried to be as paranoid as possible with non-random strings. Passed everything I could throw at it.. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16753#issuecomment-2139882544 From epeter at openjdk.org Thu May 30 15:19:18 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 30 May 2024 15:19:18 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v48] In-Reply-To: References: <_4hKqcW7tE4shxVqG8Et3BjeehNjl0NWvS7PCKZaLe0=.73dc8315-22ee-47c0-8f5b-be74edc2f7a3@github.com> Message-ID: <2MrjPeUReR3CJbw_L3K92H8O7xrKSIdZVzfpf7LVkIM=.dab21bd9-b149-4917-92dd-3e6abcca482b@github.com> On Thu, 30 May 2024 14:57:35 GMT, Scott Gibbons wrote: >>> Control question: Are we confident with this potentially going into JDK 23 or should we rather postpone to JDK 24? The fork is next week. >> >> I would hold off. @asgibbons it may pass our tests, and your extensive testing. But you never know what the fuzzer can find over a few weeks once it runs with your changes. I have made that experience many times. Let's just give it a few days, and then we have one JDK version less to worry about for backports on possible follow-up bugs ;) > > @eme64 I'm glad to have received your feedback. I see I have erroneously assumed that by making the exact code change you requested still requires your acceptance - I won't make that mistake again. I had also erroneously assumed that your review was complete and you had no further changes for me to make. I'd also not like to make that mistake again, but I'm unsure how to conclude that a review is complete - it seems like 7 hours of elapsed time isn't sufficient to indicate completion, so can you please help me figure this out? Perhaps it's just my distaste for "trickle-in" comments, which I should get over, or is there another way you can suggest? > > As for the fuzzer I would be very interested in learning more about this. We have a significant number of compute resources, so it may be valuable for us to set up a copy of the fuzzer on-site to improve the quality of our submissions. Can you help in pointing me to someone that can advise me on how to do this? > > As for holding off the integration, I'll leave the decision to a sponsor for this PR. I don't believe increasing the reviewer count just to "force" reevaluation should be an acceptable practice, although I'm not an insider in this community. @asgibbons I was done with my review, or at least so I thought ? Still: if I give comments, it would be nice to quickly finish the conversation, unless if I don't respond in many days and not even to emails. Often I only see the glaring issues. Then you fix them, and then I see something else around it. Then I may give more comments. That is what happened. If I think that I have small suggestions and then I'm done, then I might even approve even though there are suggestions still to be added. I just put up the limit really quick so that nobody else would by accident sponsor it before we have finished the conversation, and I will definitely give you my approval once the little issues are resolved ;) ------------- PR Comment: https://git.openjdk.org/jdk/pull/16753#issuecomment-2139893561 From epeter at openjdk.org Thu May 30 15:19:18 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 30 May 2024 15:19:18 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v49] In-Reply-To: <9PIuILHZnLHrZf1sz0Dsq6iup6qgyXw50mD0nGVS04c=.63bd0afd-d818-46fa-a082-a3d2066829cd@github.com> References: <9PIuILHZnLHrZf1sz0Dsq6iup6qgyXw50mD0nGVS04c=.63bd0afd-d818-46fa-a082-a3d2066829cd@github.com> Message-ID: <4ZM8wZFYPZjIbjb_O6n6DNAlpYOa2EHfmhSZHVUAXNA=.b923e319-f143-4a4c-9916-face36f337db@github.com> On Thu, 30 May 2024 13:19:57 GMT, Scott Gibbons wrote: >> Re-write the IndexOf code without the use of the pcmpestri instruction, only using AVX2 instructions. This change accelerates String.IndexOf on average 1.3x for AVX2. The benchmark numbers: >> >> >> Benchmark Score Latest >> StringIndexOf.advancedWithMediumSub 343.573 317.934 0.925375393x >> StringIndexOf.advancedWithShortSub1 1039.081 1053.96 1.014319384x >> StringIndexOf.advancedWithShortSub2 55.828 110.541 1.980027943x >> StringIndexOf.constantPattern 9.361 11.906 1.271872663x >> StringIndexOf.searchCharLongSuccess 4.216 4.218 1.000474383x >> StringIndexOf.searchCharMediumSuccess 3.133 3.216 1.02649218x >> StringIndexOf.searchCharShortSuccess 3.76 3.761 1.000265957x >> StringIndexOf.success 9.186 9.713 1.057369911x >> StringIndexOf.successBig 14.341 46.343 3.231504079x >> StringIndexOfChar.latin1_AVX2_String 6220.918 12154.52 1.953814533x >> StringIndexOfChar.latin1_AVX2_char 5503.556 5540.044 1.006629895x >> StringIndexOfChar.latin1_SSE4_String 6978.854 6818.689 0.977049957x >> StringIndexOfChar.latin1_SSE4_char 5657.499 5474.624 0.967675646x >> StringIndexOfChar.latin1_Short_String 7132.541 6863.359 0.962260014x >> StringIndexOfChar.latin1_Short_char 16013.389 16162.437 1.009307711x >> StringIndexOfChar.latin1_mixed_String 7386.123 14771.622 1.999915517x >> StringIndexOfChar.latin1_mixed_char 9901.671 9782.245 0.987938803 > > Scott Gibbons has updated the pull request incrementally with two additional commits since the last revision: > > - Stupid EOL at end > - Add @test block; fix test indentation About the fuzzer: we have it in our closed tests. But I think it comes from this: https://github.com/shipilev/JavaFuzzer ------------- PR Comment: https://git.openjdk.org/jdk/pull/16753#issuecomment-2139901477 From sgibbons at openjdk.org Thu May 30 15:27:18 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Thu, 30 May 2024 15:27:18 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v49] In-Reply-To: References: <9PIuILHZnLHrZf1sz0Dsq6iup6qgyXw50mD0nGVS04c=.63bd0afd-d818-46fa-a082-a3d2066829cd@github.com> Message-ID: On Thu, 30 May 2024 13:50:01 GMT, Emanuel Peter wrote: >> Scott Gibbons has updated the pull request incrementally with two additional commits since the last revision: >> >> - Stupid EOL at end >> - Add @test block; fix test indentation > > test/jdk/java/lang/String/IndexOf.java line 25: > >> 23: >> 24: /* >> 25: * @test > > You should add the `@bug 8320448` for all runs. Done. > test/jdk/java/lang/String/IndexOf.java line 27: > >> 25: * @test >> 26: * @summary test String indexOf() intrinsic >> 27: * @run main/othervm IndexOf > > Suggestion: > > * @run main IndexOf > > You do not need a new VM if you have no arguments ;) Done. > test/jdk/java/lang/StringBuffer/ECoreIndexOf.java line 25: > >> 23: >> 24: /* @test >> 25: * @bug 4162796 4162796 > > You need to fix the bug numbers. Done. > test/jdk/java/lang/StringBuffer/ECoreIndexOf.java line 27: > >> 25: * @bug 4162796 4162796 >> 26: * @summary Test indexOf and lastIndexOf >> 27: * @run main/othervm -Xbatch -XX:-TieredCompilation -XX:CompileCommand=dontinline,ECoreIndexOf.indexOfKernel ECoreIndexOf > > I would also add a line without `-XX:-TieredCompilation`, then C1 can be tested with this too Done. > test/jdk/java/lang/StringBuffer/ECoreIndexOf.java line 32: > >> 30: >> 31: /* @test >> 32: * @bug 4162796 4162796 > > Here too Done. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1620951690 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1620949315 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1620945040 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1620947641 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1620945484 From sgibbons at openjdk.org Thu May 30 15:30:45 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Thu, 30 May 2024 15:30:45 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v50] In-Reply-To: References: Message-ID: > Re-write the IndexOf code without the use of the pcmpestri instruction, only using AVX2 instructions. This change accelerates String.IndexOf on average 1.3x for AVX2. The benchmark numbers: > > > Benchmark Score Latest > StringIndexOf.advancedWithMediumSub 343.573 317.934 0.925375393x > StringIndexOf.advancedWithShortSub1 1039.081 1053.96 1.014319384x > StringIndexOf.advancedWithShortSub2 55.828 110.541 1.980027943x > StringIndexOf.constantPattern 9.361 11.906 1.271872663x > StringIndexOf.searchCharLongSuccess 4.216 4.218 1.000474383x > StringIndexOf.searchCharMediumSuccess 3.133 3.216 1.02649218x > StringIndexOf.searchCharShortSuccess 3.76 3.761 1.000265957x > StringIndexOf.success 9.186 9.713 1.057369911x > StringIndexOf.successBig 14.341 46.343 3.231504079x > StringIndexOfChar.latin1_AVX2_String 6220.918 12154.52 1.953814533x > StringIndexOfChar.latin1_AVX2_char 5503.556 5540.044 1.006629895x > StringIndexOfChar.latin1_SSE4_String 6978.854 6818.689 0.977049957x > StringIndexOfChar.latin1_SSE4_char 5657.499 5474.624 0.967675646x > StringIndexOfChar.latin1_Short_String 7132.541 6863.359 0.962260014x > StringIndexOfChar.latin1_Short_char 16013.389 16162.437 1.009307711x > StringIndexOfChar.latin1_mixed_String 7386.123 14771.622 1.999915517x > StringIndexOfChar.latin1_mixed_char 9901.671 9782.245 0.987938803 Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: Review comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16753/files - new: https://git.openjdk.org/jdk/pull/16753/files/3e150fe3..57e115d7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16753&range=49 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16753&range=48-49 Stats: 6 lines in 2 files changed: 3 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/16753.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16753/head:pull/16753 PR: https://git.openjdk.org/jdk/pull/16753 From epeter at openjdk.org Thu May 30 15:37:18 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 30 May 2024 15:37:18 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v49] In-Reply-To: References: <9PIuILHZnLHrZf1sz0Dsq6iup6qgyXw50mD0nGVS04c=.63bd0afd-d818-46fa-a082-a3d2066829cd@github.com> Message-ID: On Thu, 30 May 2024 15:21:10 GMT, Scott Gibbons wrote: > Done. I still see the numbers `4162796 4162796`. I'm not sure if this bug number is relevant. But certainly it should only be mentioned once ;) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1620960158 From epeter at openjdk.org Thu May 30 15:37:18 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 30 May 2024 15:37:18 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v49] In-Reply-To: References: <9PIuILHZnLHrZf1sz0Dsq6iup6qgyXw50mD0nGVS04c=.63bd0afd-d818-46fa-a082-a3d2066829cd@github.com> Message-ID: On Thu, 30 May 2024 15:30:26 GMT, Emanuel Peter wrote: >> Done. > >> Done. > > I still see the numbers `4162796 4162796`. I'm not sure if this bug number is relevant. But certainly it should only be mentioned once ;) I never add old bug number to new tests... ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1620963284 From epeter at openjdk.org Thu May 30 15:37:20 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 30 May 2024 15:37:20 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v50] In-Reply-To: References: Message-ID: On Thu, 30 May 2024 15:30:45 GMT, Scott Gibbons wrote: >> Re-write the IndexOf code without the use of the pcmpestri instruction, only using AVX2 instructions. This change accelerates String.IndexOf on average 1.3x for AVX2. The benchmark numbers: >> >> >> Benchmark Score Latest >> StringIndexOf.advancedWithMediumSub 343.573 317.934 0.925375393x >> StringIndexOf.advancedWithShortSub1 1039.081 1053.96 1.014319384x >> StringIndexOf.advancedWithShortSub2 55.828 110.541 1.980027943x >> StringIndexOf.constantPattern 9.361 11.906 1.271872663x >> StringIndexOf.searchCharLongSuccess 4.216 4.218 1.000474383x >> StringIndexOf.searchCharMediumSuccess 3.133 3.216 1.02649218x >> StringIndexOf.searchCharShortSuccess 3.76 3.761 1.000265957x >> StringIndexOf.success 9.186 9.713 1.057369911x >> StringIndexOf.successBig 14.341 46.343 3.231504079x >> StringIndexOfChar.latin1_AVX2_String 6220.918 12154.52 1.953814533x >> StringIndexOfChar.latin1_AVX2_char 5503.556 5540.044 1.006629895x >> StringIndexOfChar.latin1_SSE4_String 6978.854 6818.689 0.977049957x >> StringIndexOfChar.latin1_SSE4_char 5657.499 5474.624 0.967675646x >> StringIndexOfChar.latin1_Short_String 7132.541 6863.359 0.962260014x >> StringIndexOfChar.latin1_Short_char 16013.389 16162.437 1.009307711x >> StringIndexOfChar.latin1_mixed_String 7386.123 14771.622 1.999915517x >> StringIndexOfChar.latin1_mixed_char 9901.671 9782.245 0.987938803 > > Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: > > Review comments test/jdk/java/lang/StringBuffer/ECoreIndexOf.java line 25: > 23: > 24: /* @test > 25: * @bug 4162796 4162796 8320448 Suggestion: * @bug 8320448 test/jdk/java/lang/StringBuffer/ECoreIndexOf.java line 33: > 31: > 32: /* @test > 33: * @bug 4162796 4162796 8320448 Suggestion: * @bug 8320448 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1620964138 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1620964720 From epeter at openjdk.org Thu May 30 15:37:20 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 30 May 2024 15:37:20 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v50] In-Reply-To: References: Message-ID: On Thu, 30 May 2024 15:33:16 GMT, Emanuel Peter wrote: >> Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: >> >> Review comments > > test/jdk/java/lang/StringBuffer/ECoreIndexOf.java line 25: > >> 23: >> 24: /* @test >> 25: * @bug 4162796 4162796 8320448 > > Suggestion: > > * @bug 8320448 As I said above: I never add old bug numbers to new tests. But here it is even duplicated ;) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1620966568 From stuefe at openjdk.org Thu May 30 15:47:08 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 30 May 2024 15:47:08 GMT Subject: RFR: 8333149: ubsan : memset on nullptr target detected in jvmtiEnvBase.cpp get_object_monitor_usage In-Reply-To: References: Message-ID: On Wed, 29 May 2024 09:09:16 GMT, Matthias Baesken wrote: > When running with ubsan - enabled binaries (--enable-ubsan), > in the vmTestbase/nsk/jdi tests some cases of memset on nullptr destinations are detected in get_object_monitor_usage . > > // null out memory for robustness > memset(ret.waiters, 0, ret.waiter_count * sizeof(jthread *)); > memset(ret.notify_waiters, 0, ret.notify_waiter_count * sizeof(jthread *)); > > probably we should add checks there. > Example : > vmTestbase/nsk/jdi/ObjectReference/entryCount/entrycount002/TestDescription.jtr > > debugee.stderr> /src/hotspot/share/prims/jvmtiEnvBase.cpp:1560:11: runtime error: null pointer passed as argument 1, which is declared to never be null > debugee.stderr> #0 0x7ffb2568559c in JvmtiEnvBase::get_object_monitor_usage(JavaThread*, _jobject*, jvmtiMonitorUsage*) src/hotspot/share/prims/jvmtiEnvBase.cpp:1560 > debugee.stderr> #1 0x7ffb27987bd7 in VM_GetObjectMonitorUsage::doit() src/hotspot/share/prims/jvmtiEnvBase.hpp:594 > debugee.stderr> #2 0x7ffb28ddc2dd in VM_Operation::evaluate() src/hotspot/share/runtime/vmOperations.cpp:75 > debugee.stderr> #3 0x7ffb28deac41 in VMThread::evaluate_operation(VM_Operation*) src/hotspot/share/runtime/vmThread.cpp:283 > debugee.stderr> #4 0x7ffb28decc4f in VMThread::inner_execute(VM_Operation*) src/hotspot/share/runtime/vmThread.cpp:427 > debugee.stderr> #5 0x7ffb28ded7b9 in VMThread::loop() src/hotspot/share/runtime/vmThread.cpp:493 > debugee.stderr> #6 0x7ffb28ded8a7 in VMThread::run() src/hotspot/share/runtime/vmThread.cpp:177 > debugee.stderr> #7 0x7ffb28b7e31a in Thread::call_run() src/hotspot/share/runtime/thread.cpp:225 > debugee.stderr> #8 0x7ffb281c4971 in thread_native_entry src/hotspot/os/linux/os_linux.cpp:846 > debugee.stderr> #9 0x7ffb2df416e9 in start_thread (/lib64/libpthread.so.0+0xa6e9) (BuildId: 2f8d3c2d0f4d7888c2598d2ff6356537f5708a73) > debugee.stderr> #10 0x7ffb2d51550e in clone (/lib64/libc.so.6+0x11850e) (BuildId: f732026552f6adff988b338e92d466bc81a01c37) > > vmTestbase/nsk/jdi/ObjectReference/owningThread/owningthread002/TestDescription.jtr > > debugee.stderr> /src/hotspot/share/prims/jvmtiEnvBase.cpp:1561:11: runtime error: null pointer passed as argument 1, which is declared to never be null > debugee.stderr> #0 0x7f1e070855bb in JvmtiEnvBase::get_object_monitor_usage(JavaThread*, _jobject*, jvmtiMonitorUsage*) src/hotspot/share/prims/jvmtiEnvBase.cpp:1561 > debugee.stderr> #1 0x7f1e09387bd7 in VM_GetObjectMonitorUsage::doit() src/hotspot/share/prims/jvmtiEnvBase.hpp:594 > debugee.stderr> #2 0x7f1e0a7dc2dd in VM_Operation::evaluate() src/hotsp... I agree with David on the 24hr thing. We want others to stick to that rule, then we should keep the rule ourselves. The rule takes the pressure out of monitoring the patch flow. But @TheRealMDoerr is right, the only logical way we can see a nullptr here is if there are no waiters/notifiers. A better solution may have been to move the memsets into their respective count > 0 conditions. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19450#issuecomment-2140003043 From sgibbons at openjdk.org Thu May 30 15:48:50 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Thu, 30 May 2024 15:48:50 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v51] In-Reply-To: References: Message-ID: <73yhW7umbpUKGvfaJ5hkzLjIQ6_8hakVYD59s0-60OY=.321f0126-06a2-4efc-a271-80a518c53baa@github.com> > Re-write the IndexOf code without the use of the pcmpestri instruction, only using AVX2 instructions. This change accelerates String.IndexOf on average 1.3x for AVX2. The benchmark numbers: > > > Benchmark Score Latest > StringIndexOf.advancedWithMediumSub 343.573 317.934 0.925375393x > StringIndexOf.advancedWithShortSub1 1039.081 1053.96 1.014319384x > StringIndexOf.advancedWithShortSub2 55.828 110.541 1.980027943x > StringIndexOf.constantPattern 9.361 11.906 1.271872663x > StringIndexOf.searchCharLongSuccess 4.216 4.218 1.000474383x > StringIndexOf.searchCharMediumSuccess 3.133 3.216 1.02649218x > StringIndexOf.searchCharShortSuccess 3.76 3.761 1.000265957x > StringIndexOf.success 9.186 9.713 1.057369911x > StringIndexOf.successBig 14.341 46.343 3.231504079x > StringIndexOfChar.latin1_AVX2_String 6220.918 12154.52 1.953814533x > StringIndexOfChar.latin1_AVX2_char 5503.556 5540.044 1.006629895x > StringIndexOfChar.latin1_SSE4_String 6978.854 6818.689 0.977049957x > StringIndexOfChar.latin1_SSE4_char 5657.499 5474.624 0.967675646x > StringIndexOfChar.latin1_Short_String 7132.541 6863.359 0.962260014x > StringIndexOfChar.latin1_Short_char 16013.389 16162.437 1.009307711x > StringIndexOfChar.latin1_mixed_String 7386.123 14771.622 1.999915517x > StringIndexOfChar.latin1_mixed_char 9901.671 9782.245 0.987938803 Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: Fix bug number in tests ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16753/files - new: https://git.openjdk.org/jdk/pull/16753/files/57e115d7..6eae46e5 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16753&range=50 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16753&range=49-50 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/16753.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16753/head:pull/16753 PR: https://git.openjdk.org/jdk/pull/16753 From sgibbons at openjdk.org Thu May 30 15:48:50 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Thu, 30 May 2024 15:48:50 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v50] In-Reply-To: References: Message-ID: <22JtxwmXnPAAUHF8c3g6lmvUtymzGr6Ekib_nUAKbW4=.3315da8b-09bc-4534-9f27-0fe1485456c7@github.com> On Thu, 30 May 2024 15:34:17 GMT, Emanuel Peter wrote: >> test/jdk/java/lang/StringBuffer/ECoreIndexOf.java line 25: >> >>> 23: >>> 24: /* @test >>> 25: * @bug 4162796 4162796 8320448 >> >> Suggestion: >> >> * @bug 8320448 > > As I said above: I never add old bug numbers to new tests. But here it is even duplicated ;) The file I used as baseline for this `test/jdk/java/lang/StringBuffer/IndexOf.java` has the bug number listed twice (copy/paste). I'll remove it from here, but leave it in the original unless requested to change it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1620985844 From sgibbons at openjdk.org Thu May 30 15:48:50 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Thu, 30 May 2024 15:48:50 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v50] In-Reply-To: References: Message-ID: <3nJczHjyjWVNAlPneM19NW6Dc0MRql6sDE2hX4tyZpc=.3539eed5-c871-422c-806b-1f2d5bcbae2f@github.com> On Thu, 30 May 2024 15:33:27 GMT, Emanuel Peter wrote: >> Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: >> >> Review comments > > test/jdk/java/lang/StringBuffer/ECoreIndexOf.java line 33: > >> 31: >> 32: /* @test >> 33: * @bug 4162796 4162796 8320448 > > Suggestion: > > * @bug 8320448 Fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1620988308 From epeter at openjdk.org Thu May 30 16:10:17 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 30 May 2024 16:10:17 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v51] In-Reply-To: <73yhW7umbpUKGvfaJ5hkzLjIQ6_8hakVYD59s0-60OY=.321f0126-06a2-4efc-a271-80a518c53baa@github.com> References: <73yhW7umbpUKGvfaJ5hkzLjIQ6_8hakVYD59s0-60OY=.321f0126-06a2-4efc-a271-80a518c53baa@github.com> Message-ID: On Thu, 30 May 2024 15:48:50 GMT, Scott Gibbons wrote: >> Re-write the IndexOf code without the use of the pcmpestri instruction, only using AVX2 instructions. This change accelerates String.IndexOf on average 1.3x for AVX2. The benchmark numbers: >> >> >> Benchmark Score Latest >> StringIndexOf.advancedWithMediumSub 343.573 317.934 0.925375393x >> StringIndexOf.advancedWithShortSub1 1039.081 1053.96 1.014319384x >> StringIndexOf.advancedWithShortSub2 55.828 110.541 1.980027943x >> StringIndexOf.constantPattern 9.361 11.906 1.271872663x >> StringIndexOf.searchCharLongSuccess 4.216 4.218 1.000474383x >> StringIndexOf.searchCharMediumSuccess 3.133 3.216 1.02649218x >> StringIndexOf.searchCharShortSuccess 3.76 3.761 1.000265957x >> StringIndexOf.success 9.186 9.713 1.057369911x >> StringIndexOf.successBig 14.341 46.343 3.231504079x >> StringIndexOfChar.latin1_AVX2_String 6220.918 12154.52 1.953814533x >> StringIndexOfChar.latin1_AVX2_char 5503.556 5540.044 1.006629895x >> StringIndexOfChar.latin1_SSE4_String 6978.854 6818.689 0.977049957x >> StringIndexOfChar.latin1_SSE4_char 5657.499 5474.624 0.967675646x >> StringIndexOfChar.latin1_Short_String 7132.541 6863.359 0.962260014x >> StringIndexOfChar.latin1_Short_char 16013.389 16162.437 1.009307711x >> StringIndexOfChar.latin1_mixed_String 7386.123 14771.622 1.999915517x >> StringIndexOfChar.latin1_mixed_char 9901.671 9782.245 0.987938803 > > Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: > > Fix bug number in tests Ok, now it is good for me. But I would definately wait with integration for after the fork next week. src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 2: > 1: /* > 2: * Copyright (c) 2023, 2024 Intel Corporation. All rights reserved. Is the 2023 year intentional? I don't know your policy, so you can just ignore this ;) src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 334: > 332: // NUMBER_OF_CASES (currently 10) needle sizes for both big and small. There are special > 333: // routines for handling needle sizes > NUMBER_OF_CASES (L_{big,small}CaseDefault). These > 334: // cases use C@'s arrays_equals() to compare the needle to the haystack. The small cases Suggestion: // cases use C2's arrays_equals() to compare the needle to the haystack. The small cases Randomly spotted this. src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 773: > 771: // jae done > 772: // > 773: // Final index of start of needle @((16 - (ndlLen %16)) & 0xf) << 1 What is the meaning of the `@`? Maybe `at`. I'd use the same consistently ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16753#pullrequestreview-2088739965 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1621015782 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1621017548 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1621019611 From sgibbons at openjdk.org Thu May 30 16:16:45 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Thu, 30 May 2024 16:16:45 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v52] In-Reply-To: References: Message-ID: > Re-write the IndexOf code without the use of the pcmpestri instruction, only using AVX2 instructions. This change accelerates String.IndexOf on average 1.3x for AVX2. The benchmark numbers: > > > Benchmark Score Latest > StringIndexOf.advancedWithMediumSub 343.573 317.934 0.925375393x > StringIndexOf.advancedWithShortSub1 1039.081 1053.96 1.014319384x > StringIndexOf.advancedWithShortSub2 55.828 110.541 1.980027943x > StringIndexOf.constantPattern 9.361 11.906 1.271872663x > StringIndexOf.searchCharLongSuccess 4.216 4.218 1.000474383x > StringIndexOf.searchCharMediumSuccess 3.133 3.216 1.02649218x > StringIndexOf.searchCharShortSuccess 3.76 3.761 1.000265957x > StringIndexOf.success 9.186 9.713 1.057369911x > StringIndexOf.successBig 14.341 46.343 3.231504079x > StringIndexOfChar.latin1_AVX2_String 6220.918 12154.52 1.953814533x > StringIndexOfChar.latin1_AVX2_char 5503.556 5540.044 1.006629895x > StringIndexOfChar.latin1_SSE4_String 6978.854 6818.689 0.977049957x > StringIndexOfChar.latin1_SSE4_char 5657.499 5474.624 0.967675646x > StringIndexOfChar.latin1_Short_String 7132.541 6863.359 0.962260014x > StringIndexOfChar.latin1_Short_char 16013.389 16162.437 1.009307711x > StringIndexOfChar.latin1_mixed_String 7386.123 14771.622 1.999915517x > StringIndexOfChar.latin1_mixed_char 9901.671 9782.245 0.987938803 Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: Fix copyright & a couple of comment typos ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16753/files - new: https://git.openjdk.org/jdk/pull/16753/files/6eae46e5..f432320f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16753&range=51 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16753&range=50-51 Stats: 5 lines in 2 files changed: 0 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/16753.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16753/head:pull/16753 PR: https://git.openjdk.org/jdk/pull/16753 From kvn at openjdk.org Thu May 30 16:16:45 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 30 May 2024 16:16:45 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v49] In-Reply-To: <4ZM8wZFYPZjIbjb_O6n6DNAlpYOa2EHfmhSZHVUAXNA=.b923e319-f143-4a4c-9916-face36f337db@github.com> References: <9PIuILHZnLHrZf1sz0Dsq6iup6qgyXw50mD0nGVS04c=.63bd0afd-d818-46fa-a082-a3d2066829cd@github.com> <4ZM8wZFYPZjIbjb_O6n6DNAlpYOa2EHfmhSZHVUAXNA=.b923e319-f143-4a4c-9916-face36f337db@github.com> Message-ID: On Thu, 30 May 2024 15:16:34 GMT, Emanuel Peter wrote: >> Scott Gibbons has updated the pull request incrementally with two additional commits since the last revision: >> >> - Stupid EOL at end >> - Add @test block; fix test indentation > > About the fuzzer: we have it in our closed tests. But I think it comes from this: https://github.com/shipilev/JavaFuzzer I agree with @eme64 to postpone the integration after JDK 23 is forked in one week. It is not about how you confident with code. It is size of code. I did only limited (tier1-4) testing in our infra which did not cover all our testing configuration. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16753#issuecomment-2140103757 From sgibbons at openjdk.org Thu May 30 16:16:45 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Thu, 30 May 2024 16:16:45 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v51] In-Reply-To: References: <73yhW7umbpUKGvfaJ5hkzLjIQ6_8hakVYD59s0-60OY=.321f0126-06a2-4efc-a271-80a518c53baa@github.com> Message-ID: <1veKa8k9a_OgFxuy0XD_MPxOHgGpy8LXTgG6gEPfXiU=.3ed8e416-4267-40c5-8daf-8a9517f51557@github.com> On Thu, 30 May 2024 16:03:29 GMT, Emanuel Peter wrote: >> Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix bug number in tests > > src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 2: > >> 1: /* >> 2: * Copyright (c) 2023, 2024 Intel Corporation. All rights reserved. > > Is the 2023 year intentional? I don't know your policy, so you can just ignore this ;) I started this in November :-) > src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 334: > >> 332: // NUMBER_OF_CASES (currently 10) needle sizes for both big and small. There are special >> 333: // routines for handling needle sizes > NUMBER_OF_CASES (L_{big,small}CaseDefault). These >> 334: // cases use C@'s arrays_equals() to compare the needle to the haystack. The small cases > > Suggestion: > > // cases use C2's arrays_equals() to compare the needle to the haystack. The small cases > > Randomly spotted this. Fixed. > src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 773: > >> 771: // jae done >> 772: // >> 773: // Final index of start of needle @((16 - (ndlLen %16)) & 0xf) << 1 > > What is the meaning of the `@`? Maybe `at`. I'd use the same consistently Changed to "at". ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1621034441 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1621034583 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1621034821 From epeter at openjdk.org Thu May 30 16:23:34 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 30 May 2024 16:23:34 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v49] In-Reply-To: References: <9PIuILHZnLHrZf1sz0Dsq6iup6qgyXw50mD0nGVS04c=.63bd0afd-d818-46fa-a082-a3d2066829cd@github.com> <4ZM8wZFYPZjIbjb_O6n6DNAlpYOa2EHfmhSZHVUAXNA=.b923e319-f143-4a4c-9916-face36f337db@github.com> Message-ID: <9Gep5o1EEF96gprsHB1vDiw8KSQON-c6uh_9gBJyq9c=.43962158-2f23-4929-9e72-d4827a4fa5e8@github.com> On Thu, 30 May 2024 16:16:59 GMT, Scott Gibbons wrote: >> I agree with @eme64 to postpone the integration after JDK 23 is forked in one week. It is not about how you confident with code. It is size of code. I did only limited (tier1-4) testing in our infra which did not cover all our testing configuration. > > @vnkozlov OK. I'll defer to you all. I've contacted the author of the fuzzer to see what I can do to set up a local instance. Would this be sufficient to increase confidence for future submissions? We can run it perpetually on fixes (provided I can set it up). Had I done that, we could have had 6 months of fuzzing on top of our tests. Would that have alleviated this concern? @asgibbons I generally just stop pushing ANY RFE's a week or two before the fork. Even if you did run the fuzzer with it - there are often last-minute changes. And your code here is rather large, so even if you are confident, there must be at least one bug hiding. Running the fuzzer is nice as pre-integration, but it mostly only catches things post-integration. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16753#issuecomment-2140136262 From sgibbons at openjdk.org Thu May 30 16:23:34 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Thu, 30 May 2024 16:23:34 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v49] In-Reply-To: References: <9PIuILHZnLHrZf1sz0Dsq6iup6qgyXw50mD0nGVS04c=.63bd0afd-d818-46fa-a082-a3d2066829cd@github.com> <4ZM8wZFYPZjIbjb_O6n6DNAlpYOa2EHfmhSZHVUAXNA=.b923e319-f143-4a4c-9916-face36f337db@github.com> Message-ID: On Thu, 30 May 2024 16:10:53 GMT, Vladimir Kozlov wrote: >> About the fuzzer: we have it in our closed tests. But I think it comes from this: https://github.com/shipilev/JavaFuzzer > > I agree with @eme64 to postpone the integration after JDK 23 is forked in one week. It is not about how you confident with code. It is size of code. I did only limited (tier1-4) testing in our infra which did not cover all our testing configuration. @vnkozlov OK. I'll defer to you all. I've contacted the author of the fuzzer to see what I can do to set up a local instance. Would this be sufficient to increase confidence for future submissions? We can run it perpetually on fixes (provided I can set it up). Had I done that, we could have had 6 months of fuzzing on top of our tests. Would that have alleviated this concern? ------------- PR Comment: https://git.openjdk.org/jdk/pull/16753#issuecomment-2140124882 From kbarrett at openjdk.org Thu May 30 17:59:02 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Thu, 30 May 2024 17:59:02 GMT Subject: RFR: 8333133: Simplify QuickSort::sort In-Reply-To: References: Message-ID: On Thu, 30 May 2024 11:41:31 GMT, Aleksey Shipilev wrote: >> The "idempotent" argument is removed from that function, with associated >> simplifications to the implementation. Callers are updated to remove that >> argument. Callers that were providing a false value are unaffected in their >> behavior. The 3 callers that were providing a true value to request the >> associated feature are also unaffected (other than by being made faster), >> because the arrays involved don't contain any equivalent pairs. >> >> There are also some miscellaneous cleanups, including using the swap utility >> and fixing some comments. >> >> Testing: mach5 tier1-3 > > src/hotspot/share/utilities/quickSort.hpp line 75: > >> 73: for ( ; true; ++left_index, --right_index) { >> 74: for ( ; comparator(array[left_index], pivot_val) < 0; ++left_index) { >> 75: assert(left_index < (length - 1), "reached end of partition"); > > Let me see if I understand this change. It makes assert stronger: we do not accept `left_index == length - 1` anymore. I guess that would mean the pivot is at the last element? Which makes the partition is empty, which cannot happen? The reason I looked at the assert carefully in the first place was that `left_index < length` could pass and yet we could still do `array[length]`, which could be UB. So the tightened condition must be correct, else we have a bug. Proving we don't have a bug is a little bit harder. We know that the last element in the sequence can't be less than the pivot, because of the way find_pivot works. It arranges for the first, middle, and last values in the sequence to be ordered A[0] <= A[pivot] <= A[length-1]. The tricky case is the sequence [Vi..., P, P], where P is the pivot and all Vi < P. The from-left scan proceeds until it reaches the left occurrence of P. The from-right scan immediately stops on the right occurrence of P. The two P's are swapped. left_index is incremented and right_index is decremented. The from-left scan is restarted with left_index == length - 1, stops immediately because P < P is false, and does not execute the assert. Similarly, the from-right scan is restarted with right_index == length - 2, and also stops because P > P is false. And now the outer loop terminates. A similar argument applies for the from-right scan, where we already assert right_index > 0. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19464#discussion_r1621217067 From amenkov at openjdk.org Thu May 30 18:15:09 2024 From: amenkov at openjdk.org (Alex Menkov) Date: Thu, 30 May 2024 18:15:09 GMT Subject: Integrated: 8330852: All callers of JvmtiEnvBase::get_threadOop_and_JavaThread should pass current thread explicitly In-Reply-To: References: Message-ID: On Fri, 26 Apr 2024 22:59:43 GMT, Alex Menkov wrote: > Some cleanup related to JvmtiEnvBase::get_threadOop_and_JavaThread method > > Testing: tier1-6 This pull request has now been integrated. Changeset: 44c1845a Author: Alex Menkov URL: https://git.openjdk.org/jdk/commit/44c1845ae7fdff524d4a60a51362834cfea5c5da Stats: 43 lines in 3 files changed: 3 ins; 11 del; 29 mod 8330852: All callers of JvmtiEnvBase::get_threadOop_and_JavaThread should pass current thread explicitly Reviewed-by: sspitsyn, cjplummer ------------- PR: https://git.openjdk.org/jdk/pull/18986 From cjplummer at openjdk.org Thu May 30 19:02:06 2024 From: cjplummer at openjdk.org (Chris Plummer) Date: Thu, 30 May 2024 19:02:06 GMT Subject: RFR: 8332917: failure_handler should execute gdb "info threads" command on linux In-Reply-To: References: Message-ID: On Fri, 24 May 2024 19:45:21 GMT, Chris Plummer wrote: > On linux, failure_handler dumps stack traces for all threads, but this dump does not include the name of each thread. The gdb "info threads" command will give a summary of all threads, and if debugging process, the summary will include each thread's name. If debugging a core file, for some reason the thread name is not included, but the summary is still useful. > > Tested by running some tests that fail with a timeout, and looking at the failure_handler gdb output for both the process and the core file. Thanks for the reviews Leonid and Serguei! ------------- PR Comment: https://git.openjdk.org/jdk/pull/19401#issuecomment-2140687522 From cjplummer at openjdk.org Thu May 30 19:02:07 2024 From: cjplummer at openjdk.org (Chris Plummer) Date: Thu, 30 May 2024 19:02:07 GMT Subject: Integrated: 8332917: failure_handler should execute gdb "info threads" command on linux In-Reply-To: References: Message-ID: On Fri, 24 May 2024 19:45:21 GMT, Chris Plummer wrote: > On linux, failure_handler dumps stack traces for all threads, but this dump does not include the name of each thread. The gdb "info threads" command will give a summary of all threads, and if debugging process, the summary will include each thread's name. If debugging a core file, for some reason the thread name is not included, but the summary is still useful. > > Tested by running some tests that fail with a timeout, and looking at the failure_handler gdb output for both the process and the core file. This pull request has now been integrated. Changeset: ec88c6a8 Author: Chris Plummer URL: https://git.openjdk.org/jdk/commit/ec88c6a872a97cee1cde8844f5ee6834023a10c6 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod 8332917: failure_handler should execute gdb "info threads" command on linux Reviewed-by: lmesnik, sspitsyn ------------- PR: https://git.openjdk.org/jdk/pull/19401 From pchilanomate at openjdk.org Thu May 30 19:06:03 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Thu, 30 May 2024 19:06:03 GMT Subject: RFR: 8311177: Switching to interpreter only mode in carrier thread can lead to crashes [v2] In-Reply-To: <55rWd_Kn3Jf8kfmkMtVnzRVs_o0KK_jnuZthiS9awDA=.555b5928-38d1-422c-9014-7d4cf31a950d@github.com> References: <55rWd_Kn3Jf8kfmkMtVnzRVs_o0KK_jnuZthiS9awDA=.555b5928-38d1-422c-9014-7d4cf31a950d@github.com> Message-ID: On Thu, 30 May 2024 02:31:29 GMT, Serguei Spitsyn wrote: >> Please, review the following `interp-only` issue related to carrier threads. >> There are 3 problems fixed here: >> - The `EnterInterpOnlyModeClosure::do_threads` is taking the `JvmtiThreadState` with the `jt->jvmti_thread_state()` which is incorrect when we have a deal with a carrier thread. The target state is known at the point when the `HandshakeClosure` is set, so the fix is to pass it as a constructor parameter. >> - The `state->is_pending_interp_only_mode())` was processed at mounts only but it has to be processed for unmounts as well. >> - The test `test/hotspot/jtreg/serviceability/jvmti/vthread/MethodExitTest/libMethodExitTest.cpp` has a wrong assumption that there can't be `MethodExit` event on the carrier thread when the function `breakpoint_hit1` is being executed. However, it can happen if the virtual thread gets unmounted. >> >> The fix also includes new test case `vthread/CarrierThreadEventNotification` developed by Patricio. >> >> Testing: >> - Ran new test case locally >> - Ran mach5 tiers 1-6 > > Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: > > review: addressed nits in new test Hi Serguei, Thanks for fixing this one. src/hotspot/share/prims/jvmtiThreadState.cpp line 674: > 672: } > 673: // enable interp_only_mode for carrier thread if it has pending bit > 674: process_pending_interp_only(thread); So for the last unmount case we will call this before doing the JVMTI state rebinding, but shouldn't it be called after it in VTMS_vthread_end? Actually why not moving this call inside rebind_to_jvmti_thread_state_of()? ------------- PR Review: https://git.openjdk.org/jdk/pull/19438#pullrequestreview-2089163431 PR Review Comment: https://git.openjdk.org/jdk/pull/19438#discussion_r1621298531 From never at openjdk.org Thu May 30 20:42:12 2024 From: never at openjdk.org (Tom Rodriguez) Date: Thu, 30 May 2024 20:42:12 GMT Subject: RFR: 8333300: [JVMCI] add support for generational ZGC Message-ID: This exposes the required values for JVMCI to support generational ZGC. It includes a few things worth mentioning. JVMCI still exports XBarrierSetRuntime as fields in CompilerToVM::Data under the original name of ZBarrierSetRuntime. I have exported the XBarrierSetRuntime and ZBarrierSetRuntime functions as addresses under their actual name. This permits backward compatibility until all the required parts are in place. We can eventually delete the CompilerToVM::Data names. I added ZBarrierSetRuntime::load_barrier_on_oop_array paralleling XBarrierSetRuntime::load_barrier_on_oop_array as we use that for a vector barrier. I could create the function as part of JVMCIRuntime if there are any concerns about including that in the ZGC core. ------------- Commit messages: - 8333300: [JVMCI] add support for generational ZGC Changes: https://git.openjdk.org/jdk/pull/19490/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19490&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8333300 Stats: 241 lines in 14 files changed: 193 ins; 10 del; 38 mod Patch: https://git.openjdk.org/jdk/pull/19490.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19490/head:pull/19490 PR: https://git.openjdk.org/jdk/pull/19490 From szaldana at openjdk.org Thu May 30 20:45:25 2024 From: szaldana at openjdk.org (Sonia Zaldana Calles) Date: Thu, 30 May 2024 20:45:25 GMT Subject: RFR: 8332785: Replace naked uses of UseSharedSpaces with CDSConfig::is_using_archive Message-ID: Hi folks, This PR addresses [8332785](https://bugs.openjdk.org/browse/JDK-8332785) replacing all naked uses for ```UseSharedSpaces``` with ```CDSConfig::is_using_archive```. Testing: - [x] Tier 1 with GHA. Thanks, Sonia ------------- Commit messages: - Missed include statement in vmError_windows.cpp - 8332785: Replace naked uses of UseSharedSpaces with CDSConfig::is_using_archive Changes: https://git.openjdk.org/jdk/pull/19463/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19463&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8332785 Stats: 99 lines in 35 files changed: 8 ins; 0 del; 91 mod Patch: https://git.openjdk.org/jdk/pull/19463.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19463/head:pull/19463 PR: https://git.openjdk.org/jdk/pull/19463 From dnsimon at openjdk.org Thu May 30 20:51:02 2024 From: dnsimon at openjdk.org (Doug Simon) Date: Thu, 30 May 2024 20:51:02 GMT Subject: RFR: 8333300: [JVMCI] add support for generational ZGC In-Reply-To: References: Message-ID: On Thu, 30 May 2024 20:37:09 GMT, Tom Rodriguez wrote: > This exposes the required values for JVMCI to support generational ZGC. It includes a few things worth mentioning. JVMCI still exports XBarrierSetRuntime as fields in CompilerToVM::Data under the original name of ZBarrierSetRuntime. I have exported the XBarrierSetRuntime and ZBarrierSetRuntime functions as addresses under their actual name. This permits backward compatibility until all the required parts are in place. We can eventually delete the CompilerToVM::Data names. > > I added ZBarrierSetRuntime::load_barrier_on_oop_array paralleling XBarrierSetRuntime::load_barrier_on_oop_array as we use that for a vector barrier. I could create the function as part of JVMCIRuntime if there are any concerns about including that in the ZGC core. Marked as reviewed by dnsimon (Reviewer). src/hotspot/share/jvmci/jvmci_globals.cpp line 233: > 231: // Check if selected GC is supported by JVMCI and Java compiler > 232: if (!gc_supports_jvmci()) { > 233: fatal("JVMIC does not support the selected GC"); JVMIC -> JVMCI ------------- PR Review: https://git.openjdk.org/jdk/pull/19490#pullrequestreview-2089344199 PR Review Comment: https://git.openjdk.org/jdk/pull/19490#discussion_r1621411444 From kvn at openjdk.org Thu May 30 22:01:02 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 30 May 2024 22:01:02 GMT Subject: RFR: 8333300: [JVMCI] add support for generational ZGC In-Reply-To: References: Message-ID: On Thu, 30 May 2024 20:37:09 GMT, Tom Rodriguez wrote: > This exposes the required values for JVMCI to support generational ZGC. It includes a few things worth mentioning. JVMCI still exports XBarrierSetRuntime as fields in CompilerToVM::Data under the original name of ZBarrierSetRuntime. I have exported the XBarrierSetRuntime and ZBarrierSetRuntime functions as addresses under their actual name. This permits backward compatibility until all the required parts are in place. We can eventually delete the CompilerToVM::Data names. > > I added ZBarrierSetRuntime::load_barrier_on_oop_array paralleling XBarrierSetRuntime::load_barrier_on_oop_array as we use that for a vector barrier. I could create the function as part of JVMCIRuntime if there are any concerns about including that in the ZGC core. `nmethod.cpp` changes are fine. ZGC changes have to be reviewed by GC group. How you tested it to exercise new code (GenZGC + Graal)? And it broke RISC build based on GHA failure for cross compilation. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19490#pullrequestreview-2089434008 PR Comment: https://git.openjdk.org/jdk/pull/19490#issuecomment-2140924453 From dholmes at openjdk.org Thu May 30 23:48:08 2024 From: dholmes at openjdk.org (David Holmes) Date: Thu, 30 May 2024 23:48:08 GMT Subject: RFR: 8332935: Crash: assert(*lastPtr != 0) failed: Mismatched JNINativeInterface tables, check for new entries Message-ID: By using the `int*` type the assert could fail if the lower 32-bits of the function address were all zero. Trivial fix is to change to a type that is guaranteed the right size: `intptr_t*` Testing was done manually - see the JBS issue. Also run tier4 testing a sanity as it include `-Xcheck:jni`. Thanks. ------------- Commit messages: - 8332935: Crash: assert(*lastPtr != 0) failed: Mismatched JNINativeInterface tables, check for new entries Changes: https://git.openjdk.org/jdk/pull/19491/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19491&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8332935 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/19491.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19491/head:pull/19491 PR: https://git.openjdk.org/jdk/pull/19491 From dholmes at openjdk.org Fri May 31 00:19:07 2024 From: dholmes at openjdk.org (David Holmes) Date: Fri, 31 May 2024 00:19:07 GMT Subject: RFR: 8333149: ubsan : memset on nullptr target detected in jvmtiEnvBase.cpp get_object_monitor_usage In-Reply-To: References: Message-ID: On Wed, 29 May 2024 09:09:16 GMT, Matthias Baesken wrote: > When running with ubsan - enabled binaries (--enable-ubsan), > in the vmTestbase/nsk/jdi tests some cases of memset on nullptr destinations are detected in get_object_monitor_usage . > > // null out memory for robustness > memset(ret.waiters, 0, ret.waiter_count * sizeof(jthread *)); > memset(ret.notify_waiters, 0, ret.notify_waiter_count * sizeof(jthread *)); > > probably we should add checks there. > Example : > vmTestbase/nsk/jdi/ObjectReference/entryCount/entrycount002/TestDescription.jtr > > debugee.stderr> /src/hotspot/share/prims/jvmtiEnvBase.cpp:1560:11: runtime error: null pointer passed as argument 1, which is declared to never be null > debugee.stderr> #0 0x7ffb2568559c in JvmtiEnvBase::get_object_monitor_usage(JavaThread*, _jobject*, jvmtiMonitorUsage*) src/hotspot/share/prims/jvmtiEnvBase.cpp:1560 > debugee.stderr> #1 0x7ffb27987bd7 in VM_GetObjectMonitorUsage::doit() src/hotspot/share/prims/jvmtiEnvBase.hpp:594 > debugee.stderr> #2 0x7ffb28ddc2dd in VM_Operation::evaluate() src/hotspot/share/runtime/vmOperations.cpp:75 > debugee.stderr> #3 0x7ffb28deac41 in VMThread::evaluate_operation(VM_Operation*) src/hotspot/share/runtime/vmThread.cpp:283 > debugee.stderr> #4 0x7ffb28decc4f in VMThread::inner_execute(VM_Operation*) src/hotspot/share/runtime/vmThread.cpp:427 > debugee.stderr> #5 0x7ffb28ded7b9 in VMThread::loop() src/hotspot/share/runtime/vmThread.cpp:493 > debugee.stderr> #6 0x7ffb28ded8a7 in VMThread::run() src/hotspot/share/runtime/vmThread.cpp:177 > debugee.stderr> #7 0x7ffb28b7e31a in Thread::call_run() src/hotspot/share/runtime/thread.cpp:225 > debugee.stderr> #8 0x7ffb281c4971 in thread_native_entry src/hotspot/os/linux/os_linux.cpp:846 > debugee.stderr> #9 0x7ffb2df416e9 in start_thread (/lib64/libpthread.so.0+0xa6e9) (BuildId: 2f8d3c2d0f4d7888c2598d2ff6356537f5708a73) > debugee.stderr> #10 0x7ffb2d51550e in clone (/lib64/libc.so.6+0x11850e) (BuildId: f732026552f6adff988b338e92d466bc81a01c37) > > vmTestbase/nsk/jdi/ObjectReference/owningThread/owningthread002/TestDescription.jtr > > debugee.stderr> /src/hotspot/share/prims/jvmtiEnvBase.cpp:1561:11: runtime error: null pointer passed as argument 1, which is declared to never be null > debugee.stderr> #0 0x7f1e070855bb in JvmtiEnvBase::get_object_monitor_usage(JavaThread*, _jobject*, jvmtiMonitorUsage*) src/hotspot/share/prims/jvmtiEnvBase.cpp:1561 > debugee.stderr> #1 0x7f1e09387bd7 in VM_GetObjectMonitorUsage::doit() src/hotspot/share/prims/jvmtiEnvBase.hpp:594 > debugee.stderr> #2 0x7f1e0a7dc2dd in VM_Operation::evaluate() src/hotsp... Okay, sorry, the zero case handling here is a little awkward. But it seems to me that if the counts are zero it is expected that memset does nothing, so the value of the pointer passed in is irrelevant. Shame it doesn't specify that. We should skip allocation and the subsequent memset when the count is zero. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19450#issuecomment-2141034070 From iklam at openjdk.org Fri May 31 00:25:02 2024 From: iklam at openjdk.org (Ioi Lam) Date: Fri, 31 May 2024 00:25:02 GMT Subject: RFR: 8293980: Resolve CONSTANT_FieldRef at CDS dump time [v3] In-Reply-To: References: <6tYcoQdH8mEhbBRUoAcOi3Gue7Lz9qIjJh3GpcLKGGE=.0b286773-cc03-434e-88ca-2c9cc5efce67@github.com> Message-ID: <2cV0qix4YBr6H58RLaKdtiRmmiQ222IFHGH8kw-bWCY=.278bc880-4b3e-481c-90df-dd45a94f7822@github.com> On Thu, 30 May 2024 04:15:24 GMT, Dan Heidinga wrote: >> `preresolve_list` has the original CP indices (E.g., `putfield #123` as stored in the classfile), but in HotSpot, after bytecode rewriting, the u2 following the bytecode is changed to an index into the `cpcache()->_resolved_field_entries` array, so it becomes something like `putfield #45`. So we need to know how to convert the `123` index to the `45` index. >> >> We could walk `_resolved_field_entries` to find the `ResolvedFieldEntry` whose `_cpool_index` is `123`. However, before the `ResolvedFieldEntry` is resolved, we don't know which bytecode is used to resolve it, so we don't know whether it's for a static field or non-static field. Since we want to filter out the static fields in the PR, we need to: >> >> - walk the bytecodes to find only getfield/putfield bytecodes >> - these bytecodes will give us an index to the `_resolved_field_entries` array >> - from there, we discover the original CP index >> - then we see if this index is set to true in `preresolve_list` >> >> There's also a compatibility issue -- it's possible to have static and non-static field access using the same CP index: >> >> >> class Hack { >> static int myField; >> int foo(boolean flag) { >> try { >> if (flag) { >> // throw IncompatibleClassChangeError >> return /* pseudo code*/ getfield this.myField; >> } else { >> // OK >> return /* pseudo code*/ getstatic Hack.myField; >> } >> } catch (Throwable) { >> return 5678; >> } >> } >> >> >> So we must call `InterpreterRuntime::resolve_get_put()` which performs all the checks for access rights, static-vs-non-static, etc. This call requires a Method parameter, so we must walk all the Methods to find an appropriate one. >> >> Perhaps it's possible to refactor the `InterpreterRuntime` code to avoid passing in a Method, but I am hesitant to do that with code that deals with access right checks. > >> We could walk `_resolved_field_entries` to find the `ResolvedFieldEntry` whose `_cpool_index` is `123`. However, before the `ResolvedFieldEntry` is resolved, we don't know which bytecode is used to resolve it, so we don't know whether it's for a static field or non-static field. Since we want to filter out the static fields in the PR, we need to: >> >> * walk the bytecodes to find only getfield/putfield bytecodes >> * these bytecodes will give us an index to the `_resolved_field_entries` array >> * from there, we discover the original CP index >> * then we see if this index is set to true in `preresolve_list` > > Something's been bothering me about this explanation and I think I've put my finger on it. As you show, the same CP entry can be referenced by both `getstatic` & `getfield` bytecodes though only one will successfully resolve. Walking the bytecodes doesn't actually tell us anything - the resolution status should be different for instance vs static fields which means we're should always be safe to attempt the resolution of fields as instance fields provided we ignore errors. > >> So we must call `InterpreterRuntime::resolve_get_put()` which performs all the checks for access rights, static-vs-non-static, etc. This call requires a Method parameter, so we must walk all the Methods to find an appropriate one. > > The Method parameter is necessary for puts to final fields - either `` for static finals or an `` method for instance finals. In either case, the we don't actually resolve the field for puts so it doesn't matter if we pass the "correct" method or not during pre resolution as it will never successfully complete. I think we'd be OK to send any method we want to that call when doing preresolution provided we ignore the errors If you look at the version in the Leyden repo, there are many different types of references that are handled in `ClassPrelinker::maybe_resolve_fmi_ref` https://github.com/openjdk/leyden/blob/4faa72029abb86b55cb33b00acf9f3a18ade4b77/src/hotspot/share/cds/classPrelinker.cpp#L307 My goal is to defer all the safety checks to `InterpreterRuntime::resolve_xxx` so that we don't need to think about what is safe to pre-resolve, and what is not. Some of the checks are very complex (see linkResolver.cpp as well) and may change as the language evolve. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19355#discussion_r1621541729 From duke at openjdk.org Fri May 31 00:45:33 2024 From: duke at openjdk.org (Jin Guojie) Date: Fri, 31 May 2024 00:45:33 GMT Subject: RFR: 8331558: AArch64: optimize integer remainder [v4] In-Reply-To: <2sQ52bHtUebVvRZ6dd0zC3So9sN2mm40kXaYLm0nm_k=.5ec3561b-8dc4-4666-af9f-c32e19ff1c04@github.com> References: <2sQ52bHtUebVvRZ6dd0zC3So9sN2mm40kXaYLm0nm_k=.5ec3561b-8dc4-4666-af9f-c32e19ff1c04@github.com> Message-ID: > On some Arm processors, a separate multiply/subtract is actually faster than the combined instruction. > > (1) The following test has passed, which shows performance improvement. > > make test TEST="micro:java.lang.IntegerDivMod" > make test TEST="micro:java.lang.LongDivMod" > > * IntegerDivMod.testDivideRemainderUnsigned baseline(ns/ops) 2223 with this pacth(ns/ops) 1885 improvement(%) 17.93% > > * IntegerDivMod.testRemainderUnsigned baseline(ns/ops) 2225 with this pacth(ns/ops) 1885 improvement(%) 18.03% > > * LongDivMod.testDivideRemainderUnsigned baseline(ns/ops) 2231 with this pacth(ns/ops) 1894 improvement(%) 17.79% > > * LongDivMod.testRemainderUnsigned baseline(ns/ops) 2232 with this pacth(ns/ops) 1891 improvement(%) 18.03% > > (2) jtreg test has passed > > make run-test? TEST=tier1 Jin Guojie has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: - Merge branch 'openjdk:master' into dev0530 - Merge branch 'dev0530' of https://github.com/jinguojie-alibaba/jdk into dev0530 - Merge branch 'openjdk:master' into dev0530 - MacroAssembler::msub() takes a scratch register as an argument - 8331558: AArch64: optimize integer remainder On some Arm processors, a separate multiply/subtract is actually faster than the combined instruction. (1) The following test has passed, which shows performance improvement. make test TEST="micro:java.lang.IntegerDivMod" make test TEST="micro:java.lang.LongDivMod" * IntegerDivMod.testDivideRemainderUnsigned baseline(ns/ops) 2223 with this pacth(ns/ops) 1885 improvement(%) 17.93% * IntegerDivMod.testRemainderUnsigned baseline(ns/ops) 2225 with this pacth(ns/ops) 1885 improvement(%) 18.03% * LongDivMod.testDivideRemainderUnsigned baseline(ns/ops) 2231 with this pacth(ns/ops) 1894 improvement(%) 17.79% * LongDivMod.testRemainderUnsigned baseline(ns/ops) 2232 with this pacth(ns/ops) 1891 improvement(%) 18.03% (2) jtreg test has passed make run-test? TEST=tier1 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19471/files - new: https://git.openjdk.org/jdk/pull/19471/files/73c7bdc0..64214599 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19471&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19471&range=02-03 Stats: 2813 lines in 89 files changed: 1416 ins; 1108 del; 289 mod Patch: https://git.openjdk.org/jdk/pull/19471.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19471/head:pull/19471 PR: https://git.openjdk.org/jdk/pull/19471 From dholmes at openjdk.org Fri May 31 01:05:14 2024 From: dholmes at openjdk.org (David Holmes) Date: Fri, 31 May 2024 01:05:14 GMT Subject: RFR: 8331558: AArch64: optimize integer remainder [v4] In-Reply-To: References: <2sQ52bHtUebVvRZ6dd0zC3So9sN2mm40kXaYLm0nm_k=.5ec3561b-8dc4-4666-af9f-c32e19ff1c04@github.com> Message-ID: On Fri, 31 May 2024 00:45:33 GMT, Jin Guojie wrote: >> On some Arm processors, a separate multiply/subtract is actually faster than the combined instruction. >> >> (1) The following test has passed, which shows performance improvement. >> >> make test TEST="micro:java.lang.IntegerDivMod" >> make test TEST="micro:java.lang.LongDivMod" >> >> * IntegerDivMod.testDivideRemainderUnsigned baseline(ns/ops) 2223 with this pacth(ns/ops) 1885 improvement(%) 17.93% >> >> * IntegerDivMod.testRemainderUnsigned baseline(ns/ops) 2225 with this pacth(ns/ops) 1885 improvement(%) 18.03% >> >> * LongDivMod.testDivideRemainderUnsigned baseline(ns/ops) 2231 with this pacth(ns/ops) 1894 improvement(%) 17.79% >> >> * LongDivMod.testRemainderUnsigned baseline(ns/ops) 2232 with this pacth(ns/ops) 1891 improvement(%) 18.03% >> >> (2) jtreg test has passed >> >> make run-test? TEST=tier1 > > Jin Guojie has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - Merge branch 'openjdk:master' into dev0530 > - Merge branch 'dev0530' of https://github.com/jinguojie-alibaba/jdk into dev0530 > - Merge branch 'openjdk:master' into dev0530 > - MacroAssembler::msub() takes a scratch register as an argument > - 8331558: AArch64: optimize integer remainder > > On some Arm processors, a separate multiply/subtract is actually faster than the combined instruction. > > (1) The following test has passed, which shows performance improvement. > > make test TEST="micro:java.lang.IntegerDivMod" > make test TEST="micro:java.lang.LongDivMod" > > * IntegerDivMod.testDivideRemainderUnsigned > baseline(ns/ops) 2223 > with this pacth(ns/ops) 1885 > improvement(%) 17.93% > > * IntegerDivMod.testRemainderUnsigned > baseline(ns/ops) 2225 > with this pacth(ns/ops) 1885 > improvement(%) 18.03% > > * LongDivMod.testDivideRemainderUnsigned > baseline(ns/ops) 2231 > with this pacth(ns/ops) 1894 > improvement(%) 17.79% > > * LongDivMod.testRemainderUnsigned > baseline(ns/ops) 2232 > with this pacth(ns/ops) 1891 > improvement(%) 18.03% > > (2) jtreg test has passed > > make run-test? TEST=tier1 The issue 8331558 is already closed and this change cannot be associated with it. If this is a re-do of the original 8331558 fix then it needs a new JBS issue. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19471#issuecomment-2141065673 From fyang at openjdk.org Fri May 31 01:15:01 2024 From: fyang at openjdk.org (Fei Yang) Date: Fri, 31 May 2024 01:15:01 GMT Subject: RFR: 8332900: RISC-V: refactor nativeInst_riscv.cpp and macroAssembler_riscv.cpp [v3] In-Reply-To: References: Message-ID: On Wed, 29 May 2024 18:54:27 GMT, Hamlin Li wrote: >> Hi, >> Can you help to review the patch? >> Currently, code in nativeInst_riscv.cpp and macroAssembler_riscv.cpp call each other, which is not right for readability and maintainance. >> After refactoring, basically only code in nativeInst_riscv.cpp calls code in macroAssembler_riscv.cpp, but not in reverse direction. >> >> Thanks! >> >> * Tests are still running, so far so good. > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > restrict accessbility TBH, I don't think this is a change in the right direction. Classes like `NativeInstruction` and `MacroAssembler` are supposed to be there for different purposes, so I think it's not a issue for them to call each other where necessary. It's looks really strange to me to move functions like `extract_rs1` from `NativeInstruction` to `MacroAssembler`. It's a thing which is supposed to be in the domain of class `NativeInstruction`. ------------- PR Review: https://git.openjdk.org/jdk/pull/19459#pullrequestreview-2089652180 From sspitsyn at openjdk.org Fri May 31 01:41:17 2024 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Fri, 31 May 2024 01:41:17 GMT Subject: RFR: 8326716: JVMTI spec: clarify what nullptr means for C/C++ developers [v3] In-Reply-To: References: Message-ID: > The following RFE was fixed recently: > [8324680](https://bugs.openjdk.org/browse/JDK-8324680): Replace NULL with nullptr in JVMTI generated code > > It replaced all the `NULL`'s in the generated spec with`nullptr`. JVMTI agents can be developed in C or C++. > This update is to make it clear that `nullptr` is C programming language `null` pointer. > > I think we do not need a CSR for this fix. > > Testing: N/A (not needed) Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: review: replace nullptr with null pointer in the docs ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19257/files - new: https://git.openjdk.org/jdk/pull/19257/files/9fe639e1..4e1c48a1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19257&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19257&range=01-02 Stats: 81 lines in 4 files changed: 0 ins; 0 del; 81 mod Patch: https://git.openjdk.org/jdk/pull/19257.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19257/head:pull/19257 PR: https://git.openjdk.org/jdk/pull/19257 From sspitsyn at openjdk.org Fri May 31 01:46:03 2024 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Fri, 31 May 2024 01:46:03 GMT Subject: RFR: 8326716: JVMTI spec: clarify what nullptr means for C/C++ developers [v2] In-Reply-To: References: <6Sb8kKpbkh4ylD4u5Zayx2fV0ZaC5aVNicqoX6g_UNA=.7831eabc-905f-489b-87da-68953ec03412@github.com> <_CuYvr39rfebBcJRO0AM-2p8yQ2-V0oboFclyxAJ7Mo=.8cdba311-3f93-4c95-ac8b-6d7d41d88e24@github.com> Message-ID: On Fri, 17 May 2024 04:34:22 GMT, Kim Barrett wrote: > But this clarification doesn't actually clarify that the rest of the spec uses nullptr. Based on the proposed wording I would expect things like: > > The function may return nullptr > > to say > > The function may return a null pointer Okay. I've made a fix to replace in the docs `nullptr` with `null pointer` as you suggested. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19257#issuecomment-2141095067 From sspitsyn at openjdk.org Fri May 31 01:46:03 2024 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Fri, 31 May 2024 01:46:03 GMT Subject: RFR: 8326716: JVMTI spec: clarify what nullptr means for C/C++ developers [v3] In-Reply-To: References: Message-ID: <4eneea7A20vUtgGjOxQ0nd63PjutAtT4UsShvgys0-c=.7f0e6947-0129-4f27-a132-edbae5eedeb2@github.com> On Fri, 17 May 2024 04:47:31 GMT, Alan Bateman wrote: >> Thank you, Kim. I like this suggestion. Updated now. > > That part looks okay but I think all the parameters and error descriptions changed by JDK-8324680 will now need to change to use "null" instead of "nullptr". Okay. I've made a fix to replace in the doc: `nullptr` => `null` pointer as David suggested below. I can reduce it and remove the word 'pointer'. Please, let me know what is better. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19257#discussion_r1621575474 From david.holmes at oracle.com Fri May 31 01:51:17 2024 From: david.holmes at oracle.com (David Holmes) Date: Fri, 31 May 2024 11:51:17 +1000 Subject: [EXTERNAL] Re: External _JAVA_OPTIONS environment variable sourcing for self-contained applications In-Reply-To: <9532fa6e-fcba-49ab-a965-762e3056869b@xpipe.io> References: <1bc8a1a8-5adf-4a00-800c-cfe626608ae6@oracle.com> <918f3a96-cc75-43a5-b19b-fefe063e82ea@oracle.com> <285f99c9-0689-4059-b9c4-860879332465@xpipe.io> <10c34c7d-fedc-4a55-909c-28180fb74093@xpipe.io> <999d912a-68ad-4c5d-8b88-ef93d3b5d6f0@littlepinkcloud.com> <567cd5e4-f0d4-4c69-be66-2e220dc640eb@oracle.com> <9532fa6e-fcba-49ab-a965-762e3056869b@xpipe.io> Message-ID: <7e2f05ec-1083-4933-9600-2bddb0ad7b39@oracle.com> On 30/05/2024 3:51 pm, Christopher Schnick wrote: > Alright I see your points. I can definitely crosspost this thread to the > core libs mailing list. > > The only case in which I see this still being mainly a hotspot issue is > if there is more global configuration creeping into runtime images apart > from environment variables. Is there any other global configuration data > always sourced that I'm not aware of like registry values, Java Control > Panel settings (is that even still around?), other global configuration > files, etc.? At the hotspot-level options come from 4 sources: - the "command-line" as encoded in the args structure passed to JNI_CreateJavaVM - options embedded in the image from jlink - options from the JAVA_TOOL_OPTIONS env var - options from the _JAVA_OPTIONS env var The "command-line" is controlled by the launcher, so potentially a launcher could include additional options that it acquires from arbitrary places (like registries or the file system). But I am not aware of the standard launcher doing that. Any source of options could provide the -XX:Flags=flags-file option to read further options from a file (which gets processed first). David ----- > On 30/05/2024 06:30, David Holmes wrote: >> On 29/05/2024 8:05 pm, Andrew Haley wrote: >>> On 5/29/24 09:23, Christopher Schnick wrote: >>> ?> So is there any update on this? From the existing discussion, it >>> was still not apparent whether the hotspot developers consider this >>> being a problem that should be fixed properly. There were already a >>> few possible solutions proposed in this thread. >>> >>> I don't think there were many that would pass a compatibility and >>> specification review. "Give developers the option to unset these >>> variables in the automatically generated launcher script for jlink" >>> might well be OK, though. It'd be worth a try. >> >> I also think this is something that we should see about fixing in >> jlink, such that the problematic env-vars are omitted. I'm less >> inclined to support the suggestion that a new flag be added to hotspot >> that tells it to ignore the env vars, as you will need to add it in >> jlink anyway. >> >> But again I am not familiar with jlink and the jlink developers do not >> generally hang out on hotspot-dev. So I would suggest filing a JBS >> issue against jlink or starting a discussion on ... core-libs-dev? >> >> David >> From fyang at openjdk.org Fri May 31 02:27:03 2024 From: fyang at openjdk.org (Fei Yang) Date: Fri, 31 May 2024 02:27:03 GMT Subject: RFR: 8333245: RISC-V: UseRVV option can't be enabled after JDK-8316859 In-Reply-To: References: <14Zzi3W09YcO5NtfL7gUQwY0NDpexCOTdj4reavKKTI=.e8c2822c-6064-47c9-88d4-de50b980436a@github.com> Message-ID: On Thu, 30 May 2024 13:50:03 GMT, Robbin Ehn wrote: > With big warning that kernel do not support vector let user run with vector ? That does not make sense to me. In case of JMH performance evaluation, we can simply revert JDK-8316859. BTW: I see people are offering feedbacks to the vendor about constraints posed by the older kernels. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19472#issuecomment-2141126350 From dholmes at openjdk.org Fri May 31 02:42:02 2024 From: dholmes at openjdk.org (David Holmes) Date: Fri, 31 May 2024 02:42:02 GMT Subject: RFR: 8326716: JVMTI spec: clarify what nullptr means for C/C++ developers [v2] In-Reply-To: References: <6Sb8kKpbkh4ylD4u5Zayx2fV0ZaC5aVNicqoX6g_UNA=.7831eabc-905f-489b-87da-68953ec03412@github.com> <_CuYvr39rfebBcJRO0AM-2p8yQ2-V0oboFclyxAJ7Mo=.8cdba311-3f93-4c95-ac8b-6d7d41d88e24@github.com> Message-ID: On Fri, 31 May 2024 01:43:35 GMT, Serguei Spitsyn wrote: > Okay. I've made a fix to replace in the docs nullptr with null pointer as you suggested. What I suggested was > returns _a_ null pointer in place of > returns `nulllptr` but "a null pointer" doesn't always look right either e.g. "was a null pointer" would be better as just "was null". I think using non-code-font "null" to represent the concept of null-ness would be fine: "returns null", "is null", "was null" ------------- PR Comment: https://git.openjdk.org/jdk/pull/19257#issuecomment-2141136079 From dholmes at openjdk.org Fri May 31 05:05:06 2024 From: dholmes at openjdk.org (David Holmes) Date: Fri, 31 May 2024 05:05:06 GMT Subject: RFR: 8332785: Replace naked uses of UseSharedSpaces with CDSConfig::is_using_archive In-Reply-To: References: Message-ID: On Wed, 29 May 2024 18:12:25 GMT, Sonia Zaldana Calles wrote: > Hi folks, > > This PR addresses [8332785](https://bugs.openjdk.org/browse/JDK-8332785) replacing all naked uses for ```UseSharedSpaces``` with ```CDSConfig::is_using_archive```. > > Testing: > - [x] Tier 1 with GHA. > > Thanks, > Sonia One minor nit but otherwise looks good. Thanks src/hotspot/share/cds/cdsConfig.cpp line 308: > 306: > 307: bool CDSConfig::has_unsupported_runtime_module_options() { > 308: assert(CDSConfig::is_using_archive(), "this function is only used with -Xshare:{on,auto}"); Nit: you shouldn't need to specify `CDSConfig::` ------------- Marked as reviewed by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19463#pullrequestreview-2089898352 PR Review Comment: https://git.openjdk.org/jdk/pull/19463#discussion_r1621718062 From dholmes at openjdk.org Fri May 31 05:46:01 2024 From: dholmes at openjdk.org (David Holmes) Date: Fri, 31 May 2024 05:46:01 GMT Subject: RFR: 8333129: Move ShrinkHeapInSteps flag to Serial GC In-Reply-To: References: Message-ID: On Wed, 29 May 2024 12:36:40 GMT, Zhengyu Gu wrote: > A trivial change that moves Serial GC specific flag `ShrinkHeapInSteps` to `serial_globals.hpp` Okay. Thanks Apologies, I was mistakenly thinking that moving the flag would cause a runtime error for code not using Serial GC, but it will only cause an error if a VM were built without Serial GC support, which I don't think is possible. I will file a RFE to get the GC Tuning guide updates (if @kimbarrett doesn't beat me to it :) ). I also note that component-specific global flags don't advertise the fact they are global-specific. ------------- Marked as reviewed by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19452#pullrequestreview-2089935612 PR Comment: https://git.openjdk.org/jdk/pull/19452#issuecomment-2141270467 From gcao at openjdk.org Fri May 31 05:55:01 2024 From: gcao at openjdk.org (Gui Cao) Date: Fri, 31 May 2024 05:55:01 GMT Subject: RFR: 8333245: RISC-V: UseRVV option can't be enabled after JDK-8316859 In-Reply-To: References: <14Zzi3W09YcO5NtfL7gUQwY0NDpexCOTdj4reavKKTI=.e8c2822c-6064-47c9-88d4-de50b980436a@github.com> Message-ID: On Thu, 30 May 2024 13:50:03 GMT, Robbin Ehn wrote: >> Because some dev boards only support RVV version 0.7, In [JDK-8316859](https://bugs.openjdk.org/browse/JDK-8316859) we masked the use of HWCAP to probe for RVV extensions, and in the meantime, we can use hwprobe to probe for V extensions in Linux kernel 6.5 and above. But recently we got Banana Pi BPI-F3 board (has RVV1.0), but his kernel is 6.1.15, so the V extensions detected by HWCAP are masked. And we get the warning: `RVV is not supported on this CPU` when we enable UseRVV with the command, and we can't enable UseRVV correctly. >> >> Without Patch: >> >> zifeihan at bananapif3:~/jre/jdk/bin$ ./java -XX:+PrintFlagsFinal -XX:+UseRVV -version | grep UseRVV >> OpenJDK 64-Bit Server VM warning: RVV is not supported on this CPU >> bool UseRVV = false {ARCH product} {command line} >> bool UseRVVForBigIntegerShiftIntrinsics = false {ARCH product} {default} >> openjdk version "23-internal" 2024-09-17 >> OpenJDK Runtime Environment (build 23-internal-adhoc.zifeihan.jdk) >> OpenJDK 64-Bit Server VM (build 23-internal-adhoc.zifeihan.jdk, mixed mode) >> >> >> With Patch: >> >> zifeihan at bananapif3:~/jre/jdk/bin$ ./java -XX:+PrintFlagsFinal -XX:+UseRVV -version | grep UseRVV >> bool UseRVV = true {ARCH product} {command line} >> bool UseRVVForBigIntegerShiftIntrinsics = true {ARCH product} {default} >> openjdk version "23-internal" 2024-09-17 >> OpenJDK Runtime Environment (build 23-internal-adhoc.zifeihan.jdk) >> OpenJDK 64-Bit Server VM (build 23-internal-adhoc.zifeihan.jdk, mixed mode) > > Another suggestion, it seem like you can ge the triplet mvendorid/marchid/mimpid from /proc/cpuinfo. > So if we can grab those from VM_Version::os_uarch_additional_features() when available and no hwprobe. > We can set those 3, and in VM_Version::vendor_features() check if this is BananPie. > With big warning that kernel do not support vector let user run with vector ? > > So that way THEAD is unaffected. > > From random internet user: > > bananapif3:~$ cat /proc/cpuinfo > processor : 0 > hart : 0 > model name : Spacemit(R) X60 > isa : rv64imafdcv_sscofpmf_sstc_svpbmt_zicbom_zicboz_zicbop_zihintpause > mmu : sv39 > mvendorid : 0x710 > marchid : 0x8000000058000001 > mimpid : 0x1000000049772200 @robehn @RealFYang : Thanks for the review. I'm so sorry, I think I missed the fact that kernel support for Vector extension is only available since 6.5. I agree that that UseRVV option should only be turned on if kernel support for is guaranteed first, so I think this issue can be closed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19472#issuecomment-2141278959 From fyang at openjdk.org Fri May 31 06:06:01 2024 From: fyang at openjdk.org (Fei Yang) Date: Fri, 31 May 2024 06:06:01 GMT Subject: RFR: 8332689: RISC-V: Use load instead of trampolines In-Reply-To: References: Message-ID: <2vYlP7K0d8TQe5OKzTZA33rSAWDjIQSsoO8obofaMos=.9da50618-ed09-4e1b-954f-6199b3e34745@github.com> On Wed, 29 May 2024 12:40:05 GMT, Robbin Ehn wrote: > Hi all, please consider! > > Today we do JAL to **dest** if **dest** is in reach (+/- 1 MB). > Using a very small application or running very short time we have fast patchable calls. > But any normal application running longer will increase the code size and code chrun/fragmentation. > So whatever or not you get hot fast calls rely on luck. > > To be patchable and get code cache reach we also emit a stub trampoline which we can point the JAL to. > This would be the common case for a patchable call. > > Code stream: > JAL > Stubs: > AUIPC > LD > JALR > > > > On some CPUs L1D and L1I can't contain the same cache line, which means the tramopline stub can bounce from L1I->L1D->L1I, which is expensive. > Even if you don't have that problem having a call to a jump is not the fastest way. > Loading the address avoids the pitsfalls of cmodx. > > This patch suggest to solve the problems with trampolines, we take small penalty in the naive case of JAL to **dest**, > and instead do by default: > > Code stream: > AUIPC > LD > JALR > Stubs: > > > An experimental option for turning trampolines back on exists. > > It should be possible to enhanced this with the WIP [Zjid](https://github.com/riscv/riscv-j-extension) by changing the JALR to JAL and nop out the auipc+ld (as the current proposal of Zjid forces the I-fetcher to fetch instruction in order (meaning we will avoid a lot issues which arm has)) when in reach and vice-versa. > > Numbers from VF2 (I have done them a few times, they are always overall in favor of this patch): > > fop (msec) 2239 | 2128 = 0.950424 > h2 (msec) 18660 | 16594 = 0.889282 > jython (msec) 22022 | 21925 = 0.995595 > luindex (msec) 2866 | 2842 = 0.991626 > lusearch (msec) 4108 | 4311 = 1.04942 > lusearch-fix (msec) 4406 | 4116 = 0.934181 > pmd (msec) 5976 | 5897 = 0.98678 > jython (msec) 22022 | 21925 = 0.995595 > Avg: 0.974112 > fop(xcomp) (msec) 2721 | 2714 = 0.997427 > h2(xcomp) (msec) 37719 | 38004 = 1.00756 > jython(xcomp) ... Interesting. I will take a look and play with it on my machines. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19453#issuecomment-2141289242 From david.holmes at oracle.com Fri May 31 06:19:24 2024 From: david.holmes at oracle.com (David Holmes) Date: Fri, 31 May 2024 16:19:24 +1000 Subject: Structure of the HotSpot Interpreter In-Reply-To: References: Message-ID: <19b2860c-ccdf-475b-a994-143e744305b5@oracle.com> Hi Julian, On 30/05/2024 11:47 pm, Julian Waters wrote: > Hi all, > > I've recently been trying to learn more about HotSpot and studying its > internals, but the structure of the Interpreter seems to elude me > still. I'm aware that HotSpot doesn't use a traditional switch case > (Well, at least not usually, looking at you Zero Port), but how it > functions is more or less still a black box to me. What kind of > dispatch mechanism does it use, for instance? Is it Direct Threaded, > Indirect Threaded, Token Threaded, or something else entirely? Is > there somewhere I can learn about how everything connects together? > I've tried reading the HotSpot documentation online but there doesn't > seem to be an in-depth explanation in them for how it all fits > together, I'd greatly appreciate if someone points me in the right > direction Did you find: https://albertnetymk.github.io/2021/08/03/template_interpreter/ ? HTH David > best regards, > Julian From rehn at openjdk.org Fri May 31 06:20:01 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Fri, 31 May 2024 06:20:01 GMT Subject: RFR: 8332689: RISC-V: Use load instead of trampolines In-Reply-To: <2vYlP7K0d8TQe5OKzTZA33rSAWDjIQSsoO8obofaMos=.9da50618-ed09-4e1b-954f-6199b3e34745@github.com> References: <2vYlP7K0d8TQe5OKzTZA33rSAWDjIQSsoO8obofaMos=.9da50618-ed09-4e1b-954f-6199b3e34745@github.com> Message-ID: <121iQz2jpvxUKWm9NX0IkAYtQtxHLjXAmee01zv3VAs=.358939c7-cd57-44e9-b74b-66bf31e9f8d8@github.com> On Fri, 31 May 2024 06:02:56 GMT, Fei Yang wrote: > Interesting. I will take a look and play with it on my machines. Thanks. I don't expected this will be done before RDP1, so take your time. Note that this also somewhat fixes issues with out of order machines spectulation. I.e. the **dest** in the trampoline may very well be I-fetched and decoded which may trigger pipeline flush or similar if it's decoded as something crazy to the CPU. (obviously very implementation dependent) ------------- PR Comment: https://git.openjdk.org/jdk/pull/19453#issuecomment-2141304728 From stuefe at openjdk.org Fri May 31 06:40:01 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Fri, 31 May 2024 06:40:01 GMT Subject: RFR: 8332785: Replace naked uses of UseSharedSpaces with CDSConfig::is_using_archive In-Reply-To: References: Message-ID: On Wed, 29 May 2024 18:12:25 GMT, Sonia Zaldana Calles wrote: > Hi folks, > > This PR addresses [8332785](https://bugs.openjdk.org/browse/JDK-8332785) replacing all naked uses for ```UseSharedSpaces``` with ```CDSConfig::is_using_archive```. > > Testing: > - [x] Tier 1 with GHA. > > Thanks, > Sonia Looks good, minus the nit @dholmes-ora mentioned. Please make sure Copyrights are updated. ------------- Marked as reviewed by stuefe (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19463#pullrequestreview-2089995893 From stuefe at openjdk.org Fri May 31 06:53:04 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Fri, 31 May 2024 06:53:04 GMT Subject: RFR: 8333047: Remove arena-size-workaround in jvmtiUtils.cpp In-Reply-To: References: Message-ID: On Wed, 29 May 2024 07:42:01 GMT, Johan Sj?len wrote: >> In `JvmtiUtil::single_threaded_resource_area()`, we create a resource area that is supposed to work even if the current thread is not attached yet and there is no associated Thread or the Thread has no valid ResourceArea. >> >> It contains a workaround: >> >> >> // lazily create the single threaded resource area >> // pick a size which is not a standard since the pools don't exist yet >> _single_threaded_resource_area = new (mtInternal) ResourceArea(Chunk::non_pool_size); >> >> >> It specifies a non-standard chunk size to circumvent the chunk-pool-based allocation in the RA constructor, ensuring that only malloc is used. This is because in the old days the ChunkPools had been allocated from C-Heap and there was a time window when no chunk pools were live yet. >> >> This is quirky and a bit ugly. It is also unnecessary since [JDK-8272112](https://bugs.openjdk.org/browse/JDK-8272112) (since JDK 18). We now create chunk pools as global objects, so they are live as soon as the libjvm C++ initialization ran. We can remove this workaround and the comment. >> >> --- >> >> Tests: GHAs. >> I also manually called this function, and allocated from the resulting ResourceArea, at the very beginning of CreateJavaVM. I made sure that both allocations and follow-up-chunk-allocation worked even this early in VM life. > > Today, the ChunkPools are allocated before main through static initialization. That means that the ChunkPools exists when main starts executing, so this is safe. Thanks @jdksjolen and @sspitsyn ! ------------- PR Comment: https://git.openjdk.org/jdk/pull/19425#issuecomment-2141338912 From stuefe at openjdk.org Fri May 31 06:53:04 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Fri, 31 May 2024 06:53:04 GMT Subject: Integrated: 8333047: Remove arena-size-workaround in jvmtiUtils.cpp In-Reply-To: References: Message-ID: On Tue, 28 May 2024 12:36:41 GMT, Thomas Stuefe wrote: > In `JvmtiUtil::single_threaded_resource_area()`, we create a resource area that is supposed to work even if the current thread is not attached yet and there is no associated Thread or the Thread has no valid ResourceArea. > > It contains a workaround: > > > // lazily create the single threaded resource area > // pick a size which is not a standard since the pools don't exist yet > _single_threaded_resource_area = new (mtInternal) ResourceArea(Chunk::non_pool_size); > > > It specifies a non-standard chunk size to circumvent the chunk-pool-based allocation in the RA constructor, ensuring that only malloc is used. This is because in the old days the ChunkPools had been allocated from C-Heap and there was a time window when no chunk pools were live yet. > > This is quirky and a bit ugly. It is also unnecessary since [JDK-8272112](https://bugs.openjdk.org/browse/JDK-8272112) (since JDK 18). We now create chunk pools as global objects, so they are live as soon as the libjvm C++ initialization ran. We can remove this workaround and the comment. > > --- > > Tests: GHAs. > I also manually called this function, and allocated from the resulting ResourceArea, at the very beginning of CreateJavaVM. I made sure that both allocations and follow-up-chunk-allocation worked even this early in VM life. This pull request has now been integrated. Changeset: ba323b51 Author: Thomas Stuefe URL: https://git.openjdk.org/jdk/commit/ba323b515d8821895356507bdb1e94df0776dd5a Stats: 8 lines in 3 files changed: 0 ins; 3 del; 5 mod 8333047: Remove arena-size-workaround in jvmtiUtils.cpp Reviewed-by: jsjolen, sspitsyn ------------- PR: https://git.openjdk.org/jdk/pull/19425 From mli at openjdk.org Fri May 31 07:13:03 2024 From: mli at openjdk.org (Hamlin Li) Date: Fri, 31 May 2024 07:13:03 GMT Subject: RFR: 8332900: RISC-V: refactor nativeInst_riscv.cpp and macroAssembler_riscv.cpp [v3] In-Reply-To: References: Message-ID: On Wed, 29 May 2024 18:54:27 GMT, Hamlin Li wrote: >> Hi, >> Can you help to review the patch? >> Currently, code in nativeInst_riscv.cpp and macroAssembler_riscv.cpp call each other, which is not right for readability and maintainance. >> After refactoring, basically only code in nativeInst_riscv.cpp calls code in macroAssembler_riscv.cpp, but not in reverse direction. >> >> Thanks! >> >> * Tests are still running, so far so good. > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > restrict accessbility I'm not sure, maybe because you're already familiar with current code, but I have a different opinion. The only uses of `extract_rs1` are in static functions in `nativeInst_riscv.hpp`, so it's just a utility function, and does not matter where it's put. But, e.g. things related to movptr[1/2] are currently separated in nativeInst_riscv and macroAssembler_riscv, which is not right, in one side it brings no benefit for reading the code, in the other side when you modify some code you need do code change at both files, https://github.com/openjdk/jdk/pull/19246 is an example. The only things must be in nativeInst_riscv are the interfaces/fuctions it exposes to outside, but some implementation is not necessary to be in it, and it's better to be put in macroAssembler_riscv, as they're tightly coupled (e.g. movptr things). ------------- PR Comment: https://git.openjdk.org/jdk/pull/19459#issuecomment-2141363282 From fyang at openjdk.org Fri May 31 07:43:01 2024 From: fyang at openjdk.org (Fei Yang) Date: Fri, 31 May 2024 07:43:01 GMT Subject: RFR: 8332900: RISC-V: refactor nativeInst_riscv.cpp and macroAssembler_riscv.cpp [v3] In-Reply-To: References: Message-ID: On Wed, 29 May 2024 18:54:27 GMT, Hamlin Li wrote: >> Hi, >> Can you help to review the patch? >> Currently, code in nativeInst_riscv.cpp and macroAssembler_riscv.cpp call each other, which is not right for readability and maintainance. >> After refactoring, basically only code in nativeInst_riscv.cpp calls code in macroAssembler_riscv.cpp, but not in reverse direction. >> >> Thanks! >> >> * Tests are still running, so far so good. > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > restrict accessbility I know what you mean. But they play different roles by design. Just think of `MacroAssembler` as producer and `NativeInstruction` as consumer/analyzer of the native instructions. There will surely be some protocols between the them under the hood, which I think is quite normal. BTW: this is also adopted by other CPUs like aarch64. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19459#issuecomment-2141402623 From mli at openjdk.org Fri May 31 07:56:00 2024 From: mli at openjdk.org (Hamlin Li) Date: Fri, 31 May 2024 07:56:00 GMT Subject: RFR: 8332900: RISC-V: refactor nativeInst_riscv.cpp and macroAssembler_riscv.cpp [v3] In-Reply-To: References: Message-ID: On Fri, 31 May 2024 07:40:14 GMT, Fei Yang wrote: > There will surely be some protocols between the them under the hood. If this protocol means lots of dual direction communication, then we should consider if it's right (from a point of view OO design, it's an obvious code smell for me). NativeInstruction could be a wrapper upon MacroAssembler, but not in reverse direction, in that way it makes things complicated and it's not necessary and bring no benefit. > BTW: this is also adopted by other CPUs like aarch64. Yes, we can refer or copy some code from other platforms in case they're well implemented. Please check https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/aarch64/nativeInst_aarch64.hpp#L137, the `extract` is not in nativeInst_aarch64, it's in assembler_aarch64. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19459#issuecomment-2141421523 From sspitsyn at openjdk.org Fri May 31 08:07:36 2024 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Fri, 31 May 2024 08:07:36 GMT Subject: RFR: 8326716: JVMTI spec: clarify what nullptr means for C/C++ developers [v4] In-Reply-To: References: Message-ID: <778pi5AHHgXZdUEBV45R0Npj1wZPeuAHwWdrygWR830=.d67d6249-0dcd-4fef-976f-6911432d53f8@github.com> > The following RFE was fixed recently: > [8324680](https://bugs.openjdk.org/browse/JDK-8324680): Replace NULL with nullptr in JVMTI generated code > > It replaced all the `NULL`'s in the generated spec with`nullptr`. JVMTI agents can be developed in C or C++. > This update is to make it clear that `nullptr` is C programming language `null` pointer. > > I think we do not need a CSR for this fix. > > Testing: N/A (not needed) Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: review: more null pointer corrections ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19257/files - new: https://git.openjdk.org/jdk/pull/19257/files/4e1c48a1..48ba8f5d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19257&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19257&range=02-03 Stats: 50 lines in 1 file changed: 0 ins; 0 del; 50 mod Patch: https://git.openjdk.org/jdk/pull/19257.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19257/head:pull/19257 PR: https://git.openjdk.org/jdk/pull/19257 From sspitsyn at openjdk.org Fri May 31 08:07:36 2024 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Fri, 31 May 2024 08:07:36 GMT Subject: RFR: 8326716: JVMTI spec: clarify what nullptr means for C/C++ developers [v3] In-Reply-To: References: Message-ID: On Fri, 31 May 2024 01:41:17 GMT, Serguei Spitsyn wrote: >> The following RFE was fixed recently: >> [8324680](https://bugs.openjdk.org/browse/JDK-8324680): Replace NULL with nullptr in JVMTI generated code >> >> It replaced all the `NULL`'s in the generated spec with`nullptr`. JVMTI agents can be developed in C or C++. >> This update is to make it clear that `nullptr` is C programming language `null` pointer. >> >> I think we do not need a CSR for this fix. >> >> Testing: N/A (not needed) > > Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: > > review: replace nullptr with null pointer in the docs Thanks, David. I've done one more attempt to correct it. Please, let me know if it is wrong. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19257#issuecomment-2141435705 From fyang at openjdk.org Fri May 31 08:19:01 2024 From: fyang at openjdk.org (Fei Yang) Date: Fri, 31 May 2024 08:19:01 GMT Subject: RFR: 8332900: RISC-V: refactor nativeInst_riscv.cpp and macroAssembler_riscv.cpp [v3] In-Reply-To: References: Message-ID: On Fri, 31 May 2024 07:53:45 GMT, Hamlin Li wrote: > > There will surely be some protocols between the them under the hood. > > If this protocol means lots of dual direction communication, then we should consider if it's right (from a point of view OO design, it's an obvious code smell for me). NativeInstruction could be a wrapper upon MacroAssembler, but not in reverse direction, in that way it makes things complicated and it's not necessary and bring no benefit. I guess that might deserve a broader discussion as it does not seem to be a RISC-V specific issue. > > BTW: this is also adopted by other CPUs like aarch64. > > Yes, we can refer or copy some code from other platforms in case they're well implemented. Please check https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/aarch64/nativeInst_aarch64.hpp#L137, the `extract` is not in nativeInst_aarch64, it's in assembler_aarch64. The RISC-V counter part: https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/riscv/assembler_riscv.hpp#L384 The protocol is also there for aarch64 (movptr as an example): https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/aarch64/nativeInst_aarch64.hpp#L290 ------------- PR Comment: https://git.openjdk.org/jdk/pull/19459#issuecomment-2141454177 From shade at openjdk.org Fri May 31 08:26:03 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 31 May 2024 08:26:03 GMT Subject: RFR: 8325821: [REDO] use "dmb.ishst+dmb.ishld" for release barrier [v5] In-Reply-To: References: Message-ID: On Thu, 30 May 2024 08:26:56 GMT, Aleksey Shipilev wrote: > > I can run the jcstress test. I will run fastdebug build with `java -jar jcstress-latest.jar -tb 24h` Is it the correct command > > Yes, I think so. FTR, I ran Linux AArch64 server release on Graviton 3 instance (ergonomics selects `-AlwaysMergeDMB` there) for 12 hours. Apart from failures from [JDK-8332670](https://bugs.openjdk.org/browse/JDK-8332670), I see no other trouble. Scheduled a quick run with `+AlwaysMergeDMB` as well. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19278#issuecomment-2141471801 From mli at openjdk.org Fri May 31 08:47:03 2024 From: mli at openjdk.org (Hamlin Li) Date: Fri, 31 May 2024 08:47:03 GMT Subject: RFR: 8332900: RISC-V: refactor nativeInst_riscv.cpp and macroAssembler_riscv.cpp [v3] In-Reply-To: References: Message-ID: On Fri, 31 May 2024 08:15:28 GMT, Fei Yang wrote: > I guess that might deserve a broader discussion as it does not seem to be a RISC-V specific issue. > I'm not sure if it needs be discussed broadly, as it's an implementation detail, not a shared logic in JVM. Unless someone would also like to refactor the aach64 implementation, but I'm not trying to do that. > The RISC-V counter part: https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/riscv/assembler_riscv.hpp#L384 > Yes, what I mean is `extract` related things are not necessarily in nativeInst_riscv, especially when it causes too much communication bidirectionally between nativeInst and macroAssembler. > The protocol is also there for aarch64 (movptr as an example): https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/aarch64/nativeInst_aarch64.hpp#L290 As I said, we can refer to other platforms in case they're well implemented. But, I see no benefit and reason to put e.g. `patch_addr_in_movptr1` in macroAssembler and `is_movptr1_at` in nativeInst, I can not find the rational to do so, just because aarch64 do the similar thing? seems that's not a good reason; if it is, then how about other platforms, do they all follow aarch64 implementation details? ------------- PR Comment: https://git.openjdk.org/jdk/pull/19459#issuecomment-2141514630 From aph at openjdk.org Fri May 31 09:51:04 2024 From: aph at openjdk.org (Andrew Haley) Date: Fri, 31 May 2024 09:51:04 GMT Subject: RFR: 8331558: AArch64: optimize integer remainder [v4] In-Reply-To: References: <2sQ52bHtUebVvRZ6dd0zC3So9sN2mm40kXaYLm0nm_k=.5ec3561b-8dc4-4666-af9f-c32e19ff1c04@github.com> Message-ID: On Fri, 31 May 2024 00:45:33 GMT, Jin Guojie wrote: >> On some Arm processors, a separate multiply/subtract is actually faster than the combined instruction. >> >> (1) The following test has passed, which shows performance improvement. >> >> make test TEST="micro:java.lang.IntegerDivMod" >> make test TEST="micro:java.lang.LongDivMod" >> >> * IntegerDivMod.testDivideRemainderUnsigned baseline(ns/ops) 2223 with this pacth(ns/ops) 1885 improvement(%) 17.93% >> >> * IntegerDivMod.testRemainderUnsigned baseline(ns/ops) 2225 with this pacth(ns/ops) 1885 improvement(%) 18.03% >> >> * LongDivMod.testDivideRemainderUnsigned baseline(ns/ops) 2231 with this pacth(ns/ops) 1894 improvement(%) 17.79% >> >> * LongDivMod.testRemainderUnsigned baseline(ns/ops) 2232 with this pacth(ns/ops) 1891 improvement(%) 18.03% >> >> (2) jtreg test has passed >> >> make run-test? TEST=tier1 > > Jin Guojie has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - Merge branch 'openjdk:master' into dev0530 > - Merge branch 'dev0530' of https://github.com/jinguojie-alibaba/jdk into dev0530 > - Merge branch 'openjdk:master' into dev0530 > - MacroAssembler::msub() takes a scratch register as an argument > - 8331558: AArch64: optimize integer remainder > > On some Arm processors, a separate multiply/subtract is actually faster than the combined instruction. > > (1) The following test has passed, which shows performance improvement. > > make test TEST="micro:java.lang.IntegerDivMod" > make test TEST="micro:java.lang.LongDivMod" > > * IntegerDivMod.testDivideRemainderUnsigned > baseline(ns/ops) 2223 > with this pacth(ns/ops) 1885 > improvement(%) 17.93% > > * IntegerDivMod.testRemainderUnsigned > baseline(ns/ops) 2225 > with this pacth(ns/ops) 1885 > improvement(%) 18.03% > > * LongDivMod.testDivideRemainderUnsigned > baseline(ns/ops) 2231 > with this pacth(ns/ops) 1894 > improvement(%) 17.79% > > * LongDivMod.testRemainderUnsigned > baseline(ns/ops) 2232 > with this pacth(ns/ops) 1891 > improvement(%) 18.03% > > (2) jtreg test has passed > > make run-test? TEST=tier1 There's a new issue at https://bugs.openjdk.org/browse/JDK-8333343 ------------- PR Comment: https://git.openjdk.org/jdk/pull/19471#issuecomment-2141636309 From duke at openjdk.org Fri May 31 10:09:03 2024 From: duke at openjdk.org (kuaiwei) Date: Fri, 31 May 2024 10:09:03 GMT Subject: RFR: 8325821: [REDO] use "dmb.ishst+dmb.ishld" for release barrier [v5] In-Reply-To: References: Message-ID: On Fri, 31 May 2024 08:23:51 GMT, Aleksey Shipilev wrote: > > > I can run the jcstress test. I will run fastdebug build with `java -jar jcstress-latest.jar -tb 24h` Is it the correct command > > > > Yes, I think so. > > FTR, I ran Linux AArch64 server release on Graviton 3 instance (ergonomics selects `-AlwaysMergeDMB` there) for 12 hours. Apart from failures from [JDK-8332670](https://bugs.openjdk.org/browse/JDK-8332670), I see no other trouble. Scheduled a quick run with `+AlwaysMergeDMB` as well. Thanks for testing. I'm running jcstress on a neoverse-n2 instance. I got some "soft errs" in console output. Are they real error? ------------- PR Comment: https://git.openjdk.org/jdk/pull/19278#issuecomment-2141690121 From jbhateja at openjdk.org Fri May 31 10:17:24 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 31 May 2024 10:17:24 GMT Subject: RFR: 8329031: CPUID feature detection for Advanced Performance Extensions =?UTF-8?B?KEludGVswq4=?= APX) Message-ID: Summary of changes include with the patch:- 1) CPUID based feature detection check for Intel APX extension (https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html) 2) Validation during VM initialization for extended GPRs state save / restoration by OS across context switches of java application threads executing JIT compiled code with new APX ISA. Kindly review and share your feedback. Best Regards, Jatin ------------- Commit messages: - Update vm_version_x86.cpp - Post merge clenups. - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8329031 - Minor modification in UseAPX flag description - Making UseAPX a boolean flag. - 32-bit build fix - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8329031 - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8329031 - 8329031: CPUID feature detection for APX during VM initialization. Changes: https://git.openjdk.org/jdk/pull/18562/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=18562&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8329031 Stats: 179 lines in 8 files changed: 153 ins; 11 del; 15 mod Patch: https://git.openjdk.org/jdk/pull/18562.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18562/head:pull/18562 PR: https://git.openjdk.org/jdk/pull/18562 From duke at openjdk.org Fri May 31 10:17:24 2024 From: duke at openjdk.org (Steve Dohrmann) Date: Fri, 31 May 2024 10:17:24 GMT Subject: RFR: 8329031: CPUID feature detection for Advanced Performance Extensions =?UTF-8?B?KEludGVswq4=?= APX) In-Reply-To: References: Message-ID: <6SdGX2sJgqS0nv6DzDELFML8Jv0GE9BHJwxH54UdQTs=.55228610-14a9-4102-ad46-6bd49c0e1f81@github.com> On Mon, 1 Apr 2024 12:01:27 GMT, Jatin Bhateja wrote: > Summary of changes include with the patch:- > > 1) CPUID based feature detection check for Intel APX extension (https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html) > 2) Validation during VM initialization for extended GPRs state save / restoration by OS across context switches of java application threads executing JIT compiled code with new APX ISA. > > Kindly review and share your feedback. > > Best Regards, > Jatin Hi @jatin-bhateja, Can you merge with the latest since PR #18476 is in now? ------------- PR Comment: https://git.openjdk.org/jdk/pull/18562#issuecomment-2128187189 From fyang at openjdk.org Fri May 31 10:55:01 2024 From: fyang at openjdk.org (Fei Yang) Date: Fri, 31 May 2024 10:55:01 GMT Subject: RFR: 8332900: RISC-V: refactor nativeInst_riscv.cpp and macroAssembler_riscv.cpp [v3] In-Reply-To: References: Message-ID: On Fri, 31 May 2024 08:43:17 GMT, Hamlin Li wrote: > As I said, we can refer to other platforms in case they're well implemented. But, I see no benefit and reason to put e.g. `movptr1` and `patch_addr_in_movptr1` in macroAssembler and `is_movptr1_at` in nativeInst, I can not find the rational to do so, just because aarch64 do the similar thing? seems that's not a good reason; if it is, then how about other platforms, do they all follow aarch64 implementation details? Anyway, it's an implementation detail, we don't have to follow other platforms unless it's a good design choice for riscv too. Apparently that's not what I mean. I just want say that it's a commonly adopted approach by other CPUs not just aarch64 and I am just fine with it. But your opinion from a OO design perspective does make sense to me. Anyway, here might be a compromise for us when I see you still kept following interfaces for `NativeInstruction`: bool is_jal() const { return MacroAssembler::is_jal_at(addr_at(0)); } bool is_movptr() const { return MacroAssembler::is_movptr1_at(addr_at(0)) || MacroAssembler::is_movptr2_at(addr_at(0)); } bool is_movptr1() const { return MacroAssembler::is_movptr1_at(addr_at(0)); } bool is_movptr2() const { return MacroAssembler::is_movptr2_at(addr_at(0)); } bool is_auipc() const { return MacroAssembler::is_auipc_at(addr_at(0)); } bool is_call() const { return MacroAssembler::is_call_at(addr_at(0)); } bool is_jump() const { return MacroAssembler::is_jump_at(addr_at(0)); } Can you do similar thing for other interfaces like `MacroAssembler::is_li16u_at`? Like `NativeInstruction::is_li16u` which delegates work to `MacroAssembler::is_li16u_at`. And we use these interfaces from `NativeInstruction` where appropriate at the callsites like in file src/hotspot/cpu/riscv/gc/z/zBarrierSetAssembler_riscv.cpp. Hopefully, the code will be more unified in style and resolves both of our concerns. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19459#issuecomment-2141756406 From shade at openjdk.org Fri May 31 11:07:04 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 31 May 2024 11:07:04 GMT Subject: RFR: 8325821: [REDO] use "dmb.ishst+dmb.ishld" for release barrier [v5] In-Reply-To: References: Message-ID: On Fri, 31 May 2024 10:06:06 GMT, kuaiwei wrote: > Thanks for testing. I'm running jcstress on a neoverse-n2 instance. I got some "soft errs" in console output. Are they real error? Soft errors are API inconsistencies that jcstress harness handled. As long as you don't have new hard errors, you are fine. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19278#issuecomment-2141786102 From shade at openjdk.org Fri May 31 11:07:05 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 31 May 2024 11:07:05 GMT Subject: RFR: 8325821: [REDO] use "dmb.ishst+dmb.ishld" for release barrier [v6] In-Reply-To: References: Message-ID: <-EyonABDkbgLwga8XTUDIaJeKhd8qwOmMmYtjFg90RQ=.eb6aebe4-b3e0-4225-9413-57a37730643c@github.com> On Thu, 30 May 2024 07:45:31 GMT, kuaiwei wrote: >> he origin patch for https://bugs.openjdk.org/browse/JDK-8324186 has 2 issues: >> 1 It show regression in some platform, like Apple silicon in mac os >> 2 Can not handle instruction sequence like "dmb.ishld; dmb.ishst; dmb.ishld; dmb.ishld" >> >> It can be fixed by: >> 1 Enable AlwaysMergeDMB by default, only disable it in architecture we can see performance improvement (N1 or N2) >> 2 Check the special pattern and merge the subsequent dmb. >> >> It also fix a bug when code buffer is expanding, st/ld/dmb can not be merged. I added unit tests for these. >> >> This patch still has a unhandled case. Insts like "dmb.ishld; dmb.ishst; dmb.ish", it will merge the last 2 instructions and can not merge all three. Because when emitting dmb.ish, if merge all previous dmbs, the code buffer will shrink the size. I think it may break some resumption and think it's not a common pattern. >> >> In previous PR https://github.com/openjdk/jdk/pull/18467 , I tried an implementation to use state machine for merging. But it looks risky to pending instruction during emitting. > > kuaiwei has updated the pull request incrementally with one additional commit since the last revision: > > Add comment in aarch64.ad src/hotspot/cpu/aarch64/macroAssembler_aarch64.hpp line 153: > 151: Assembler::bind(L); > 152: code()->clear_last_insn(); > 153: code()->set_last_label(pc()); OK, so we have added `_last_label` to shared code in `codeBuffer`, but only update it in aarch64. This would be surprising for other platforms. On the other hand, this is what we already do with `_last_insn` -- only implementing it for specific platforms. Probably fine, but it would be nice to strengthen this with asserts, maybe in separate PR. test/hotspot/gtest/aarch64/test_assembler_aarch64.cpp line 211: > 209: constexpr uint32_t test_encode_dmb_ld = 0xd50339bf; > 210: constexpr uint32_t test_encode_dmb_st = 0xd5033abf; > 211: constexpr uint32_t test_encode_dmb = 0xd5033bbf; Can you maybe move these to the top, and use these constants across the test? You would not need the comments like `0xd5033abf, // dmb.ishst` then. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19278#discussion_r1622251263 PR Review Comment: https://git.openjdk.org/jdk/pull/19278#discussion_r1622170658 From shade at openjdk.org Fri May 31 11:07:06 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 31 May 2024 11:07:06 GMT Subject: RFR: 8325821: [REDO] use "dmb.ishst+dmb.ishld" for release barrier [v2] In-Reply-To: References: Message-ID: On Wed, 22 May 2024 11:11:59 GMT, kuaiwei wrote: >> test/hotspot/gtest/aarch64/test_assembler_aarch64.cpp line 198: >> >>> 196: } >>> 197: >>> 198: TEST_VM(AssemblerAArch64, merge_ldst) { >> >> This test seems to be irrelevant for the issue at hand? Tests `ld/st` -> `ldp/stp` merging, not the barrier merges? > > In this patch, I fixed an issue, dmb/st/ld may not merge if CodeBuffer is expanding, I added some unit tests to check it. Aha. So the logic is that the same issue that affects `dmb` merging also affects `ldp` merging? All right, it's fine to leave it here then. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19278#discussion_r1622171573 From mli at openjdk.org Fri May 31 11:50:07 2024 From: mli at openjdk.org (Hamlin Li) Date: Fri, 31 May 2024 11:50:07 GMT Subject: RFR: 8332900: RISC-V: refactor nativeInst_riscv.cpp and macroAssembler_riscv.cpp [v3] In-Reply-To: References: Message-ID: On Fri, 31 May 2024 10:48:26 GMT, Fei Yang wrote: > Like `NativeInstruction::is_li16u` which delegates work to `MacroAssembler::is_li16u_at`. I don't find `NativeInstruction::is_li16u`, maybe you want to say something else for the delegation you mentioned? Take `MacroAssembler::is_li16u_at` as example, I moved it to macroAssembler, because in macroAssembler it's used too. So one of the principals I'd like to stick to in this refactoring is to make these 2 classes's communication unidirectional, so maybe it's better to move `MacroAssembler::is_li16u_at` too. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19459#issuecomment-2141872299 From duke at openjdk.org Fri May 31 11:59:04 2024 From: duke at openjdk.org (kuaiwei) Date: Fri, 31 May 2024 11:59:04 GMT Subject: RFR: 8325821: [REDO] use "dmb.ishst+dmb.ishld" for release barrier [v6] In-Reply-To: <-EyonABDkbgLwga8XTUDIaJeKhd8qwOmMmYtjFg90RQ=.eb6aebe4-b3e0-4225-9413-57a37730643c@github.com> References: <-EyonABDkbgLwga8XTUDIaJeKhd8qwOmMmYtjFg90RQ=.eb6aebe4-b3e0-4225-9413-57a37730643c@github.com> Message-ID: On Fri, 31 May 2024 11:02:31 GMT, Aleksey Shipilev wrote: >> kuaiwei has updated the pull request incrementally with one additional commit since the last revision: >> >> Add comment in aarch64.ad > > src/hotspot/cpu/aarch64/macroAssembler_aarch64.hpp line 153: > >> 151: Assembler::bind(L); >> 152: code()->clear_last_insn(); >> 153: code()->set_last_label(pc()); > > OK, so we have added `_last_label` to shared code in `codeBuffer`, but only update it in aarch64. This would be surprising for other platforms. On the other hand, this is what we already do with `_last_insn` -- only implementing it for specific platforms. Probably fine, but it would be nice to strengthen this with asserts, maybe in separate PR. It reminds me it could be applied to riscv. It also need merge membar. I will move this part to a new PR. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19278#discussion_r1622318763 From fyang at openjdk.org Fri May 31 12:16:02 2024 From: fyang at openjdk.org (Fei Yang) Date: Fri, 31 May 2024 12:16:02 GMT Subject: RFR: 8332900: RISC-V: refactor nativeInst_riscv.cpp and macroAssembler_riscv.cpp [v3] In-Reply-To: References: Message-ID: On Fri, 31 May 2024 11:47:41 GMT, Hamlin Li wrote: > > Like `NativeInstruction::is_li16u` which delegates work to `MacroAssembler::is_li16u_at`. > > I don't find `NativeInstruction::is_li16u`, maybe you want to say something else for the delegation you mentioned? Never mind. I think I miss read the code. > Take `MacroAssembler::is_li16u_at` as example, I moved it to macroAssembler, because in macroAssembler it's used too. So one of the principals I'd like to stick to in this refactoring is to make these 2 classes's communication unidirectional, so maybe it's better to move `MacroAssembler::is_li16u_at` too. Yeah. Your change becomes interesting to me now. I am having another check. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19459#issuecomment-2141940114 From tanksherman27 at gmail.com Fri May 31 12:16:05 2024 From: tanksherman27 at gmail.com (Julian Waters) Date: Fri, 31 May 2024 20:16:05 +0800 Subject: Structure of the HotSpot Interpreter In-Reply-To: References: Message-ID: Hi David, Ah, I remember reading through that article some time ago, and it did get me familiarized with the basic concepts in the Interpreter, but unfortunately it isn't really a deep dive into HotSpot's internals and the nitty gritty of how the code all meshes together, additionally there seems to be a little bit of inaccuracy in it (stating that HotSpot is Direct Threaded when it really is Token Threaded for instance). Thanks for the pointer though! best regards, Julian On Thu, May 30, 2024 at 9:47?PM Julian Waters wrote: > > Hi all, > > I've recently been trying to learn more about HotSpot and studying its > internals, but the structure of the Interpreter seems to elude me > still. I'm aware that HotSpot doesn't use a traditional switch case > (Well, at least not usually, looking at you Zero Port), but how it > functions is more or less still a black box to me. What kind of > dispatch mechanism does it use, for instance? Is it Direct Threaded, > Indirect Threaded, Token Threaded, or something else entirely? Is > there somewhere I can learn about how everything connects together? > I've tried reading the HotSpot documentation online but there doesn't > seem to be an in-depth explanation in them for how it all fits > together, I'd greatly appreciate if someone points me in the right > direction > > best regards, > Julian From heidinga at openjdk.org Fri May 31 12:24:08 2024 From: heidinga at openjdk.org (Dan Heidinga) Date: Fri, 31 May 2024 12:24:08 GMT Subject: RFR: 8293980: Resolve CONSTANT_FieldRef at CDS dump time [v3] In-Reply-To: <2cV0qix4YBr6H58RLaKdtiRmmiQ222IFHGH8kw-bWCY=.278bc880-4b3e-481c-90df-dd45a94f7822@github.com> References: <6tYcoQdH8mEhbBRUoAcOi3Gue7Lz9qIjJh3GpcLKGGE=.0b286773-cc03-434e-88ca-2c9cc5efce67@github.com> <2cV0qix4YBr6H58RLaKdtiRmmiQ222IFHGH8kw-bWCY=.278bc880-4b3e-481c-90df-dd45a94f7822@github.com> Message-ID: On Fri, 31 May 2024 00:22:37 GMT, Ioi Lam wrote: >>> We could walk `_resolved_field_entries` to find the `ResolvedFieldEntry` whose `_cpool_index` is `123`. However, before the `ResolvedFieldEntry` is resolved, we don't know which bytecode is used to resolve it, so we don't know whether it's for a static field or non-static field. Since we want to filter out the static fields in the PR, we need to: >>> >>> * walk the bytecodes to find only getfield/putfield bytecodes >>> * these bytecodes will give us an index to the `_resolved_field_entries` array >>> * from there, we discover the original CP index >>> * then we see if this index is set to true in `preresolve_list` >> >> Something's been bothering me about this explanation and I think I've put my finger on it. As you show, the same CP entry can be referenced by both `getstatic` & `getfield` bytecodes though only one will successfully resolve. Walking the bytecodes doesn't actually tell us anything - the resolution status should be different for instance vs static fields which means we're should always be safe to attempt the resolution of fields as instance fields provided we ignore errors. >> >>> So we must call `InterpreterRuntime::resolve_get_put()` which performs all the checks for access rights, static-vs-non-static, etc. This call requires a Method parameter, so we must walk all the Methods to find an appropriate one. >> >> The Method parameter is necessary for puts to final fields - either `` for static finals or an `` method for instance finals. In either case, the we don't actually resolve the field for puts so it doesn't matter if we pass the "correct" method or not during pre resolution as it will never successfully complete. I think we'd be OK to send any method we want to that call when doing preresolution provided we ignore the errors > > If you look at the version in the Leyden repo, there are many different types of references that are handled in `ClassPrelinker::maybe_resolve_fmi_ref` > > https://github.com/openjdk/leyden/blob/4faa72029abb86b55cb33b00acf9f3a18ade4b77/src/hotspot/share/cds/classPrelinker.cpp#L307 > > My goal is to defer all the safety checks to `InterpreterRuntime::resolve_xxx` so that we don't need to think about what is safe to pre-resolve, and what is not. Some of the checks are very complex (see linkResolver.cpp as well) and may change as the language evolve. The current algorithm says: for each bytecode in each method: switch(bytecode) { case getfield: case outfield: InterpreterRuntime::resolve_get_put(bc, raw_index, mh, cp, false /*initialize_holder*/, CHECK); break; .... } What I'm proposing is: for each ResolvedFieldEntry bool success = InterpreterRuntime::resolve_get_put(getfield, raw_index, nullptr /* mh */, cp, false /*initialize_holder*/, CHECK); if (success) { // also resolve for put InterpreterRuntime::resolve_get_put(putfield, raw_index, nullptr /* mh */, cp, false /*initialize_holder*/, CHECK); } The method is not critical as the current algorithm attempts resolution with multiple methods. The resolution logic already has to handle this for normal execution and "knows" not to resolve entries (like puts of field fields) regardless of the method as they need to do additional runtime checks. The same will apply to invoke bytecodes later.... it feels safer to do only what the bytecodes in some method have asked for but the runtime already has to be robust against different kinds of gets/puts or invokes targeting the same cp entries. If you really want to only resolve the exact cases (ie: gets not puts, etc) that were resolved in the training run, then we need to write out as part of the classicist more explicitly what needs to be resolved: ie: @cp_resolved_gets 4, 7 8 @cp_resolved_puts 7 8 10 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19355#discussion_r1622343820 From tanksherman27 at gmail.com Fri May 31 12:24:18 2024 From: tanksherman27 at gmail.com (Julian Waters) Date: Fri, 31 May 2024 20:24:18 +0800 Subject: Structure of the HotSpot Interpreter In-Reply-To: References: Message-ID: Hi Andrew, Thanks for the overview. I unfortunately can't do +PrintInterpreter at the moment since my JDK is experiencing compilation failures everywhere (It's in a bit of a mess right now), but I will try doing that once I've gotten everything fixed. However, I've been digging through the code a little, and I think I see a bit of a pattern. The methods in the Template Table files are all geared towards emitting the executable code into memory, and each of their methods are passed as a pointer to the corresponding bytecode definition to actually emit code into memory. The dispatch mechanism is still a bit of a mystery to me, but from what I can see the code that dispatches to the next bytecode is emitted by dispatch_next. Did I get all of that right, or is there anything I am missing? best regards, Julian On Thu, May 30, 2024 at 9:47?PM Julian Waters wrote: > > Hi all, > > I've recently been trying to learn more about HotSpot and studying its > internals, but the structure of the Interpreter seems to elude me > still. I'm aware that HotSpot doesn't use a traditional switch case > (Well, at least not usually, looking at you Zero Port), but how it > functions is more or less still a black box to me. What kind of > dispatch mechanism does it use, for instance? Is it Direct Threaded, > Indirect Threaded, Token Threaded, or something else entirely? Is > there somewhere I can learn about how everything connects together? > I've tried reading the HotSpot documentation online but there doesn't > seem to be an in-depth explanation in them for how it all fits > together, I'd greatly appreciate if someone points me in the right > direction > > best regards, > Julian From zgu at openjdk.org Fri May 31 13:03:07 2024 From: zgu at openjdk.org (Zhengyu Gu) Date: Fri, 31 May 2024 13:03:07 GMT Subject: RFR: 8333129: Move ShrinkHeapInSteps flag to Serial GC In-Reply-To: References: Message-ID: On Wed, 29 May 2024 12:36:40 GMT, Zhengyu Gu wrote: > A trivial change that moves Serial GC specific flag `ShrinkHeapInSteps` to `serial_globals.hpp` Thanks, @dholmes-ora and @kimbarrett ------------- PR Comment: https://git.openjdk.org/jdk/pull/19452#issuecomment-2142086578 From zgu at openjdk.org Fri May 31 13:03:07 2024 From: zgu at openjdk.org (Zhengyu Gu) Date: Fri, 31 May 2024 13:03:07 GMT Subject: Integrated: 8333129: Move ShrinkHeapInSteps flag to Serial GC In-Reply-To: References: Message-ID: <-QY-m8IFTr3g5fJKq9E7THR5YDvGRAS93dJsp7LaoIM=.49bde2b6-818c-4b6c-8b1a-7fbdd01c3907@github.com> On Wed, 29 May 2024 12:36:40 GMT, Zhengyu Gu wrote: > A trivial change that moves Serial GC specific flag `ShrinkHeapInSteps` to `serial_globals.hpp` This pull request has now been integrated. Changeset: 79a78f03 Author: Zhengyu Gu URL: https://git.openjdk.org/jdk/commit/79a78f032effdae40816e7d3e2596dc2b8ef5b9f Stats: 15 lines in 2 files changed: 4 ins; 5 del; 6 mod 8333129: Move ShrinkHeapInSteps flag to Serial GC Reviewed-by: dholmes, kbarrett ------------- PR: https://git.openjdk.org/jdk/pull/19452 From rehn at openjdk.org Fri May 31 13:11:04 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Fri, 31 May 2024 13:11:04 GMT Subject: RFR: 8332899: RISC-V: add comment and make the code more readable (if possible) in MacroAssembler::movptr In-Reply-To: References: Message-ID: On Tue, 28 May 2024 15:36:00 GMT, Hamlin Li wrote: > Hi, > Can you help to review the patch? > As discussed, https://github.com/openjdk/jdk/pull/19246#discussion_r1613279908, it's worth to make the code more readable. > For movptr1, add some comments to help understand the tricky part. > For movptr2, it uses the similar (tricky) way as movptr1, so I align the code implementation with movptr1, and try to make it more straightforward. > I tried it, hope it's better. > Thanks. Seems fine to me! ------------- Marked as reviewed by rehn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19431#pullrequestreview-2090794567 From fyang at openjdk.org Fri May 31 13:46:04 2024 From: fyang at openjdk.org (Fei Yang) Date: Fri, 31 May 2024 13:46:04 GMT Subject: RFR: 8332900: RISC-V: refactor nativeInst_riscv.cpp and macroAssembler_riscv.cpp [v3] In-Reply-To: References: Message-ID: On Wed, 29 May 2024 18:54:27 GMT, Hamlin Li wrote: >> Hi, >> Can you help to review the patch? >> Currently, code in nativeInst_riscv.cpp and macroAssembler_riscv.cpp call each other, which is not right for readability and maintainance. >> After refactoring, basically only code in nativeInst_riscv.cpp calls code in macroAssembler_riscv.cpp, but not in reverse direction. >> >> Thanks! >> >> * Tests are still running, so far so good. > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > restrict accessbility LGTM modulo one minor comment. I think you are right in this refactoring work. Sorry for mis-understanding your opinion. Thanks! src/hotspot/cpu/riscv/macroAssembler_riscv.hpp line 99: > 97: } > 98: > 99: private: Maybe it's better to make those is_XXX functions public in case of future possible uses in other places? ------------- PR Review: https://git.openjdk.org/jdk/pull/19459#pullrequestreview-2090850751 PR Review Comment: https://git.openjdk.org/jdk/pull/19459#discussion_r1622429210 From mli at openjdk.org Fri May 31 14:40:15 2024 From: mli at openjdk.org (Hamlin Li) Date: Fri, 31 May 2024 14:40:15 GMT Subject: RFR: 8332900: RISC-V: refactor nativeInst_riscv.cpp and macroAssembler_riscv.cpp [v4] In-Reply-To: References: Message-ID: > Hi, > Can you help to review the patch? > Currently, code in nativeInst_riscv.cpp and macroAssembler_riscv.cpp call each other, which is not right for readability and maintainance. > After refactoring, basically only code in nativeInst_riscv.cpp calls code in macroAssembler_riscv.cpp, but not in reverse direction. > > Thanks! > > * Tests are still running, so far so good. Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: adjust accessibility ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19459/files - new: https://git.openjdk.org/jdk/pull/19459/files/fe345dd3..ef11d6a7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19459&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19459&range=02-03 Stats: 17 lines in 1 file changed: 6 ins; 11 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/19459.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19459/head:pull/19459 PR: https://git.openjdk.org/jdk/pull/19459 From mli at openjdk.org Fri May 31 14:40:15 2024 From: mli at openjdk.org (Hamlin Li) Date: Fri, 31 May 2024 14:40:15 GMT Subject: RFR: 8332900: RISC-V: refactor nativeInst_riscv.cpp and macroAssembler_riscv.cpp [v3] In-Reply-To: References: Message-ID: On Fri, 31 May 2024 13:42:36 GMT, Fei Yang wrote: > LGTM modulo one minor comment. I think you are right in this refactoring work. Sorry for mis-understanding your opinion. Thanks! No worry. Thanks for discussion! > src/hotspot/cpu/riscv/macroAssembler_riscv.hpp line 99: > >> 97: } >> 98: >> 99: private: > > Maybe it's better to make those is_XXX functions public in case of possible future uses in other places? Yes, modified. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19459#issuecomment-2142393896 PR Review Comment: https://git.openjdk.org/jdk/pull/19459#discussion_r1622521815 From fyang at openjdk.org Fri May 31 14:51:03 2024 From: fyang at openjdk.org (Fei Yang) Date: Fri, 31 May 2024 14:51:03 GMT Subject: RFR: 8332900: RISC-V: refactor nativeInst_riscv.cpp and macroAssembler_riscv.cpp [v4] In-Reply-To: References: Message-ID: <5I6VbVXsihU-TRX-N4Yn3zzxtJ7dBj-DGcNfmteLxWo=.7ad1bbf0-313f-4cf3-aa51-22bd232e6ca0@github.com> On Fri, 31 May 2024 14:40:15 GMT, Hamlin Li wrote: >> Hi, >> Can you help to review the patch? >> Currently, code in nativeInst_riscv.cpp and macroAssembler_riscv.cpp call each other, which is not right for readability and maintainance. >> After refactoring, basically only code in nativeInst_riscv.cpp calls code in macroAssembler_riscv.cpp, but not in reverse direction. >> >> Thanks! >> >> * Tests are still running, so far so good. > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > adjust accessibility Marked as reviewed by fyang (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/19459#pullrequestreview-2091046822 From fyang at openjdk.org Fri May 31 14:55:03 2024 From: fyang at openjdk.org (Fei Yang) Date: Fri, 31 May 2024 14:55:03 GMT Subject: RFR: 8332899: RISC-V: add comment and make the code more readable (if possible) in MacroAssembler::movptr In-Reply-To: References: Message-ID: On Tue, 28 May 2024 15:36:00 GMT, Hamlin Li wrote: > Hi, > Can you help to review the patch? > As discussed, https://github.com/openjdk/jdk/pull/19246#discussion_r1613279908, it's worth to make the code more readable. > For movptr1, add some comments to help understand the tricky part. > For movptr2, it uses the similar (tricky) way as movptr1, so I align the code implementation with movptr1, and try to make it more straightforward. > I tried it, hope it's better. > Thanks. LGTM modulo the typo in code comment. Thanks. src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 1686: > 1684: // In case of 11th bit of `lower` is 0, it's straightforward to understand. > 1685: // In case of 11th bit of `lower` is 1, it's a bit tricky, to help understand, > 1686: // image divide both `upper` and `lower` into 2 parts respectively, i.e. Suggestion: s/image/imagine/ ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19431#pullrequestreview-2091050276 PR Review Comment: https://git.openjdk.org/jdk/pull/19431#discussion_r1622540261 From gcao at openjdk.org Fri May 31 15:01:08 2024 From: gcao at openjdk.org (Gui Cao) Date: Fri, 31 May 2024 15:01:08 GMT Subject: Withdrawn: 8333245: RISC-V: UseRVV option can't be enabled after JDK-8316859 In-Reply-To: <14Zzi3W09YcO5NtfL7gUQwY0NDpexCOTdj4reavKKTI=.e8c2822c-6064-47c9-88d4-de50b980436a@github.com> References: <14Zzi3W09YcO5NtfL7gUQwY0NDpexCOTdj4reavKKTI=.e8c2822c-6064-47c9-88d4-de50b980436a@github.com> Message-ID: On Thu, 30 May 2024 09:13:30 GMT, Gui Cao wrote: > Because some dev boards only support RVV version 0.7, In [JDK-8316859](https://bugs.openjdk.org/browse/JDK-8316859) we masked the use of HWCAP to probe for RVV extensions, and in the meantime, we can use hwprobe to probe for V extensions in Linux kernel 6.5 and above. But recently we got Banana Pi BPI-F3 board (has RVV1.0), but his kernel is 6.1.15, so the V extensions detected by HWCAP are masked. And we get the warning: `RVV is not supported on this CPU` when we enable UseRVV with the command, and we can't enable UseRVV correctly. > > Without Patch: > > zifeihan at bananapif3:~/jre/jdk/bin$ ./java -XX:+PrintFlagsFinal -XX:+UseRVV -version | grep UseRVV > OpenJDK 64-Bit Server VM warning: RVV is not supported on this CPU > bool UseRVV = false {ARCH product} {command line} > bool UseRVVForBigIntegerShiftIntrinsics = false {ARCH product} {default} > openjdk version "23-internal" 2024-09-17 > OpenJDK Runtime Environment (build 23-internal-adhoc.zifeihan.jdk) > OpenJDK 64-Bit Server VM (build 23-internal-adhoc.zifeihan.jdk, mixed mode) > > > With Patch: > > zifeihan at bananapif3:~/jre/jdk/bin$ ./java -XX:+PrintFlagsFinal -XX:+UseRVV -version | grep UseRVV > bool UseRVV = true {ARCH product} {command line} > bool UseRVVForBigIntegerShiftIntrinsics = true {ARCH product} {default} > openjdk version "23-internal" 2024-09-17 > OpenJDK Runtime Environment (build 23-internal-adhoc.zifeihan.jdk) > OpenJDK 64-Bit Server VM (build 23-internal-adhoc.zifeihan.jdk, mixed mode) This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/19472 From mli at openjdk.org Fri May 31 15:04:38 2024 From: mli at openjdk.org (Hamlin Li) Date: Fri, 31 May 2024 15:04:38 GMT Subject: RFR: 8332899: RISC-V: add comment and make the code more readable (if possible) in MacroAssembler::movptr [v2] In-Reply-To: References: Message-ID: > Hi, > Can you help to review the patch? > As discussed, https://github.com/openjdk/jdk/pull/19246#discussion_r1613279908, it's worth to make the code more readable. > For movptr1, add some comments to help understand the tricky part. > For movptr2, it uses the similar (tricky) way as movptr1, so I align the code implementation with movptr1, and try to make it more straightforward. > I tried it, hope it's better. > Thanks. Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: typo ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19431/files - new: https://git.openjdk.org/jdk/pull/19431/files/93de6315..263aefba Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19431&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19431&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/19431.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19431/head:pull/19431 PR: https://git.openjdk.org/jdk/pull/19431 From mli at openjdk.org Fri May 31 15:04:38 2024 From: mli at openjdk.org (Hamlin Li) Date: Fri, 31 May 2024 15:04:38 GMT Subject: RFR: 8332899: RISC-V: add comment and make the code more readable (if possible) in MacroAssembler::movptr [v2] In-Reply-To: References: Message-ID: On Fri, 31 May 2024 13:08:11 GMT, Robbin Ehn wrote: >> Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: >> >> typo > > Seems fine to me! Thanks @robehn @RealFYang for your reviewing! ------------- PR Comment: https://git.openjdk.org/jdk/pull/19431#issuecomment-2142452305 From mli at openjdk.org Fri May 31 15:04:39 2024 From: mli at openjdk.org (Hamlin Li) Date: Fri, 31 May 2024 15:04:39 GMT Subject: RFR: 8332899: RISC-V: add comment and make the code more readable (if possible) in MacroAssembler::movptr [v2] In-Reply-To: References: Message-ID: On Fri, 31 May 2024 14:49:47 GMT, Fei Yang wrote: >> Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: >> >> typo > > src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 1686: > >> 1684: // In case of 11th bit of `lower` is 0, it's straightforward to understand. >> 1685: // In case of 11th bit of `lower` is 1, it's a bit tricky, to help understand, >> 1686: // image divide both `upper` and `lower` into 2 parts respectively, i.e. > > Suggestion: s/image/imagine/ Thanks, fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19431#discussion_r1622556913 From mli at openjdk.org Fri May 31 15:04:39 2024 From: mli at openjdk.org (Hamlin Li) Date: Fri, 31 May 2024 15:04:39 GMT Subject: Integrated: 8332899: RISC-V: add comment and make the code more readable (if possible) in MacroAssembler::movptr In-Reply-To: References: Message-ID: On Tue, 28 May 2024 15:36:00 GMT, Hamlin Li wrote: > Hi, > Can you help to review the patch? > As discussed, https://github.com/openjdk/jdk/pull/19246#discussion_r1613279908, it's worth to make the code more readable. > For movptr1, add some comments to help understand the tricky part. > For movptr2, it uses the similar (tricky) way as movptr1, so I align the code implementation with movptr1, and try to make it more straightforward. > I tried it, hope it's better. > Thanks. This pull request has now been integrated. Changeset: 914423e3 Author: Hamlin Li URL: https://git.openjdk.org/jdk/commit/914423e3b7162ad934fa4edc46ee37e0f401d27b Stats: 32 lines in 1 file changed: 25 ins; 3 del; 4 mod 8332899: RISC-V: add comment and make the code more readable (if possible) in MacroAssembler::movptr Reviewed-by: rehn, fyang ------------- PR: https://git.openjdk.org/jdk/pull/19431 From szaldana at openjdk.org Fri May 31 15:12:39 2024 From: szaldana at openjdk.org (Sonia Zaldana Calles) Date: Fri, 31 May 2024 15:12:39 GMT Subject: RFR: 8332785: Replace naked uses of UseSharedSpaces with CDSConfig::is_using_archive [v2] In-Reply-To: References: Message-ID: > Hi folks, > > This PR addresses [8332785](https://bugs.openjdk.org/browse/JDK-8332785) replacing all naked uses for ```UseSharedSpaces``` with ```CDSConfig::is_using_archive```. > > Testing: > - [x] Tier 1 with GHA. > > Thanks, > Sonia Sonia Zaldana Calles has updated the pull request incrementally with one additional commit since the last revision: Updating copyright headers and unnecessary CDSConfig:: ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19463/files - new: https://git.openjdk.org/jdk/pull/19463/files/f3e6b17c..ba4c0032 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19463&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19463&range=00-01 Stats: 15 lines in 15 files changed: 0 ins; 0 del; 15 mod Patch: https://git.openjdk.org/jdk/pull/19463.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19463/head:pull/19463 PR: https://git.openjdk.org/jdk/pull/19463 From iklam at openjdk.org Fri May 31 18:46:04 2024 From: iklam at openjdk.org (Ioi Lam) Date: Fri, 31 May 2024 18:46:04 GMT Subject: RFR: 8293980: Resolve CONSTANT_FieldRef at CDS dump time [v3] In-Reply-To: References: <6tYcoQdH8mEhbBRUoAcOi3Gue7Lz9qIjJh3GpcLKGGE=.0b286773-cc03-434e-88ca-2c9cc5efce67@github.com> <2cV0qix4YBr6H58RLaKdtiRmmiQ222IFHGH8kw-bWCY=.278bc880-4b3e-481c-90df-dd45a94f7822@github.com> Message-ID: On Fri, 31 May 2024 12:21:09 GMT, Dan Heidinga wrote: >> If you look at the version in the Leyden repo, there are many different types of references that are handled in `ClassPrelinker::maybe_resolve_fmi_ref` >> >> https://github.com/openjdk/leyden/blob/4faa72029abb86b55cb33b00acf9f3a18ade4b77/src/hotspot/share/cds/classPrelinker.cpp#L307 >> >> My goal is to defer all the safety checks to `InterpreterRuntime::resolve_xxx` so that we don't need to think about what is safe to pre-resolve, and what is not. Some of the checks are very complex (see linkResolver.cpp as well) and may change as the language evolve. > > The current algorithm says: > > for each bytecode in each method: > switch(bytecode) { > case getfield: > case outfield: > InterpreterRuntime::resolve_get_put(bc, raw_index, mh, cp, false /*initialize_holder*/, CHECK); > break; > .... > } > > What I'm proposing is: > > for each ResolvedFieldEntry > bool success = InterpreterRuntime::resolve_get_put(getfield, raw_index, nullptr /* mh */, cp, false /*initialize_holder*/, CHECK); > if (success) { > // also resolve for put > InterpreterRuntime::resolve_get_put(putfield, raw_index, nullptr /* mh */, cp, false /*initialize_holder*/, CHECK); > } > > > The `method` parameter is not critical as the "current" algorithm attempts resolution with multiple methods - once for each method that references the ResolvedFieldEntry. The resolution logic already has to handle dealing with different rules for different types of methods (ie `` & ``) for normal execution and "knows" not to resolve entries (like puts of field fields) regardless of the method as they need to do additional runtime checks on every access. > > The same will apply to invoke bytecodes later.... it feels safer to do only what the bytecodes in some method have asked for but the runtime already has to be robust against different kinds of gets/puts or invokes targeting the same cp entries. By eagerly resolving we're not giving up any safety. > > If you really want to only resolve the exact cases (ie: gets not puts, etc) that were resolved in the training run, then we need to write out as part of the classlist more explicitly what needs to be resolved: > ie: > > @cp_resolved_gets 4, 7 8 > @cp_resolved_puts 7 8 10 This makes sense. I will try to prototype it in the Leyden repo and then update this PR. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19355#discussion_r1622828712 From never at openjdk.org Fri May 31 19:55:01 2024 From: never at openjdk.org (Tom Rodriguez) Date: Fri, 31 May 2024 19:55:01 GMT Subject: RFR: 8333300: [JVMCI] add support for generational ZGC In-Reply-To: References: Message-ID: On Thu, 30 May 2024 20:37:09 GMT, Tom Rodriguez wrote: > This exposes the required values for JVMCI to support generational ZGC. It includes a few things worth mentioning. JVMCI still exports XBarrierSetRuntime as fields in CompilerToVM::Data under the original name of ZBarrierSetRuntime. I have exported the XBarrierSetRuntime and ZBarrierSetRuntime functions as addresses under their actual name. This permits backward compatibility until all the required parts are in place. We can eventually delete the CompilerToVM::Data names. > > I added ZBarrierSetRuntime::load_barrier_on_oop_array paralleling XBarrierSetRuntime::load_barrier_on_oop_array as we use that for a vector barrier. I could create the function as part of JVMCIRuntime if there are any concerns about including that in the ZGC core. Testing of the combined Graal and JVMCI changes are in progress. I hit one main failure that I'm investigating and think I see the problem. I'll put it the combo all together for testing once I've got that resolved. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19490#issuecomment-2142892561 From dcubed at openjdk.org Fri May 31 20:07:04 2024 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Fri, 31 May 2024 20:07:04 GMT Subject: RFR: 8332935: Crash: assert(*lastPtr != 0) failed: Mismatched JNINativeInterface tables, check for new entries In-Reply-To: References: Message-ID: On Thu, 30 May 2024 23:43:00 GMT, David Holmes wrote: > By using the `int*` type the assert could fail if the lower 32-bits of the function address were all zero. Trivial fix is to change to a type that is guaranteed the right size: `intptr_t*` > > Testing was done manually - see the JBS issue. > > Also run tier4 testing a sanity as it include `-Xcheck:jni`. > > Thanks. Thumbs up. This is a trivial fix. You should consider updating the copyright year. ------------- Marked as reviewed by dcubed (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19491#pullrequestreview-2091634633 From never at openjdk.org Fri May 31 21:03:30 2024 From: never at openjdk.org (Tom Rodriguez) Date: Fri, 31 May 2024 21:03:30 GMT Subject: RFR: 8333300: [JVMCI] add support for generational ZGC [v2] In-Reply-To: References: Message-ID: > This exposes the required values for JVMCI to support generational ZGC. It includes a few things worth mentioning. JVMCI still exports XBarrierSetRuntime as fields in CompilerToVM::Data under the original name of ZBarrierSetRuntime. I have exported the XBarrierSetRuntime and ZBarrierSetRuntime functions as addresses under their actual name. This permits backward compatibility until all the required parts are in place. We can eventually delete the CompilerToVM::Data names. > > I added ZBarrierSetRuntime::load_barrier_on_oop_array paralleling XBarrierSetRuntime::load_barrier_on_oop_array as we use that for a vector barrier. I could create the function as part of JVMCIRuntime if there are any concerns about including that in the ZGC core. Tom Rodriguez has updated the pull request incrementally with one additional commit since the last revision: Fix riscv compilation ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19490/files - new: https://git.openjdk.org/jdk/pull/19490/files/bf01ebde..bb91b42c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19490&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19490&range=00-01 Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/19490.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19490/head:pull/19490 PR: https://git.openjdk.org/jdk/pull/19490 From eosterlund at openjdk.org Fri May 31 21:20:15 2024 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Fri, 31 May 2024 21:20:15 GMT Subject: RFR: 8333300: [JVMCI] add support for generational ZGC [v2] In-Reply-To: References: Message-ID: On Fri, 31 May 2024 21:03:30 GMT, Tom Rodriguez wrote: >> This exposes the required values for JVMCI to support generational ZGC. It includes a few things worth mentioning. JVMCI still exports XBarrierSetRuntime as fields in CompilerToVM::Data under the original name of ZBarrierSetRuntime. I have exported the XBarrierSetRuntime and ZBarrierSetRuntime functions as addresses under their actual name. This permits backward compatibility until all the required parts are in place. We can eventually delete the CompilerToVM::Data names. >> >> I added ZBarrierSetRuntime::load_barrier_on_oop_array paralleling XBarrierSetRuntime::load_barrier_on_oop_array as we use that for a vector barrier. I could create the function as part of JVMCIRuntime if there are any concerns about including that in the ZGC core. > > Tom Rodriguez has updated the pull request incrementally with one additional commit since the last revision: > > Fix riscv compilation Great work. Looks good. ------------- Marked as reviewed by eosterlund (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19490#pullrequestreview-2091722595 From dholmes at openjdk.org Fri May 31 23:12:27 2024 From: dholmes at openjdk.org (David Holmes) Date: Fri, 31 May 2024 23:12:27 GMT Subject: RFR: 8332935: Crash: assert(*lastPtr != 0) failed: Mismatched JNINativeInterface tables, check for new entries [v2] In-Reply-To: References: Message-ID: > By using the `int*` type the assert could fail if the lower 32-bits of the function address were all zero. Trivial fix is to change to a type that is guaranteed the right size: `intptr_t*` > > Testing was done manually - see the JBS issue. > > Also run tier4 testing a sanity as it include `-Xcheck:jni`. > > Thanks. David Holmes has updated the pull request incrementally with one additional commit since the last revision: Copyright year ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19491/files - new: https://git.openjdk.org/jdk/pull/19491/files/4241674b..51fa807f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19491&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19491&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/19491.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19491/head:pull/19491 PR: https://git.openjdk.org/jdk/pull/19491 From dholmes at openjdk.org Fri May 31 23:12:27 2024 From: dholmes at openjdk.org (David Holmes) Date: Fri, 31 May 2024 23:12:27 GMT Subject: RFR: 8332935: Crash: assert(*lastPtr != 0) failed: Mismatched JNINativeInterface tables, check for new entries [v2] In-Reply-To: References: Message-ID: On Fri, 31 May 2024 20:04:23 GMT, Daniel D. Daugherty wrote: >> David Holmes has updated the pull request incrementally with one additional commit since the last revision: >> >> Copyright year > > Thumbs up. This is a trivial fix. You should consider updating the copyright year. Thanks for the review @dcubed-ojdk ! Copyright year updated. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19491#issuecomment-2143088973 From sspitsyn at openjdk.org Fri May 31 23:55:20 2024 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Fri, 31 May 2024 23:55:20 GMT Subject: RFR: 8311177: Switching to interpreter only mode in carrier thread can lead to crashes [v3] In-Reply-To: References: Message-ID: > Please, review the following `interp-only` issue related to carrier threads. > There are 3 problems fixed here: > - The `EnterInterpOnlyModeClosure::do_threads` is taking the `JvmtiThreadState` with the `jt->jvmti_thread_state()` which is incorrect when we have a deal with a carrier thread. The target state is known at the point when the `HandshakeClosure` is set, so the fix is to pass it as a constructor parameter. > - The `state->is_pending_interp_only_mode())` was processed at mounts only but it has to be processed for unmounts as well. > - The test `test/hotspot/jtreg/serviceability/jvmti/vthread/MethodExitTest/libMethodExitTest.cpp` has a wrong assumption that there can't be `MethodExit` event on the carrier thread when the function `breakpoint_hit1` is being executed. However, it can happen if the virtual thread gets unmounted. > > The fix also includes new test case `vthread/CarrierThreadEventNotification` developed by Patricio. > > Testing: > - Ran new test case locally > - Ran mach5 tiers 1-6 Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: review: refactored def and use of process_pending_interp_only() ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19438/files - new: https://git.openjdk.org/jdk/pull/19438/files/2f75975f..19e4d8fa Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19438&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19438&range=01-02 Stats: 36 lines in 4 files changed: 16 ins; 18 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/19438.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19438/head:pull/19438 PR: https://git.openjdk.org/jdk/pull/19438 From sspitsyn at openjdk.org Fri May 31 23:58:07 2024 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Fri, 31 May 2024 23:58:07 GMT Subject: RFR: 8311177: Switching to interpreter only mode in carrier thread can lead to crashes [v2] In-Reply-To: References: <55rWd_Kn3Jf8kfmkMtVnzRVs_o0KK_jnuZthiS9awDA=.555b5928-38d1-422c-9014-7d4cf31a950d@github.com> Message-ID: <6wadlhiKWk4vvtNXh3UGCf7o9giMAQENl13TZ-gTjc4=.d60fa68f-ebef-4ac3-a7c7-fe16c9cc6438@github.com> On Thu, 30 May 2024 18:59:10 GMT, Patricio Chilano Mateo wrote: >> Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: >> >> review: addressed nits in new test > > src/hotspot/share/prims/jvmtiThreadState.cpp line 674: > >> 672: } >> 673: // enable interp_only_mode for carrier thread if it has pending bit >> 674: process_pending_interp_only(thread); > > So for the last unmount case we will call this before doing the JVMTI state rebinding, but shouldn't it be called after it in VTMS_vthread_end? Actually why not moving this call inside rebind_to_jvmti_thread_state_of()? Thank you for the comment! I was also thinking about placing it to the `rebind_to_jvmti_thread_state_of()`. I've made and pushed this change now. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19438#discussion_r1623046562